• Members 1 post 0 knowledge points
    July 23, 2025, 12:53 p.m.

    Hey everyone,

    I'm trying to learn dplyr and I'm stuck on what seems like a simple problem. I have a data frame sales_data with columns for product_category, region, and sales_total.

    
    # Sample Data
    sales_data <- data.frame(
      product_category = c("Electronics", "Clothing", "Electronics", "Books", "Clothing"),
      region = c("North", "North", "South", "South", "West"),
      sales_total = c(1200, 300, 800, 150, 450)
    )
    

    I want to calculate the total sales for each product_category. I thought this code would work, but it's just returning the same number of rows as my original data frame.

    
    library(dplyr)
    
    sales_data %>%
      group_by(product_category) %>%
      mutate(total_sales = sum(sales_total))
    

    What am I doing wrong? I expected to get one row for "Electronics", one for "Clothing", and one for "Books". Thanks in advance!

  • Members 1 post 0 knowledge points
    July 30, 2025, 1:24 p.m.
    check_box

    Marked as best answer by July 30, 2025, 1:27 p.m..

    Here is another answer. It isn't the best one.

  • Members 1 post 0 knowledge points
    July 23, 2025, 1:26 p.m.

    That's a very common mix-up when you're starting out. You're looking for the summarise() function, not mutate().

    mutate() adds a new column to the existing data frame. When you combine it with group_by(), it calculates the value for each group and then repeats that value for every row within that group.

    summarise() (or summarize()) collapses each group into a single row.

    This should give you the output you want:

    
    sales_data %>%
      group_by(product_category) %>%
      summarise(total_sales = sum(sales_total))
    
    # # A tibble: 3 × 2
    #   product_category total_sales
    #   <chr>                  <dbl>
    # 1 Books                    150
    # 2 Clothing                 750
    # 3 Electronics             2000
    

    Hope that helps!