Window Functions, Rowwise Operations, and tidyr

Window functions compute a value for each row based on a window of related rows — similar to SQL window functions. In dplyr, they are used inside mutate() after group_by(). Common window functions include row_number() for rank, rank(), min_rank(), dense_rank(), percent_rank() for various ranking schemes, cumsum(), cummean(), cummax(), cummin() for cumulative statistics, and lag() / lead() for shifting values within a group. These are indispensable for calculating running totals, percentage-of-group, and period-over-period changes. The rowwise() function changes the unit of computation from the column to the row, allowing you to apply functions that are not vectorised across rows. Combined with c_across(), you can compute row-wise statistics like a row mean or the maximum across a selection of columns. For the same task on a data frame without dplyr, apply(df, 1, fun) works but rowwise() integrates naturally with the pipe workflow. tidyr complements dplyr by providing tools to reshape and tidy data. pivot_longer() converts multiple columns into two columns (name and value), transforming wide format into long format. pivot_wider() does the reverse. separate() splits a single character column into multiple columns on a delimiter. unite() merges multiple columns into one. fill() propagates a non-NA value forward or backward to fill gaps. These functions together with dplyr cover the vast majority of real-world data wrangling tasks.

Example

library(dplyr)
library(tidyr)

sales <- data.frame(
    month  = rep(c("Jan","Feb","Mar","Apr"), times = 2),
    region = rep(c("North","South"), each = 4),
    revenue = c(120,135,110,150, 90,95,105,115)
)

# Window functions inside mutate after group_by
sales |>
    group_by(region) |>
    mutate(
        rank_rev    = row_number(desc(revenue)),
        pct_of_max  = revenue / max(revenue),
        cumulative  = cumsum(revenue),
        mom_change  = revenue - lag(revenue, default = first(revenue))
    ) |>
    ungroup()

# rowwise() for row-level computation
scores <- data.frame(
    student = c("Alice","Bob","Carol"),
    math    = c(88, 72, 95),
    english = c(91, 85, 78),
    science = c(79, 90, 88)
)
scores |>
    rowwise() |>
    mutate(
        avg_score = mean(c_across(math:science)),
        max_score = max(c_across(math:science))
    ) |>
    ungroup()

# pivot_longer — wide to long
long_sales <- sales |>
    rename(rev = revenue) |>
    pivot_longer(cols = rev, names_to = "metric", values_to = "value")

# More typical pivot_longer: multiple measure columns
scores_long <- scores |>
    pivot_longer(cols      = -student,
                 names_to  = "subject",
                 values_to = "score")
scores_long

# pivot_wider — long back to wide
scores_long |>
    pivot_wider(names_from  = subject,
                values_from = score)

# separate and unite
data.frame(date_range = c("2024-01-01/2024-03-31",
                           "2024-04-01/2024-06-30")) |>
    separate(date_range, into = c("start", "end"), sep = "/")

scores |>
    unite("id", student, sep = "_") |>
    head(2)

HTML

Full Editor

Related Resources

R Reference

Complete tag & property list

R How-To Guides

Step-by-step practical guides

R Exercises

Practice what you've learned