SyntaxStudy
Sign Up
R Core dplyr Verbs: filter, select, mutate, arrange
R Beginner 1 min read

Core dplyr Verbs: filter, select, mutate, arrange

dplyr is the tidyverse's grammar of data manipulation, providing a consistent set of verbs that cover the most common data transformation tasks. Each verb is a function that takes a data frame as its first argument and returns a data frame, making them composable via the pipe operator. The package works with in-memory data frames and, through dbplyr, translates the same code to SQL for database backends, so learning dplyr gives you a skill that transfers across environments. filter() selects rows that satisfy one or more logical conditions, using the column names directly without quoting (non-standard evaluation). Multiple conditions separated by commas are combined with AND; the | operator handles OR conditions. select() chooses columns by name, position, or helper functions like starts_with(), ends_with(), contains(), matches(), and everything(). The minus sign drops columns, and renaming during selection is done with new_name = old_name syntax. mutate() adds new columns or modifies existing ones. It evaluates expressions in the context of the data frame so column names can be used directly. You can reference newly created columns within the same mutate() call in later expressions. transmute() is a variant that returns only the newly created columns. arrange() reorders rows by one or more columns, with desc() wrapping a column name to sort in descending order.
Example
library(dplyr)

# Sample data
employees <- data.frame(
    id     = 1:8,
    name   = c("Alice","Bob","Carol","Dave","Eve","Frank","Gina","Hank"),
    dept   = c("HR","IT","IT","Finance","HR","IT","Finance","HR"),
    salary = c(55000,72000,48000,95000,61000,88000,102000,58000),
    years  = c(3, 7, 2, 12, 5, 9, 15, 4),
    active = c(TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE)
)

# filter — keep rows matching conditions
filter(employees, dept == "IT")
filter(employees, salary > 60000, active == TRUE)
filter(employees, dept == "HR" | dept == "Finance")
filter(employees, between(salary, 50000, 80000))

# select — choose columns
select(employees, name, dept, salary)
select(employees, -id, -active)        # drop columns
select(employees, starts_with("s"))    # salary, (none others here)
select(employees, emp_name = name, department = dept, salary)

# mutate — add / modify columns
employees <- mutate(employees,
    bonus       = salary * 0.10,
    total_comp  = salary + bonus,
    seniority   = case_when(
        years <  3 ~ "junior",
        years <  8 ~ "mid",
        TRUE       ~ "senior"
    )
)

# arrange — sort rows
arrange(employees, salary)              # ascending
arrange(employees, desc(salary))        # descending
arrange(employees, dept, desc(salary))  # multi-column sort

# Pipe all together
employees |>
    filter(active == TRUE) |>
    select(name, dept, total_comp, seniority) |>
    arrange(dept, desc(total_comp))