Linux / Bash Text Processing Cutting, Sorting, and Deduplication with cut, sort, and uniq

Linux / Bash Beginner 1 min read

Cutting, Sorting, and Deduplication with cut, sort, and uniq

The `cut` command extracts specific columns or character ranges from each line of input. With `-f` it selects fields (using the delimiter set by `-d`), and with `-c` it selects character positions. It is a fast, simple alternative to `awk` when you need to extract fixed columns from delimited text. `sort` arranges lines alphabetically by default. The `-n` flag enables numeric sort, `-r` reverses the order, `-k` specifies which field to sort by, and `-t` sets the field delimiter. The `-u` flag removes duplicates in the same pass. Sorting is frequently applied before `uniq` to consolidate and count repeated lines in log files. `uniq` collapses consecutive duplicate lines, optionally showing counts (`-c`) or only duplicates (`-d`) or only unique lines (`-u`). Because `uniq` only compares adjacent lines, the input must usually be sorted first. The combination `sort | uniq -c | sort -rn` produces a frequency table that is invaluable for log analysis, finding common IP addresses, URLs, or error messages.

Example

# ---- cut examples ----

# Extract field 1 (username) from /etc/passwd (: delimiter)
cut -d: -f1 /etc/passwd

# Extract fields 1 and 7 (username and shell)
cut -d: -f1,7 /etc/passwd

# Extract characters 1-10 from each line
cut -c1-10 /var/log/syslog

# Extract from character 15 to end of line
cut -c15- /var/log/syslog

# ---- sort examples ----

# Alphabetical sort
sort /etc/passwd

# Numeric sort (field 3 = UID), colon delimiter
sort -t: -k3 -n /etc/passwd

# Reverse sort (largest UID first)
sort -t: -k3 -rn /etc/passwd

# Sort by file size (second column of du output)
du -sh /var/log/* | sort -h

# Remove duplicate lines during sort
sort -u words.txt

# Human-readable sort (1K, 10M, 2G)
df -h | sort -k5 -h

# ---- uniq examples ----

# Count occurrences of each line
sort access.log | uniq -c

# Sort by frequency (most common first)
sort access.log | uniq -c | sort -rn | head -20

# Show only lines that appear more than once
sort file.txt | uniq -d

# Show only lines that appear exactly once
sort file.txt | uniq -u

# Full pipeline: top 10 IPs hitting your web server
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

HTML

Full Editor

Related Resources

Linux / Bash Reference

Complete tag & property list

Linux / Bash How-To Guides

Step-by-step practical guides

Linux / Bash Exercises

Practice what you've learned