Skip to content
shellmap

Remove duplicate lines from a file

Strip duplicate lines from a file, optionally preserving original order.

How to remove duplicate lines from a file in each shell

Bashunix
awk '!seen[$0]++' file.txt

`awk '!seen[$0]++'` keeps the FIRST occurrence of each line + PRESERVES original order. The expression `!seen[$0]++` is `true` only the first time it sees a line (post-increment + negation). `sort -u file.txt` is shorter but SORTS — destroys input order. `uniq` requires pre-sorted input (only de-dupes ADJACENT duplicates) — `sort file.txt | uniq` is equivalent to `sort -u`.

Zshunix
awk '!seen[$0]++' file.txt

Same external `awk`. macOS BSD `awk` is identical for this pattern. zsh also has `print -l ${(u)lines[@]}` if you've already loaded a file into an array (`-l` = newline-separated, `(u)` = unique). Order is preserved by `(u)`.

Fishunix
awk '!seen[$0]++' file.txt

Same external. Fish-native: `cat file.txt | string collect | string split \n | path sort -u` — but `awk` is far shorter and faster. For in-memory list dedup: `set -l uniq (cat file.txt | awk "!seen[\$0]++")`.

PowerShellwindows
Get-Content file.txt | Select-Object -Unique

`Select-Object -Unique` preserves input order — keeps first occurrence. NOTE: as of pwsh 7.4, `Select-Object -Unique` is CASE-INSENSITIVE by default (`Foo` and `foo` collapse to one) — pwsh 5.1 was CASE-SENSITIVE. For consistent case-sensitive behavior cross-version: `Get-Content file.txt | Sort-Object -Unique -CaseSensitive` (sorts + dedupes — destroys order). For order-preserving + case-sensitive: pipe through `Where-Object` with a hashtable: `$seen=@{}; Get-Content file.txt | Where-Object { -not $seen.ContainsKey($_) -and ($seen[$_]=$true) }`.

cmd.exewindows
powershell -Command "Get-Content file.txt | Select-Object -Unique"

cmd has no native dedupe. Shell out to pwsh. Or `sort /unique file.txt > out.txt` — but `sort` SORTS first (destroys order) and the dedupe is implicit in the sort. For order-preserving on cmd without pwsh: not possible in a one-liner; install Git for Windows for `awk`.

Equivalents listed for Bash, Zsh, Fish, PowerShell, cmd.exe.

Gotchas & notes

  • **Order-preserving vs sorted — pick deliberately**: `awk '!seen[$0]++'` keeps original order (first-seen wins) — log files, command history. `sort -u` rearranges into sorted order — IP lists, hostnames. They produce DIFFERENT outputs from the same input. `awk` is also faster than `sort -u` on large files because it's O(N) with a hashtable, vs `sort -u`'s O(N log N).
  • **Memory footprint**: `awk '!seen[$0]++'` keeps every distinct line in RAM (hashtable grows linearly with unique-count). For a 100 GB log with 99% duplicates, awk uses ~1 GB. `sort -u` uses constant disk via tmpfile-based external sort — slower but bounded memory. For a 100 GB log with mostly-distinct lines, neither is happy — pre-filter by column (`cut -f1 | sort -u`) before keeping the rest.
  • **Case sensitivity**: `awk '!seen[$0]++'` is CASE-SENSITIVE — `Foo` and `foo` are different. For case-insensitive: `awk '!seen[tolower($0)]++'`. `sort -fu` (fold case) collapses case-equivalent lines. pwsh `Select-Object -Unique` flipped from case-sensitive (5.1) to case-insensitive (6+) — explicitly add `-CaseSensitive` (Sort-Object only) for cross-version consistency.
  • **Trailing whitespace + line endings**: lines that differ only by trailing `\r`, ` `, `\t`, or invisible Unicode chars (NBSP `\u00a0`, zero-width-space `\u200b`) count as DISTINCT. Normalize before dedup: `awk '{sub(/[ \t\r]+$/,"")}!seen[$0]++' file.txt` strips trailing whitespace + `\r`. CRLF-vs-LF is the #1 cause of "the file has duplicates but `sort -u` doesn't remove them" mystery.

Related commands

Related tasks