Skip to content
shellmap

Dedupe lines while preserving order

Remove duplicate lines from input but KEEP the first occurrence in its original position — for unique-but-sorted-by-recency lists, `$PATH` cleanup, and history dedup.

How to dedupe lines while preserving order in each shell

Bashunix
awk '!seen[$0]++' file.txt

The CANONICAL one-liner. Mechanism: `seen[$0]++` returns the current count BEFORE incrementing — so on first sight returns 0 (falsy → `!0=1` truthy → print); on subsequent sight returns 1+ (truthy → `!1=0` falsy → skip). Single-pass, O(n) hashtable memory.

Zshunix
awk '!seen[$0]++' file.txt
Fishunix
awk '!seen[$0]++' file.txt
PowerShellwindows
Get-Content file.txt | Select-Object -Unique

PRESERVES ORDER. Case-sensitivity changed at pwsh 6: 5.1 default is CASE-SENSITIVE, 6+ is CASE-INSENSITIVE. For 6+ case-sensitive: `Select-Object -Unique -CaseSensitive`. Alternative: `$seen = @{}; Get-Content file.txt | Where-Object { -not $seen.ContainsKey($_) -and ($seen[$_] = 1) }`.

cmd.exewindows
powershell -NoProfile -Command "Get-Content file.txt | Select-Object -Unique"

cmd has no native dedup. The `sort` command sorts (and DESTROYS ORDER) — even `sort /unique` (Win10+) sorts first. Shell out.

Equivalents listed for Bash, Zsh, Fish, PowerShell, cmd.exe.

Gotchas & notes

  • **`awk \'!seen[$0]++\'` is the canonical idiom** — taught in every Unix programming book published since 1990. The full expansion: `$0` is the whole line; `seen[$0]` is an associative-array lookup (auto-initialized to 0 / empty string on first access); `++` is POST-increment (returns the old value, then adds 1); `!` negates. So: first sight `!seen[$0]++` is `!0` is `1` (truthy, default action = print); second sight `!seen[$0]++` is `!1` is `0` (falsy, no print). Memory: one entry per unique line, in awk\'s internal hash. For 100M-line input with 99% duplicates, awk uses ~1 GB; sort-u uses tempfiles (slower but constant memory). For very large input with FEW unique lines: awk wins. For very large input with MANY unique lines and unsorted output OK: sort -u wins.
  • **`sort -u` vs `sort | uniq` vs `uniq`**: `sort -u` sorts AND dedupes (no order preservation). `sort | uniq` is identical in result to `sort -u` but worse (one extra fork). `uniq` ALONE only dedupes ADJACENT duplicates — `uniq` on unsorted input misses non-adjacent dups. `sort` then `uniq` is a 2-pass sort+dedup; `sort -u` is one pass. NONE of these preserve original order — that\'s the job of `awk !seen[$0]++`.
  • **Case-folding while deduping**: `awk \'!seen[tolower($0)]++\'` lowercases the LOOKUP KEY but prints the ORIGINAL line — so "Apple" and "apple" both count as the same; the FIRST one seen wins. `sort -fu` (`-f` fold case, `-u` unique) does the same case-insensitive dedup but DESTROYS ORDER. pwsh `Select-Object -Unique` is case-insensitive on 6+ by default (per the earlier note) — add `-CaseSensitive` if needed.
  • **Normalizing-before-dedup pitfalls**: trailing whitespace, CRLF vs LF line endings, NBSP / zero-width characters all make "logically identical" lines compare different. Pre-normalize: `awk \'{gsub(/[[:space:]]+$/, ""); print}\' | awk \'!seen[$0]++\'`. For `$PATH` dedup (the most common real-world use): `echo "$PATH" | tr ":" "\n" | awk \'!seen[$0]++\' | paste -sd:` — split on `:`, dedupe preserving order, rejoin with `:`. This eliminates "PATH bloat from idempotent appends" without changing the resolution order of remaining entries.

Related commands

Related tasks