Dedupe lines while preserving order

Remove duplicate lines from input but KEEP the first occurrence in its original position — for unique-but-sorted-by-recency lists, `$PATH` cleanup, and history dedup.

How to dedupe lines while preserving order in each shell

Bashunix

awk '!seen[$0]++' file.txt

The CANONICAL one-liner. Mechanism: `seen[$0]++` returns the current count BEFORE incrementing — so on first sight returns 0 (falsy → `!0=1` truthy → print); on subsequent sight returns 1+ (truthy → `!1=0` falsy → skip). Single-pass, O(n) hashtable memory.

Zshunix

awk '!seen[$0]++' file.txt

Fishunix

awk '!seen[$0]++' file.txt

PowerShellwindows

Get-Content file.txt | Select-Object -Unique

PRESERVES ORDER. Case-sensitivity changed at pwsh 6: 5.1 default is CASE-SENSITIVE, 6+ is CASE-INSENSITIVE. For 6+ case-sensitive: `Select-Object -Unique -CaseSensitive`. Alternative: `$seen = @{}; Get-Content file.txt | Where-Object { -not $seen.ContainsKey($_) -and ($seen[$_] = 1) }`.

cmd.exewindows

powershell -NoProfile -Command "Get-Content file.txt | Select-Object -Unique"

cmd has no native dedup. The `sort` command sorts (and DESTROYS ORDER) — even `sort /unique` (Win10+) sorts first. Shell out.

Equivalents listed for Bash, Zsh, Fish, PowerShell, cmd.exe.

Gotchas & notes

**`awk \'!seen[$0]++\'` is the canonical idiom** — taught in every Unix programming book published since 1990. The full expansion: `$0` is the whole line; `seen[$0]` is an associative-array lookup (auto-initialized to 0 / empty string on first access); `++` is POST-increment (returns the old value, then adds 1); `!` negates. So: first sight `!seen[$0]++` is `!0` is `1` (truthy, default action = print); second sight `!seen[$0]++` is `!1` is `0` (falsy, no print). Memory: one entry per unique line, in awk\'s internal hash. For 100M-line input with 99% duplicates, awk uses ~1 GB; sort-u uses tempfiles (slower but constant memory). For very large input with FEW unique lines: awk wins. For very large input with MANY unique lines and unsorted output OK: sort -u wins.
**`sort -u` vs `sort | uniq` vs `uniq`**: `sort -u` sorts AND dedupes (no order preservation). `sort | uniq` is identical in result to `sort -u` but worse (one extra fork). `uniq` ALONE only dedupes ADJACENT duplicates — `uniq` on unsorted input misses non-adjacent dups. `sort` then `uniq` is a 2-pass sort+dedup; `sort -u` is one pass. NONE of these preserve original order — that\'s the job of `awk !seen[$0]++`.
**Case-folding while deduping**: `awk \'!seen[tolower($0)]++\'` lowercases the LOOKUP KEY but prints the ORIGINAL line — so "Apple" and "apple" both count as the same; the FIRST one seen wins. `sort -fu` (`-f` fold case, `-u` unique) does the same case-insensitive dedup but DESTROYS ORDER. pwsh `Select-Object -Unique` is case-insensitive on 6+ by default (per the earlier note) — add `-CaseSensitive` if needed.
**Normalizing-before-dedup pitfalls**: trailing whitespace, CRLF vs LF line endings, NBSP / zero-width characters all make "logically identical" lines compare different. Pre-normalize: `awk \'{gsub(/[[:space:]]+$/, ""); print}\' | awk \'!seen[$0]++\'`. For `$PATH` dedup (the most common real-world use): `echo "$PATH" | tr ":" "\n" | awk \'!seen[$0]++\' | paste -sd:` — split on `:`, dedupe preserving order, rejoin with `:`. This eliminates "PATH bloat from idempotent appends" without changing the resolution order of remaining entries.

Dedupe lines while preserving order

How to dedupe lines while preserving order in each shell

Gotchas & notes

Related commands

Related tasks