Count words in a file
Count the number of whitespace-separated words in a file or piped text.
How to count words in a file in each shell
wc -w file.txt`wc` (word count). Flags: `-w` words, `-l` lines, `-c` BYTES (not chars), `-m` CHARS (locale-aware multi-byte), `-L` longest-line length. `wc file.txt` (no flag) prints all three: `lines words bytes filename`. From stdin: `cat file.txt | wc -w` works but `wc -w < file.txt` is faster (no `cat` fork).
wc -w file.txtSame external `wc`. macOS BSD `wc` is bit-compatible with GNU for `-w`/`-l`/`-c`/`-m`. zsh has no built-in word counter; the GNU/BSD external is the canonical answer.
wc -w file.txtSame external. Fish capture: `set -l words (wc -w < file.txt | string trim)`. The `string trim` strips leading whitespace (wc prefixes output with spaces for alignment).
(Get-Content file.txt | Measure-Object -Word).Words`Measure-Object -Word` counts whitespace-separated tokens in each input string. `-Line` for lines, `-Character` for chars. Note `Measure-Object -Word` works on STRING input — `Get-Content` reads the file as lines (array of strings); `Get-Content -Raw` reads as one big string (still works). For BYTE count, use `(Get-Item file.txt).Length`.
powershell -Command "(Get-Content file.txt | Measure-Object -Word).Words"cmd has no native word counter. Shell out to pwsh (Windows 10+). Or `find /c /v "" file.txt` counts LINES (not words). For a quick line count: `type file.txt | find /c /v ""`. No built-in word equivalent.
Equivalents listed for Bash, Zsh, Fish, PowerShell, cmd.exe.
Gotchas & notes
- **`wc -w` definition of "word"**: a maximal run of non-whitespace characters. Whitespace = space, tab, newline, vertical-tab, form-feed, carriage-return (per `isspace(3)`). Punctuation INSIDE a word counts as part of it: `it's` is one word; `hello,world` (no space after comma) is one word. To split on punctuation too: `tr -c 'a-zA-Z' '\n' < file.txt | grep -c .` — replace non-alphabetic with newlines, then count non-empty lines.
- **`-c` is BYTES, `-m` is CHARS**: a 100-byte UTF-8 file with 50 multi-byte characters has `wc -c = 100`, `wc -m = 50`. For "characters" in the user-facing sense, ALWAYS use `-m` (and ensure `LC_ALL=en_US.UTF-8` or similar — otherwise `-m` may fall back to byte-counting). `LC_ALL=C wc -m file.txt` is equivalent to `wc -c`.
- **pwsh `Measure-Object -Word` quirks**: counts whitespace-separated tokens per INPUT STRING. `Get-Content file.txt` reads the file as an array of strings (one per line) — `Measure-Object -Word` then sums tokens across all lines. Result is identical to `wc -w` for ASCII text. For UTF-8 multi-byte, pwsh uses .NET's `String.Split` which is Unicode-aware — multi-byte text counts cleanly (no locale dependency, unlike GNU `wc`).
- **Cross-OS gotcha — line endings**: a CRLF-terminated file (Windows-authored text on Unix) has `\r\n` line endings. GNU `wc -w` correctly treats `\r` as whitespace (BSD `wc` also). pwsh `Measure-Object -Word` similarly. No bug — but BYTE count (`-c`) differs by N bytes for N lines because of the extra `\r` per line. For exact-byte file size, use `(Get-Item file.txt).Length` (pwsh) or `stat -c%s file.txt` (Linux).