Skip to content
shellmap

Extract email addresses from text

Pull every email address out of a log file or block of text.

How to extract email addresses from text in each shell

Bashunix
grep -Eo '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' file.txt

`grep -E` enables extended regex (no escaping `+` `{}`). `-o` (only-matching) prints just the match, not the line. `[A-Za-z]{2,}` enforces a 2+ char TLD (filters out trailing punctuation like `foo@bar.`). For UNIQUE emails: pipe through `sort -u`. For lowercase normalization: pipe through `tr A-Z a-z`. Add `-i` to grep for case-insensitive matching (rarely needed since the regex already accepts both cases).

Zshunix
grep -Eo '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' file.txt

Same external `grep`. macOS ships BSD `grep` which supports `-E -o` identically. PCRE features (`\d`, `\w`, lookaheads) are GNU-only via `grep -P` — BSD `grep` does NOT have `-P`. For PCRE on macOS: `brew install grep` → `ggrep -P`.

Fishunix
grep -Eo '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' file.txt

Same external. Fish-native string regex: `string match -ar '[\w.+-]+@[\w.-]+\.\w{2,}' < file.txt` — `-a` all matches, `-r` regex. Built-in, no `grep` fork, supports POSIX-extended.

PowerShellwindows
Select-String -Path file.txt -Pattern '[\w.%+-]+@[\w.-]+\.[A-Za-z]{2,}' -AllMatches | ForEach-Object { $_.Matches.Value }

pwsh regex is .NET — supports `\w` (word char incl underscore), `\d`, and PCRE features. `Select-String` defaults to LINE matching; `-AllMatches` returns every regex hit per line (not just the first). `$_.Matches.Value` extracts the matched substring. `-CaseSensitive` flag toggles case mode (default INsensitive on pwsh, opposite of grep).

cmd.exewindows
findstr /R "[A-Za-z0-9._+%-][A-Za-z0-9._+%-]*@[A-Za-z0-9.-][A-Za-z0-9.-]*\.[A-Za-z][A-Za-z]*" file.txt

`findstr /R` uses a LIMITED regex flavor — no `+`, `{N,M}`, or character-class shorthand. The `[A-Za-z]+` idiom must be expanded to `[A-Za-z][A-Za-z]*` (one char + zero-or-more). For real regex on cmd: shell out to pwsh — `powershell -Command "(Select-String -Path file.txt -Pattern \"\[\\w.+-\]+@\[\\w.-\]+\\.\\w{2,}\" -AllMatches).Matches.Value"`.

Equivalents listed for Bash, Zsh, Fish, PowerShell, cmd.exe.

Gotchas & notes

  • **RFC 5322 email regexes are notoriously long** — the full grammar admits quoted local-parts (`"weird user"@example.com`), comments, IP-literal domains (`user@[192.168.1.1]`), internationalized addresses (`uż[email protected]`). The 99% practical regex `[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}` covers Gmail / Outlook / corporate addresses but rejects valid RFC addresses. Pick the right precision for the use case — log mining tolerates false negatives; address validation needs a library, not a regex.
  • **TLD length minimum**: requiring `{2,}` filters out trailing punctuation matches (`alice@example.` from `Mail alice@example. now`) but excludes single-letter top-level domains (none currently exist) and the rare IP-literal `user@[1.2.3.4]`. For domain-with-port (`[email protected]:25`), the regex stops correctly at the `:` because `:` isn't in the TLD class.
  • **Avoid newline-spanning matches**: most regex engines treat `.` as "not newline" so embedded newlines break a match — that's the desired behavior here (a regex spanning two lines from `mailto:alice@\nexample.com` would be a bug). pwsh `-AllMatches` operates per LINE (Select-String reads line-by-line). For RAW multi-line input: `Get-Content -Raw file.txt | Select-String -Pattern ... -AllMatches`.
  • **Privacy + GDPR — emails are personal data**: extracting emails from logs creates a personal-data export — under GDPR / CCPA this is a regulated activity. Retain the smallest set you need, store in a controlled location, and delete on a schedule. Don't commit the output to git. Don't paste into a third-party regex tester (web testers like regex101 PERSIST input on shared infra — the obvious-looking testing step leaks the data you just extracted).

Related commands

Related tasks