Skip to content
shellmap

Extract a substring by regex

Pull a substring out of a string or a stream of input using a regex — for parsing log lines, extracting an ID from a URL, scraping a version number, or tokenizing a config.

How to extract a substring by regex in each shell

Bashunix
echo "v1.2.3-beta" | grep -oP '\d+\.\d+\.\d+'

`grep -o` prints ONLY the matched portion (not the whole line). `-P` is Perl-compatible regex (GNU grep only; macOS BSD grep doesn't have `-P`). For BSD/portable, `grep -oE '[0-9]+\.[0-9]+\.[0-9]+'` works (POSIX ERE without lookarounds). For named groups: `sed -nE 's/.*(v[0-9]+\.[0-9]+\.[0-9]+).*/\1/p'`. Bash built-in: `[[ "v1.2.3" =~ v([0-9]+\.[0-9]+\.[0-9]+) ]] && echo "${BASH_REMATCH[1]}"` — fastest (no fork) for pattern matching within a script.

Zshunix
echo "v1.2.3-beta" | grep -oP '\d+\.\d+\.\d+'

Same external. Zsh's regex match builtin: `[[ "v1.2.3" =~ v([0-9]+\.[0-9]+\.[0-9]+) ]] && echo "$match[1]"` (zsh uses `$match[1]`, bash uses `${BASH_REMATCH[1]}` — the `$match` syntax is zsh-specific). For multi-match capture across a file, `zsh` parameter expansion: `(${(M)lines:#*pattern*})` filters; combined with `sed` for the actual extraction.

Fishunix
echo "v1.2.3-beta" | string match -r '\d+\.\d+\.\d+'

Fish's `string` builtin is its native regex engine. `string match -r` (regex mode) prints the full match; `string match -rg` (groups) prints capture groups (one per line). `string replace -r` does substitution. No fork, fast. For "first match only": `string match -r --groups-only 'v(\d+\.\d+\.\d+)' "v1.2.3-beta"` returns `1.2.3`.

PowerShellwindows
"v1.2.3-beta" -match 'v(\d+\.\d+\.\d+)'; $matches[1]

The `-match` operator returns `$True`/`$False` AND populates the `$matches` automatic variable. `$matches[0]` is the full match; `$matches[1]`, `$matches[2]` are capture groups; `$matches['name']` for named groups (`(?<version>...)`). For multiple matches across a string: `[regex]::Matches("v1.2.3 and v4.5.6", 'v(\d+\.\d+\.\d+)') | ForEach-Object { $_.Groups[1].Value }`. The `[regex]` namespace is the full .NET regex API.

cmd.exewindows
powershell -NoProfile -Command "'v1.2.3-beta' -match 'v(\d+\.\d+\.\d+)'; $matches[1]"

cmd `findstr` has regex but no capture-group extraction (only line-output). For substring extraction, shell out to pwsh. `findstr /R /C:"v[0-9]*\.[0-9]*\.[0-9]*" file.txt` returns matching LINES only.

Equivalents listed for Bash, Zsh, Fish, PowerShell, cmd.exe.

Gotchas & notes

  • POSIX BRE (basic regex, `grep` default) vs POSIX ERE (`grep -E`) vs PCRE (`grep -P`) vs PowerShell regex (closer to .NET, near-PCRE-compatible) all differ subtly. The most portable: ERE with no lookarounds, no backrefs in groups. PCRE features to avoid for portability: `(?P<name>...)` named groups (Python syntax — works in Perl/PCRE/.NET; NOT in POSIX). For complex scripts, just pick one runtime (Python regex / pwsh regex / awk) and stick with it.
  • GREEDY vs LAZY matching is the most common regex bug. `.*` is greedy — matches as much as possible. `<a href="A">link1</a><a href="B">link2</a>` matched by `<a href="(.*)">` captures `A">link1</a><a href="B`. Use `.*?` (lazy) or character-class `[^"]*` (anchor-the-end) instead. pwsh defaults to greedy; `(?:...)` for non-capturing groups; `(?=...)` for lookaheads.
  • For "extract many things from one line", capture groups are cleaner than chained greps. Bash: `[[ "$line" =~ ([0-9]+)\.([0-9]+)\.([0-9]+) ]] && major="${BASH_REMATCH[1]}" minor="${BASH_REMATCH[2]}" patch="${BASH_REMATCH[3]}"`. pwsh: `if ($line -match '(\d+)\.(\d+)\.(\d+)') { $major=$matches[1]; $minor=$matches[2]; $patch=$matches[3] }`. Both are far cleaner than three grep invocations.
  • For "extract structured fields with relaxed regex" specifically (e.g. CSV row parsing, JSON-ish keys), use the right tool: CSV → `awk -F,` or `csvtool` or pwsh `Import-Csv`; JSON → `jq`; YAML → `yq`. Regex on structured data is fragile across quoting / escaping edge cases.

Related commands

Related tasks