Skip to content
shellmap

Parse a URL into its scheme, host, port, path, and query

Extract individual pieces of a URL (scheme, userinfo, host, port, path, query, fragment) — for log parsing, building tooling that rewrites endpoints, splitting connection strings, or generating routing tables from a manifest.

How to parse a url into its scheme, host, port, path, and query in each shell

Bashunix
python3 -c "from urllib.parse import urlparse; u=urlparse(\"https://user:[email protected]:8080/api/v2?key=value#section\"); print(u.scheme, u.hostname, u.port, u.path, u.query)"

Python `urlparse` is the only truly safe POSIX-portable URL parser available in standard tooling. The pure-bash alternative — `${url#*://}` parameter expansion + nested expansions — works for simple cases but FAILS on IPv6 hostnames (`[::1]:8080`), userinfo with `@` in the password, percent-encoded path segments, or query strings with `=` in values. For one-shot extraction `python3` is consistent across distros; for higher-throughput parsing, write a Python script not a chain of `sed`/`awk`.

Zshunix
python3 -c "from urllib.parse import urlparse; u=urlparse(\"https://user:[email protected]:8080/api/v2?key=value#section\"); print(u.scheme, u.hostname, u.port, u.path, u.query)"
Fishunix
python3 -c "from urllib.parse import urlparse; u=urlparse('https://user:[email protected]:8080/api/v2?key=value#section'); print(u.scheme, u.hostname, u.port, u.path, u.query)"
PowerShellwindows
$u = [System.Uri]"https://user:[email protected]:8080/api/v2?key=value#section"; $u | Format-List Scheme, Host, Port, AbsolutePath, Query, Fragment, UserInfo

`[System.Uri]` is the canonical .NET parser — RFC 3986 compliant, handles IPv6 (`[::1]` brackets), IDN (Unicode hostnames), userinfo, percent-encoded everything. Properties: `Scheme`, `Host`, `Port` (returns -1 if default port), `AbsolutePath`, `Query` (starts with `?`), `Fragment` (starts with `#`), `UserInfo`. The `.Port` returns the IMPLICIT default if the URL didn't specify one (so `https://example.com` → `.Port = 443`); use `.IsDefaultPort` to check whether the port was explicit.

cmd.exewindows
powershell -NoProfile -Command "[System.Uri]'https://user:[email protected]:8080/api/v2?key=value' | Format-List"

No native URL parser in cmd. Shell to pwsh for `[System.Uri]`. Crude cmd hack: `for /f "tokens=1,2 delims=://" %a in ("%URL%") do @echo scheme=%a host=%b` — works for the simplest `scheme://host` only, breaks on EVERY edge case (port, path, userinfo, IPv6). Don't roll-your-own.

Equivalents listed for Bash, Zsh, Fish, PowerShell, cmd.exe.

Gotchas & notes

  • **Pure-bash parsing — the regex-trap minefield**. Common pattern: `proto="${url%%://*}"; rest="${url#*://}"; host="${rest%%/*}"; path="${rest#*/}"`. This breaks on: (a) IPv6 hostnames `https://[::1]:8080/foo` — `rest%%/*` returns `[::1]:8080` correctly but extracting just the host (without port) needs another pass; (b) URLs without paths `https://example.com` — `rest#*/` returns `https://example.com` (no slash to strip); (c) userinfo with `@` in password `https://user:p@[email protected]/` — `rest%%@*` greedy-matches wrong; (d) percent-encoded delimiters `https://example.com/path%2Fwith%2Fslash` — the encoded `/` shouldn't split. Three failure modes any of which silently produce wrong data. Use Python (always available) or `[System.Uri]` (pwsh) — never roll-your-own regex for URLs in production.
  • **Query string parsing — different problem, different tool**. Splitting a query into `key=value` pairs needs URL-decoding of EACH side: `?a=hello%20world&b=1+2` → `{a: "hello world", b: "1 2"}` (note `+` decodes to space ONLY in `application/x-www-form-urlencoded`, NOT in the path). Python: `from urllib.parse import parse_qs; parse_qs(u.query)` returns `{"a": ["hello world"], "b": ["1 2"]}` (always lists — same key can repeat). pwsh: `[System.Web.HttpUtility]::ParseQueryString($u.Query)` (requires `Add-Type -AssemblyName System.Web` on pwsh 5.1; built-in on pwsh 7). jq for JSON contexts: `echo "$json" | jq -r ".url | capture(\".*?\\\\?(?<q>.*)\") .q"` — clunky, prefer Python.
  • **IDN / punycode — when the hostname is Unicode**. `https://日本.jp` is a valid URL with an IDN hostname. Browsers display it in Unicode but speak ASCII (`xn--wgv71a.jp`) on the wire. `[System.Uri]` decodes `.IdnHost` (punycode form `xn--...`) vs `.Host` (Unicode form). Python: `import idna; idna.encode("日本.jp")` → `b'xn--wgv71a.jp'`. Crucial when: (a) generating Host headers manually (must use ASCII punycode), (b) cert SAN matching (cert lists ASCII form), (c) DNS queries (DNS protocol is ASCII). The `python3 -c "from urllib.parse import urlparse"` form transparently produces the punycode in `.hostname` since Python 3.6 — so the basic recipe above already handles it correctly.
  • **RFC 3986 corner cases that break naive parsers**: (a) `mailto:[email protected]` — scheme is `mailto`, NO `//` authority section, the "path" is the email address; (b) `tel:+1-555-1234` — same scheme-with-no-authority pattern; (c) `data:text/plain;base64,SGVsbG8=` — `data:` URLs encode payload inline; (d) `file:///path/to/file` — three slashes (empty authority + absolute path); (e) `https://example.com:` — colon with empty port (rare but legal) — `[System.Uri]` returns `Port=-1`, python `urlparse` returns `port=None`. ALWAYS test the parser against RFC 3986 examples + the WHATWG URL Standard test suite if the URL stream is user-input.
  • **Reconstructing a URL after editing one piece** — composing back is the inverse problem and equally trap-laden. Python: `from urllib.parse import urlunparse, urlencode; urlunparse((scheme, netloc, path, params, query, fragment))` — `netloc` is `user:pass@host:port` already-assembled (don't pass them separately). pwsh: `[System.UriBuilder]::new("https", "example.com", 8443, "/api", "?k=v")` then `.Uri.ToString()`. Common mistake: string-concatenating `"https://" + host + ":" + port + path` — breaks for IPv6 (needs `[]`), produces double-slash if `path` is empty or already starts with `/`, doesn't escape special chars in path. `UriBuilder` handles all of this correctly.

Related commands

Related tasks