Skip to content
shellmap

Pretty-print an XML file

Indent and format an XML document for human reading.

How to pretty-print an xml file in each shell

Bashunix
xmllint --format data.xml

`xmllint` ships with libxml2 (almost universal on Linux + macOS). `--format` adds 2-space indentation. `--noblanks` strips existing whitespace before reformatting (useful for already-minified XML). `xmlstarlet fo data.xml` (xmlstarlet) is an alternative with more knobs: `--indent-tab` / `--indent-spaces 4` / `--encode utf-8`. Both validate XML as a side effect — `xmllint --noout data.xml` is the pure-validate idiom.

Zshunix
xmllint --format data.xml

Same external `xmllint`. macOS ships libxml2 (`/usr/bin/xmllint`). `brew install xmlstarlet` for the more user-friendly CLI. For one-shot inline: `cat data.xml | xmllint --format -` (the `-` reads stdin).

Fishunix
xmllint --format data.xml

Same external. Fish capture: `set -l xml (xmllint --format data.xml | string collect)`.

PowerShellwindows
$x = [xml](Get-Content data.xml -Raw); $sw = New-Object IO.StringWriter; $w = New-Object Xml.XmlTextWriter($sw); $w.Formatting='Indented'; $x.WriteContentTo($w); $sw.ToString()

pwsh `[xml]` cast loads into an `XmlDocument`. Manual writer chain produces indented output. Shorter via XDocument: `[System.Xml.Linq.XDocument]::Parse((Get-Content data.xml -Raw)).ToString()` — single line, returns indented XML. The default indent is 2 spaces; change via `XmlWriterSettings.IndentChars`. Both approaches preserve namespaces correctly (string-replace approaches don't).

cmd.exewindows
powershell -Command "[System.Xml.Linq.XDocument]::Parse((Get-Content data.xml -Raw)).ToString()"

cmd has no XML formatter. Shell out to pwsh (built-in Windows 10+). Legacy `msxsl.exe` (deprecated but downloadable) does XSLT-based formatting — overkill for pretty-print. `type data.xml` (cmd) just prints raw — no formatting.

Equivalents listed for Bash, Zsh, Fish, PowerShell, cmd.exe.

Gotchas & notes

  • **XML declaration + encoding**: a file starting with `<?xml version="1.0" encoding="UTF-8"?>` SHOULD be parsed as UTF-8 even if the OS default differs. `xmllint --format` honors the declaration; pwsh `[xml](Get-Content -Raw)` reads bytes per `Get-Content`'s encoding (`-Encoding utf8` for safety on pwsh 5.1 where default is W1252). Mismatched encodings produce mojibake AROUND non-ASCII characters but parse OK for ASCII-only documents — the bug surfaces only on i18n content.
  • **Namespace preservation**: pretty-printing must preserve `xmlns:foo` declarations and prefix bindings. `xmllint --format` is correct (it parses + re-serializes via libxml2's namespace-aware writer). String-replace approaches (`sed` to insert newlines) BREAK namespaces. pwsh `[xml]$x; $x.OuterXml` re-serializes via .NET's namespace-aware writer — correct. Validate with `xmllint --noout` after pretty-printing if you suspect breakage.
  • **Whitespace-sensitive XML — `xml:space="preserve"`**: some XML schemas (HTML embedded in XML, programming-language source in MathML/SVG `<text>`) require preserving whitespace. `xmllint --format` LOSES this whitespace by default — adding indentation alters character content inside `xml:space="preserve"` elements. For these documents, do NOT pretty-print; or use `xmllint --pretty 2` which is namespace + xml:space aware.
  • **Streaming vs DOM-load for huge XML**: `xmllint --format` loads the WHOLE document into RAM (libxml2 tree). A 4 GB XML file OOM-kills `xmllint`. For huge files: `xmllint --stream --format` (streaming SAX-style writer) keeps memory bounded but disables xpath queries. pwsh `[xml]` cast also DOM-loads — for huge XML, use `XmlReader` (forward-only, streaming) with a per-element-write loop — more code, but constant memory.

Related commands

Related tasks