Conversation
Uses quick-xml to parse and re-indent XML bodies, similar to how JSON responses are already formatted. Both buffered and streaming paths are supported. The xml.format and xml.indent format options are now accepted (previously rejected as unsupported). Default indent is 4 spaces, matching JSON. Whitespace-only text nodes are stripped during formatting so that already-indented XML gets re-indented cleanly without doubling up. Mixed content whitespace is preserved. Invalid XML falls back to syntax-highlight-only (no crash). Closes ducaale#231
|
@ducaale both comments addressed. Please let me know if you want these squashed into one commit |
blyxxyz
left a comment
There was a problem hiding this comment.
Very clean implementation, thank you!
The code looks good, haven't tested it yet.
- add xml format options to --format-options help text - log xml format errors with log::debug!() before fallback - handle Error::Io and Interrupted in format_xml - remove trailing blank line in tests/cli.rs
3ff0244 to
7e3f0d9
Compare
blyxxyz
left a comment
There was a problem hiding this comment.
Thanks for bearing with me. This time I tested it with a few different documents, particularly https://github.com/shlomif/perl-XML-LibXML/blob/master/example/test.xhtml. It works great!
The formatting isn't identical to HTTPie though:
Original XML
</p></dd><dt><strong><a name="item_toString">toString</a></strong></dt><dd><p><strong>toString</strong> is a deparsing function, so the DOM Tree can be translated into a string,
ready for output. The optional <strong>$format</strong> parameter sets the indenting of the output. This parameter is expected to
be an <em>integer</em> value, that specifies the number of linebreaks for each node. For more
information about the formatted output check the documentation of <em>xmlDocDumpFormatMemory</em> in <em>libxml2/tree.h</em> .
</p>xh output
</dd>
<dt>
<strong>
<a name="item_toString">toString</a>
</strong>
</dt>
<dd>
<p>
<strong>toString</strong> is a deparsing function, so the DOM Tree can be translated into a string,
ready for output. The optional <strong>$format</strong> parameter sets the indenting of the output. This parameter is expected to
be an <em>integer</em> value, that specifies the number of linebreaks for each node. For more
information about the formatted output check the documentation of <em>xmlDocDumpFormatMemory</em> in <em>libxml2/tree.h</em> .
</p>HTTPie output
</dd>
<dt>
<strong>
<a name="item_toString">toString</a>
</strong>
</dt>
<dd>
<p>
<strong>toString</strong>
is a deparsing function, so the DOM Tree can be translated into a string,
ready for output. The optional
<strong>$format</strong>
parameter sets the indenting of the output. This parameter is expected to
be an
<em>integer</em>
value, that specifies the number of linebreaks for each node. For more
information about the formatted output check the documentation of
<em>xmlDocDumpFormatMemory</em>
in
<em>libxml2/tree.h</em>
.
</p>
</dd>HTTPie changes the whitespace more. Each tag gets its own line even if it's nested between text nodes, and leading and trailing whitespace are removed.
It might be nice to do that too but it doesn't look like quick-xml has an easy way to do it? trim_text(true) produces results like this, which is IMO worse:
<p>As an equivalent of<strong>createElement</strong>, but it creates a<strong>Text Node</strong>bound to the DOM.</p>So I'm fine with merging this.
|
quick-xml gives you events one at a time (Text, Start, Text, End...). It doesn't give you "here's a mixed-content parent with all its children." You'd have to buffer children of each element, detect mixed content, then re-emit with custom splitting logic. |
Co-authored-by: Jan Verbeek <jan.verbeek@posteo.nl>
Co-authored-by: Jan Verbeek <jan.verbeek@posteo.nl>
Closes #231
Uses
quick-xmlto parse and re-indent XML response bodies. Works for both buffered and streaming (--stream) output.quick-xml0.38 is already in the dependency tree (viaplist/syntect), so this doesn't add a new crate to the lockfile.How it works
Reads XML events with
trim_text(false)to preserve meaningful whitespace (e.g.<p>Hello <b>world</b></p>), but skips whitespace-only text nodes so that already-indented input gets re-indented without doubling up.For the streaming+color path, a small
HighlightingWriteradapter bridges quick-xml's event-based output with syntect's line-based highlighting. quick-xml writes formatted bytes and callsflush()after each event, and the adapter highlights and forwards them to the terminal on each flush.Invalid XML falls back to syntax-highlight-only, same as how invalid JSON is handled.
Format options
--format-options=xml.format:falsedisables formatting--format-options=xml.indent:2changes indent (default 4, matching JSON)These were already recognized by the CLI but rejected as "Unsupported".
Not included
HTML pretty-printing. HTML needs error recovery, void elements, implicit closing. Fundamentally different from XML. Could be a separate PR if there's interest.