XML is designed as a human-readable markup language, but only if it is pretty formatted. If you open an XML file, you may fail to understand the content, especially for a large XML file. However, you can print an XML on Bash or Python to make it easily human-readable. This comprehensive guide presents the different ways to print XML in Linux Bash and Python pretty.

Why Pretty Print XML Files?

When working with XML data, having a pretty printed version makes it easier for humans to read and understand the hierarchical structure. Some key benefits of pretty printing XML files include:

  • Improved readability: Properly indented XML with line breaks is much easier to visually parse for humans compared to minified XML.

  • Easier editing: Pretty printed XML can be easily edited by hand in a text editor if needed since the formatting makes the content structure clear.

  • Debugging assistance: When debugging XML processing code, having a formatted XML file simplifies spotting issues quickly.

  • Code sharing: Pretty XML code is easier to share and discuss with others compared to minified XML.

So while XML parsers can handle minified XML just fine, having a pretty printed version offers significant advantages for direct human consumption.

How to Pretty Print XML in Python

Python has great XML support through various built-in libraries. When working with XML data in Python, you may need to pretty print an XML string or file to inspect the contents. Here are a few different ways to accomplish this.

Using xml.dom.minidom

The xml.dom.minidom module provides a simple way to parse XML into a DOM object and pretty print it back as XML.

Here‘s an example:

import xml.dom.minidom

xml_string = ‘‘‘
<books>
  <book>
    <title>Data Science for Beginners</title>
    <author>Mark Newman</author>
    <pages>250</pages>
  </book>
  <book>
    <title>Automate the Boring Stuff</title>
    <author>Uma Student</author>
    <pages>425</pages>
  </book>
</books>‘‘‘

dom = xml.dom.minidom.parseString(xml_string)
pretty_xml_string = dom.toprettyxml()

print(pretty_xml_string)

This parses the XML string into a DOM object, pretty prints it using the toprettyxml() method, and prints the output.

The key things to note are:

  • Import xml.dom.minidom to work with the DOM APIs
  • Use parseString() to parse an XML string into a DOM
  • Call toprettyxml() on the DOM object to pretty print back to an XML string

This prints the XML with proper indentation and line breaks:

<?xml version="1.0" ?>
<books>
    <book>
        <title>Data Science for Beginners</title>
        <author>Mark Newman</author>
        <pages>250</pages>
    </book>
    <book>
        <title>Automate the Boring Stuff</title>
        <author>Uma Student</author>
        <pages>425</pages>
    </book>
</books>

We can also load and pretty print an external XML file using parse():

dom = xml.dom.minidom.parse(‘books.xml‘)
print(dom.toprettyxml()) 

Using lxml

The lxml module is another popular XML handling library for Python. It provides an lxml.etree API for XML manipulation along with tools like prettify() to pretty print XML.

Here is an lxml example:

from lxml import etree

xml_string = ‘‘‘
<books>
  <book>
    <title>Data Science for Beginners</title>
    <author>Mark Newman</author>
    <pages>250</pages>
  </book>
  <book>  
    <title>Automate the Boring Stuff</title>
    <author>Uma Student</author>
    <pages>425</pages>
  </book>
</books>‘‘‘

root = etree.fromstring(xml_string)
print(etree.tostring(root, pretty_print=True).decode())

The key differences from minidom are:

  • Import lxml.etree for the XML APIs
  • Use fromstring() to parse an XML string into an etree object
  • Call tostring() on the root node with pretty_print=True to pretty print the XML
  • Decode the bytes output to get a string

This prints the pretty formatted XML:

<books>
  <book>
    <title>Data Science for Beginners</title>
    <author>Mark Newman</author>
    <pages>250</pages>
  </book>
  <book>
    <title>Automate the Boring Stuff</title>
    <author>Uma Student</author>
    <pages>425</pages>
  </book>
</books>

We could also use parse() to load and parse an external XML file.

So lxml provides more power and flexibility in working with XML while also allowing easy pretty printing.

Using xmlformatter

For pretty printing XML without needing to parse it, we can use the xmlformatter package.

The xmlformatter module works directly on XML strings and handles indentation, line breaks, and other formatting automatically:

import xmlformatter

xml_string = ‘‘‘<?xml version="1.0" encoding="UTF-8"?>
<books><book id="1"><author>Mark Newman</author><title>Data Science for Beginners</title></book><book id="2"><author>Uma Student</author><title>Automate the Boring Stuff</title></book></books>‘‘‘

print(xmlformatter.format_xml(xml_string))

This pretty prints the XML string directly without needing to parse it into a DOM or etree object first:

<?xml version="1.0" encoding="UTF-8"?>
<books>
  <book id="1">
    <author>Mark Newman</author>
    <title>Data Science for Beginners</title>
  </book>
  <book id="2">
    <author>Uma Student</author>
    <title>Automate the Boring Stuff</title>
  </book>
</books>

The xmlformatter module handles all the indentation, line breaks, and formatting out of the box. This provides a simple way to pretty print XML strings directly.

How to Pretty Print XML in Linux Bash

In addition to Python, Linux Bash also makes it easy to format XML files on the command line for improved readability.

Here are a few different options for pretty printing XML using Bash tools.

Using xmllint

The xmllint command line tool is included with the libxml2-utils package on most Linux distributions.

To pretty print an XML file with proper indentation:

xmllint --format input.xml

For example:

xmllint --format books.xml

This formats the XML file and prints the output to stdout.

We can also write the output to a file:

xmllint --format books.xml > formatted_books.xml

Now formatted_books.xml will contain the pretty printed XML.

Using xmlstarlet

The xmlstarlet tool is another popular choice for command line XML manipulation on Linux.

To pretty print XML files with xmlstarlet:

xmlstarlet format input.xml

For example:

xmlstarlet format books.xml 

This indents the XML properly and prints it.

We can also use output redirection just like with xmllint:

xmlstarlet format books.xml > formatted_books.xml

So xmlstarlet provides another handy pretty printing option if working directly with XML files from the Linux command line.

Using the Python xml.dom.minidom

Since Python also runs great on Linux, we can reuse the Python example from earlier for pretty printing XML files:

python3 -c "import xml.dom.minidom; dom = xml.dom.minidom.parse(‘books.xml‘); print(dom.toprettyxml())"

Here we directly invoke the Python interpreter and use the xml.dom.minidom module to load, pretty print, and output an XML file in a single one-liner command.

This allows reusing Python‘s powerful XML abilities directly from Bash.

When to Pretty Print XML?

Here are some common use cases where having a pretty printed XML version is useful:

  • Viewing XML config files: Pretty print system config files like Apache .conf files for readability.

  • Debugging XML data: Format XML datasets, feeds, or responses to inspect contents.

  • Generating sample XML: Creating formatted XML files to use as demo data.

  • Improving XML editor experience: Manually editing pretty XML in IDEs or text editors.

  • Print debugging: Include pretty printed XML in stack traces to simplify debugging.

  • Code sharing: Formatted XML is easier to share and discuss with coworkers.

So while XML parsing works fine with minified XML, having an indented and formatted XML version makes many developer tasks simpler.

Conclusion

This guide has covered multiple techniques to pretty print XML using both Python and Linux Bash tools. The key takeaways are:

  • In Python, use xml.dom.minidom, lxml, or xmlformatter for pretty printing.
  • In Linux Bash, use xmllint or xmlstarlet for formatting XML files.
  • Pretty printing improves XML readability and editing for developers.

Being able to generate properly indented XML with line breaks ensures your XML data remains human-friendly. Both Python and Bash offer great options to keep your XML pretty.

Similar Posts