Modify XML Files with Python: A Practical, Repeatable Workflow

XML shows up in places you least expect. One week it is an Android manifest, the next it is a legacy integration that still speaks SOAP, and then you are asked to patch a CI config stored as XML because a vendor tool only accepts that format. I have seen teams lose hours hand-editing tags, only to ship a file that looks correct but fails validation because a namespace or encoding slipped. When I modify XML, I want repeatable steps: parse safely, locate the exact node, apply a focused change, and write the file back in a predictable form. That process should be boring, not risky.

If you are here to change attributes, add new nodes, or update text without breaking the document, you are in the right place. I will walk you through how I do it with Python in 2026, focusing on xml.etree.ElementTree for most tasks, and calling out when I switch to other tools. You will get runnable examples, real-world pitfalls, and a clear mental model for text and structure so you can modify XML with confidence.

Why XML Still Shows Up in 2026

I meet XML in three common places. First, config and build systems: Maven POM files, Android manifests, some enterprise CI systems, and deployment descriptors still rely on XML because the ecosystems around them do. Second, document formats like Office Open XML and SVG are XML under the hood. Third, integrations that cannot easily move to JSON because of strict schemas, signatures, or compliance requirements still use XML. The point is not whether XML is trendy; the point is that you will keep touching it, and mistakes are expensive.

I treat XML as a tree first, text second. That mental shift matters because modifying XML is rarely about concatenating strings. You are editing a structure with ordering, attributes, and nested children. When you get that right, the output is clean, stable, and easy to test. When you get it wrong, you end up with missing namespaces, overwritten text, or empty tags that fail validation.

In modern Python work, I also care about the surrounding workflow. I run scripts in a small virtual environment, I keep changes deterministic so diffs stay clean, and I use type hints so refactors do not create silent errors. I also lean on AI-assisted tooling for query drafts or quick XPath ideas, then verify those against the XML tree in code. The machine helps, but I still validate by running the script and reviewing the output.

Mental Model: Elements, Attributes, Text, and Tail

The xml.etree.ElementTree module represents XML as a tree. The tree has two core concepts: ElementTree, which represents the document, and Element, which represents a node. I keep this model in my head every time I modify a file:

element.tag is the name of the node, like country or neighbor.
element.attrib is a dictionary of attributes.
element.text is the text inside the element before any child elements.
element.tail is the text that appears after the element but before the next sibling.
Child elements are stored as a list-like sequence you can iterate.

The last two fields, text and tail, are the tricky parts. If you only ever edit XML that is either simple or pretty printed, you might not notice them. But in mixed-content documents, tail can hold whitespace or actual data, and careless edits can move or delete it. I always inspect it when I am working with documents that include inline formatting or significant whitespace.

When I explain this to teammates, I use a simple analogy: elements are folders, attributes are labels on the folders, text is a sticky note inside the folder, and tail is a sticky note stuck to the outside of the folder before the next folder. It sounds silly, but it makes tail memorable, and that prevents subtle bugs.

Parsing and Navigating Without Surprises

I parse XML in two common ways: from a file on disk or from a string. For file work, I keep ElementTree around because it has the write method. For string data, I use ET.fromstring and then create a tree if I plan to write it back.

import xml.etree.ElementTree as ET
xml_text = ‘‘‘








‘‘‘
Parse from a string
root = ET.fromstring(xml_text)
Parse from a file and keep the tree for writing later
tree = ET.parse(‘countries.xml‘)
root = tree.getroot()
print(root.tag)  # COUNTRIES

Once I have the root element, I navigate by walking the tree. The most used methods are:

iter(tag) to walk the entire subtree.
findall(tag) to get direct child elements with a tag.
find(tag) to get the first matching child.
get(attr) to read a specific attribute.

for neighbor in root.iter(‘neighbor‘):
print(neighbor.get(‘name‘), neighbor.get(‘direction‘))
for country in root.findall(‘country‘):
name = country.get(‘name‘)
neighbor = country.find(‘neighbor‘)
print(name, neighbor.get(‘name‘))

For 2026 workflows, I often add a small safety layer. If a tag might not exist, I check for None and fail fast with a clear error, because a silent change to an empty list can be worse than a hard failure. I also keep my paths explicit, because ElementTree supports a subset of XPath and you do not want a query that accidentally matches more than it should.

Modifying Attributes, Text, and Structure

Most modifications fall into one of three categories: update attributes, change text, or change the structure. The good news is that ElementTree makes each of these easy once you have the right node.

Here is a realistic example: add a new neighbor, update a rank, and remove a legacy node. I often write these as small, composable functions that I can test.

import xml.etree.ElementTree as ET
xml_text = ‘‘‘


1




2




3



‘‘‘
root = ET.fromstring(xml_text)
Update a rank value
state = root.find("state[@name=‘GUJARAT‘]")
rank = state.find(‘rank‘)
rank.text = ‘5‘
Add a new neighbor element
new_neighbor = ET.SubElement(state, ‘neighbor‘)
new_neighbor.set(‘name‘, ‘MAHARASHTRA‘)
new_neighbor.set(‘direction‘, ‘S‘)
Remove a legacy neighbor by name
for neighbor in list(state.findall(‘neighbor‘)):
if neighbor.get(‘name‘) == ‘RAJASTHAN‘:
state.remove(neighbor)
print(ET.tostring(root, encoding=‘unicode‘))

A few details matter here. I call list(state.findall(‘neighbor‘)) before removing, because removing from a list while iterating can skip items. I set attributes with set because it makes intent clear. And I always update text as a string, even if the data is numeric, because ElementTree stores text as strings.

When I need to insert a new element in a specific position, I use insert(index, element) instead of SubElement, which always appends. For example, if you want to remain the first child, insert new elements after it instead of at the end. Order often matters for legacy systems.

Namespaces and Mixed Content Without Pain

Namespaces are where many XML edits go wrong. If you open a file and see xmlns=‘http://example.com/schema‘, you are in namespace land. The tag names in ElementTree now include the namespace URI in braces, which can look odd at first.

Here is a practical pattern I use:

import xml.etree.ElementTree as ET
xml_text = ‘‘‘

Alpha
Beta

‘‘‘
root = ET.fromstring(xml_text)
ns = {‘x‘: ‘http://example.com/schema‘}
item = root.find(‘x:item[@id="B2"]‘, namespaces=ns)
item.text = ‘Beta Prime‘
print(item.tag)  # {http://example.com/schema}item

The key is to define a namespace map and use a prefix in your queries. Do not strip namespaces unless you own the entire document and are sure no validators rely on them. If you need to output with a specific prefix, call ET.register_namespace before writing.

Mixed content is the other tricky area. Suppose you have


Hello world and friends

. Here, the p element has text of Hello , the b element has text of world, and the b element has tail of and friends. If you edit p.text without keeping tail in mind, you can lose part of the sentence. When I edit mixed content, I inspect text and tail explicitly, and I write small tests that compare the output string to a known good value.

Writing Back Safely and Predictably

Writing XML back to disk is where small choices can create messy diffs. I care about three things: encoding, XML declaration, and deterministic formatting. I also want writes to be atomic, so I do not leave a half-written file if a job is interrupted.

ElementTree can write with a declaration and a chosen encoding:

from pathlib import Path
import xml.etree.ElementTree as ET
root = ET.Element(‘settings‘)
ET.SubElement(root, ‘feature‘, enabled=‘true‘)
path = Path(‘settings.xml‘)
tree = ET.ElementTree(root)
Write with declaration and UTF-8 encoding
with path.open(‘wb‘) as f:
tree.write(f, encoding=‘UTF-8‘, xml_declaration=True)

Pretty printing is not built in, so I often run a small formatter. For small files, I use xml.dom.minidom to reformat the string. For large files, I avoid reformatting to reduce memory use and keep changes small. Here is a lightweight function that indents the tree in place without extra dependencies:

def indent(elem, level=0):
pad = ‘\n‘ + (‘    ‘ * level)
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = pad + ‘    ‘
for child in elem:
indent(child, level + 1)
if not elem.tail or not elem.tail.strip():
elem.tail = pad
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = pad

Call indent(root) before writing if you want consistent formatting. I also write to a temporary file and replace the target so the write is atomic:

from pathlib import Path
import os
path = Path(‘settings.xml‘)
tmppath = path.withsuffix(‘.xml.tmp‘)
tree.write(tmppath, encoding=‘UTF-8‘, xmldeclaration=True)
os.replace(tmp_path, path)

This pattern is simple and reliable. If the process stops mid-write, the original file stays intact.

Common Mistakes and Guardrails

I see the same XML mistakes repeatedly, so I add guardrails to avoid them.

Forgetting namespaces. Your find calls return None, you silently skip updates, and the output still looks valid. I always check that the nodes I plan to modify were actually found.
Editing the wrong node. If there are multiple tags with the same name, an unscoped find can target the wrong one. I anchor queries with attributes or a parent path.
Clobbering text in mixed-content elements. I inspect text and tail for any element that might contain inline tags.
Assuming attribute order matters. XML attribute order is not significant, and ElementTree may reorder them. If a downstream tool relies on order, that tool is brittle and I isolate the change or switch to a serializer that can preserve order.
Ignoring encoding. If the file includes non-ASCII characters, write with UTF-8 and include the XML declaration. I keep text in Python as str, not bytes.
Modifying while iterating. Removing elements while iterating over the same list can skip siblings. I wrap the list in list() when I need to delete items.
Overwriting unrelated whitespace. When diffs matter, I avoid global pretty printing and edit only what I must.

When performance matters, I measure, not guess. For small configs, ElementTree edits are often in the 5–30 ms range. For multi-megabyte docs, parse and write can land in the 80–300 ms range on a modern laptop, depending on the shape of the tree. If that is too slow, I move to lxml or consider a streaming approach with iterparse so I do not hold the whole tree in memory.

Choosing the Right Tool and a Practical Patch Workflow

I reach for xml.etree.ElementTree first because it is in the standard library and easy to deploy. But there are times when another tool is the better choice. Here is how I decide, framed as a modern workflow comparison.

Traditional approach

Modern approach in 2026

—

Manual edits in a text editor

Small Python patch script with tests

Full DOM load for huge files

iterparse or lxml streaming

Direct XML string concatenation

Tree edits with explicit node selection

No safety for untrusted input

defusedxml for safer parsing

Ad hoc diff checks

Automated checks in CIIf I need advanced XPath support, I use lxml.etree, which supports full XPath and is generally faster for large files. If I need schema validation, I look at xmlschema or a vendor-provided validator. If I parse untrusted XML, I switch to defusedxml.ElementTree to reduce risk from malicious payloads. And if I need JSON-like access, xmltodict can be a quick bridge for read-only tasks, but I rarely write with it because it can blur the line between attributes and elements.

Here is a compact patch workflow I use when I want a reliable, repeatable change. It is the pattern I run in CI and during deployments.

from pathlib import Path
import xml.etree.ElementTree as ET
import os
def patch_config(path: Path) -> None:
tree = ET.parse(path)
root = tree.getroot()
# Ensure target node exists
feature = root.find("feature[@name=‘search‘]")
if feature is None:
feature = ET.SubElement(root, ‘feature‘)
feature.set(‘name‘, ‘search‘)
# Update attribute and text
feature.set(‘enabled‘, ‘true‘)
feature.text = ‘enabled by deployment patch‘
# Write atomically
tmppath = path.withsuffix(‘.xml.tmp‘)
tree.write(tmppath, encoding=‘UTF-8‘, xmldeclaration=True)
os.replace(tmp_path, path)
patch_config(Path(‘app-config.xml‘))

This gives me an idempotent patch. If I run it multiple times, I get the same output. That makes it safe for automation and safe for humans who rerun scripts when a deployment fails.

Real-World Scenario: Patch an Android Manifest Safely

Android manifests are a classic example of XML that looks simple until you need to make a precision edit. Imagine a build system where you need to add an and update a version attribute. You also want to avoid destroying the manifest’s namespace or its formatting.

from pathlib import Path
import xml.etree.ElementTree as ET
import os
ANDROID_NS = ‘http://schemas.android.com/apk/res/android‘
ET.registernamespace(‘android‘, ANDROIDNS)
ns = {‘android‘: ANDROID_NS}
MANIFEST = Path(‘AndroidManifest.xml‘)
def patch_manifest(path: Path) -> None:
tree = ET.parse(path)
root = tree.getroot()
# Update versionCode and versionName on the manifest tag
root.set(f‘{{{ANDROID_NS}}}versionCode‘, ‘42‘)
root.set(f‘{{{ANDROID_NS}}}versionName‘, ‘2.3.0‘)
# Ensure INTERNET permission exists
permissions = root.findall(‘uses-permission‘)
has_internet = any(
p.get(f‘{{{ANDROID_NS}}}name‘) == ‘android.permission.INTERNET‘
for p in permissions
)
if not has_internet:
p = ET.Element(‘uses-permission‘)
p.set(f‘{{{ANDROID_NS}}}name‘, ‘android.permission.INTERNET‘)
# Insert permission near the top, after  attributes
root.insert(0, p)
tmppath = path.withsuffix(‘.tmp‘)
tree.write(tmppath, encoding=‘UTF-8‘, xmldeclaration=True)
os.replace(tmp_path, path)
patch_manifest(MANIFEST)

Two details matter here. First, the android namespace lives on attributes, so we must use the full {namespace}name form when setting or reading attribute values. Second, we use ET.register_namespace to preserve the android: prefix on output. If you skip that, Python will write a generic prefix like ns0, which can cause confusing diffs. The inserted permission does not need a namespace because it is not a namespaced tag in the manifest, only its attributes are.

Real-World Scenario: Update a Maven POM with Version Pins

Maven POM files use namespaces and a structured hierarchy. The typical change is pinning a dependency version or adding a plugin configuration. Here’s how I find a dependency by groupId and artifactId and update its version.

from pathlib import Path
import xml.etree.ElementTree as ET
POM = Path(‘pom.xml‘)
POM_NS = ‘http://maven.apache.org/POM/4.0.0‘
ET.registernamespace(‘‘, POMNS)
ns = {‘m‘: POM_NS}
def updatedependencyversion(path: Path, group: str, artifact: str, version: str) -> None:
tree = ET.parse(path)
root = tree.getroot()
deps = root.findall(‘.//m:dependencies/m:dependency‘, ns)
target = None
for dep in deps:
g = dep.find(‘m:groupId‘, ns)
a = dep.find(‘m:artifactId‘, ns)
if g is not None and a is not None and g.text == group and a.text == artifact:
target = dep
break
if target is None:
raise ValueError(f‘Dependency not found: {group}:{artifact}‘)
v = target.find(‘m:version‘, ns)
if v is None:
v = ET.SubElement(target, f‘{{{POM_NS}}}version‘)
v.text = version
tree.write(path, encoding=‘UTF-8‘, xml_declaration=True)
updatedependencyversion(POM, ‘org.slf4j‘, ‘slf4j-api‘, ‘2.0.13‘)

Notice the ET.registernamespace(‘‘, POMNS) call. Maven POM uses a default namespace, so I register the empty prefix to keep the file’s original shape. Without it, you often end up with prefixed tags in the output, which is noisy. This is also a good example of a strict match. I only update the dependency if both groupId and artifactId match; I do not touch anything else.

Edge Cases: Attributes vs Elements, and When Structure Lies

XML is flexible, sometimes too flexible. Different systems represent the same data in different ways. For example, you may see a feature flag as in one file and true in another. You cannot assume a consistent schema unless you validate it.

I handle this by writing small helper functions that normalize access.

def getboolattrorchild(elem, name: str) -> str | None:
# Prefer attribute, fall back to child text
if name in elem.attrib:
return elem.attrib[name]
child = elem.find(name)
return child.text if child is not None else None
Example usage
feature = root.find("feature[@name=‘search‘]")
value = getboolattrorchild(feature, ‘enabled‘)

This approach keeps your script resilient to minor schema differences, which is common in enterprise XML. I still log or assert if the format is not what I expect. Silent behavior is the enemy in patch scripts.

Another subtle edge case is when whitespace is meaningful. In some XML formats, such as certain template languages or document markup, whitespace is not just formatting. If you format or strip aggressively, you may break rendering. When I cannot confidently change whitespace, I avoid pretty printing entirely and modify only the specific element values.

Streaming Large XML with `iterparse`

For very large XML files, loading the entire tree can be expensive. iterparse lets you stream through the document and handle elements as they close. This is useful when you need to modify or remove a subset of elements without keeping everything in memory.

Here is a pattern that updates certain nodes while clearing elements to save memory. This example changes every node with status=‘legacy‘ to status=‘active‘.

import xml.etree.ElementTree as ET
from pathlib import Path
SOURCE = Path(‘big.xml‘)
TARGET = Path(‘big_patched.xml‘)
context = ET.iterparse(SOURCE, events=(‘end‘,))
root = None
for event, elem in context:
if root is None:
root = elem
if elem.tag == ‘item‘ and elem.get(‘status‘) == ‘legacy‘:
elem.set(‘status‘, ‘active‘)
# Clear children to keep memory low
elem.clear()
Write the updated tree
ET.ElementTree(root).write(TARGET, encoding=‘UTF-8‘, xml_declaration=True)

There is an important caveat: using iterparse with elem.clear() means you cannot freely access ancestors or siblings later. You need to plan your logic around the streaming nature. For more sophisticated streaming edits, I typically use lxml because it offers better incremental parsing tools.

Validation: Make It Hard to Ship Broken XML

If you are editing a file that has a schema, validate it. This is not optional in production work. A single missing namespace or a wrong element name can cost hours to debug downstream. Even a lightweight validation step can catch common errors.

Here is a minimal approach where you run a sanity check after modification. It does not require a formal schema; it just verifies key nodes exist and that the output parses cleanly.

from pathlib import Path
import xml.etree.ElementTree as ET
def sanity_check(path: Path) -> None:
tree = ET.parse(path)
root = tree.getroot()
# Example invariants
if root.find("feature[@name=‘search‘]") is None:
raise ValueError(‘search feature missing‘)
if root.find(‘settings‘) is None:
raise ValueError(‘settings node missing‘)
sanity_check(Path(‘app-config.xml‘))

If you have an XSD, you can validate against it using third-party libraries. It adds friction, but it saves you from subtle bugs. I treat schema validation as a release gate, not a development convenience.

When Not to Use Python for XML Edits

Most edits are perfect for Python. But I do not force it if another tool is simpler or safer.

If the change is a one-off and there is already a reliable CLI tool, I may use xmlstarlet or a vendor script.
If the XML is signed (like some SAML or SOAP security payloads), modifying it will break the signature. In that case, you must use the system that re-signs the payload, not a manual script.
If the XML is part of a build pipeline with a strict formatting rule, I match the existing tooling rather than reformatting in Python.
If the file is extremely large (hundreds of MB), streaming and specialized tooling may be more appropriate than loading into memory.

The key is to match the tool to the risk profile. Python is great, but only if you are honest about the constraints.

Deep Dive: Reliable Attribute Updates with Explicit Checks

A common real-world task is updating attributes across multiple nodes. The danger is updating more than you intended. Here is a pattern I use to ensure I only touch nodes that match a narrow filter.

import xml.etree.ElementTree as ET
xml_text = ‘‘‘



‘‘‘
root = ET.fromstring(xml_text)
Only enable the prod search service
for svc in root.findall("service[@name=‘search‘][@env=‘prod‘]"):
svc.set(‘enabled‘, ‘true‘)
print(ET.tostring(root, encoding=‘unicode‘))

Notice the double attribute filter. I am being explicit about name and env. This pattern is safer than using a broader query and then checking conditions in Python, because it gives you an exact match. If the filter returns zero elements, I treat it as a signal to double-check the file rather than silently continuing.

Working with Lists and Order-Sensitive XML

Some XML formats care about element order. For example, a schema may require before . If you append a new element at the end, you can break validation even though the XML is well-formed.

Here is a pattern that inserts in the right place:

import xml.etree.ElementTree as ET
root = ET.fromstring(‘demo1‘)
Insert  after 
name = root.find(‘name‘)
idx = list(root).index(name)
root.insert(idx + 1, ET.Element(‘description‘))
root.find(‘description‘).text = ‘example config‘
print(ET.tostring(root, encoding=‘unicode‘))

This keeps the order stable. I use this for config formats that are schema-validated or that have order-sensitive consumers.

Handling Comments and Processing Instructions

ElementTree does not preserve XML comments by default. If the file you edit relies on comments or processing instructions for documentation or tooling, you should either avoid using ElementTree or accept that comments might be lost. This is another case where lxml can be a better choice, because it can preserve comments and processing instructions.

If comments are important in your workflow, I recommend one of two approaches:

Use lxml and enable comment preservation.
Avoid re-serializing the entire file. Instead, make small edits in place with a targeted tool or a template-driven approach.

I treat comment loss as a real signal: if people care about comments in a file, losing them will cause confusion, even if the XML still works.

A Production-Grade Patch Script Template

When I write a reusable script for a team, I give it a predictable structure: parse, patch, validate, write. I also add logging and exit codes for CI. Here is a compact template you can adapt.

from future import annotations
from pathlib import Path
import xml.etree.ElementTree as ET
import os
import sys
def patch(root: ET.Element) -> None:
# Example: ensure feature exists and is enabled
feature = root.find("feature[@name=‘search‘]")
if feature is None:
feature = ET.SubElement(root, ‘feature‘, name=‘search‘)
feature.set(‘enabled‘, ‘true‘)
feature.text = ‘enabled by patch‘
def validate(root: ET.Element) -> None:
if root.find("feature[@name=‘search‘]") is None:
raise ValueError(‘feature search missing after patch‘)
def main(path: Path) -> int:
tree = ET.parse(path)
root = tree.getroot()
patch(root)
validate(root)
tmppath = path.withsuffix(‘.xml.tmp‘)
tree.write(tmppath, encoding=‘UTF-8‘, xmldeclaration=True)
os.replace(tmp_path, path)
return 0
if name == ‘main‘:
if len(sys.argv) != 2:
print(‘Usage: patch_xml.py ‘)
raise SystemExit(2)
raise SystemExit(main(Path(sys.argv[1])))

This style makes it easy to test by swapping in a string input or a test fixture file. It also makes the patch deterministic, which is crucial for automation.

Testing Strategy: Diff-Friendly and Focused

I keep XML patch tests small and explicit. I do not test the entire file; I test the exact part I change. Here is a simple test approach using a string fixture.

import xml.etree.ElementTree as ET
def patch(xml_text: str) -> str:
root = ET.fromstring(xml_text)
feature = root.find("feature[@name=‘search‘]")
if feature is None:
feature = ET.SubElement(root, ‘feature‘, name=‘search‘)
feature.set(‘enabled‘, ‘true‘)
return ET.tostring(root, encoding=‘unicode‘)
input_xml = ‘‘
outputxml = patch(inputxml)
assert "feature" in output_xml
assert "enabled=\"true\"" in output_xml

This test is not pretty, but it catches the functional behavior. For more structured tests, I parse the output and validate the node structure instead of comparing raw strings. That approach avoids brittle failures due to whitespace formatting changes.

Performance Considerations: Focus on the Right Bottleneck

I often see teams worry about parsing performance when their real bottleneck is disk I/O or validation. For small to medium XML files, parsing and writing is typically fast enough. If you are modifying XML in a tight loop across thousands of files, then speed matters.

A few tips that help without overengineering:

Keep parsing and writing in the same process to reuse Python’s overhead.
Avoid pretty printing in batch workflows; it adds significant overhead and noise.
If you only need a subset of data, iterparse can reduce memory footprint.
Use lxml if you need faster XPath or more control; it is often faster in practice for complex queries.
Batch writes and avoid rewriting files that do not change. I compute a checksum or compare the serialized output to the original before writing.

When I need data on performance, I measure before and after. My default approach is to instrument the script with a simple timer or use a profiler. That keeps me honest and prevents unnecessary optimization work.

Alternative Approaches: When You Need More Power

There are times when ElementTree is not enough. Here is a quick map of alternatives and when I reach for them.

lxml.etree: Full XPath, better namespace handling, comment preservation, and often faster. I use it for complex queries or when I must preserve comments.
defusedxml: Safer parsing for untrusted XML. This is a security move for ingestion or API input.
xmlschema: Schema validation and type-aware parsing. I use it in regulated or strictly validated environments.
xmltodict: Quick read access with a dict-like interface, mostly for read-only tasks or quick inspection.
xmlstarlet or CLI tools: Good for quick one-off edits or shell pipelines when Python is overkill.

I am not loyal to a tool; I am loyal to reliable output. If switching tools reduces risk or improves clarity, I do it.

Modern Tooling and AI-Assisted Workflow

My day-to-day workflow mixes scripting with assisted generation. I often ask an AI assistant to draft XPath queries or suggest a tree traversal, then I validate it against the real XML. This is helpful when dealing with deep or unfamiliar schemas. The guardrail is simple: I never trust generated queries without a test or a quick inspect of the parsed tree.

For example, I might use AI to guess where a config is stored in a long XML, then I verify by printing the matching nodes or counts. If the AI suggests a broad query that could hit multiple nodes, I tighten it manually. The combination is powerful: fast drafts, deliberate verification.

If you adopt this workflow, keep it disciplined. Add print checks or assertions, and never ship a patch script that you have not run on a real sample file.

A Practical Checklist Before You Ship

I keep a short checklist before I ship XML edits, especially in CI or deployment contexts.

Parse the file and confirm the root tag is what I expect.
Locate the exact node(s) and assert the count matches expectation.
Make the smallest change possible; avoid global reformatting.
Write atomically and preserve encoding with XML declaration.
Re-parse the output to ensure it is well-formed.
If a schema exists, validate against it.
Compare a minimal diff to ensure only intended changes are present.

This checklist prevents 90% of XML-related incidents I have seen.

Closing: The Practical Path I Recommend

When you modify XML with Python, I want you to keep it boring and predictable. Parse into a tree, select the exact node, make the smallest change, and write it back with a clear encoding and atomic replace. Respect namespaces, be explicit about order when it matters, and add validation when the schema is strict. That mindset turns XML from a source of anxiety into a routine automation task.

If you take away one thing, let it be this: treat XML as structure, not as a string. The more you lean into the tree model and explicit selection, the fewer surprises you will face. That is what makes XML edits safe, repeatable, and fast to ship.

Why XML Still Shows Up in 2026

Mental Model: Elements, Attributes, Text, and Tail

Parsing and Navigating Without Surprises

Parse from a string

Parse from a file and keep the tree for writing later

tree = ET.parse(‘countries.xml‘)

root = tree.getroot()

Modifying Attributes, Text, and Structure

Update a rank value

Add a new neighbor element

Remove a legacy neighbor by name