XML still shows up in real systems when you least expect it: vendor feeds, config files, legacy APIs, and those “just one quick change” export jobs that suddenly become weekly tasks. I’ve spent plenty of time triaging broken XML, and the pattern is consistent—most bugs are caused by small, silent edits that aren’t validated, or by scripts that treat XML like plain text. If you’ve ever grepped and replaced a tag only to discover the file no longer parses, you know the pain.
I’ll show you a reliable, repeatable way to modify XML with Python using the standard library. You’ll learn how to parse and inspect a document, update attributes and text safely, add or remove elements, preserve formatting where possible, and write changes back without breaking consumers. I’ll also cover common mistakes, when XML is the right tool, and how I approach performance and validation in 2026 workflows.
Along the way I’ll use concrete, real-world examples—think customer orders, deployment manifests, and compliance reports—so you can adapt the patterns directly to your work.
Why XML Still Matters (and When It Doesn’t)
I rarely choose XML for new systems, but I still edit it weekly. The reasons are practical: certain enterprise integrations require it, some SaaS vendors only support XML uploads, and a lot of infrastructure tooling (including older build pipelines and device configs) expects XML. If you’re in finance, healthcare, or manufacturing, there’s a decent chance you’re touching XML today.
That said, you shouldn’t reach for XML by default. I recommend XML when you need:
- Mixed content (text plus markup) like documentation or markup-heavy data
- Complex hierarchical data with attributes that carry meaning
- Validation via XSD, especially in regulated environments
- Interoperability with older systems or industry standards
I avoid XML when:
- You control both ends and can use JSON or protobuf
- The document is huge and performance is the primary goal
- You need strict ordering and minimal overhead for machine-only data
If you’re modifying XML because it’s the contract you’ve inherited, the key is to treat it as a tree, not text. Python’s xml.etree.ElementTree is the fastest path to safe edits for most use cases.
A Reliable Mental Model: XML as a Tree
When I explain XML to junior engineers, I use a “directory tree” analogy. Each element is a folder, attributes are metadata on the folder, and text nodes are the files inside it. That helps clarify why naive string editing breaks things: you’re cutting across tree boundaries.
Python’s xml.etree.ElementTree gives you two core concepts:
ElementTree: the whole documentElement: a single node in that tree
You parse a document into a tree, walk the tree, modify nodes, and write it back. That’s the core loop.
Here’s a small XML example that we’ll build on:
Rina Patel
[email protected]
Technical Writing Handbook
JavaScript Patterns
This is simple, but it already has attributes (id, status, sku, quantity), text nodes (item names), and nested elements. We can safely manipulate it with ElementTree.
Parsing and Inspecting XML the Right Way
I prefer parsing from a file for production work and from a string for test fixtures. Here’s both:
import xml.etree.ElementTree as ET
From a file
tree = ET.parse("orders.xml")
root = tree.getroot()
From a string (useful in tests)
xml_payload = """
"""
rootfromstring = ET.fromstring(xml_payload)
Once you have a root, you can inspect it:
print(root.tag) # Orders
print(root.attrib) # {}
print(len(list(root))) # number of children
I rely on these core patterns to navigate:
find()for the first matching childfindall()for direct childreniter()to walk the whole subtree
For example, to list all items across all orders:
for item in root.iter("Item"):
print(item.get("sku"), item.text)
In my experience, using iter() is less error-prone for deeply nested documents because you don’t need to know the exact structure in advance.
Modifying Attributes and Text Safely
Most edits fall into two categories: attributes and text. I recommend starting with the attribute approach because it’s explicit and keeps the markup clean.
Update an attribute
Let’s change an order status from pending to shipped:
order = root.find("Order")
order.set("status", "shipped")
Update text
Let’s fix an item title that was entered incorrectly:
for item in root.iter("Item"):
if item.get("sku") == "BK-482":
item.text = "Technical Writing Field Guide"
Add a new attribute
Maybe we want to record the shipping carrier:
order.set("carrier", "UPS")
Remove an attribute
If you’re cleaning data, remove an attribute with pop:
order.attrib.pop("carrier", None)
I always use None as the default to avoid KeyErrors in pipelines.
Write it back
Once you’re done, write the file:
tree.write("orders.xml", encoding="utf-8", xml_declaration=True)
That’s the minimal “parse → edit → write” workflow you should use for almost every XML change.
Adding, Removing, and Reordering Elements
Beyond simple edits, you’ll often need to add elements or remove outdated ones. ElementTree makes this straightforward, but there are some gotchas around ordering.
Add a new child element
Let’s add a discount to an order. I like to be explicit about insertion:
order = root.find("Order")
discount = ET.Element("Discount")
discount.set("currency", "USD")
discount.text = "5.00"
order.append(discount)
Add a child with SubElement
This is a shorter pattern that’s good for one-liners:
notes = ET.SubElement(order, "Notes")
notes.text = "Customer requested gift wrap."
Remove an element
If an item is out of stock, remove it:
for item in list(order.findall("Item")):
if item.get("sku") == "JS-110":
order.remove(item)
Notice the list(...) wrapper. That avoids modifying the collection while iterating.
Reorder elements
XML consumers sometimes care about order. You can rebuild children in the required order:
Example: ensure Customer comes before Items
children = list(order)
order.clear()
Rebuild in preferred order
customer = next((c for c in children if c.tag == "Customer"), None)
items = [c for c in children if c.tag == "Item"]
others = [c for c in children if c.tag not in {"Customer", "Item"}]
for node in [customer, items, others]:
if node is not None:
order.append(node)
I only do this when the schema or a consumer explicitly requires ordering.
Namespaces and Real-World XML
Namespaces are the biggest stumbling block I see. If your XML has a namespace like this:
…
Then find("LineItem") won’t work. You need a namespace map:
ns = {"inv": "http://example.com/invoice"}
line_items = root.findall("inv:LineItem", ns)
I recommend defining ns once and reusing it across your script. If you don’t, you’ll get empty results and waste time debugging.
Here’s a full example that updates a namespace-aware document:
import xml.etree.ElementTree as ET
ns = {"inv": "http://example.com/invoice"}
tree = ET.parse("invoice.xml")
root = tree.getroot()
for item in root.findall("inv:LineItem", ns):
if item.get("sku") == "LAP-900":
item.set("status", "backorder")
Keep namespace prefix in output
ET.register_namespace("", ns["inv"])
tree.write("invoice.xml", encoding="utf-8", xml_declaration=True)
The register_namespace call prevents Python from renaming your namespace prefixes, which can break downstream systems.
Formatting and Pretty-Printing Without Breaking Things
ElementTree doesn’t preserve indentation or comments by default. If you write a file back, the formatting often collapses into a single line. That’s acceptable for machines but annoying for humans.
In Python 3.9+, you can use ET.indent to pretty-print before saving:
import xml.etree.ElementTree as ET
tree = ET.parse("orders.xml")
root = tree.getroot()
Modify content…
ET.indent(tree, space=" ", level=0)
tree.write("orders.xml", encoding="utf-8", xml_declaration=True)
If you need to preserve comments, ElementTree won’t keep them unless you use a different parser like lxml. In 2026, I still reach for lxml when I need:
- Full comment preservation
- XPath that’s more expressive
- Schema validation in the same workflow
But if your requirements are modest and you want zero dependencies, ElementTree is still my default.
Advanced Modifications: Merging, Cloning, and Bulk Updates
Once you’re comfortable with the basics, you can tackle more advanced edits. I’ll show two patterns I use often: merging and bulk updates.
Merge two XML documents
Imagine you have customers.xml and orders.xml and want to merge customer data into each order. I’ll keep it simple: add the CustomerId attribute to the order based on email.
import xml.etree.ElementTree as ET
customers_tree = ET.parse("customers.xml")
customersroot = customerstree.getroot()
orders_tree = ET.parse("orders.xml")
ordersroot = orderstree.getroot()
Build lookup map from customers
emailtoid = {}
for cust in customers_root.findall("Customer"):
email = cust.findtext("Email")
cust_id = cust.get("id")
if email and cust_id:
emailtoid[email] = cust_id
Apply to orders
for order in orders_root.findall("Order"):
email = order.findtext("Customer/Email")
if email in emailtoid:
order.set("customerid", emailto_id[email])
orderstree.write("orders.xml", encoding="utf-8", xmldeclaration=True)
This pattern—build a lookup dict, then apply—scales well and is easy to reason about.
Clone and modify nodes
If you need to duplicate nodes with small changes, use copy:
import copy
import xml.etree.ElementTree as ET
order = root.find("Order")
item = order.find("Item")
new_item = copy.deepcopy(item)
new_item.set("sku", "BK-900")
new_item.set("quantity", "1")
new_item.text = "XML Engineering Guide"
order.append(new_item)
I use deep copies when an element has children; it avoids manual re-creation.
Common Mistakes I See (and How You Should Avoid Them)
I’ve reviewed a lot of XML scripts, and the same issues appear again and again.
Mistake 1: Treating XML as plain text
People do replace("pending", ...) and it works—until it doesn’t. Any attribute ordering change, whitespace change, or nested element breaks it. Always parse XML into a tree.
Mistake 2: Ignoring namespaces
If your find() calls return nothing, the document probably has namespaces. Always check the root tag for a namespace URI, then use a namespace map.
Mistake 3: Modifying while iterating
If you remove nodes while iterating, you’ll skip items. Convert to a list first: for node in list(root.iter("Item")):
Mistake 4: Dropping the XML declaration
Some consumers require the XML declaration. Always write with xml_declaration=True unless you’re certain it’s optional.
Mistake 5: Losing formatting you actually care about
ElementTree doesn’t preserve comments or original whitespace. If human readability is required, either re-indent or use a parser that preserves formatting.
Performance and Scale Considerations
ElementTree is efficient for small to medium documents. For very large XML files (hundreds of MB or more), I recommend streaming with iterparse so you don’t load the entire tree into memory.
Here’s a streaming pattern that updates attributes on the fly:
import xml.etree.ElementTree as ET
Streaming parse for large files
context = ET.iterparse("large_orders.xml", events=("end",))
for event, elem in context:
if elem.tag == "Order" and elem.get("status") == "pending":
elem.set("status", "queued")
# Clear elements to free memory
elem.clear()
This approach is typically in the 50–200ms range for small files and scales linearly for large ones. The key is to clear elements you no longer need to keep memory bounded.
If you need sub-10ms latency, you probably shouldn’t be editing XML in the request path. Use a background job or pre-processing step instead.
Traditional vs Modern Workflow (2026 Perspective)
I still use plain scripts for XML edits, but I rely on AI-assisted workflows for scaffolding and validation. Here’s how I see it today:
Traditional Approach
—
One-off script
Handwritten logic
Manual checks
README notes
I’ll still write the actual parsing and editing logic myself, but I use AI to scaffold boilerplate, generate sample data, and draft tests. You can get to a stable script much faster, but you still need to understand the tree model to avoid mistakes.
Real-World Scenarios I Handle with These Patterns
If you’re wondering where this applies in practice, here are a few situations I’ve actually dealt with:
- Updating deployment manifests in a legacy CI system
- Merging partner-provided XML feeds into a unified catalog
- Normalizing user profile exports before importing into a new platform
- Cleaning up attribute inconsistencies before a compliance audit
- Redacting sensitive fields from XML logs
All of these are safe, repeatable, and auditable when you use proper XML parsing instead of text edits.
A Complete, Runnable Example You Can Reuse
Here’s a script that ties everything together: parse, update, add, remove, and write with indentation. It’s small enough to adapt but covers the core techniques.
import xml.etree.ElementTree as ET
INPUT_FILE = "orders.xml"
OUTPUT_FILE = "orders.updated.xml"
Load the XML
tree = ET.parse(INPUT_FILE)
root = tree.getroot()
Update order status
for order in root.findall("Order"):
if order.get("status") == "pending":
order.set("status", "processing")
Fix a product name
for item in root.iter("Item"):
if item.get("sku") == "BK-482":
item.text = "Technical Writing Field Guide"
Add a note to a specific order
order = root.find("Order")
if order is not None:
notes = ET.SubElement(order, "Notes")
notes.text = "Auto-generated note: customer confirmed address."
Remove discontinued items
for item in list(root.iter("Item")):
if item.get("sku") == "JS-110":
parent = order # in this example, Items are direct children of Order
parent.remove(item)
Pretty-print
ET.indent(tree, space=" ", level=0)
Save the updated file
tree.write(OUTPUTFILE, encoding="utf-8", xmldeclaration=True)
That script is intentionally compact. In production, I add validation, logging, and guardrails to make it safer in pipelines. I’ll show those next.
New Section: Validation, Schemas, and Confidence Checks
XML’s biggest strength is that it can be validated. If you’re modifying a file that’s consumed by another system, validation is how you avoid late-night rollback calls.
Lightweight validation: sanity checks
If you don’t have an XSD, do at least a few structural checks before you write output:
- Ensure required attributes exist (
id,status, etc.) - Ensure required children exist (
Customer,Item) - Ensure text values are non-empty for critical nodes
Here’s a quick pattern I use:
requiredorderattrs = {"id", "status"}
required_children = {"Customer", "Item"}
for order in root.findall("Order"):
missingattrs = requiredorder_attrs – set(order.attrib)
if missing_attrs:
raise ValueError(f"Order missing attributes: {missing_attrs}")
child_tags = {child.tag for child in order}
if not requiredchildren.issubset(childtags):
raise ValueError("Order missing required children")
This won’t replace schema validation, but it prevents obvious breakage.
Schema validation (when you have an XSD)
ElementTree doesn’t validate against XSD, so if you need schema validation, you’ll want lxml. I’ll keep this brief because the rest of this article sticks to the standard library, but here’s the conceptual flow:
- Parse XML into an
lxmltree - Load the XSD
- Validate and capture errors
- Only write output if validation passes
I use schema validation for any integration that rejects malformed XML or where business rules are encoded in the XSD.
Checksum and diff-based assurance
If you’re running XML modifications in a pipeline, I recommend adding a basic diff check after modification. It helps you detect unexpected edits:
- Count the number of modified nodes
- Check that only expected tags changed
- Save a small diff summary for audit
You can do this by walking the tree before and after and comparing tag, attrib, and text. It’s not expensive for small documents and provides confidence in CI.
New Section: Handling Edge Cases That Break Scripts
XML is full of edge cases that don’t show up in the first demo. Here are the ones that tend to break real scripts—and how I work around them.
Edge Case 1: Empty elements and None values
Empty tags are valid XML. ElementTree may return None for text, so always guard:
text_value = (item.text or "").strip()
This avoids unexpected NoneType errors when you call .strip() or .lower().
Edge Case 2: Whitespace-only text nodes
Whitespace can exist between elements and be treated as text. This matters if you’re iterating and expecting only “meaningful” text. Use strip() and explicitly check length.
Edge Case 3: Attributes as numbers
Attributes are always strings. If you store numeric values, you must parse them:
qty = int(item.get("quantity", "0"))
item.set("quantity", str(qty + 1))
Don’t set a raw integer; it must be a string.
Edge Case 4: Mixed content
XML can mix text and child elements in one node. ElementTree represents this with .text for content before the first child and .tail for content after each child. If you’re editing narrative XML (docs, policies), this is critical:
paragraph = root.find("Paragraph")
print(paragraph.text) # text before the first child
for child in paragraph:
print(child.tag, child.text, child.tail)
If you ignore .tail, you might delete or reorder content unintentionally.
Edge Case 5: Comments and processing instructions
ElementTree doesn’t preserve comments by default. If you need them, you either:
- Switch to
lxml - Or accept that comments will be lost
I pick lxml whenever the XML is hand-maintained and comments are documentation.
New Section: Safer Modification Patterns for Production
It’s one thing to write a quick script, but production workflows need guardrails. Here are patterns I use in real pipelines.
Pattern 1: Read → validate → modify → validate → write
I always validate both before and after modification. Before ensures you’re starting from valid input; after ensures you didn’t break it.
Pattern 2: Use temporary output and atomic replace
Never overwrite the source file directly if it’s important. Write to a temp file, then rename:
import os
import tempfile
with tempfile.NamedTemporaryFile("w", delete=False, suffix=".xml") as tmp:
tmp_path = tmp.name
Write modified XML to tmp_path
tree.write(tmppath, encoding="utf-8", xmldeclaration=True)
Atomic replace
os.replace(tmppath, OUTPUTFILE)
This prevents partial writes and makes the pipeline more resilient.
Pattern 3: Build a logging trail
If you’re modifying files in bulk, log exactly what you changed:
- File name
- Node count modified
- List of key identifiers (e.g., order IDs)
- Timestamp
This is especially useful for compliance audits.
Pattern 4: Controlled updates with whitelists
If only certain tags should be changed, make it explicit:
allowed_tags = {"Order", "Item", "Notes"}
for elem in root.iter():
if elem.tag not in allowed_tags:
continue
# Apply allowed edits only
This keeps your script from accidentally rewriting unrelated parts of the tree.
New Section: Alternative Approaches (When ElementTree Isn’t Enough)
ElementTree is great, but not always sufficient. Here’s how I decide:
Use ElementTree when:
- Documents are small to medium
- You only need basic tag/attribute edits
- You prefer zero dependencies
Use lxml when:
- You need schema validation
- You need to preserve comments and processing instructions
- You need full XPath support
Use streaming (iterparse) when:
- Files are very large
- Memory usage is a constraint
- You can handle single-pass modifications
Use DOM-style libraries when:
- You need to preserve formatting exactly
- You’re doing fine-grained edits in a human-maintained file
I keep the majority of scripts in ElementTree for simplicity, and I move to lxml only when requirements force me.
New Section: Practical Scenario Walkthroughs
Let’s take this from theory to concrete scenarios. These are real patterns I’ve used and optimized.
Scenario 1: Update deployment manifests
Say you have an XML manifest file with versioned deployments. You need to update the version and add a release note.
- Find the deployment node by ID
- Update
versionattribute - Add
child with text - Ensure
appears after
This is the ordering case where you rebuild children to satisfy consumers.
Scenario 2: Merge vendor feed with internal identifiers
A common problem is reconciling vendor XML feeds with internal IDs. The workflow is:
- Load vendor feed
- Build a lookup table from your internal data
- Add
to each matching product
This is exactly the “build a dict, then apply” pattern from the merge example.
Scenario 3: Redact sensitive values
If you’re exporting logs or records for external use, you may need to remove sensitive tags.
For example, removing and nodes:
sensitive = {"SSN", "CreditCard"}
for elem in list(root.iter()):
if elem.tag in sensitive:
parent = root.find(".") # placeholder: use a parent map
The main gotcha here is that ElementTree doesn’t give you parent references by default. You can build a parent map:
parent_map = {c: p for p in root.iter() for c in p}
for elem in list(root.iter()):
if elem.tag in sensitive:
parent = parent_map.get(elem)
if parent is not None:
parent.remove(elem)
This pattern is extremely useful for deletions anywhere in the tree.
Scenario 4: Normalizing date formats
If you have inconsistent date formats across nodes, normalize them:
from datetime import datetime
for node in root.iter("OrderDate"):
raw = (node.text or "").strip()
if not raw:
continue
parsed = datetime.strptime(raw, "%m/%d/%Y")
node.text = parsed.strftime("%Y-%m-%d")
Normalization is best done before other processing so downstream systems don’t choke on inconsistent formats.
New Section: Comparison Table for Editing Strategies
I like to summarize approach trade-offs in a quick table. It helps decide which tool to use for a given job.
ElementTree
iterparse
—
—
Best
Overkill
Not built-in
Possible, but complex
Weak
Weak
Weak
Best
Limited
LimitedThe takeaway: start with ElementTree, move to lxml for advanced needs, and stream for huge files.
New Section: Testing XML Modifications (Quick and Effective)
Tests for XML edits don’t need to be heavy. I usually add a few fast tests that run in CI:
- Parse output to ensure it’s well-formed
- Validate required tags/attributes exist
- Assert that key modifications were applied
A minimal test strategy is:
- Store input XML fixture
- Run the script
- Parse output XML
- Assert on specific nodes
This can be done with unittest or pytest; the important part is validating the output structure, not the exact formatting.
New Section: Guardrails for Team Workflows
When multiple people touch XML scripts, consistency is everything. I set a few lightweight conventions:
- Always log input and output file names
- Always include a
--dry-runmode - Always run a basic validation before writing
- Always preserve XML declaration unless explicitly disabled
A --dry-run mode can just skip writing and print a diff summary. It prevents accidental modifications in shared environments.
New Section: When XML Is the Wrong Tool (And What I Use Instead)
Sometimes the best XML modification is no modification at all. If you have control over both ends, I’ll usually migrate to JSON or a binary format:
- JSON for lightweight APIs and most data interchange
- Protocol Buffers when you need strong typing and compact size
- Parquet or Avro for large-scale analytics
If you’re stuck with XML, don’t fight it. Just treat it as a tree and build safe transformations.
New Section: Practical Tips That Save Time
These are small, practical tips I rely on:
- Use
findtext()for direct text access without extraNonechecks - Use
attrib.get()with defaults to avoid KeyErrors - Build a parent map when deleting nodes deep in the tree
- Always set
encoding="utf-8"explicitly when writing - Use
ET.register_namespace()when namespaces are present
They seem minor, but they prevent hours of debugging.
New Section: A Production-Grade Script Skeleton
Below is a more structured version of the earlier example. It includes validation, logging, and a safer write pattern.
import xml.etree.ElementTree as ET
import tempfile
import os
INPUT_FILE = "orders.xml"
OUTPUT_FILE = "orders.updated.xml"
def validate_root(root):
if root.tag != "Orders":
raise ValueError("Unexpected root tag")
def update_orders(root):
updated = 0
for order in root.findall("Order"):
if order.get("status") == "pending":
order.set("status", "processing")
updated += 1
return updated
def fix_items(root):
fixed = 0
for item in root.iter("Item"):
if item.get("sku") == "BK-482":
item.text = "Technical Writing Field Guide"
fixed += 1
return fixed
def remove_discontinued(root):
removed = 0
for item in list(root.iter("Item")):
if item.get("sku") == "JS-110":
parent_map = {c: p for p in root.iter() for c in p}
parent = parent_map.get(item)
if parent is not None:
parent.remove(item)
removed += 1
return removed
def main():
tree = ET.parse(INPUT_FILE)
root = tree.getroot()
validate_root(root)
updated = update_orders(root)
fixed = fix_items(root)
removed = remove_discontinued(root)
ET.indent(tree, space=" ", level=0)
with tempfile.NamedTemporaryFile("w", delete=False, suffix=".xml") as tmp:
tmp_path = tmp.name
tree.write(tmppath, encoding="utf-8", xmldeclaration=True)
os.replace(tmppath, OUTPUTFILE)
print(f"Updated: {updated}, Fixed: {fixed}, Removed: {removed}")
if name == "main":
main()
This is the skeleton I use when I want a reliable script I can hand to another engineer without a lot of verbal explanation.
New Section: Debugging Tips for XML Scripts
When a script fails, the error often appears far from the real cause. These debugging habits help me pinpoint issues quickly:
- Print
root.tagandroot.attribearly - Use
ET.dump(elem)to inspect a subtree during debugging - Save intermediate XML output when a change is complex
- Log the number of nodes matched by a
findall()
One of the simplest checks is:
matches = root.findall("Order")
print(f"Matched {len(matches)} orders")
If this prints 0, you almost always have a namespace issue.
New Section: Explainability and Auditability in 2026 Workflows
Even in 2026, XML is often part of compliance pipelines. That means you need to be able to explain exactly what changed. I recommend:
- Versioning the script itself
- Logging a summary of modifications
- Keeping copies of the original and modified XML
This is especially important when your script is part of data transformation in regulated environments. You’re not just editing XML—you’re creating a chain of custody.
New Section: Quick Checklist Before You Run an XML Modification Script
I keep a mental checklist that saves me from mistakes:
- Did I parse the XML into a tree (no string replacements)?
- Did I check for namespaces?
- Did I validate required elements/attributes?
- Did I use
xml_declaration=Truewhen writing? - Did I preserve formatting if humans will read the file?
- Did I log what changed?
If I can’t say yes to those, I slow down and add the missing pieces.
Final Thoughts
XML isn’t exciting, but it’s still essential. The scripts you write to modify it often run quietly in the background, and that’s exactly why they need to be correct. A single malformed tag can break an integration; a missing attribute can cause a compliance error. The good news is that you don’t need a heavy stack to get this right.
If you remember one thing: treat XML as a tree, not text. Parse it, edit it via nodes, validate it, and write it back cleanly. With a few safe patterns, your scripts become predictable and resilient.
If you want, I can also provide a version that includes schema validation, a --dry-run mode, and a diff summary output so you can drop it straight into a CI pipeline.


