Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Fast XML parsing using Expat in Python
Python's xml.parsers.expat module provides fast XML parsing using the Expat library. It is a non-validating XML parser that creates an XML parser object and captures XML elements through various handler functions. This event-driven approach is memory-efficient and suitable for processing large XML files.
How Expat Parser Works
The Expat parser uses three main handler functions ?
- StartElementHandler − Called when an opening tag is encountered
- EndElementHandler − Called when a closing tag is encountered
- CharacterDataHandler − Called when character data between tags is found
Example
Here's how to parse XML data using Expat with custom handler functions ?
import xml.parsers.expat
# Capture the first element
def first_element(tag, attrs):
print('first element:', tag, attrs)
# Capture the last element
def last_element(tag):
print('last element:', tag)
# Capture the character Data
def character_value(value):
print('Character value:', repr(value))
parser_expat = xml.parsers.expat.ParserCreate()
parser_expat.StartElementHandler = first_element
parser_expat.EndElementHandler = last_element
parser_expat.CharacterDataHandler = character_value
parser_expat.Parse("""<?xml version="1.0"?>
<parent student_rollno="15">
<child1 Student_name="Krishna"> Strive for progress, not perfection</child1>
<child2 student_name="vamsi"> There are no shortcuts to any place worth going</child2>
</parent>""", 1)
The output of the above code is ?
first element: parent {'student_rollno': '15'}
Character value: '\n'
first element: child1 {'Student_name': 'Krishna'}
Character value: ' Strive for progress, not perfection'
last element: child1
Character value: '\n'
first element: child2 {'student_name': 'vamsi'}
Character value: ' There are no shortcuts to any place worth going'
last element: child2
Character value: '\n'
last element: parent
Key Points
- The parser processes XML sequentially, calling handlers as it encounters elements
- Whitespace characters (like newlines) are captured as character data
- Attributes are passed as dictionaries to the start element handler
- The final parameter
1inParse()indicates this is the final chunk of data
Conclusion
Expat parser is ideal for fast, memory-efficient XML processing in Python. It uses event-driven parsing with handler functions to process XML elements, attributes, and character data sequentially.
Advertisements
