Handle Duplicate Keys in YAML with Python Using ruamel.yaml
In this tutorial, you’ll learn how to use ruamel.yaml to detect and prevent duplicate keys in YAML files.
You’ll explore various methods to identify duplicates, implement custom constructors, and more.
Detect Duplicate Keys When Parsing YAML
To parse a YAML string, you can use the ruamel.yaml library:
from ruamel.yaml import YAML
from ruamel.yaml.constructor import DuplicateKeyError
yaml_str = """
person:
name: Ali
age: 30
age: 31
"""
yaml = YAML()
yaml.allow_duplicate_keys = False
try:
data = yaml.load(yaml_str)
except DuplicateKeyError as e:
print(e)
Output:
while constructing a mapping
in "", line 3, column 3:
name: Ali
^ (line: 3)
found duplicate key "age" with value "31" (original value: "30")
in "", line 5, column 3:
age: 31
^ (line: 5)
The parser raises a DuplicateKeyError, because the duplicate age keys.
Customize Duplicate Key Handling
You can override the construct_mapping method to customize the duplicate key handling according to your needs.
from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor, Constructor
class UniqueKeyConstructor(SafeConstructor):
def construct_mapping(self, node, deep=False):
mapping = {}
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
if key in mapping:
print(f"Duplicate key detected: {key}")
value = self.construct_object(value_node, deep=deep)
mapping[key] = value
return mapping
class UniqueKeyYAML(YAML):
def __init__(self, **kw):
YAML.__init__(self, **kw)
self.Constructor = UniqueKeyConstructor
yaml_str = """
database:
host: localhost
port: 3306
port: 5432
"""
yaml = UniqueKeyYAML(typ='safe')
data = yaml.load(yaml_str)
print(data)
Output:
Duplicate key detected: port
{'database': {'host': 'localhost', 'port': 5432}}
By overriding construct_mapping, you can log a message when a duplicate key is found instead of raising an error.
The last value for port overwrites the previous one.
Keep All duplicate keys
To keep all values associated with duplicate keys during parsing, you can track them using a list.
from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor
from collections import defaultdict
class MultiValueConstructor(SafeConstructor):
def construct_mapping(self, node, deep=False):
mapping = defaultdict(list)
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
value = self.construct_object(value_node, deep=deep)
mapping[key].append(value)
return dict(mapping)
class MultiValueYAML(YAML):
def __init__(self, **kw):
super().__init__(**kw)
self.Constructor = MultiValueConstructor
yaml_str = """
server:
address: 192.168.1.1
address: 192.168.1.2
address: 192.168.1.3
"""
yaml = MultiValueYAML(typ='safe')
data = yaml.load(yaml_str)
print(data)
Output:
{'server': {'address': ['192.168.1.1', '192.168.1.2', '192.168.1.3']}}
Now, all the address values are stored in a list.
Handle Duplicate Keys in Nested Structures
To detect duplicates in nested structures, your custom constructor should recursively check for duplicates at each level.
from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor, ConstructorError
import ruamel.yaml.nodes
class NestedUniqueKeyConstructor(SafeConstructor):
def construct_mapping(self, node, deep=False):
mapping = {}
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
if key in mapping:
raise ConstructorError(
"while constructing a mapping", node.start_mark,
f"found duplicate key ({key})", key_node.start_mark
)
value = self.construct_object(value_node, deep=deep)
if isinstance(value_node, ruamel.yaml.nodes.MappingNode):
value = self.construct_mapping(value_node, deep=deep)
mapping[key] = value
return mapping
class NestedUniqueKeyYAML(YAML):
def __init__(self, **kw):
super().__init__(**kw)
self.Constructor = NestedUniqueKeyConstructor
yaml_str = """
project:
name: Sahara
details:
manager: Omar
manager: Layla
"""
yaml = NestedUniqueKeyYAML(typ='safe')
try:
data = yaml.load(yaml_str)
except ConstructorError as e:
print(e)
Output:
while constructing a mapping in "", line 5, column 5 found duplicate key (manager) in "", line 6, column 5
This ensures that duplicates within nested dictionaries, like the manager key are detected and reported.
Prevent Duplicate Keys When Writing YAML
You can set the allow_duplicate_keys to False to prevent duplicates from being written to the output file.
from ruamel.yaml import YAML
yaml_str = """
settings:
theme: light
theme: dark
"""
yaml = YAML()
yaml.allow_duplicate_keys = False
try:
data = yaml.load(yaml_str)
data['settings']['language'] = 'en'
with open('output.yaml', 'w') as f:
yaml.dump(data, f)
except Exception as e:
print(e)
Output:
while constructing a mapping
in "", line 3, column 3:
theme: light
^ (line: 3)
found duplicate key "theme" with value "dark" (original value: "light")
in "", line 4, column 3:
theme: dark
^ (line: 4)
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.