Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - Removing Duplicate Dicts in List
When working with lists of dictionaries in Python, you may encounter duplicate entries that need to be removed. Since dictionaries are mutable and unhashable, they cannot be directly compared or stored in sets. This article explores four effective methods to remove duplicate dictionaries from a list.
Method 1: Using List Comprehension with Tuple Conversion
This approach converts each dictionary to a sorted tuple for comparison ?
def remove_duplicates(dict_list):
seen = set()
result = []
for d in dict_list:
tuple_form = tuple(sorted(d.items()))
if tuple_form not in seen:
seen.add(tuple_form)
result.append(d)
return result
# Example data
cities = [
{"Place": "Haldwani", "State": "Uttarakhand"},
{"Place": "Hisar", "State": "Haryana"},
{"Place": "Shillong", "State": "Meghalaya"},
{"Place": "Kochi", "State": "Kerala"},
{"Place": "Bhopal", "State": "Madhya Pradesh"},
{"Place": "Kochi", "State": "Kerala"}, # Duplicate
{"Place": "Haridwar", "State": "Uttarakhand"}
]
unique_cities = remove_duplicates(cities)
print(unique_cities)
[{'Place': 'Haldwani', 'State': 'Uttarakhand'}, {'Place': 'Hisar', 'State': 'Haryana'}, {'Place': 'Shillong', 'State': 'Meghalaya'}, {'Place': 'Kochi', 'State': 'Kerala'}, {'Place': 'Bhopal', 'State': 'Madhya Pradesh'}, {'Place': 'Haridwar', 'State': 'Uttarakhand'}]
Method 2: Using Pandas DataFrame
Pandas provides a built-in method for handling duplicates in large datasets ?
import pandas as pd
def remove_duplicates_pandas(dict_list):
df = pd.DataFrame(dict_list)
df.drop_duplicates(inplace=True)
return df.to_dict(orient='records')
# Example data
cities = [
{"Place": "Haldwani", "State": "Uttarakhand"},
{"Place": "Hisar", "State": "Haryana"},
{"Place": "Shillong", "State": "Meghalaya"},
{"Place": "Kochi", "State": "Kerala"},
{"Place": "Bhopal", "State": "Madhya Pradesh"},
{"Place": "Kochi", "State": "Kerala"}, # Duplicate
{"Place": "Haridwar", "State": "Uttarakhand"}
]
unique_cities = remove_duplicates_pandas(cities)
print(unique_cities)
[{'Place': 'Haldwani', 'State': 'Uttarakhand'}, {'Place': 'Hisar', 'State': 'Haryana'}, {'Place': 'Shillong', 'State': 'Meghalaya'}, {'Place': 'Kochi', 'State': 'Kerala'}, {'Place': 'Bhopal', 'State': 'Madhya Pradesh'}, {'Place': 'Haridwar', 'State': 'Uttarakhand'}]
Method 3: Using Hash with Frozenset
This method creates a hash from dictionary items using frozenset for efficient comparison ?
def make_hashable(d):
return hash(frozenset(d.items()))
def remove_duplicates_hash(dict_list):
seen = set()
result = []
for d in dict_list:
hash_value = make_hashable(d)
if hash_value not in seen:
seen.add(hash_value)
result.append(d)
return result
# Example data
cities = [
{"Place": "Haldwani", "State": "Uttarakhand"},
{"Place": "Hisar", "State": "Haryana"},
{"Place": "Shillong", "State": "Meghalaya"},
{"Place": "Kochi", "State": "Kerala"},
{"Place": "Bhopal", "State": "Madhya Pradesh"},
{"Place": "Kochi", "State": "Kerala"}, # Duplicate
{"Place": "Haridwar", "State": "Uttarakhand"}
]
unique_cities = remove_duplicates_hash(cities)
print(unique_cities)
[{'Place': 'Haldwani', 'State': 'Uttarakhand'}, {'Place': 'Hisar', 'State': 'Haryana'}, {'Place': 'Shillong', 'State': 'Meghalaya'}, {'Place': 'Kochi', 'State': 'Kerala'}, {'Place': 'Bhopal', 'State': 'Madhya Pradesh'}, {'Place': 'Haridwar', 'State': 'Uttarakhand'}]
Method 4: Using Helper Function with Sorted Tuples
This approach uses a helper function to convert dictionaries to sorted tuples for comparison ?
def dict_to_sorted_tuple(d):
return tuple(sorted(d.items()))
def remove_duplicates_helper(dict_list):
seen = set()
result = []
for d in dict_list:
tuple_form = dict_to_sorted_tuple(d)
if tuple_form not in seen:
seen.add(tuple_form)
result.append(d)
return result
# Example data
cities = [
{"Place": "Haldwani", "State": "Uttarakhand"},
{"Place": "Hisar", "State": "Haryana"},
{"Place": "Shillong", "State": "Meghalaya"},
{"Place": "Kochi", "State": "Kerala"},
{"Place": "Bhopal", "State": "Madhya Pradesh"},
{"Place": "Kochi", "State": "Kerala"}, # Duplicate
{"Place": "Haridwar", "State": "Uttarakhand"}
]
unique_cities = remove_duplicates_helper(cities)
print(unique_cities)
[{'Place': 'Haldwani', 'State': 'Uttarakhand'}, {'Place': 'Hisar', 'State': 'Haryana'}, {'Place': 'Shillong', 'State': 'Meghalaya'}, {'Place': 'Kochi', 'State': 'Kerala'}, {'Place': 'Bhopal', 'State': 'Madhya Pradesh'}, {'Place': 'Haridwar', 'State': 'Uttarakhand'}]
Comparison of Methods
| Method | Performance | Memory Usage | Best For |
|---|---|---|---|
| List Comprehension | Good | Low | Small to medium datasets |
| Pandas | Excellent | High | Large datasets with complex data |
| Hash with Frozenset | Very Good | Medium | Fast comparison needed |
| Helper Function | Good | Low | Clean, readable code |
Conclusion
Choose the method based on your specific needs: use pandas for large datasets, frozenset hashing for performance, or tuple conversion for simplicity. All methods effectively remove duplicate dictionaries while preserving the original order of unique entries.
