Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
FuzzyWuzzy Python library
In this tutorial, we are going to learn about the FuzzyWuzzy Python library. FuzzyWuzzy library is developed to compare strings and provides fuzzy string matching capabilities. While we have other modules like regex and difflib to compare strings, FuzzyWuzzy is unique in its approach. The methods from this library return a score out of 100 indicating how closely the strings match, instead of simple true/false or string results.
Installation
To work with the FuzzyWuzzy library, we need to install fuzzywuzzy and optionally python-Levenshtein for better performance ?
pip install fuzzywuzzy pip install python-Levenshtein
The python-Levenshtein package provides faster string matching algorithms and is recommended for better performance.
Basic String Comparison with fuzz.ratio()
The fuzz.ratio() method compares two strings and returns a similarity score out of 100. Let's see how it works ?
from fuzzywuzzy import fuzz
# 100 for identical strings
print(f"Identical Strings: {fuzz.ratio('tutorialspoint', 'tutorialspoint')}")
# Lower scores for different strings
print(f"Case Difference: {fuzz.ratio('tutorialspoint', 'TutorialsPoint')}")
print(f"With Space: {fuzz.ratio('tutorialspoint', 'Tutorials Point')}")
# 0 for completely different strings
print(f"Different Strings: {fuzz.ratio('abcd', 'efgh')}")
Identical Strings: 100 Case Difference: 86 With Space: 86 Different Strings: 0
Partial String Matching with fuzz.partial_ratio()
The partial_ratio() method is useful when one string is a substring of another ?
from fuzzywuzzy import fuzz
# Partial matching when one string contains the other
print(f"Substring Match: {fuzz.partial_ratio('tutorials', 'tutorialspoint')}")
print(f"Regular Ratio: {fuzz.ratio('tutorials', 'tutorialspoint')}")
# Partial matching with different cases
print(f"Case Insensitive: {fuzz.partial_ratio('TUTORIALS', 'tutorialspoint')}")
Substring Match: 100 Regular Ratio: 80 Case Insensitive: 100
Advanced Matching with fuzz.WRatio()
fuzz.WRatio() (Weighted Ratio) provides the most intelligent string comparison by automatically choosing the best matching algorithm ?
from fuzzywuzzy import fuzz
# Handles extra characters intelligently
print(f"Extra Characters: {fuzz.WRatio('tutorialspoint', 'tutorialspoint!!!')}")
# Better case handling
print(f"Case Difference: {fuzz.WRatio('tutorialspoint', 'TutorialsPoint')}")
# Handles word order and spacing
print(f"Word Order: {fuzz.WRatio('tutorials point', 'point tutorials')}")
# Still 0 for completely different strings
print(f"Different Strings: {fuzz.WRatio('abcd', 'efgh')}")
Extra Characters: 100 Case Difference: 100 Word Order: 90 Different Strings: 0
Comparison of Methods
| Method | Best For | Key Feature |
|---|---|---|
fuzz.ratio() |
Exact string comparison | Simple character-by-character matching |
fuzz.partial_ratio() |
Substring matching | Finds best partial match |
fuzz.WRatio() |
General purpose | Automatically selects best algorithm |
Practical Example
Here's a practical example comparing different approaches for name matching ?
from fuzzywuzzy import fuzz
name1 = "John Smith"
name2 = "SMITH, JOHN"
print(f"ratio(): {fuzz.ratio(name1, name2)}")
print(f"partial_ratio(): {fuzz.partial_ratio(name1, name2)}")
print(f"WRatio(): {fuzz.WRatio(name1, name2)}")
ratio(): 50 partial_ratio(): 67 WRatio(): 90
Conclusion
FuzzyWuzzy provides powerful string matching capabilities with different algorithms for various use cases. Use WRatio() for general-purpose matching as it automatically selects the best approach, while partial_ratio() works well for substring matching scenarios.
