Summary
When comparing a feature on two different records, differences in cases and punctuation shouldn't negatively impact the score.
Impact
Not normalizing the strings before JaroWinkler/Levenshtein comparisons negatively drops scores, when the reality is they are referencing the same value.
Steps to reproduce
Examples to consider:
Thomas vs thomas
O'Hara vs Ohara
321 Main St vs 321 Main St.
Jose vs José
Albany vs Albany
Expected behavior
The above 5 cases should result in similarity scores of 1.0.
Summary
When comparing a feature on two different records, differences in cases and punctuation shouldn't negatively impact the score.
Impact
Not normalizing the strings before JaroWinkler/Levenshtein comparisons negatively drops scores, when the reality is they are referencing the same value.
Steps to reproduce
Examples to consider:
ThomasvsthomasO'HaravsOhara321 Main Stvs321 Main St.JosevsJoséAlbanyvsAlbanyExpected behavior
The above 5 cases should result in similarity scores of 1.0.