Skip to content

Similarity comparisons should ignore whitespace characters #243

@m-goggins

Description

@m-goggins

Summary

We want to compare only the meaningful, rich parts of strings, i.e., non-whitespace characters, so they should be removed from string in feature_iter.

Impact

An example demonstrating how removing whitespace from the middle of strings produces more accurate string similarity metrics because the spaces no longer count as similar characters.

Screenshot 2025-03-10 at 2.37.27 PM.png

This ticket is closely related to #238

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions