Skip to content

fix: hash for Unicode will call write_u8 like Ascii does#73

Merged
seanmonstar merged 1 commit intomasterfrom
unicode-hash-each-byte
Dec 24, 2024
Merged

fix: hash for Unicode will call write_u8 like Ascii does#73
seanmonstar merged 1 commit intomasterfrom
unicode-hash-each-byte

Conversation

@seanmonstar
Copy link
Copy Markdown
Owner

@conradludgate
Copy link
Copy Markdown

This still fails on

let k1 = UniCase::new("Maße");
let k2 = UniCase::ascii("maße");

But I will admit that this seems like a misuse of unicase in that situation, rather than being a reliable behaviour.

@seanmonstar
Copy link
Copy Markdown
Owner Author

Crazy personal opinion: that probably should yell at you and die.

@marvin-j97
Copy link
Copy Markdown

To make it worse, "Maße" means something like "measurements", while "Masse" means "mass" (as in weight or group of things/people), so they aren't even the same word.

@seanmonstar
Copy link
Copy Markdown
Owner Author

seanmonstar commented Dec 23, 2024

Even words that are spelled exactly the same can have different meanings based on context. This crate is not trying to say the words mean the same thing. It's saying that according to the Unicode Case Folding algorithm, they are equivalent for matching purposes.

@conradludgate
Copy link
Copy Markdown

Putting it into a more formal specification. It's a logic bug if you use UniCase::ascii with non-ascii text, thus any inconsistencies are the fault of the user here and not the library. LGTM

@seanmonstar seanmonstar merged commit de5ebf0 into master Dec 24, 2024
@seanmonstar seanmonstar deleted the unicode-hash-each-byte branch December 24, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

incorrect hash impl

3 participants