Skip to content

Fix bBitMinHash NumPy pickling issue#248

Merged
ekzhu merged 2 commits intoekzhu:masterfrom
123epsilon:fix_bbit_pickling
Nov 3, 2025
Merged

Fix bBitMinHash NumPy pickling issue#248
ekzhu merged 2 commits intoekzhu:masterfrom
123epsilon:fix_bbit_pickling

Conversation

@123epsilon
Copy link
Copy Markdown
Contributor

cc @ekzhu

Problem:

Recent PRs such as #246 failed tests in Python 3.11 due to inconsistencies in the way that NumPy handled type coercion in certain functions (such as left_shift <<). This led to an issue where, in more recent versions of Python, when pickling hashvalues in bBitMinHash the hashvalues would overflow to zero (because NumPy, being strict, refused to coerce the type from np.uint32 to np.uint64), effectively removing the ability to pickle these objects - and worse, failing silently.

This problem was not present in previous NumPy versions (e.g., 1.x) because they were more lenient with type coercion and would implicitly change the hashvalue type to np.uint64.

Solution:

We resolve this issue by instead performing the bitwise operations in bBitMinHash's __getstate__ routine in native Python BigInteger type. This type has theoretically unlimited precision which avoids overflows. Moreover, it is implicitly coerced to np.uint64 as we expect. This works across NumPy versions and therefore across Python versions as well.

References

NEP 50 - Promotion rules for Python scalars

[DOC] Changes to NumPy data type promotion

@123epsilon
Copy link
Copy Markdown
Contributor Author

Update: edited Ubuntu repo in tests as in #246.

The main reason I submitted a separate PR for this change is because this PR as a doc might be useful for posterity, since this is kind of a non-trivial change.

@123epsilon
Copy link
Copy Markdown
Contributor Author

cc @ekzhu

@ekzhu ekzhu merged commit f52e023 into ekzhu:master Nov 3, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants