chore: add pyright type checking and ci checks for it #280
chore: add pyright type checking and ci checks for it #280ekzhu merged 22 commits intoekzhu:masterfrom
Conversation
…pe hints in MinHash and LSH classes
…handling in _hashed_byteswap
ekzhu
left a comment
There was a problem hiding this comment.
Thanks! Could you resolve the conflict that was introduced by the latest merge.
Sure @ekzhu |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #280 +/- ##
=========================================
Coverage ? 77.52%
=========================================
Files ? 15
Lines ? 2056
Branches ? 0
=========================================
Hits ? 1594
Misses ? 462
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds Pyright static type checking to the project to improve code quality and type safety. The changes include configuration for Pyright in pyproject.toml, a new CI workflow for automated type checking, and various fixes to type hints and error handling across the codebase.
- Configures Pyright with basic type checking mode and selective rule disabling
- Adds GitHub Actions workflow for automated Pyright checks
- Improves type annotations and error handling in storage, LSH, and MinHash modules
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
pyproject.toml |
Adds Pyright configuration with basic type checking mode and various reports disabled |
.github/workflows/checks.yml |
Adds new Pyright CI job using uv and pyright |
datasketch/weighted_minhash.py |
Changes type check from Iterable to Sized, adds redundant type annotation, refactors array initialization |
datasketch/storage.py |
Adds return type annotations, changes None returns to ValueError, adds buffer_size property, makes seeds parameter optional |
datasketch/minhash.py |
Updates import statements, changes type hints from Iterable to Sized/Union[Sized, np.ndarray], adds return type to _parse_hashvalues |
datasketch/lshensemble.py |
Extracts variable to avoid potential type checking issues with array indexing |
datasketch/lsh_bloom.py |
Adds None checks for n and fp parameters in validation |
datasketch/lsh.py |
Adds explicit type annotations for storage attributes, changes _merge return type to None, adds None check for hashfunc |
datasketch/experimental/aio/storage.py |
Adds type: ignore comments for async Redis method calls |
Comments suppressed due to low confidence (1)
datasketch/minhash.py:142
- The
_parse_hashvaluesmethod is called twice for the samehashvaluesinput: once on line 121 to get the length, and again on line 142 to assign toself.hashvalues. This is inefficient - the result from line 121 should be stored and reused on line 142 to avoid parsing the same data twice.
hashvalues = self._parse_hashvalues(hashvalues)
num_perm = len(hashvalues)
if num_perm > _hash_range:
# Because 1) we don't want the size to be too large, and
# 2) we are using 4 bytes to store the size value
raise ValueError(
"Cannot have more than %d number of\
permutation functions"
% _hash_range
)
self.seed = seed
self.num_perm = num_perm
# Check the hash function.
if not callable(hashfunc):
raise ValueError("The hashfunc must be a callable.")
self.hashfunc = hashfunc
# Check for use of hashobj and issue warning.
if hashobj is not None:
warnings.warn("hashobj is deprecated, use hashfunc instead.", DeprecationWarning, stacklevel=2)
# Initialize hash values
if hashvalues is not None:
self.hashvalues = self._parse_hashvalues(hashvalues)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@bhimrazy great work! And Happy New Year to you 🎆 |
What does this PR do?
Checklist