chore: add pyright type checking and ci checks for it by bhimrazy · Pull Request #280 · ekzhu/datasketch

bhimrazy · 2025-11-11T06:57:28Z

What does this PR do?

Adds Pyright type checking configuration and CI workflow to improve code quality and type safety.
Fixes type hints, assertions, and error handling in storage, LSH, MinHash, and related modules.
Updates pyproject.toml with Pyright settings.

Checklist

Are unit tests passing?
Documentation added/updated for all public APIs?
Is this a breaking change? If yes, add "[BREAKING]" to the PR title.

…pe hints in MinHash and LSH classes

… sized iterables

…AsyncMinHashLSH

…handling in _hashed_byteswap

…nHashLSH

ekzhu

Thanks! Could you resolve the conflict that was introduced by the latest merge.

bhimrazy · 2025-12-28T06:08:20Z

Thanks! Could you resolve the conflict that was introduced by the latest merge.

Sure @ekzhu

datasketch/minhash.py

codecov-commenter · 2026-01-02T08:36:02Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 78.12500% with 7 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@4e29f97). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
datasketch/storage.py	76.92%	3 Missing ⚠️
datasketch/lsh.py	71.42%	2 Missing ⚠️
datasketch/minhash.py	66.66%	1 Missing ⚠️
datasketch/weighted_minhash.py	80.00%	1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff            @@
##             master     #280   +/-   ##
=========================================
  Coverage          ?   77.52%           
=========================================
  Files             ?       15           
  Lines             ?     2056           
  Branches          ?        0           
=========================================
  Hits              ?     1594           
  Misses            ?      462           
  Partials          ?        0

Flag	Coverage Δ
unittests	`77.52% <78.12%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds Pyright static type checking to the project to improve code quality and type safety. The changes include configuration for Pyright in pyproject.toml, a new CI workflow for automated type checking, and various fixes to type hints and error handling across the codebase.

Configures Pyright with basic type checking mode and selective rule disabling
Adds GitHub Actions workflow for automated Pyright checks
Improves type annotations and error handling in storage, LSH, and MinHash modules

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`pyproject.toml`	Adds Pyright configuration with basic type checking mode and various reports disabled
`.github/workflows/checks.yml`	Adds new Pyright CI job using uv and pyright
`datasketch/weighted_minhash.py`	Changes type check from `Iterable` to `Sized`, adds redundant type annotation, refactors array initialization
`datasketch/storage.py`	Adds return type annotations, changes `None` returns to `ValueError`, adds `buffer_size` property, makes `seeds` parameter optional
`datasketch/minhash.py`	Updates import statements, changes type hints from `Iterable` to `Sized`/`Union[Sized, np.ndarray]`, adds return type to `_parse_hashvalues`
`datasketch/lshensemble.py`	Extracts variable to avoid potential type checking issues with array indexing
`datasketch/lsh_bloom.py`	Adds None checks for `n` and `fp` parameters in validation
`datasketch/lsh.py`	Adds explicit type annotations for storage attributes, changes `_merge` return type to `None`, adds None check for `hashfunc`
`datasketch/experimental/aio/storage.py`	Adds `type: ignore` comments for async Redis method calls

Comments suppressed due to low confidence (1)

datasketch/minhash.py:142

The _parse_hashvalues method is called twice for the same hashvalues input: once on line 121 to get the length, and again on line 142 to assign to self.hashvalues. This is inefficient - the result from line 121 should be stored and reused on line 142 to avoid parsing the same data twice.

            hashvalues = self._parse_hashvalues(hashvalues)
            num_perm = len(hashvalues)
        if num_perm > _hash_range:
            # Because 1) we don't want the size to be too large, and
            # 2) we are using 4 bytes to store the size value
            raise ValueError(
                "Cannot have more than %d number of\
                    permutation functions"
                % _hash_range
            )
        self.seed = seed
        self.num_perm = num_perm
        # Check the hash function.
        if not callable(hashfunc):
            raise ValueError("The hashfunc must be a callable.")
        self.hashfunc = hashfunc
        # Check for use of hashobj and issue warning.
        if hashobj is not None:
            warnings.warn("hashobj is deprecated, use hashfunc instead.", DeprecationWarning, stacklevel=2)
        # Initialize hash values
        if hashvalues is not None:
            self.hashvalues = self._parse_hashvalues(hashvalues)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

datasketch/weighted_minhash.py

datasketch/storage.py

.github/workflows/checks.yml

…h class

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ekzhu · 2026-01-02T16:56:48Z

@bhimrazy great work!

And Happy New Year to you 🎆

bhimrazy · 2026-01-03T07:01:06Z

@bhimrazy great work!

And Happy New Year to you 🎆

Thanks, @ekzhu! 🙌
Happy New Year to you too ✨

bhimrazy added 16 commits November 11, 2025 12:41

add config to pyproject toml

aa0ea4e

add checks for pyright

7c02db4

Merge branch 'master' into checks/add-pyright

7376b2b

update

46fa79c

update

cbd16de

Merge branch 'master' into checks/add-pyright

2e31e35

Merge branch 'master' into checks/add-pyright

ac8e358

fix: add type ignore comments for Redis storage methods and update ty…

ceb7e8e

…pe hints in MinHash and LSH classes

fix: update input validation in WeightedMinHashGenerator to check for…

5c55b7f

… sized iterables

fix: add type hints and improve error handling in storage functions

d1288c1

fix: add assertions to ensure keys and hashtables are initialized in …

8ec9ab5

…AsyncMinHashLSH

Merge branch 'master' into checks/add-pyright

3fdd1bf

fix: update _merge method signature to return None and improve error …

91157dd

…handling in _hashed_byteswap

fix: remove unnecessary assertion for upper bound in MinHashLSHEnsemble

7930b73

fix: simplify hashvalues initialization in WeightedMinHashGenerator

3856e4d

fix: remove unnecessary assertions for keys and hashtables in AsyncMi…

94f77d9

…nHashLSH

bhimrazy changed the title ~~Checks/add-pyright~~ chore: add pyright type checking and ci checks for it Dec 20, 2025

bhimrazy marked this pull request as ready for review December 20, 2025 04:57

bhimrazy requested a review from ekzhu as a code owner December 20, 2025 04:57

ekzhu approved these changes Dec 26, 2025

View reviewed changes

Merge branch 'master' into checks/add-pyright

8e12a2d

ekzhu reviewed Dec 30, 2025

View reviewed changes

datasketch/minhash.py Outdated Show resolved Hide resolved

ekzhu reviewed Dec 30, 2025

View reviewed changes

datasketch/minhash.py Outdated Show resolved Hide resolved

Merge branch 'master' into checks/add-pyright

5d6e7ff

Copilot AI review requested due to automatic review settings January 2, 2026 08:34

Copilot started reviewing on behalf of bhimrazy January 2, 2026 08:34 View session

Copilot AI reviewed Jan 2, 2026

View reviewed changes

datasketch/weighted_minhash.py Show resolved Hide resolved

datasketch/storage.py Outdated Show resolved Hide resolved

.github/workflows/checks.yml Outdated Show resolved Hide resolved

refactor: remove duplicate parsing of hashvalues

1a0bb76

bhimrazy and others added 3 commits January 2, 2026 14:47

refactor: update type hints for hashvalues and permutations in MinHas…

0cc41e2

…h class

apply suggestion

57589c5

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

revert seeds none change

ee99594

ekzhu merged commit 50ce2a0 into ekzhu:master Jan 2, 2026
10 checks passed

bhimrazy deleted the checks/add-pyright branch January 3, 2026 06:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add pyright type checking and ci checks for it #280

chore: add pyright type checking and ci checks for it #280
ekzhu merged 22 commits intoekzhu:masterfrom
bhimrazy:checks/add-pyright

bhimrazy commented Nov 11, 2025 •

edited

Loading

Uh oh!

ekzhu left a comment

Uh oh!

bhimrazy commented Dec 28, 2025

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jan 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ekzhu commented Jan 2, 2026

Uh oh!

bhimrazy commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bhimrazy commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist

Uh oh!

ekzhu left a comment

Choose a reason for hiding this comment

Uh oh!

bhimrazy commented Dec 28, 2025

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ekzhu commented Jan 2, 2026

Uh oh!

bhimrazy commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bhimrazy commented Nov 11, 2025 •

edited

Loading

codecov-commenter commented Jan 2, 2026 •

edited

Loading