Skip to content

[247] Implement MinHashLSHDeletionSession to speed up key deletions#272

Merged
ekzhu merged 6 commits intoekzhu:masterfrom
Varun0157:master
Nov 8, 2025
Merged

[247] Implement MinHashLSHDeletionSession to speed up key deletions#272
ekzhu merged 6 commits intoekzhu:masterfrom
Varun0157:master

Conversation

@Varun0157
Copy link
Copy Markdown
Contributor

(Resolves #247)

Implementation

We add a MinHashLSHDeletionSession, similar to MinHashLSHInsertionSession to support the batched removal of keys. In order to do so, we implement buffered operations for the Redis storage.

Note that buffered operations already seem to be implemented for the Cassandra storage layer, so no changes were required to bring about compatibility.

Testing

We add unit tests to ensure the MinHashLSHDeletionSession performs as expected.

Benchmarking

On comparing buffered and "non-buffered" key removal across a range of key counts, we find a consistent speedup using buffered operations.

Remote Redis Local Redis
remote_redis_deletion_benchmark local_redis_deletion_benchmark

The database resided in us-east-1 while the testing client was in South Asia.

The speedup incurred scales with number of keys since we're effectively bringing networks calls (the key bottleneck) down from $O(keys)$ to $O(\lceil\frac{keys}{buffer_size}\rceil)$.

The benchmarking script can be found here.

@Varun0157 Varun0157 requested a review from ekzhu November 6, 2025 22:33
Copy link
Copy Markdown
Owner

@ekzhu ekzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a separate PR for adding test_redis.py with integration tests.

@ekzhu ekzhu merged commit deddaec into ekzhu:master Nov 8, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

could add a MinHashLSHDeleteSession, similar as MinHashLSHInsertionSession

2 participants