-
Notifications
You must be signed in to change notification settings - Fork 99
fix: extdedup really use memmapped ondisk hash table
#3020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: extdedup really use memmapped ondisk hash table
#3020
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the extdedup command to properly use memory-mapped on-disk hash tables instead of in-memory hash tables when memory limits are exceeded. The fix addresses issue #2462 by ensuring that the hash table operations work directly with the memory-mapped file rather than creating separate in-memory copies.
Key changes:
- Refactored ExtDedupCache to work directly with memory-mapped hash tables instead of maintaining separate disk and memory structures
- Added comprehensive test coverage for large dataset deduplication with memory constraints
- Simplified the hash table initialization process to use proper memory-mapping
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/odhtcache.rs | Refactored to use HashTable directly with memory mapping instead of HashTableOwned, removed duplicate disk field, and simplified initialization |
| tests/test_extdedup.rs | Added comprehensive test for large dataset deduplication with memory limits and helper function to generate test data |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
resolves #2462