Skip to content

Conversation

@jqnatividad
Copy link
Collaborator

resolves #2462

@jqnatividad jqnatividad requested a review from Copilot September 28, 2025 21:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the extdedup command to properly use memory-mapped on-disk hash tables instead of in-memory hash tables when memory limits are exceeded. The fix addresses issue #2462 by ensuring that the hash table operations work directly with the memory-mapped file rather than creating separate in-memory copies.

Key changes:

  • Refactored ExtDedupCache to work directly with memory-mapped hash tables instead of maintaining separate disk and memory structures
  • Added comprehensive test coverage for large dataset deduplication with memory constraints
  • Simplified the hash table initialization process to use proper memory-mapping

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/odhtcache.rs Refactored to use HashTable directly with memory mapping instead of HashTableOwned, removed duplicate disk field, and simplified initialization
tests/test_extdedup.rs Added comprehensive test for large dataset deduplication with memory limits and helper function to generate test data

jqnatividad and others added 4 commits September 28, 2025 18:04
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jqnatividad jqnatividad requested a review from Copilot September 28, 2025 22:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

@jqnatividad jqnatividad requested a review from Copilot September 28, 2025 23:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

@jqnatividad jqnatividad requested a review from Copilot September 28, 2025 23:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

@jqnatividad jqnatividad merged commit 6895d83 into master Sep 29, 2025
16 of 17 checks passed
@jqnatividad jqnatividad deleted the 2475-extdedup-really-use-memmapped-ondisk-hash-table branch September 29, 2025 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

extdedup memory-mapped on-disk hash table

2 participants