Skip to content

Disable DiskCache in hf_xet, continue to use it in git_xet#535

Merged
rajatarya merged 10 commits intomainfrom
rajat/make_chunk_cache_optin
Oct 23, 2025
Merged

Disable DiskCache in hf_xet, continue to use it in git_xet#535
rajatarya merged 10 commits intomainfrom
rajat/make_chunk_cache_optin

Conversation

@rajatarya
Copy link
Collaborator

@rajatarya rajatarya commented Oct 23, 2025

cc: @assafvayner - would welcome your feedback but I know you are out this week.

In this PR:
2. hf_xet : disables DiskCache by default.
3. git_xet : continues to use DiskCache by default, set to 10GB as before.

I am testing it manually while putting up the review - so might need more commits to get it fully working. Right now all the unit-tests pass but I haven't verified the functionality with manual testing yet.

- random eviction
- implements ChunkCache trait
- hf-xet default chunk_cache is 0 bytes
- MemoryCache default size is 20% of system RAM, but configurable
- git_xet uses DiskCache with default 10GB disk cache
- hf_xet uses MemoryCache with default size being 20% of system RAM
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a MemoryCache implementation for the ChunkCache trait and configures hf_xet to use RAM-based caching (20% of system RAM) instead of disk-based caching, while git_xet continues using DiskCache (10GB).

Key changes:

  • Implements MemoryCache with LRU-style eviction and configurable capacity based on system RAM percentage
  • Routes cache strategy selection through the cache_size configuration parameter (0 = MemoryCache, >0 = DiskCache)
  • Refactors disk cache constants to distinguish between overall capacity and per-file limits

Reviewed Changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
chunk_cache/src/memory.rs New MemoryCache implementation with random eviction policy
chunk_cache/src/lib.rs Exports MemoryCache and adds MEMORY_CACHE_PERCENTAGE configuration
chunk_cache/src/disk.rs Refactors cache size constants and updates import statements
chunk_cache/Cargo.toml Adds sysinfo dependency for system memory detection
cas_client/src/remote_client.rs Adds with_cache() constructor for custom cache injection
cas_client/src/lib.rs Re-exports MemoryCache types and constants
data/src/remote_client_interface.rs Implements cache strategy selection logic based on cache_size
data/Cargo.toml Adds chunk_cache dependency
hf_xet/src/lib.rs Sets HF_XET_CHUNK_CACHE_SIZE_BYTES=0 to enable MemoryCache

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@rajatarya rajatarya changed the title (feature): Add MemoryCache to chunk_cache crate, use in hf_xet Disable DiskCache in hf_xet, continue to use it in git_xet Oct 23, 2025
@rajatarya
Copy link
Collaborator Author

@seanses @hoytak : should be ready to review now - removed MemoryCache and I think addressed feedback on using with_cache_size in TranslatorConfig.

Copy link
Collaborator

@seanses seanses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rajatarya rajatarya merged commit f2d1587 into main Oct 23, 2025
6 checks passed
@rajatarya rajatarya deleted the rajat/make_chunk_cache_optin branch October 23, 2025 23:28
hoytak added a commit that referenced this pull request Nov 6, 2025
hoytak added a commit that referenced this pull request Nov 10, 2025
This PR disables the disk cache by default in hf_xet using cargo
features instead of in-code logic.

Reverts #535
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants