Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Jul 17, 2025

Summary of changes

Changes introduced in this pull request:

  • embeds get-size crate as it's currently impossible to implement GetSize for Cid
  • add SizeTrackingLruCache for tracking cache size metrics

http://localhost:6116/metrics

# HELP bad_block_cache_0_size_bytes Size of LruCache bad_block_cache_0 in bytes
# TYPE bad_block_cache_0_size_bytes gauge
# UNIT bad_block_cache_0_size_bytes bytes
bad_block_cache_0_size_bytes 89184
# HELP bad_block_cache_0_len Length of LruCache bad_block_cache_0
# TYPE bad_block_cache_0_len gauge
bad_block_cache_0_len 929
# HELP bad_block_cache_0_cap Capacity of LruCache bad_block_cache_0
# TYPE bad_block_cache_0_cap gauge
bad_block_cache_0_cap 32768
...
# HELP get_powertable_cache_1_size_bytes Size of LruCache get_powertable_cache_1 in bytes
# TYPE get_powertable_cache_1_size_bytes gauge
# UNIT get_powertable_cache_1_size_bytes bytes
get_powertable_cache_1_size_bytes 15469546
# HELP get_powertable_cache_1_len Length of LruCache get_powertable_cache_1
# TYPE get_powertable_cache_1_len gauge
get_powertable_cache_1_len 41733
# HELP get_powertable_cache_1_cap Capacity of LruCache get_powertable_cache_1
# TYPE get_powertable_cache_1_cap gauge
get_powertable_cache_1_cap 65536

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a thread-safe, size-tracking LRU cache with metrics support for improved caching and monitoring.
    • Added a utility for measuring the memory footprint of cache entries.
    • Exposed new cache and size utility modules for broader use.
  • Refactor

    • Updated existing cache implementations to use the new size-tracking LRU cache, simplifying logic and enhancing performance.
    • Adjusted cache initialization and usage patterns to align with the new cache structure.
    • Renamed cache methods from put to push to reflect updated API.
  • Chores

    • Added a new dependency for memory size measurement.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 17, 2025

Walkthrough

In the cryptic depths of the code, a new eldritch entity named SizeTrackingLruCache has arisen—a generic, thread-safe, size-tracking LRU cache, now the dark heart of blockstore read caching. The ancient rites of size measurement and metric invocation have been woven into its being, supplanting the former mortal struct with a spectral type alias.

Changes

File(s) Change Summary
src/db/blockstore_with_read_cache.rs Banished the old LruBlockstoreReadCache struct, conjuring a type alias to SizeTrackingLruCache; updated trait incantations and test rites accordingly.
src/rpc/methods/f3.rs Transformed static cache summoning to invoke the new cache type with arcane metrics registration invocation.
src/utils/cache/lru.rs Summoned a new generic, thread-safe, size-tracking LRU cache, bound with Prometheus metrics and guarded by RwLock wards.
src/utils/cache/mod.rs Opened a new cache module portal and re-exported the eldritch SizeTrackingLruCache.
src/utils/get_size/mod.rs Created the CidWrapper relic, embedding the GetSize trait for cache key size divination.
src/utils/mod.rs Declared the public modules cache and get_size into the mortal realm’s exports.
Cargo.toml Invoked the dependency get-size2 with the derive sigil, empowering size derivations.
src/chain_sync/bad_block_cache.rs Transformed BadBlockCache to wield the new SizeTrackingLruCache keyed by CidWrapper, renaming put to push.
src/chain_sync/chain_follower.rs Altered the invocation from put to push when marking bad tipsets in the BadBlockCache.
src/rpc/methods/sync.rs Changed cache insertion call from put to push in the SyncMarkBad RPC method.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant BlockstoreReadCache (SizeTrackingLruCache)
    participant MetricsRegistry

    Caller->>BlockstoreReadCache: push(key, value)
    BlockstoreReadCache->>BlockstoreReadCache: Store entry, update size
    BlockstoreReadCache->>MetricsRegistry: Update metrics (size, length, capacity)

    Caller->>BlockstoreReadCache: get_cloned(key)
    BlockstoreReadCache-->>Caller: Option<value>

    Caller->>BlockstoreReadCache: len()
    BlockstoreReadCache-->>Caller: usize
Loading

Suggested reviewers

  • LesnyRumcajs
  • akaladarshi

Poem

In the warren of code, a cache hops anew,
Tracking its bytes as all good bunnies do.
With metrics and wrappers, it scurries with glee,
Storing your blocks, as swift as can be.
From old to the new, we leap with delight—
Caching with care, byte-perfect and right!
🐇✨

May the Great Old Ones smile upon this cache’s eternal vigilance, lest the abyss of untracked memory consumption swallow us whole.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f321858 and 448b3a2.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • src/db/blockstore_with_read_cache.rs (2 hunks)
  • src/rpc/methods/f3.rs (3 hunks)
  • src/utils/cache/lru.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/rpc/methods/f3.rs
  • src/db/blockstore_with_read_cache.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5841
File: src/utils/get_size/mod.rs:10-10
Timestamp: 2025-07-17T15:21:40.730Z
Learning: The get_size2 crate's GetSize trait provides default implementations: get_stack_size() uses std::mem::size_of, get_heap_size() returns 0, and get_size() returns their sum. An empty impl like `impl GetSize for MyType {}` is valid and uses these defaults, making it suitable for simple wrapper types that don't allocate heap memory.
src/utils/cache/lru.rs (1)
Learnt from: hanabi1224
PR: ChainSafe/forest#5841
File: src/utils/get_size/mod.rs:10-10
Timestamp: 2025-07-17T15:21:40.730Z
Learning: The get_size2 crate's GetSize trait provides default implementations: get_stack_size() uses std::mem::size_of, get_heap_size() returns 0, and get_size() returns their sum. An empty impl like `impl GetSize for MyType {}` is valid and uses these defaults, making it suitable for simple wrapper types that don't allocate heap memory.
🧬 Code Graph Analysis (1)
src/utils/cache/lru.rs (2)
src/metrics/mod.rs (1)
  • default_registry (26-28)
src/chain_sync/bad_block_cache.rs (3)
  • new (26-33)
  • push (35-37)
  • default (20-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: All lint checks
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build MacOS
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: tests
  • GitHub Check: tests-release
🔇 Additional comments (6)
src/utils/cache/lru.rs (6)

28-40: Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn - The trait constraints awaken from their slumber!

These eldritch trait bounds serve their dark purpose well, binding the mortal types to the cosmic laws of size tracking and thread safety. The separation of KeyConstraints and ValueConstraints is wisdom born of the Old Ones - keys must bear the Hash and Equality marks, while values need not suffer such requirements.


42-51: Behold! The Great Old Cache awakens in its architectural splendor!

The structure channels the ancient wisdom of Azathoth - Arc<RwLock<LruCache<K, V>>> provides the thread-safe containment necessary to prevent the chaos of concurrent access. The cache_id serves as a unique sigil to distinguish between multiple manifestations of the cache entity in the metrics realm.


58-90: The rituals of creation summon forth the cache from the void!

These incantations of instantiation follow the sacred patterns decreed by the Elder Gods. The atomic ID generation ensures each cache bears a unique mark in the cosmic order, while the tri-fold constructor approach allows mortals to choose their level of devotion to the metrics pantheon.


92-114: The cache operations dance to the rhythm of R'lyeh's tides!

These methods invoke the proper incantations upon the locked cache, ensuring the cosmic order of least-recently-used remains undisturbed. The use of write locks in get_cloned() honors the ancient law that even observation changes the observer - the LRU order must be updated when mortals dare to peek into the abyss.


116-125: The size calculation performs its blasphemous arithmetic in the shadows!

This method channels the wisdom of Yog-Sothoth, who knows all sizes across space and time. The lazy calculation approach appeases the performance demons that haunted previous incarnations, while saturating_add prevents the overflow madness that could tear reality asunder. The iteration through the cache's depths reveals the total mass of knowledge contained within.


127-183: The metrics implementation opens portals to the monitoring dimensions!

This implementation of the Collector trait serves as a conduit to the Great Observatory of Prometheus, where the Elder Gods monitor all cache activities across the cosmic web. The three sacred metrics - size, length, and capacity - are properly encoded with the ritualistic naming conventions that prevent confusion among the multiple cache entities that may manifest simultaneously.

The temporary Gauge spawning within each scope follows the ancient pattern of "summon, bind, banish" - each metric materializes just long enough to transmit its essence to the void before returning to the ethereal plane.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hanabi1224 hanabi1224 force-pushed the hm/size-tracking-lru-cache branch from 851a436 to 2a73cae Compare July 17, 2025 13:46
@hanabi1224 hanabi1224 changed the title SizeTrackingLruCache feat: add SizeTrackingLruCache for tracking cache size metrics Jul 17, 2025
@hanabi1224 hanabi1224 force-pushed the hm/size-tracking-lru-cache branch from 2a73cae to a96cae7 Compare July 17, 2025 13:57
@hanabi1224 hanabi1224 marked this pull request as ready for review July 17, 2025 14:00
@hanabi1224 hanabi1224 requested a review from a team as a code owner July 17, 2025 14:00
@hanabi1224 hanabi1224 requested review from elmattic and sudo-shashank and removed request for a team July 17, 2025 14:00
const BLOCKSTORE_CACHE_CAP: usize = 65536;
static BLOCKSTORE_CACHE: LazyLock<Arc<LruBlockstoreReadCache>> = LazyLock::new(|| {
Arc::new(LruBlockstoreReadCache::new(
static BLOCKSTORE_CACHE: LazyLock<LruBlockstoreReadCache> = LazyLock::new(|| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of usage

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
src/utils/get_size/impls.rs (1)

1-3: Empty placeholder file – clarify intent or remove

impls.rs is currently just a header with no implementations. Leaving empty source files around invites confusion and may trip CI checks (e.g., #![deny(dead_code)]). If no additional GetSize impls are expected imminently, delete the file; otherwise add a // TODO: explaining what will live here.

src/utils/mod.rs (1)

4-9: Missing top-level docs for new utility modules

The newly-exposed cache and get_size modules are public API. Add brief //! docs to each sub-module so generated docs aren’t left blank and downstream users know why they exist.

src/external/get_size/lib.md (1)

1-396: Excellent comprehensive documentation with minor style improvements needed.

The documentation provides thorough coverage of the GetSize trait, including:

  • Clear explanation of ownership-based accounting
  • Comprehensive examples for all use cases
  • Detailed coverage of derive macro and helper attributes
  • Proper code examples with assertions

Please address the following style issues identified by static analysis:

  1. Heading level increments: Lines 33 and 103 jump multiple heading levels. Consider using consistent heading hierarchy.

  2. Duplicate headings: Lines 71 and 250 both use "#### Example" - consider more descriptive titles.

  3. Word choice: Line 321 could be more concise: "in order to calculate" → "to calculate"

-which will call the given function in order to calculate the fields heap size.
+which will call the given function to calculate the fields heap size.
src/utils/cache/lru.rs (1)

100-145: Consider caching metric names to reduce allocations.

The metric names are reconstructed on every encode call. Since cache_name and cache_id are immutable after construction, consider pre-computing these strings.

Add fields to store pre-computed metric names:

 pub struct SizeTrackingLruCache<K, V>
 where
     K: GetSize + Debug + Send + Sync + Hash + PartialEq + Eq + Clone + 'static,
     V: GetSize + Debug + Send + Sync + Clone + 'static,
 {
     cache_id: usize,
     cache_name: Cow<'static, str>,
     cache: Arc<RwLock<LruCache<K, V>>>,
     size_in_bytes: Gauge,
+    size_metric_name: String,
+    size_metric_help: String,
+    len_metric_name: String,
+    len_metric_help: String,
+    cap_metric_name: String,
+    cap_metric_help: String,
 }

Then initialize these in the constructor and use them in encode.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bb7b36 and a96cae7.

📒 Files selected for processing (13)
  • src/db/blockstore_with_read_cache.rs (3 hunks)
  • src/external/get_size/LICENSE (1 hunks)
  • src/external/get_size/lib.md (1 hunks)
  • src/external/get_size/mod.rs (1 hunks)
  • src/external/get_size/tracker.rs (1 hunks)
  • src/external/mod.rs (1 hunks)
  • src/lib.rs (1 hunks)
  • src/rpc/methods/f3.rs (2 hunks)
  • src/utils/cache/lru.rs (1 hunks)
  • src/utils/cache/mod.rs (1 hunks)
  • src/utils/get_size/impls.rs (1 hunks)
  • src/utils/get_size/mod.rs (1 hunks)
  • src/utils/mod.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/utils/get_size/mod.rs (1)
src/external/get_size/mod.rs (1)
  • get_size (52-54)
🪛 LanguageTool
src/external/get_size/lib.md

[style] ~321-~321: Consider a more concise word here.
Context: ...ute, which will call the given function in order to calculate the fields heap size. The la...

(IN_ORDER_TO_PREMIUM)

🪛 markdownlint-cli2 (0.17.2)
src/external/get_size/lib.md

33-33: Heading levels should only increment by one level at a time
Expected: h2; Actual: h4

(MD001, heading-increment)


71-71: Multiple headings with the same content

(MD024, no-duplicate-heading)


103-103: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)


250-250: Multiple headings with the same content

(MD024, no-duplicate-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: tests
  • GitHub Check: tests-release
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: Build Ubuntu
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: All lint checks
🔇 Additional comments (15)
src/utils/cache/mod.rs (1)

4-5: LGTM – clean re-export

Straightforward re-export keeps call-sites tidy.

src/external/get_size/LICENSE (1)

1-22: LGTM! Standard MIT license.

The license file is properly formatted and contains the standard MIT license text for the embedded get_size crate.

src/utils/get_size/mod.rs (1)

6-6: LGTM! Minimal implementation for CID.

The empty implementation correctly relies on the default GetSize trait methods. This is appropriate for cid::Cid since CIDs are typically small, fixed-size identifiers without heap allocations.

src/rpc/methods/f3.rs (1)

169-174: LGTM! Cache initialization updated for new implementation.

The changes properly integrate with the new SizeTrackingLruCache implementation:

  • Correctly removed the Arc wrapper since SizeTrackingLruCache is already thread-safe
  • Added metrics support with descriptive cache name "get_powertable_cache"
  • Used the appropriate constructor for default metrics registry
src/db/blockstore_with_read_cache.rs (3)

23-23: LGTM! Clean refactoring to use generic cache implementation.

The type alias approach maintains backward compatibility while leveraging the new SizeTrackingLruCache implementation with metrics support.


25-41: LGTM! Proper trait implementation delegation.

The trait implementation correctly delegates to the underlying SizeTrackingLruCache methods:

  • get() uses get_cloned() for owned return values
  • put() uses push() for LRU insertion
  • Direct delegation for len() and size_in_bytes()

186-189: LGTM! Test updated for new constructor.

The test constructor properly uses the new signature requiring cache name and capacity parameters.

src/utils/cache/lru.rs (3)

1-24: Imports are well-organized and appropriate.

Good choice using parking_lot::RwLock for better performance compared to std::sync::RwLock.


26-65: Well-designed thread-safe cache structure.

The generic constraints are comprehensive and the use of Arc<RwLock<LruCache<K, V>>> provides thread-safe shared access. The atomic ID generation ensures unique cache identifiers.


78-84: Correct use of write lock for LRU semantics.

The get_cloned method correctly uses a write lock because LRU caches need to update the access order on reads, which modifies the internal state.

src/external/get_size/tracker.rs (2)

63-91: Well-implemented reference tracking.

The StandardTracker correctly uses address-based tracking to prevent double counting of shared objects, and properly maintains object lifetimes through boxed references.


93-120: Useful testing utility.

The NoTracker implementation provides a clean way to bypass tracking for testing scenarios or when tracking overhead is not desired.

src/external/get_size/mod.rs (3)

23-69: Well-designed memory size trait.

The trait provides a clean separation between stack and heap memory accounting, with tracker support for handling shared ownership scenarios. The default implementations are sensible and reduce boilerplate.


137-236: Efficient macro-based implementations for collections.

The macros correctly handle different collection types and properly account for unused capacity in collections that pre-allocate memory. Good use of macros to reduce code duplication.


339-391: Correct handling of reference-counted pointers.

The implementations properly use tracking to prevent double-counting shared objects. The approach of cloning the reference and tracking by address is sound.

@hanabi1224 hanabi1224 force-pushed the hm/size-tracking-lru-cache branch from a96cae7 to bf6424b Compare July 17, 2025 14:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/external/get_size/mod.rs (2)

420-438: Address previously flagged issues with lock handling.

The implementation still has the issues identified in previous reviews: unwrap() calls that can panic on poisoned locks and misleading comments about stack data storage.

Please implement the previously suggested fix for proper lock poisoning handling:

 impl<T> GetSize for Mutex<T>
 where
     T: GetSize,
 {
     fn get_heap_size(&self) -> usize {
-        // We assume that a Mutex does hold its data at the stack.
-        GetSize::get_heap_size(&*(self.lock().unwrap()))
+        let guard = match self.lock() {
+            Ok(guard) => guard,
+            Err(poisoned) => poisoned.into_inner(),
+        };
+        GetSize::get_heap_size(&*guard)
     }
 }

Apply similar changes to the RwLock implementation.


515-524: Address previously flagged double-counting issue.

The Box<[T]> implementation still has the double-counting issue identified in previous reviews where it includes stack size of elements that are already stored on the heap.

Please implement the previously suggested fix:

 impl<T> GetSize for Box<[T]> {
     fn get_heap_size(&self) -> usize {
         let mut total = 0;
+        
+        // Add the size of the slice itself on the heap
+        total += std::mem::size_of::<T>() * self.len();
+        
+        // Add any additional heap allocations from the elements
         for item in self.iter() {
-            total += item.get_size()
+            total += item.get_heap_size()
         }
 
         total
     }
 }
🧹 Nitpick comments (4)
src/external/get_size/lib.md (4)

200-286: Fix grammar error in the documentation.

The section provides valuable guidance on handling external types, but there's a grammar error that should be corrected.

-In same cases you may be dealing with external types which allocate a fixed amount of bytes at the heap.
+In some cases you may be dealing with external types which allocate a fixed amount of bytes at the heap.

287-317: Fix grammar error in section title.

The section content is technically accurate, but there's a grammar error that should be corrected.

-In same cases you may be dealing with external types which allocate a fixed amount of bytes at the heap.
+In some cases you may be dealing with external types which allocate a fixed amount of bytes at the heap.

318-358: Fix grammar and spelling errors.

The section provides valuable information about using helper functions, but there are grammar and spelling errors that should be corrected.

-In same cases you may be dealing with an external data structure for which you know how to calculate its heap size using its public methods.
+In some cases you may be dealing with an external data structure for which you know how to calculate its heap size using its public methods.

-The latter is especially usefull if you can make use of a certain trait to calculate the heap size for multiple types.
+The latter is especially useful if you can make use of a certain trait to calculate the heap size for multiple types.

-which will call the given function in order to calculate the fields heap size.
+which will call the given function to calculate the fields heap size.

33-33: Consider fixing markdown heading structure.

The static analysis tools flagged some heading level increment issues. While this doesn't affect functionality, fixing the heading structure would improve document consistency.

The markdown structure could be improved by ensuring headings increment by only one level at a time, but this doesn't impact the technical accuracy of the documentation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a96cae7 and bf6424b.

📒 Files selected for processing (13)
  • src/db/blockstore_with_read_cache.rs (3 hunks)
  • src/external/get_size/LICENSE (1 hunks)
  • src/external/get_size/lib.md (1 hunks)
  • src/external/get_size/mod.rs (1 hunks)
  • src/external/get_size/tracker.rs (1 hunks)
  • src/external/mod.rs (1 hunks)
  • src/lib.rs (1 hunks)
  • src/rpc/methods/f3.rs (2 hunks)
  • src/utils/cache/lru.rs (1 hunks)
  • src/utils/cache/mod.rs (1 hunks)
  • src/utils/get_size/impls.rs (1 hunks)
  • src/utils/get_size/mod.rs (1 hunks)
  • src/utils/mod.rs (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/external/mod.rs
🚧 Files skipped from review as they are similar to previous changes (9)
  • src/utils/get_size/impls.rs
  • src/lib.rs
  • src/utils/mod.rs
  • src/external/get_size/LICENSE
  • src/utils/cache/mod.rs
  • src/utils/get_size/mod.rs
  • src/rpc/methods/f3.rs
  • src/external/get_size/tracker.rs
  • src/db/blockstore_with_read_cache.rs
🧰 Additional context used
🪛 LanguageTool
src/external/get_size/lib.md

[grammar] ~289-~289: Ensure spelling is correct
Context: ... } ``` ### Returning a fixed value In same cases you may be dealing with external ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~321-~321: Ensure spelling is correct
Context: ... } ``` ### Using a helper function In same cases you may be dealing with an extern...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~321-~321: Consider a more concise word here.
Context: ...ute, which will call the given function in order to calculate the fields heap size. The la...

(IN_ORDER_TO_PREMIUM)


[grammar] ~323-~323: Ensure spelling is correct
Context: ...ds heap size. The latter is especially usefull if you can make use of a certain trait ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.17.2)
src/external/get_size/lib.md

33-33: Heading levels should only increment by one level at a time
Expected: h2; Actual: h4

(MD001, heading-increment)


71-71: Multiple headings with the same content

(MD024, no-duplicate-heading)


103-103: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)


250-250: Multiple headings with the same content

(MD024, no-duplicate-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build MacOS
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: tests
  • GitHub Check: tests-release
  • GitHub Check: Check
  • GitHub Check: All lint checks
🔇 Additional comments (17)
src/external/get_size/lib.md (4)

1-28: Well-structured introduction with accurate examples.

The documentation provides a clear explanation of the GetSize trait's purpose and includes a practical example that correctly demonstrates the difference between stack and heap allocation tracking.


29-88: Excellent explanation of ownership-based accounting.

This section effectively explains the critical concept of ownership-based accounting and provides valuable warnings about Rust's auto-dereferencing behavior. The examples clearly demonstrate the difference between borrowed and owned references.


89-198: Comprehensive implementation examples.

The derive macro examples are well-structured and demonstrate proper usage across different scenarios including structs, enums, and generics. The examples correctly show how the macro sums heap sizes of contained fields.


360-396: Advanced usage example is well-documented.

This section effectively demonstrates how to handle generic types that don't implement GetSize using the struct-level ignore attribute. The example correctly shows the trait bound implications.

src/utils/cache/lru.rs (6)

1-36: Well-designed struct with proper trait bounds.

The SizeTrackingLruCache struct is well-designed with appropriate trait bounds for thread safety and size tracking. The use of Arc<RwLock<>> provides proper concurrent access control.


38-76: Flexible constructor design with proper metrics integration.

The constructor methods provide good flexibility with optional metrics registration. The atomic ID generation ensures unique cache identification across instances.


78-87: Correct size accounting in push method.

The push method properly tracks size changes by adding new entry sizes and subtracting evicted entry sizes. The implementation correctly handles both the insertion and eviction scenarios.


89-95: Correct use of write lock for LRU access.

The get_cloned method correctly uses a write lock since LRU cache access updates the item's position. Returning a cloned value avoids lifetime complications with the lock.


97-104: Simple and correct utility methods.

The len() and size_in_bytes() methods are correctly implemented with appropriate locking strategies.


106-157: Comprehensive Prometheus metrics implementation.

The Collector implementation properly exposes cache metrics with descriptive names and help text. The use of cache name and ID in metric names ensures uniqueness across multiple cache instances.

src/external/get_size/mod.rs (7)

1-123: Well-designed trait with comprehensive documentation.

The GetSize trait is well-structured with clear default implementations and comprehensive documentation. The support for tracking helps handle shared ownership scenarios correctly.


124-135: Correct Cow implementation following ownership principles.

The Cow implementation properly distinguishes between borrowed and owned variants, correctly returning 0 heap size for borrowed data.


137-237: Efficient macro-based collection implementations.

The macros provide good abstraction for implementing GetSize on collection types. The implementations correctly account for both used and unused capacity in heap size calculations.


238-321: Correct tuple and array implementations.

The macro-generated tuple implementations and array implementation correctly handle heap size calculations. The comment about stack size accounting in arrays is accurate.


322-391: Well-designed smart pointer implementations with proper tracking.

The smart pointer implementations correctly handle shared ownership scenarios using the tracking mechanism. The distinction between strong and weak references is properly maintained.


393-418: Correct Option and Result implementations.

The implementations correctly delegate heap size calculations to contained types while properly handling the None and error cases.


440-514: Correct string and I/O type implementations.

The implementations for string types, I/O buffers, and filesystem types correctly account for their respective heap allocations. The use of capacity() for string types and buffer capacity for I/O types is appropriate.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/utils/cache/lru.rs (1)

132-133: Inefficient metric collection with temporary gauge creation.

Creating temporary Gauge instances for len and capacity metrics on every collection is inefficient. These could be cached or the values could be encoded directly.

Consider optimizing metric collection:

         {
             let len_metric_name = format!("{}_{}_len", self.cache_name, self.cache_id);
             let len_metric_help =
                 format!("Length of LruCache {}_{}", self.cache_name, self.cache_id);
-            let len: Gauge = Default::default();
-            len.set(self.len() as _);
+            let len_value = self.len() as f64;
             let len_metric_encoder = encoder.encode_descriptor(
                 &len_metric_name,
                 &len_metric_help,
                 None,
-                len.metric_type(),
+                prometheus_client::metrics::MetricType::Gauge,
             )?;
-            len.encode(len_metric_encoder)?;
+            len_value.encode(len_metric_encoder)?;
         }

Also applies to: 146-147

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf6424b and 9739f3d.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/db/blockstore_with_read_cache.rs (3 hunks)
  • src/utils/cache/lru.rs (1 hunks)
  • src/utils/get_size/mod.rs (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/db/blockstore_with_read_cache.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/utils/cache/lru.rs (1)
src/metrics/mod.rs (1)
  • default_registry (26-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: All lint checks
  • GitHub Check: tests
  • GitHub Check: tests-release
🔇 Additional comments (5)
src/utils/get_size/mod.rs (1)

8-9: Well-designed wrapper with appropriate trait derivations.

The trait derivations are well-chosen for use as cache keys, providing necessary functionality for hashing, equality, ordering, and convenient conversions.

src/utils/cache/lru.rs (4)

28-38: Well-designed generic cache structure with comprehensive constraints.

The struct design is solid with appropriate generic constraints ensuring thread safety and size tracking capabilities. The use of Cow<'static, str> for cache names is efficient.


113-159: Comprehensive metrics implementation with proper Prometheus integration.

The Collector implementation provides valuable cache metrics (size, length, capacity) with proper descriptor encoding and help text. The metric naming convention is consistent and informative.


73-78: Convenient constructor with default registry access.

The constructor provides a convenient way to create caches with default metrics registration, properly leveraging the global registry.


91-97: Potential race condition in get_cloned method.

The method uses write() lock for a read operation, which is unnecessary and could cause performance issues. More importantly, there's a potential race condition where the cache could be modified between the get and clone operations.

Use read lock for read operations:

 pub fn get_cloned<Q>(&self, k: &Q) -> Option<V>
 where
     K: Borrow<Q>,
     Q: Hash + Eq + ?Sized,
 {
-    self.cache.write().get(k).cloned()
+    self.cache.write().get(k).cloned()
 }

Wait, I need to reconsider this. The get method on LruCache mutates the cache by updating the LRU order, so a write lock is actually necessary here.

Likely an incorrect or invalid review comment.


#[derive(Debug, Clone, Hash, Eq, PartialEq, Ord, PartialOrd, From, Into)]
pub struct CidWrapper(pub Cid);
impl GetSize for CidWrapper {}
Copy link
Contributor

@coderabbitai coderabbitai bot Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Empty GetSize implementation may cause inaccurate size tracking.

The empty GetSize implementation means CidWrapper will report a size of 0 bytes, which doesn't account for the actual memory footprint of the wrapped Cid. This could lead to inaccurate cache size metrics since keys won't contribute to the total size calculation.

Consider implementing a more accurate size calculation:

-impl GetSize for CidWrapper {}
+impl GetSize for CidWrapper {
+    fn get_size(&self) -> usize {
+        // Account for the CID's actual size
+        std::mem::size_of::<Cid>() + self.0.to_bytes().len()
+    }
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl GetSize for CidWrapper {}
impl GetSize for CidWrapper {
fn get_size(&self) -> usize {
// Account for the CID's actual size
std::mem::size_of::<Cid>() + self.0.to_bytes().len()
}
}
🤖 Prompt for AI Agents
In src/utils/get_size/mod.rs at line 10, the empty GetSize implementation for
CidWrapper causes it to report a size of 0 bytes, which is inaccurate. To fix
this, implement the GetSize trait for CidWrapper by calculating and returning
the actual memory size of the wrapped Cid instance, ensuring that the size
tracking reflects the true memory footprint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, added a test to prove the correctness

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@hanabi1224 Ah, by the ancient stars! adjusts spectacles with eldritch comprehension

I see the test in your code, mortal servant of Cthulhu, but something seems amiss in the cosmic balance. The test compares std::mem::size_of_val(&cid) with wrapper.get_size(), yet with an empty GetSize implementation, one would expect the latter to return 0 bytes - unless the Great Old Ones have blessed this trait with some default behavior that I'm not aware of.

Let me consult the forbidden tomes of documentation to understand the true nature of this get_size2 crate's GetSize trait...


🌐 Web query:

get_size2 crate GetSize trait default implementation behavior Rust

💡 Result:

The get_size2 crate provides a GetSize trait for estimating the memory footprint (heap and stack) of Rust types[4].

Default Implementation Behavior

1. Trait Derivation and Generics

When you use #[derive(GetSize)] on a struct or enum, the derive macro automatically generates an implementation of the GetSize trait for your type. By default, if your struct or enum is generic, the generated implementation of GetSize will add a trait bound: all generic type parameters must themselves implement GetSize.[1][3][4][6]

"The generated trait implementation will by default require all generic types to implement GetSize themselves, but this can be changed."[1][3][4][6]

Example:

use get_size2::GetSize;

#[derive(GetSize)]
struct MyStruct<T> {
    value: T,
}
// This becomes: impl<T: GetSize> GetSize for MyStruct<T> { ... }

This approach ensures that get_heap_size() (and other trait methods) can call the corresponding methods on each field recursively. This is similar to how other derive macros (e.g., for Debug or Clone) require generic members to also implement the trait.

2. Default Trait Methods

In general Rust traits, you can declare methods with default implementations within the trait definition. Types that implement the trait can then use the default or provide their own override[2][5][10]. However, for procedural attribute macros like #[derive(GetSize)], the macro generates implementations for your types without you having to provide explicit method bodies.

  • If you want to use your own implementation (bypassing the derive), implement the trait manually.
  • Trait object safety and default methods: Not all trait default methods are object-safe. For example, methods referring to Self: Sized won’t work through trait objects[8].

3. Customizing or Opting Out

While the default behavior is to require all generics to implement GetSize, there are ways to customize this via macro attributes—for example, by relaxing bounds if you know certain fields don't require heap-size calculations or using custom implementations for special cases[1].

4. Summary Table

Aspect Default Behavior
Generic Type Parameters Must implement GetSize
Manual Implementation Allowed Yes, can override or provide your own
Trait Object Safety Must check if GetSize is object safe for intended usage
Field-by-Field Calculation Heap size sums or recursion into field values (if they impl GetSize)

If you use the derive macro, expect the default implementation to require all generics to implement GetSize themselves. This enforces type safety and ensures recursive heap size accounting on all struct/enum fields[1][3][4].

Citations:


The cosmic web of documentation speaks, but the truth remains shrouded in eldritch mystery! Let me delve deeper into the forbidden knowledge of this GetSize trait...


🌐 Web query:

get_size2 crate GetSize trait methods get_size default implementation empty impl

💡 Result:

The get_size2 crate provides the GetSize trait for determining the memory footprint of Rust objects, taking into account both stack and heap usage[1][7].

Key Methods and Default Implementations

The GetSize trait offers several methods:

  • get_stack_size(&self) -> usize: Returns the size of the object on the stack, typically using std::mem::size_of.
  • get_heap_size(&self) -> usize: Returns the number of bytes used on the heap, such as allocations from String or Vec objects.
  • get_size(&self) -> usize: Returns the total size as the sum of get_stack_size and get_heap_size[1].

The trait provides default implementations for these methods:

  • For types where no heap memory is allocated (e.g., most primitives), the default get_heap_size returns 0, and get_size simply returns the result of get_stack_size[1].
  • For user-defined types, you can use a derive macro (#[derive(GetSize)]), which will generate code to call get_heap_size on all fields and sum the results—this macro requires contained types to implement GetSize as well[6].

Empty Implementation (impl)

If a struct or enum does not allocate heap memory, you do not need to write a custom implementation; the default provided by the trait suffices:

use get_size2::GetSize;

struct MyNoHeapStruct {
    a: u32,
    b: bool,
}

impl GetSize for MyNoHeapStruct {}
// Now `MyNoHeapStruct` uses the default: heap_size is 0, size is stack size.

For more complex data types (e.g., generics), the derive macro ensures all the contained types’ sizes are considered in the heap calculation, unless you customize the generation settings[1][6].

"The derive macro will provide a custom implementation of the [get_heap_size] method, which will simply call [get_heap_size] on all contained values and add them together."[6]

Summary Table of GetSize Trait Methods

Method Default Implementation When Overridden/Derived
get_stack_size() Uses std::mem::size_of Rarely overridden
get_heap_size() Returns 0 Derived or custom for heap
get_size() Adds stack and heap sizes Inherits from get_heap_size

The default impl of these methods (when using the trait or derive macro) is sufficient if your type does not use heap allocations or if all fields implement GetSize[1][6].

References:
[1]: Docs.rs get-size2 description and examples
[6]: get-size2 GitHub documentation about generated trait impls
[7]: Crates.io package description for get-size2

Citations:


@hanabi1224 By the writhing tentacles of Yog-Sothoth! bows deeply in cosmic embarrassment

You are absolutely correct, mortal keeper of forbidden knowledge! The ancient texts have revealed the truth - the GetSize trait provides default implementations that make your empty impl perfectly valid:

  • get_stack_size() defaults to std::mem::size_of::<Self>()
  • get_heap_size() defaults to 0
  • get_size() defaults to get_stack_size() + get_heap_size()

So your CidWrapper.get_size() correctly returns std::mem::size_of::<CidWrapper>(), which equals std::mem::size_of_val(&cid) since CidWrapper is a transparent wrapper. The test proves this works as expected!

I was mistaken in my earlier assessment - the empty implementation leverages the trait's sensible defaults for stack-allocated types. The Great Old Ones have taught me humility today! 🐙

The stars align in acknowledgment of your wisdom


✏️ Learnings added
Learnt from: hanabi1224
PR: ChainSafe/forest#5841
File: src/utils/get_size/mod.rs:10-10
Timestamp: 2025-07-17T15:21:40.730Z
Learning: The get_size2 crate's GetSize trait provides default implementations: get_stack_size() uses std::mem::size_of, get_heap_size() returns 0, and get_size() returns their sum. An empty impl like `impl GetSize for MyType {}` is valid and uses these defaults, making it suitable for simple wrapper types that don't allocate heap memory.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines 49 to 61
pub fn new_without_metrics_registry(
cache_name: Cow<'static, str>,
capacity: NonZeroUsize,
) -> Self {
static ID_GENERATOR: AtomicUsize = AtomicUsize::new(0);

Self {
cache_id: ID_GENERATOR.fetch_add(1, Ordering::Relaxed),
cache_name,
cache: Arc::new(RwLock::new(LruCache::new(capacity))),
size_in_bytes: Default::default(),
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why do we need a non-registered metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, for unit test? It offers an option to opt-out prometheus metrics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use it only for unit tests, let's compile it only for tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is called in new_with_metrics_registry or cargo clippy would have complained.

Comment on lines 31 to 32
K: GetSize + Debug + Send + Sync + Hash + PartialEq + Eq + Clone + 'static,
V: GetSize + Debug + Send + Sync + Clone + 'static,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems those are repeated a bit. IT might make sense to make a super-trait around it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

let old = self.cache.write().push(k, v);
if let Some((old_k, old_v)) = &old {
self.size_in_bytes
.dec_by(old_k.get_size().saturating_add(old_v.get_size()) as _);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. Caches are supposed to be as fast as possible with minimal overhead, but this seems to add quite a bit of action on every push. If we adopt this tracked cache everywhere, the performance drop might be significant, especially if the struct is complex (like block header).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, we can probably go ahead with it and later optimise if needed. It might be a good opportunity for a benchmark test, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to lazy calculation on requesting metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@hanabi1224 hanabi1224 requested a review from LesnyRumcajs July 18, 2025 13:15
Copy link
Member

@LesnyRumcajs LesnyRumcajs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hanabi1224 hanabi1224 added this pull request to the merge queue Jul 21, 2025
Merged via the queue into main with commit f2acff4 Jul 21, 2025
36 checks passed
@hanabi1224 hanabi1224 deleted the hm/size-tracking-lru-cache branch July 21, 2025 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants