-
Notifications
You must be signed in to change notification settings - Fork 182
fix: reduce db read ops in chain export #5868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduce a new method for serializing and computing CIDs for block headers, replacing direct CID computation with a process that returns both the CID and serialized bytes. Additionally, chain traversal logic is refactored to propagate optional block data alongside CIDs, optimizing data retrieval by using available block data when possible and minimizing unnecessary database fetches. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant RawBlockHeader
participant Multihash
participant CID
Client->>RawBlockHeader: car_block()
RawBlockHeader->>RawBlockHeader: serialize to CBOR (Vec<u8>)
RawBlockHeader->>Multihash: blake2b256(serialized bytes)
Multihash-->>RawBlockHeader: multihash digest
RawBlockHeader->>CID: create CID(multihash, codec)
CID-->>RawBlockHeader: CID
RawBlockHeader-->>Client: (CID, serialized bytes)
sequenceDiagram
participant ChainStream
participant Queue
participant DB
ChainStream->>Queue: pop (Cid, Option<Vec<u8>>)
alt Data present
Queue-->>ChainStream: (Cid, Some(data))
ChainStream-->>Client: emit (Cid, data)
else Data missing
Queue-->>ChainStream: (Cid, None)
ChainStream->>DB: fetch data for Cid
alt Data found
DB-->>ChainStream: data
ChainStream-->>Client: emit (Cid, data)
else Data not found
DB-->>ChainStream: error
ChainStream-->>Client: error (if fail_on_dead_links)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/blocks/header.rs(2 hunks)src/ipld/util.rs(11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: All lint checks
🔇 Additional comments (7)
src/blocks/header.rs (2)
11-22: Clean import reorganization!The imports are well-organized and the addition of multihash utilities is appropriate for the new
car_block()functionality.
103-110: Excellent optimization for reducing DB reads!The
car_block()method efficiently returns both the CID and serialized bytes, which aligns perfectly with the PR objective of reducing database read operations during chain export.src/ipld/util.rs (5)
105-105: Smart optimization for Task enum!Adding optional block data to the
Emitvariant enables reusing already-serialized data, avoiding redundant database lookups.
190-201: Well-implemented data retrieval optimization!The logic correctly prioritizes using provided block data before falling back to database fetches, maintaining the same error handling behavior.
291-291: Consistent queue type update!The queue type change to
Vec<(Cid, Option<Vec<u8>>)>properly supports the optimization across both stream implementations.
468-491: Consistent optimization in unordered stream!The implementation properly handles optional block data, mirroring the optimization in
ChainStreamwhile maintaining appropriate error handling.
559-565: Good refactoring!Extracting the
ipld_to_cidfunction improves code readability and eliminates duplication.
f789dad to
7bf39d8
Compare
7bf39d8 to
42b9081
Compare
|
Do you have some metrics for mainnet? |
|
@LesnyRumcajs no, mainnet snap export is slow on my laptop. I can measure with a DO droplet if you prefer. |
|
Yes, please measure it. Let's have at least five runs before the change and five runs after, optimally from exactly the same tipset. |
@LesnyRumcajs Instead of using DO droplet with shared CPU, I tested on my laptop with |
Summary of changes
(Originally part of #5867)
This PR makes some small refactoring to
stream_exportandunordered_stream_exportto reduce db read operations.I see ~5% perf gain in calibnet snapshot export on my laptop
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit
Summary by CodeRabbit