fix: reduce db read ops in chain export #5868

hanabi1224 · 2025-07-28T12:17:22Z

Summary of changes

(Originally part of #5867)
This PR makes some small refactoring to stream_export and unordered_stream_export to reduce db read operations.

I see ~5% perf gain in calibnet snapshot export on my laptop

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

New Features
- Improved block streaming performance by carrying block data alongside CIDs, reducing redundant data retrieval.
Refactor
- Updated internal handling of block identifiers and data to optimize processing and retrieval.
- Enhanced block header serialization and CID computation for improved consistency.
Bug Fixes
- Enhanced error handling for missing block data during chain traversal.

coderabbitai · 2025-07-28T12:17:28Z

Walkthrough

The changes introduce a new method for serializing and computing CIDs for block headers, replacing direct CID computation with a process that returns both the CID and serialized bytes. Additionally, chain traversal logic is refactored to propagate optional block data alongside CIDs, optimizing data retrieval by using available block data when possible and minimizing unnecessary database fetches.

Changes

Cohort / File(s)	Change Summary
Block Header Serialization & CID Computation `src/blocks/header.rs`	Refactored `RawBlockHeader` to add `car_block()` for CBOR serialization and CID computation using Blake2b256. Updated `cid()` to use `car_block()`. Adjusted import statements.
Chain Traversal with Block Data Propagation `src/ipld/util.rs`	Changed `Task::Emit` to carry both `Cid` and optional block data. Updated traversal queues in `ChainStream` and `UnorderedChainStream` to handle `(Cid, Option<Vec<u8>>)` tuples. Refactored logic to use block data when available, reducing redundant database fetches. Extracted `ipld_to_cid`.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RawBlockHeader
    participant Multihash
    participant CID

    Client->>RawBlockHeader: car_block()
    RawBlockHeader->>RawBlockHeader: serialize to CBOR (Vec<u8>)
    RawBlockHeader->>Multihash: blake2b256(serialized bytes)
    Multihash-->>RawBlockHeader: multihash digest
    RawBlockHeader->>CID: create CID(multihash, codec)
    CID-->>RawBlockHeader: CID
    RawBlockHeader-->>Client: (CID, serialized bytes)

sequenceDiagram
    participant ChainStream
    participant Queue
    participant DB

    ChainStream->>Queue: pop (Cid, Option<Vec<u8>>)
    alt Data present
        Queue-->>ChainStream: (Cid, Some(data))
        ChainStream-->>Client: emit (Cid, data)
    else Data missing
        Queue-->>ChainStream: (Cid, None)
        ChainStream->>DB: fetch data for Cid
        alt Data found
            DB-->>ChainStream: data
            ChainStream-->>Client: emit (Cid, data)
        else Data not found
            DB-->>ChainStream: error
            ChainStream-->>Client: error (if fail_on_dead_links)
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7bf39d8 and 42b9081.

📒 Files selected for processing (2)

src/blocks/header.rs (2 hunks)
src/ipld/util.rs (11 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/blocks/header.rs
src/ipld/util.rs

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hm/reduce-db-read-in-chain-export

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19c11c8 and 3108e1b.

📒 Files selected for processing (2)

src/blocks/header.rs (2 hunks)
src/ipld/util.rs (11 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: tests-release
GitHub Check: tests
GitHub Check: cargo-publish-dry-run
GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: All lint checks

🔇 Additional comments (7)

src/blocks/header.rs (2)

11-22: Clean import reorganization!

The imports are well-organized and the addition of multihash utilities is appropriate for the new car_block() functionality.

103-110: Excellent optimization for reducing DB reads!

The car_block() method efficiently returns both the CID and serialized bytes, which aligns perfectly with the PR objective of reducing database read operations during chain export.

src/ipld/util.rs (5)

105-105: Smart optimization for Task enum!

Adding optional block data to the Emit variant enables reusing already-serialized data, avoiding redundant database lookups.

190-201: Well-implemented data retrieval optimization!

The logic correctly prioritizes using provided block data before falling back to database fetches, maintaining the same error handling behavior.

291-291: Consistent queue type update!

The queue type change to Vec<(Cid, Option<Vec<u8>>)> properly supports the optimization across both stream implementations.

468-491: Consistent optimization in unordered stream!

The implementation properly handles optional block data, mirroring the optimization in ChainStream while maintaining appropriate error handling.

559-565: Good refactoring!

Extracting the ipld_to_cid function improves code readability and eliminates duplication.

src/blocks/header.rs

LesnyRumcajs · 2025-07-28T12:55:20Z

Do you have some metrics for mainnet?

hanabi1224 · 2025-07-28T13:07:52Z

@LesnyRumcajs no, mainnet snap export is slow on my laptop. I can measure with a DO droplet if you prefer.

LesnyRumcajs · 2025-07-28T16:19:50Z

Yes, please measure it. Let's have at least five runs before the change and five runs after, optimally from exactly the same tipset.

hanabi1224 · 2025-07-30T10:00:06Z

Yes, please measure it. Let's have at least five runs before the change and five runs after, optimally from exactly the same tipset.

@LesnyRumcajs Instead of using DO droplet with shared CPU, I tested on my laptop with time forest-cli snapshot export --dry-run -d 900 -t 5184000, I see ~2% perf gain on mainnet.

# PR branch
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    28m58.045s
user    0m0.278s
sys     0m0.285s
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    27m57.873s
user    0m0.284s
sys     0m0.222s
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    29m19.508s
user    0m0.190s
sys     0m0.390s

# main branch
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    29m53.178s
user    0m0.332s
sys     0m0.279s
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    28m49.859s
user    0m0.321s
sys     0m0.260s
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    29m13.298s
user    0m0.474s
sys     0m0.286s

hanabi1224 marked this pull request as ready for review July 28, 2025 12:17

hanabi1224 requested a review from a team as a code owner July 28, 2025 12:17

hanabi1224 requested review from LesnyRumcajs and elmattic and removed request for a team July 28, 2025 12:17

hanabi1224 mentioned this pull request Jul 28, 2025

feat: add --unordered to forest-cli snapshot export #5867

Merged

4 tasks

coderabbitai bot reviewed Jul 28, 2025

View reviewed changes

src/blocks/header.rs Show resolved Hide resolved

hanabi1224 force-pushed the hm/reduce-db-read-in-chain-export branch 3 times, most recently from f789dad to 7bf39d8 Compare July 28, 2025 12:32

fix: reduce db read ops in chain export

42b9081

hanabi1224 force-pushed the hm/reduce-db-read-in-chain-export branch from 7bf39d8 to 42b9081 Compare July 28, 2025 12:37

Merge branch 'main' into hm/reduce-db-read-in-chain-export

7ae5b6b

Merge branch 'main' into hm/reduce-db-read-in-chain-export

1b0ccb4

LesnyRumcajs approved these changes Jul 31, 2025

View reviewed changes

hanabi1224 enabled auto-merge July 31, 2025 09:34

elmattic approved these changes Aug 4, 2025

View reviewed changes

hanabi1224 added this pull request to the merge queue Aug 4, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 4, 2025

hanabi1224 added this pull request to the merge queue Aug 4, 2025

Merged via the queue into main with commit 97bedb9 Aug 4, 2025
44 checks passed

hanabi1224 deleted the hm/reduce-db-read-in-chain-export branch August 4, 2025 10:37

This was referenced Aug 28, 2025

fix: reduce allocations in chain traversal #6009

Merged

chore: remove unordered chain traversal #6014

Merged

coderabbitai bot mentioned this pull request Oct 8, 2025

Add new subcommands for forest-cli snasphot #6128

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: reduce db read ops in chain export #5868

fix: reduce db read ops in chain export #5868

Uh oh!

hanabi1224 commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

LesnyRumcajs commented Jul 28, 2025

Uh oh!

hanabi1224 commented Jul 28, 2025

Uh oh!

LesnyRumcajs commented Jul 28, 2025

Uh oh!

hanabi1224 commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: reduce db read ops in chain export #5868

fix: reduce db read ops in chain export #5868

Uh oh!

Conversation

hanabi1224 commented Jul 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reference issue to close (if applicable)

Other information and links

Change checklist

Summary by CodeRabbit

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LesnyRumcajs commented Jul 28, 2025

Uh oh!

hanabi1224 commented Jul 28, 2025

Uh oh!

LesnyRumcajs commented Jul 28, 2025

Uh oh!

hanabi1224 commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hanabi1224 commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)