Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Jul 28, 2025

Summary of changes

(Originally part of #5867)
This PR makes some small refactoring to stream_export and unordered_stream_export to reduce db read operations.

I see ~5% perf gain in calibnet snapshot export on my laptop

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features
    • Improved block streaming performance by carrying block data alongside CIDs, reducing redundant data retrieval.
  • Refactor
    • Updated internal handling of block identifiers and data to optimize processing and retrieval.
    • Enhanced block header serialization and CID computation for improved consistency.
  • Bug Fixes
    • Enhanced error handling for missing block data during chain traversal.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 28, 2025

Walkthrough

The changes introduce a new method for serializing and computing CIDs for block headers, replacing direct CID computation with a process that returns both the CID and serialized bytes. Additionally, chain traversal logic is refactored to propagate optional block data alongside CIDs, optimizing data retrieval by using available block data when possible and minimizing unnecessary database fetches.

Changes

Cohort / File(s) Change Summary
Block Header Serialization & CID Computation
src/blocks/header.rs
Refactored RawBlockHeader to add car_block() for CBOR serialization and CID computation using Blake2b256. Updated cid() to use car_block(). Adjusted import statements.
Chain Traversal with Block Data Propagation
src/ipld/util.rs
Changed Task::Emit to carry both Cid and optional block data. Updated traversal queues in ChainStream and UnorderedChainStream to handle (Cid, Option<Vec<u8>>) tuples. Refactored logic to use block data when available, reducing redundant database fetches. Extracted ipld_to_cid.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RawBlockHeader
    participant Multihash
    participant CID

    Client->>RawBlockHeader: car_block()
    RawBlockHeader->>RawBlockHeader: serialize to CBOR (Vec<u8>)
    RawBlockHeader->>Multihash: blake2b256(serialized bytes)
    Multihash-->>RawBlockHeader: multihash digest
    RawBlockHeader->>CID: create CID(multihash, codec)
    CID-->>RawBlockHeader: CID
    RawBlockHeader-->>Client: (CID, serialized bytes)
Loading
sequenceDiagram
    participant ChainStream
    participant Queue
    participant DB

    ChainStream->>Queue: pop (Cid, Option<Vec<u8>>)
    alt Data present
        Queue-->>ChainStream: (Cid, Some(data))
        ChainStream-->>Client: emit (Cid, data)
    else Data missing
        Queue-->>ChainStream: (Cid, None)
        ChainStream->>DB: fetch data for Cid
        alt Data found
            DB-->>ChainStream: data
            ChainStream-->>Client: emit (Cid, data)
        else Data not found
            DB-->>ChainStream: error
            ChainStream-->>Client: error (if fail_on_dead_links)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7bf39d8 and 42b9081.

📒 Files selected for processing (2)
  • src/blocks/header.rs (2 hunks)
  • src/ipld/util.rs (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/blocks/header.rs
  • src/ipld/util.rs
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/reduce-db-read-in-chain-export

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hanabi1224 hanabi1224 marked this pull request as ready for review July 28, 2025 12:17
@hanabi1224 hanabi1224 requested a review from a team as a code owner July 28, 2025 12:17
@hanabi1224 hanabi1224 requested review from LesnyRumcajs and elmattic and removed request for a team July 28, 2025 12:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19c11c8 and 3108e1b.

📒 Files selected for processing (2)
  • src/blocks/header.rs (2 hunks)
  • src/ipld/util.rs (11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build MacOS
  • GitHub Check: All lint checks
🔇 Additional comments (7)
src/blocks/header.rs (2)

11-22: Clean import reorganization!

The imports are well-organized and the addition of multihash utilities is appropriate for the new car_block() functionality.


103-110: Excellent optimization for reducing DB reads!

The car_block() method efficiently returns both the CID and serialized bytes, which aligns perfectly with the PR objective of reducing database read operations during chain export.

src/ipld/util.rs (5)

105-105: Smart optimization for Task enum!

Adding optional block data to the Emit variant enables reusing already-serialized data, avoiding redundant database lookups.


190-201: Well-implemented data retrieval optimization!

The logic correctly prioritizes using provided block data before falling back to database fetches, maintaining the same error handling behavior.


291-291: Consistent queue type update!

The queue type change to Vec<(Cid, Option<Vec<u8>>)> properly supports the optimization across both stream implementations.


468-491: Consistent optimization in unordered stream!

The implementation properly handles optional block data, mirroring the optimization in ChainStream while maintaining appropriate error handling.


559-565: Good refactoring!

Extracting the ipld_to_cid function improves code readability and eliminates duplication.

@hanabi1224 hanabi1224 force-pushed the hm/reduce-db-read-in-chain-export branch 3 times, most recently from f789dad to 7bf39d8 Compare July 28, 2025 12:32
@hanabi1224 hanabi1224 force-pushed the hm/reduce-db-read-in-chain-export branch from 7bf39d8 to 42b9081 Compare July 28, 2025 12:37
@LesnyRumcajs
Copy link
Member

Do you have some metrics for mainnet?

@hanabi1224
Copy link
Contributor Author

@LesnyRumcajs no, mainnet snap export is slow on my laptop. I can measure with a DO droplet if you prefer.

@LesnyRumcajs
Copy link
Member

Yes, please measure it. Let's have at least five runs before the change and five runs after, optimally from exactly the same tipset.

@hanabi1224
Copy link
Contributor Author

Yes, please measure it. Let's have at least five runs before the change and five runs after, optimally from exactly the same tipset.

@LesnyRumcajs Instead of using DO droplet with shared CPU, I tested on my laptop with time forest-cli snapshot export --dry-run -d 900 -t 5184000, I see ~2% perf gain on mainnet.

# PR branch
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    28m58.045s
user    0m0.278s
sys     0m0.285s
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    27m57.873s
user    0m0.284s
sys     0m0.222s
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    29m19.508s
user    0m0.190s
sys     0m0.390s

# main branch
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    29m53.178s
user    0m0.332s
sys     0m0.279s
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    28m49.859s
user    0m0.321s
sys     0m0.260s
./forest_snapshot_mainnet_2025-07-29_height_5184000.forest.car.zst: 0B (0B/s)
Export completed.

real    29m13.298s
user    0m0.474s
sys     0m0.286s

@hanabi1224 hanabi1224 enabled auto-merge July 31, 2025 09:34
@hanabi1224 hanabi1224 added this pull request to the merge queue Aug 4, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 4, 2025
@hanabi1224 hanabi1224 added this pull request to the merge queue Aug 4, 2025
Merged via the queue into main with commit 97bedb9 Aug 4, 2025
44 checks passed
@hanabi1224 hanabi1224 deleted the hm/reduce-db-read-in-chain-export branch August 4, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants