-
Notifications
You must be signed in to change notification settings - Fork 182
fix: CarStream should support large data blocks like F3 snap #6099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds on-the-fly transcoding for local non-forest CAR sources during import and conditional post-download transcoding; replaces direct Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Import as "import_chain_as_forest_car"
participant FS as "Filesystem/DB"
participant DL as "Downloader"
participant Transcode as "transcode_into_forest_car"
rect rgb(239,246,255)
note over Caller,Import: Local source flow
Caller->>Import: import_chain_as_forest_car(from_path)
Import->>FS: ForestCar::is_valid(from_path)
alt valid
Import->>FS: move_or_copy_file -> forest_car_db_path
else invalid
Import->>Transcode: transcode(from_path -> temp_forest_car)
Transcode-->>Import: temp_forest_car
Import->>FS: persist(temp_forest_car -> forest_car_db_path)
end
end
rect rgb(245,255,245)
note over Caller,Import: Remote URL flow (downloaded)
Caller->>Import: import_chain_as_forest_car(url)
Import->>DL: download(url -> downloaded_car_temp_path)
DL-->>Import: downloaded_car_temp_path
Import->>FS: ForestCar::is_valid(downloaded_car_temp_path)
alt valid
Import->>FS: persist(downloaded_car_temp_path -> forest_car_db_path)
else invalid
Import->>Transcode: transcode(downloaded_car_temp_path -> forest_car_db_temp_path)
Transcode-->>Import: forest_car_db_temp_path
Import->>FS: persist(forest_car_db_temp_path -> forest_car_db_path)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
Comment |
| } | ||
|
|
||
| pub fn uvi_bytes() -> UviBytes { | ||
| let mut decoder = UviBytes::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UviBytes::default cannot be banned in .clippy.toml because it does not belong to UviBytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/db/car_stream.rs (1)
166-176: Validate CARv2 header fields and bound UviBytes (prevent underflow/DoS)Unchecked casts from header_v2 fields and an unbounded UviBytes allow negative/huge values to produce underflows or massive allocations — validate non‑negative offsets/sizes, bound UviBytes to the CARv1 byte window, and reject malformed headers.
- src/utils/db/car_stream.rs (around lines 165–176 and 331–336): apply the patch below to validate data_offset/data_size (use try_into/try_from), compute a safe uvi_limit from the validated max_car_v1_bytes, and use uvi_bytes_with_limit(uvi_limit).
- src/db/car/plain.rs (around lines 129–136): header_v2.data_offset and header_v2.data_size are cast with
asto u64; validate they are non‑negative (try_into/try_from) before changing cursor positions or computing limits.Applicable patch (use as-is for car_stream.rs):
@@ - // Skip v2 header bytes - if let Some(header_v2) = &header_v2 { - let mut to_skip = vec![0; header_v2.data_offset as usize]; - reader.read_exact(&mut to_skip).await?; - } + // Skip v2 header bytes (validated, non-negative) + if let Some(header_v2) = &header_v2 { + let data_offset: usize = header_v2 + .data_offset + .try_into() + .map_err(|_| io::Error::new(io::ErrorKind::InvalidData, "negative data_offset"))?; + let mut to_skip = vec![0; data_offset]; + reader.read_exact(&mut to_skip).await?; + } @@ - let max_car_v1_bytes = header_v2 - .as_ref() - .map(|h| h.data_size as u64) - .unwrap_or(u64::MAX); - let mut reader = FramedRead::new(reader.take(max_car_v1_bytes), uvi_bytes()); + let max_car_v1_bytes: u64 = header_v2 + .as_ref() + .map(|h| { + u64::try_from(h.data_size) + .map_err(|_| io::Error::new(io::ErrorKind::InvalidData, "negative data_size")) + }) + .transpose()? + .unwrap_or(u64::MAX); + let uvi_limit: usize = max_car_v1_bytes.try_into().unwrap_or(usize::MAX); + let mut reader = FramedRead::new(reader.take(max_car_v1_bytes), uvi_bytes_with_limit(uvi_limit)); @@ -pub fn uvi_bytes() -> UviBytes { - let mut decoder = UviBytes::default(); - decoder.set_max_len(usize::MAX); - decoder -} +/// Returns a UviBytes codec with a caller‑provided maximum frame length. +pub fn uvi_bytes_with_limit(limit: usize) -> UviBytes { + let mut c = UviBytes::default(); + c.set_max_len(limit); + c +} + +/// Returns an unbounded UviBytes codec (intended for encoding or fully‑buffered inputs). +pub fn uvi_bytes() -> UviBytes { + uvi_bytes_with_limit(usize::MAX) +}
🧹 Nitpick comments (2)
src/db/car/forest.rs (1)
267-270: Reuse the decoder in the loop to avoid per‑iteration reinit.Tiny win and clearer intent.
- while let Some(block_frame) = uvi_bytes().decode_eof(&mut zstd_frame)? { + let mut uv = uvi_bytes(); + while let Some(block_frame) = uv.decode_eof(&mut zstd_frame)? { let CarBlock { cid, data } = CarBlock::from_bytes(block_frame)?; block_map.insert(cid.into(), data); }src/daemon/db_util.rs (1)
203-207: Fix log message: it says “fallback to copy” but we may transcode.Message can mislead when a local raw CAR is transcoded in move_or_copy.
- tracing::warn!( - "Snapshot file is not a valid forest.car.zst file, fallback to copy" - ); + tracing::warn!( + "Snapshot is not a valid .forest.car.zst; will copy/transcode as needed" + );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/daemon/db_util.rs(2 hunks)src/db/car/forest.rs(5 hunks)src/utils/db/car_stream.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.
Applied to files:
src/daemon/db_util.rs
📚 Learning: 2025-08-25T14:17:09.129Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5978
File: .github/workflows/unit-tests.yml:0-0
Timestamp: 2025-08-25T14:17:09.129Z
Learning: hanabi1224's download_file_with_cache function in src/utils/net/download_file.rs preserves URL path structure in local cache directories by using cache_dir.join(url.path().strip_prefix('/').unwrap_or_else(|| url.path())), so snapshots from https://forest-snapshots.fra1.cdn.digitaloceanspaces.com/rpc_test/ are cached locally at ~/.cache/forest/test/rpc-snapshots/rpc_test/ (including the rpc_test subdirectory from the URL path).
Applied to files:
src/daemon/db_util.rs
📚 Learning: 2025-08-04T13:36:22.993Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5835
File: src/chain/tests.rs:58-76
Timestamp: 2025-08-04T13:36:22.993Z
Learning: In projects using tokio with "full" features enabled, `Vec<u8>` implements `AsyncWrite` through blanket implementations provided by Tokio. This allows `&mut Vec<u8>` to be used directly with async write operations without needing additional wrapper types like `SyncIoBridge`.
Applied to files:
src/db/car/forest.rs
📚 Learning: 2025-08-04T13:36:22.993Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5835
File: src/chain/tests.rs:58-76
Timestamp: 2025-08-04T13:36:22.993Z
Learning: In the Forest codebase, `Vec<u8>` can be used as an `AsyncWrite` implementation in test contexts. The user confirmed that tests using `&mut Vec<u8>` with `export` and `export_v2` functions compile and pass both locally and on CI.
Applied to files:
src/db/car/forest.rs
🧬 Code graph analysis (2)
src/daemon/db_util.rs (2)
src/db/car/forest.rs (1)
is_valid(140-142)src/utils/io/mmap.rs (1)
open(56-62)
src/db/car/forest.rs (1)
src/utils/db/car_stream.rs (1)
uvi_bytes(332-336)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: All lint checks
🔇 Additional comments (6)
src/utils/db/car_stream.rs (1)
286-286: Header encode switch to helper looks good.Safe use; encode path is unaffected by max_len.
src/db/car/forest.rs (3)
159-163: LGTM: bounded header decode via helper.Header frame is small; using the shared helper improves consistency.
313-316: LGTM: header encode via helper.No behavior change; centralized configuration.
402-406: LGTM: error context improved.Clearer tracing without altering control flow.
src/daemon/db_util.rs (2)
164-169: Good: direct transcode for local non‑Forest CAR avoids wasted copy.Reduces I/O and temp space.
300-304: LGTM: added structured tracing for transcode.Helpful for support/telemetry.
|
no green checkmark! |
@LesnyRumcajs fixed! |
src/utils/db/car_stream.rs
Outdated
|
|
||
| pub fn uvi_bytes() -> UviBytes { | ||
| let mut decoder = UviBytes::default(); | ||
| decoder.set_max_len(usize::MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reasoanble? Shouldn't we still put some bounds on it, like at most 4 GiB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Made it 8GiB which is reasonably large
src/utils/db/car_stream.rs
Outdated
|
|
||
| pub fn uvi_bytes() -> UviBytes { | ||
| let mut decoder = UviBytes::default(); | ||
| decoder.set_max_len(8 * 1024 * 1024 * 1024); // 8GiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a constant out of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
c64da20
Summary of changes
This PR fixes an issue in transcoding a v2 plain CAR (generated by Lotus) into ForestCAR by increasing
max_lenofUviBytes(default is128MiB)Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit
New Features
Bug Fixes