-
Notifications
You must be signed in to change notification settings - Fork 182
feat: forest-cli snapshot export-diff #6074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds an end-to-end diff snapshot export flow: CLI subcommand, new RPC Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as forest-cli
participant RPC as Forest RPC
participant Lock as CHAIN_EXPORT_LOCK
participant Archive as archive_cmd::do_export
participant Stream as stream_chain
participant FS as Filesystem
User->>CLI: snapshot export-diff --from F --to T --depth D -o path
CLI->>RPC: Forest.ChainExportDiff{from:F,to:T,depth:D,output_path}
RPC->>Lock: try_lock()
alt lock acquired
RPC->>Archive: do_export(store, Arc<TS@F>, out, epoch=F, depth=D, diff=T, diff_depth=D, force=true)
Archive->>Stream: stream_chain(db, tipsets, stateroot_limit_exclusive=T)
Stream-->>Archive: blocks/messages bounded by exclusive T
Archive->>FS: write CAR.zst (temp)
Archive-->>RPC: Ok
RPC-->>CLI: Ok
CLI->>FS: persist temp -> final path
CLI-->>User: Export completed
else busy
RPC-->>CLI: Error: export in progress
CLI-->>User: Failure
end
sequenceDiagram
autonumber
participant Job1 as Export/ExportDiff #1
participant Job2 as Export/ExportDiff #2
participant Lock as CHAIN_EXPORT_LOCK
Job1->>Lock: try_lock()
Note right of Lock: First job acquires lock
Lock-->>Job1: acquired
Job2->>Lock: try_lock()
Lock-->>Job2: error (already locked)
Job1-->>Lock: release on completion
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
.github/workflows/forest.yml (1)
621-647: Add the new job to the umbrella “integration-tests-status”.The aggregator omits
calibnet-export-diff-check, so the status job won’t reflect failures from this new check.Apply:
- calibnet-export-check-v2 + - calibnet-export-diff-check - calibnet-no-discovery-checkssrc/tool/subcommands/archive_cmd.rs (2)
262-267: Avoid borrowing a temporary Arc across .await; materialize the Arc first.
&store.into()takes a reference to a temporaryArc, which will not live across the async call and will fail to compile. Create anArcbinding and pass&arcinstead.Apply this diff:
- let store = ManyCar::try_from(snapshot_files)?; - let heaviest_tipset = store.heaviest_tipset()?; - do_export( - &store.into(), + let store: Arc<_> = ManyCar::try_from(snapshot_files)?.into(); + let heaviest_tipset = store.heaviest_tipset()?; + do_export( + &store, heaviest_tipset.into(), output_path, epoch, depth, diff, diff_depth, force, )
1168-1179: Test uses&store.into()temporary; fix lifetime as in production code.Same borrow-of-temporary issue in test; make
Arcexplicit and pass&store.Apply this diff:
- let store = AnyCar::try_from(calibnet::DEFAULT_GENESIS).unwrap(); - let heaviest_tipset = store.heaviest_tipset().unwrap(); - do_export( - &store.into(), + let store: Arc<_> = AnyCar::try_from(calibnet::DEFAULT_GENESIS).unwrap().into(); + let heaviest_tipset = store.heaviest_tipset().unwrap(); + do_export( + &store, heaviest_tipset.into(), output_path.path().into(), Some(0), 1, None, None, false, )
🧹 Nitpick comments (4)
CHANGELOG.md (1)
32-33: Fix link: reference the PR, not an Issue.6074 is a PR. Use the PR URL to match the contribution guidelines in the header.
Apply:
- - [#6074](https://github.com/ChainSafe/forest/issues/6074) Added `forest-cli snapshot export-diff` subcommand for exporting a diff snapshot. + - [#6074](https://github.com/ChainSafe/forest/pull/6074) Added `forest-cli snapshot export-diff` subcommand for exporting a diff snapshot.src/ipld/util.rs (1)
113-115: Doc/param naming drift: “exclusive” bound; align comments and names.Behavior changes look correct, but docs still refer to
stateroot_limit. Update tostateroot_limit_exclusiveto prevent off‑by‑one confusion.Apply:
-/// Stream all blocks that are reachable before the `stateroot_limit` epoch in a depth-first +/// Stream all blocks that are reachable before the `stateroot_limit_exclusive` epoch (exclusive) in a depth-first @@ -/// * `stateroot_limit` - An epoch that signifies how far back (exclusive) we need to inspect tipsets, +/// * `stateroot_limit_exclusive` - Epoch strictly before which we stop deep inspection,Also good call on:
- Using
>for the exclusive bound (Lines 256, 269).- Keeping the genesis special case (Line 269).
Also applies to: 146-150, 156-159, 165-169, 180-181, 256-265, 269-280
src/cli/subcommands/snapshot_cmd.rs (2)
44-58: Parameter UX: validatefrom > toand positivedepth.Defaulting
depthtofrom - tois sensible, but add guards to fail fast on invalid inputs (e.g.,from <= to, negative/zero depth).Apply:
ExportDiff { @@ depth: Option<crate::chain::ChainEpochDelta>, },And in the handler (see below) add checks before issuing the RPC.
157-226: Create output directory and validate epochs before export.
- Ensure the parent directory exists to avoid ENOENT from
tempfile_in.- Validate
from > toand that effectivedepth> 0 before RPC.Apply:
@@ - let depth = depth.unwrap_or_else(|| from - to); + let depth = depth.unwrap_or_else(|| from - to); + if from <= to { + anyhow::bail!("`--from` ({from}) must be greater than `--to` ({to})"); + } + if depth <= 0 { + anyhow::bail!("`--depth` must be positive (got {depth})"); + } @@ - let output_dir = output_path.parent().context("invalid output path")?; + let output_dir = output_path.parent().context("invalid output path")?; + std::fs::create_dir_all(output_dir) + .with_context(|| format!("creating output dir {}", output_dir.display()))?;Optional: preflight fetch
Forest.Version/StateNetworkNameto include chain finality in the CLI error, mirroring the server‑side check.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.github/workflows/forest.yml(1 hunks)CHANGELOG.md(1 hunks)scripts/tests/calibnet_export_diff_check.sh(1 hunks)src/cli/subcommands/snapshot_cmd.rs(3 hunks)src/ipld/util.rs(6 hunks)src/rpc/methods/chain.rs(4 hunks)src/rpc/mod.rs(1 hunks)src/tool/subcommands/api_cmd/test_snapshots_ignored.txt(1 hunks)src/tool/subcommands/archive_cmd.rs(7 hunks)src/tool/subcommands/mod.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (4)
src/cli/subcommands/snapshot_cmd.rs (3)
src/rpc/methods/chain.rs (18)
chain(359-359)chain(390-390)handle(141-153)handle(167-182)handle(197-211)handle(226-258)handle(271-279)handle(292-303)handle(316-404)handle(417-460)handle(473-497)handle(513-522)handle(537-542)handle(557-595)handle(610-626)handle(641-646)handle(719-730)handle(748-759)src/rpc/client.rs (3)
call(92-151)client(114-114)request(272-285)src/db/car/forest.rs (2)
new_forest_car_temp_path_in(471-478)std(110-110)
scripts/tests/calibnet_export_diff_check.sh (1)
scripts/tests/harness.sh (2)
forest_init(124-148)forest_query_epoch(77-79)
src/rpc/methods/chain.rs (1)
src/tool/subcommands/archive_cmd.rs (1)
do_export(513-616)
src/tool/subcommands/archive_cmd.rs (4)
src/ipld/util.rs (2)
stream_chain(146-159)stream_graph(163-169)src/chain/store/chain_store.rs (1)
heaviest_tipset(221-229)src/db/car/any.rs (1)
heaviest_tipset(73-79)src/db/car/many.rs (1)
heaviest_tipset(141-147)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: Build forest binaries on Linux AMD64
🔇 Additional comments (10)
src/tool/subcommands/api_cmd/test_snapshots_ignored.txt (1)
81-82: LGTM; keep ignore list aligned with RPC set.Placement after
Forest.ChainExportis sensible. Please ensure the API snapshot generator and docs include the new method..github/workflows/forest.yml (1)
309-330: Good addition. Ensure artifacts (forest, forest-cli, forest-tool) are present in PATH.Steps mirror other calibnet checks; looks fine.
Please confirm
make install-slim-quickuploadsforest-cliandforest-toolalong withforest(the script needs both). If not, extend the upload glob.src/tool/subcommands/mod.rs (1)
5-5: Visibility widening looks appropriate.
pub(crate)enables RPC to reuse archive internals without exposing them publicly.src/rpc/mod.rs (1)
84-85: RPC registration LGTM.
ForestChainExportDiffis correctly added to the method set and will be included in OpenRPC.src/tool/subcommands/archive_cmd.rs (4)
552-556: Off‑by‑one sanity for exclusive stateroot bound (request verification).With an exclusive bound,
diff_limit = diff_ts.epoch() - diff_depthshould include exactlydiff_depthstate‑roots (e.g., 3007870..=3006971 for 900). Looks right; please confirm with a quick unit/integration check.
40-40: LGTM: Correctly switching diff pre-scan to stream_chain.
stream_chainwith dead-link failures on by default is appropriate for computing the seen set; retainingstream_graphfor the merge path is fine.
997-998: LGTM: Arc wrap of Tipset at call site.
root.into()toArc<Tipset>matches the newdo_exportAPI.
1030-1031: LGTM: Arc wrap for diff export call.Consistent with the new signature and usage.
src/rpc/methods/chain.rs (2)
407-461: Forest.ChainExportDiff wiring looks solid; reuse of do_export is good.Parameter flow (
from,to,depth) and defaultingdiff_depthto finality match the CLI semantics. After the lock fix, this path should work as intended.Consider adding a small RPC test to assert the lock error when a concurrent export is in progress.
1059-1067: LGTM: Params struct mirrors CLI flags and is JSON-friendly.Field names and types align with expected usage;
PathBufis fine for server-side paths.
| #!/usr/bin/env bash | ||
| # This script is checking the correctness of | ||
| # the diff snapshot export feature. | ||
| # It requires both the `forest` and `forest-cli` binaries to be in the PATH. | ||
|
|
||
| set -eu | ||
|
|
||
| source "$(dirname "$0")/harness.sh" | ||
|
|
||
| forest_init "$@" | ||
|
|
||
| db_path=$($FOREST_TOOL_PATH db stats --chain calibnet | grep "Database path:" | cut -d':' -f2- | xargs) | ||
| snapshot=$(find "$db_path/car_db"/*.car.zst | tail -n 1) | ||
| snapshot_epoch=$(forest_query_epoch "$snapshot") | ||
|
|
||
| echo "Exporting diff snapshot @ $snapshot_epoch with forest-cli snapshot export-diff" | ||
| $FOREST_CLI_PATH snapshot export-diff --from "$snapshot_epoch" --to "$((snapshot_epoch - 900))" -d 900 -o diff1 | ||
|
|
||
| $FOREST_CLI_PATH shutdown --force | ||
|
|
||
| echo "Exporting diff snapshot @ $snapshot_epoch with forest-tool archive export" | ||
| $FOREST_TOOL_PATH archive export --epoch "$snapshot_epoch" --depth 900 --diff "$((snapshot_epoch - 900))" --diff-depth 900 -o diff2 "$snapshot" | ||
|
|
||
| echo "Comparing diff snapshots" | ||
| diff diff1 diff2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harden the script (pipefail, robust snapshot selection, and deterministic compare).
Current use of globbed find and plain diff is brittle. Add pipefail, fail fast if no snapshot, and pick the newest snapshot deterministically. Also prefer comparing normalized metadata to avoid false negatives from compression framing.
Apply:
@@
-set -eu
+set -euo pipefail
@@
-db_path=$($FOREST_TOOL_PATH db stats --chain calibnet | grep "Database path:" | cut -d':' -f2- | xargs)
-snapshot=$(find "$db_path/car_db"/*.car.zst | tail -n 1)
-snapshot_epoch=$(forest_query_epoch "$snapshot")
+db_path="$(
+ $FOREST_TOOL_PATH db stats --chain calibnet \
+ | grep -E "^Database path:" \
+ | cut -d':' -f2- \
+ | xargs
+)"
+# Pick latest snapshot by mtime; fail if none.
+snapshot="$(ls -1t "$db_path"/car_db/*.car.zst 2>/dev/null | head -n1 || true)"
+if [[ -z "${snapshot:-}" ]]; then
+ echo "Error: no *.car.zst snapshots found under: $db_path/car_db" >&2
+ exit 1
+fi
+snapshot_epoch="$(forest_query_epoch "$snapshot")"
@@
-echo "Exporting diff snapshot @ $snapshot_epoch with forest-cli snapshot export-diff"
-$FOREST_CLI_PATH snapshot export-diff --from "$snapshot_epoch" --to "$((snapshot_epoch - 900))" -d 900 -o diff1
+DEPTH="${DEPTH:-900}"
+TO_EPOCH="$((snapshot_epoch - DEPTH))"
+echo "Exporting diff snapshot @ $snapshot_epoch with forest-cli snapshot export-diff (to=$TO_EPOCH depth=$DEPTH)"
+$FOREST_CLI_PATH snapshot export-diff --from "$snapshot_epoch" --to "$TO_EPOCH" -d "$DEPTH" -o diff1
@@
-echo "Exporting diff snapshot @ $snapshot_epoch with forest-tool archive export"
-$FOREST_TOOL_PATH archive export --epoch "$snapshot_epoch" --depth 900 --diff "$((snapshot_epoch - 900))" --diff-depth 900 -o diff2 "$snapshot"
+echo "Exporting diff snapshot @ $snapshot_epoch with forest-tool archive export"
+$FOREST_TOOL_PATH archive export --epoch "$snapshot_epoch" --depth "$DEPTH" --diff "$TO_EPOCH" --diff-depth "$DEPTH" -o diff2 "$snapshot"
@@
-echo "Comparing diff snapshots"
-diff diff1 diff2
+echo "Comparing diff snapshots (metadata)"
+$FOREST_TOOL_PATH archive info diff1 > info1.txt
+$FOREST_TOOL_PATH archive info diff2 > info2.txt
+diff -u info1.txt info2.txtOptional: if strict byte-for-byte comparison is desired and deterministic, keep a second diff diff1 diff2 step.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #!/usr/bin/env bash | |
| # This script is checking the correctness of | |
| # the diff snapshot export feature. | |
| # It requires both the `forest` and `forest-cli` binaries to be in the PATH. | |
| set -eu | |
| source "$(dirname "$0")/harness.sh" | |
| forest_init "$@" | |
| db_path=$($FOREST_TOOL_PATH db stats --chain calibnet | grep "Database path:" | cut -d':' -f2- | xargs) | |
| snapshot=$(find "$db_path/car_db"/*.car.zst | tail -n 1) | |
| snapshot_epoch=$(forest_query_epoch "$snapshot") | |
| echo "Exporting diff snapshot @ $snapshot_epoch with forest-cli snapshot export-diff" | |
| $FOREST_CLI_PATH snapshot export-diff --from "$snapshot_epoch" --to "$((snapshot_epoch - 900))" -d 900 -o diff1 | |
| $FOREST_CLI_PATH shutdown --force | |
| echo "Exporting diff snapshot @ $snapshot_epoch with forest-tool archive export" | |
| $FOREST_TOOL_PATH archive export --epoch "$snapshot_epoch" --depth 900 --diff "$((snapshot_epoch - 900))" --diff-depth 900 -o diff2 "$snapshot" | |
| echo "Comparing diff snapshots" | |
| diff diff1 diff2 | |
| #!/usr/bin/env bash | |
| # This script is checking the correctness of | |
| # the diff snapshot export feature. | |
| # It requires both the `forest` and `forest-cli` binaries to be in the PATH. | |
| set -euo pipefail | |
| source "$(dirname "$0")/harness.sh" | |
| forest_init "$@" | |
| db_path="$( | |
| $FOREST_TOOL_PATH db stats --chain calibnet \ | |
| | grep -E "^Database path:" \ | |
| | cut -d':' -f2- \ | |
| | xargs | |
| )" | |
| # Pick latest snapshot by mtime; fail if none. | |
| snapshot="$(ls -1t "$db_path"/car_db/*.car.zst 2>/dev/null | head -n1 || true)" | |
| if [[ -z "${snapshot:-}" ]]; then | |
| echo "Error: no *.car.zst snapshots found under: $db_path/car_db" >&2 | |
| exit 1 | |
| fi | |
| snapshot_epoch="$(forest_query_epoch "$snapshot")" | |
| DEPTH="${DEPTH:-900}" | |
| TO_EPOCH="$((snapshot_epoch - DEPTH))" | |
| echo "Exporting diff snapshot @ $snapshot_epoch with forest-cli snapshot export-diff (to=$TO_EPOCH depth=$DEPTH)" | |
| $FOREST_CLI_PATH snapshot export-diff --from "$snapshot_epoch" --to "$TO_EPOCH" -d "$DEPTH" -o diff1 | |
| $FOREST_CLI_PATH shutdown --force | |
| echo "Exporting diff snapshot @ $snapshot_epoch with forest-tool archive export" | |
| $FOREST_TOOL_PATH archive export --epoch "$snapshot_epoch" --depth "$DEPTH" --diff "$TO_EPOCH" --diff-depth "$DEPTH" -o diff2 "$snapshot" | |
| echo "Comparing diff snapshots (metadata)" | |
| $FOREST_TOOL_PATH archive info diff1 > info1.txt | |
| $FOREST_TOOL_PATH archive info diff2 > info2.txt | |
| diff -u info1.txt info2.txt |
🤖 Prompt for AI Agents
In scripts/tests/calibnet_export_diff_check.sh around lines 1 to 25, the script
is brittle: it lacks pipefail, doesn't fail if no snapshot is found, selects
snapshots non-deterministically, and uses a raw diff that can report false
differences due to compression framing. Add set -o pipefail alongside set -eu;
after finding snapshots, check and exit non-zero if none found; select the
newest snapshot deterministically (e.g. sort by mtime or use a reliable newest
selection) instead of relying on glob order; when comparing outputs, normalize
or decompress archives/metadata before diff (so compression framing won't
produce false mismatches) and keep an optional final strict byte-for-byte
comparison step if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
scripts/tests/calibnet_export_diff_check.sh (1)
6-6: Good: pipefail enabled.Thanks for adopting
set -euo pipefail; this eliminates a class of silent failures in the test harness.
🧹 Nitpick comments (10)
scripts/tests/calibnet_export_diff_check.sh (4)
12-14: Harden snapshot discovery (deterministic latest, fail fast, robust parsing).Globbing through
find "$db_path/car_db"/*.car.zst | tail -n 1is brittle and may pick an arbitrary file. Prefer mtime-based selection with an explicit “none found” guard and more robustdb_pathparsing.- db_path=$($FOREST_TOOL_PATH db stats --chain calibnet | grep "Database path:" | cut -d':' -f2- | xargs) - snapshot=$(find "$db_path/car_db"/*.car.zst | tail -n 1) - snapshot_epoch=$(forest_query_epoch "$snapshot") + db_path="$( + $FOREST_TOOL_PATH db stats --chain calibnet \ + | grep -E '^Database path:' \ + | cut -d':' -f2- \ + | xargs + )" + # Pick newest snapshot by mtime; fail clearly if none. + snapshot="$(ls -1t "$db_path"/car_db/*.car.zst 2>/dev/null | head -n1 || true)" + if [[ -z "${snapshot:-}" ]]; then + echo "Error: no *.car.zst snapshots found under: $db_path/car_db" >&2 + exit 1 + fi + snapshot_epoch="$(forest_query_epoch "$snapshot")"
16-23: Parametrize depth and compute target epoch once; add guards.Make depth configurable, derive
--tofrom it, and add a simple lower-bound guard to avoid negative epochs in edge cases.- echo "Exporting diff snapshot @ $snapshot_epoch with forest-cli snapshot export-diff" - $FOREST_CLI_PATH snapshot export-diff --from "$snapshot_epoch" --to "$((snapshot_epoch - 900))" -d 900 -o diff1 + DEPTH="${DEPTH:-900}" + TO_EPOCH="$((snapshot_epoch - DEPTH))" + if (( TO_EPOCH < 0 )); then + echo "Error: computed --to epoch ($TO_EPOCH) is negative (from=$snapshot_epoch depth=$DEPTH)" >&2 + exit 1 + fi + echo "Exporting diff snapshot @ $snapshot_epoch with forest-cli snapshot export-diff (to=$TO_EPOCH depth=$DEPTH)" + $FOREST_CLI_PATH snapshot export-diff --from "$snapshot_epoch" --to "$TO_EPOCH" -d "$DEPTH" -o diff1 @@ - echo "Exporting diff snapshot @ $snapshot_epoch with forest-tool archive export" - $FOREST_TOOL_PATH archive export --epoch "$snapshot_epoch" --depth 900 --diff "$((snapshot_epoch - 900))" --diff-depth 900 -o diff2 "$snapshot" + echo "Exporting diff snapshot @ $snapshot_epoch with forest-tool archive export" + $FOREST_TOOL_PATH archive export \ + --epoch "$snapshot_epoch" \ + --depth "$DEPTH" \ + --diff "$TO_EPOCH" \ + --diff-depth "$DEPTH" \ + -o diff2 "$snapshot"
24-25: Compare normalized metadata, not raw bytes (avoid compression framing noise).Byte-for-byte diffs of compressed outputs can spuriously fail. Compare canonicalized metadata from
archive info. Optionally keep strict compare behind a flag.- echo "Comparing diff snapshots" - diff diff1 diff2 + echo "Comparing diff snapshots (metadata)" + $FOREST_TOOL_PATH archive info diff1 > info1.txt + $FOREST_TOOL_PATH archive info diff2 > info2.txt + diff -u info1.txt info2.txt + if [[ "${STRICT_BYTE_COMPARE:-0}" == "1" ]]; then + echo "Strict byte-by-byte compare enabled" + diff diff1 diff2 + fi
6-10: Use tmpdir + trap to avoid file collisions and ensure cleanup.Guard against parallel CI runs and leave a clean workspace.
set -euo pipefail source "$(dirname "$0")/harness.sh" forest_init "$@" + +# Work in a temporary directory and clean up on exit. +TMPDIR="$(mktemp -d -t calibnet-export-diff-XXXXXX)" +trap 'rm -rf "$TMPDIR"' EXIT +cd "$TMPDIR"src/tool/subcommands/archive_cmd.rs (6)
262-275: Construct Arc once for the store; avoid temporaryinto()+ reference.This improves readability and avoids borrowing a temporary. It’s consistent with other call sites that already use
Arc.- let store = ManyCar::try_from(snapshot_files)?; - let heaviest_tipset = store.heaviest_tipset()?; - do_export( - &store.into(), - heaviest_tipset.into(), + let store = Arc::new(ManyCar::try_from(snapshot_files)?); + let heaviest_tipset = store.heaviest_tipset()?; + do_export( + &store, + heaviest_tipset.into(), output_path, epoch, depth, diff, diff_depth, force, ) .await
540-540: PreferArc::clonefor clarity when cloning Arcs.Makes intent explicit and avoids confusion with cloning the underlying store.
- let index = ChainIndex::new(store.clone()); + let index = ChainIndex::new(Arc::clone(store));
997-1005: Minor: avoid implicitinto()forrootin helpers.Being explicit about
Arcmaterialization reads clearer in these paths.- do_export( - &store, - root.into(), + let root = Arc::new(root); + do_export( + &store, + root, output_path.clone(), Some(epoch), depth, diff, diff_depth, force, )
1031-1038: Sameroot.into()nit as above in diff export helper.- do_export( - &store, - root.into(), + let root = Arc::new(root); + do_export( + &store, + root, output_path.clone(), Some(epoch), depth, diff, diff_depth, force, )
1171-1179: Test: constructArconce to match production path.Avoid borrowing a temporary; also mirrors the main code path suggested above.
- do_export( - &store.into(), - heaviest_tipset.into(), + let store = Arc::new(store); + let heaviest_tipset = Arc::new(heaviest_tipset); + do_export( + &store, + heaviest_tipset, output_path.path().into(), Some(0), 1, None, None, false, )
512-516: Change API to take Arc<impl Blockstore + Send + Sync + 'static> by value — update callersMirrors
root: Arc<Tipset>and removes the odd&Arcborrow; update the callers below to pass theArc(remove&or clone as needed):
- src/rpc/methods/chain.rs:447 — do_export(&ctx.store_owned(), start_ts, output_path)
- src/tool/subcommands/archive_cmd.rs:264 — do_export(&store.into(), heaviest_tipset.into(), output_path)
- src/tool/subcommands/archive_cmd.rs:995 — do_export(&store, root.into(), output_path.clone())
- src/tool/subcommands/archive_cmd.rs:1028 — do_export(&store, root.into(), output_path.clone())
- src/tool/subcommands/archive_cmd.rs:1170 — do_export(&store.into(), heaviest_tipset.into(), output_path.path().into())
Also search for out-of-crate uses before merging.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scripts/tests/calibnet_export_diff_check.sh(1 hunks)src/tool/subcommands/archive_cmd.rs(8 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-09-16T12:55:26.955Z
Learnt from: hanabi1224
PR: ChainSafe/forest#6079
File: src/chain_sync/metrics.rs:6-7
Timestamp: 2025-09-16T12:55:26.955Z
Learning: HEAD_EPOCH references in shell scripts (like scripts/tests/calibnet_eth_mapping_check.sh) that extract data from `forest-cli info show` are unrelated to Rust metrics constants with the same name and should not be flagged when metrics cleanup is performed.
Applied to files:
scripts/tests/calibnet_export_diff_check.sh
🧬 Code graph analysis (2)
scripts/tests/calibnet_export_diff_check.sh (1)
scripts/tests/harness.sh (2)
forest_init(124-148)forest_query_epoch(77-79)
src/tool/subcommands/archive_cmd.rs (3)
src/ipld/util.rs (3)
stream_chain(146-159)stream_graph(163-169)new(63-67)src/chain/store/chain_store.rs (3)
heaviest_tipset(221-229)new(116-142)new(562-566)src/blocks/tipset.rs (3)
genesis(415-452)new(242-260)new(494-511)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Check
🔇 Additional comments (1)
src/tool/subcommands/archive_cmd.rs (1)
552-556: Good: switched tostream_chainwith dead-link failures.Using
stream_chain(fail_on_dead_links = true) is safer for constructing the diff “seen” set thanstream_graph.
LesnyRumcajs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Hubert <hubert@chainsafe.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/cli/subcommands/snapshot_cmd.rs (1)
173-175: Nice: adopted prior depth defaulting suggestion.
Matches the earlier review note; thanks for folding it in.
🧹 Nitpick comments (5)
src/cli/subcommands/snapshot_cmd.rs (5)
44-58: Clarify flag semantics in help text.
Add that --from must be >= --to, and that depth defaults to (from - to). This reduces user confusion and mismatches with filenames.
44-58: Optional: parity withexportflags.
Consider adding--dry-run/--skip-checksum(even if checksum is a no-op here) for UI parity betweenexportandexport-diff.
176-185: Make output path handling resilient (no parent, missing dirs).
Currentlyparent()None (e.g., "diff.car.zst") errors, and missing directories aren’t created. Default to.and ensure the directory exists.Apply this diff:
- let output_dir = output_path.parent().context("invalid output path")?; + let output_dir = output_path.parent().unwrap_or_else(|| Path::new(".")); + std::fs::create_dir_all(output_dir) + .with_context(|| format!("failed to create output dir {}", output_dir.display()))?;Note: Consider applying the same improvement in the
Exportpath (Lines 103–105) for consistency.
200-213: Ensure progress task cannot outlive on early errors.
The spawned reporter keeps running if the RPC call fails before reaching the abort. The next comment proposes a minimal restructure that guarantees abort/await on all paths.
214-223: Always abort/await the progress task even on RPC errors.
Restructure to handle abortion before propagating errors.Apply this diff:
- // Manually construct RpcRequest because snapshot export could - // take a few hours on mainnet - client - .call(ForestChainExportDiff::request((params,))?.with_timeout(Duration::MAX)) - .await?; - - handle.abort(); - pb.finish(); - _ = handle.await; + // Manually construct RpcRequest because snapshot export could + // take a few hours on mainnet + let res = client + .call(ForestChainExportDiff::request((params,))?.with_timeout(Duration::MAX)) + .await; + handle.abort(); + pb.finish(); + _ = handle.await; + res?;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cli/subcommands/snapshot_cmd.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/cli/subcommands/snapshot_cmd.rs (3)
src/rpc/methods/chain.rs (18)
chain(359-359)chain(390-390)handle(141-153)handle(167-182)handle(197-211)handle(226-258)handle(271-279)handle(292-303)handle(316-404)handle(417-460)handle(473-497)handle(513-522)handle(537-542)handle(557-595)handle(610-626)handle(641-646)handle(719-730)handle(748-759)src/rpc/client.rs (3)
call(92-151)client(114-114)request(272-285)src/db/car/forest.rs (2)
new_forest_car_temp_path_in(471-478)std(110-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Check
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: All lint checks
🔇 Additional comments (4)
src/cli/subcommands/snapshot_cmd.rs (4)
9-9: LGTM: correct import for diff params.
163-171: LGTM: calibnet naming compat mirrorsexportpath.
224-226: LGTM: atomic persist and completion message.
173-191: Require --from >= --to and derive default depth from the range; don't force equality unless the export API requires it.
ChainEpochDelta == ChainEpoch == i64, so casting (from - to) is safe. Implement: ensure from >= to, let expected_depth = (from - to) as crate::chain::ChainEpochDelta, depth = depth.unwrap_or(expected_depth), ensure depth > 0. Enforce depth == expected_depth only after confirming the export API requires exact equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/cli/subcommands/snapshot_cmd.rs (1)
173-175: Also enforcefrom > to(independent ofdepth).With an explicit positive
depth,from <= tocurrently passes CLI validation and will rely on server‑side behavior. Make intent explicit and fail early.- let depth = depth.unwrap_or_else(|| from - to); - anyhow::ensure!(depth > 0, "depth must be positive"); + anyhow::ensure!(from > to, "`from` must be greater than `to`"); + let depth = depth.unwrap_or_else(|| from - to); + anyhow::ensure!(depth > 0, "depth must be positive");
🧹 Nitpick comments (4)
src/cli/subcommands/snapshot_cmd.rs (4)
44-58: Validate epochs at parse-time to fail fast.Enforce non‑negative epochs via clap so users get immediate feedback; keep runtime checks too.
/// Epoch to export from - #[arg(long)] + #[arg(long, value_parser = clap::value_parser!(i64).range(0..))] from: i64, /// Epoch to diff against - #[arg(long)] + #[arg(long, value_parser = clap::value_parser!(i64).range(0..))] to: i64,
163-171: DRY up chain name mapping.The calibnet mapping is duplicated in Export and ExportDiff. Consider extracting a small helper and reusing it in both branches.
Example (outside this hunk):
fn snapshot_chain_name(raw: &str) -> &str { if raw == calibnet::NETWORK_GENESIS_NAME { calibnet::NETWORK_COMMON_NAME } else { raw } }
183-185: Clarify handling whenoutput_pathpoints to a non‑existent directory.
is_dir()returns false for non‑existent paths, so a user‑intended directory might be treated as a file path. Consider documenting this or adding a--output-dirflag to disambiguate.
224-226: Optional: emit a checksum for parity withexport.Diff exports don’t persist a checksum; consider computing SHA‑256 client‑side post‑persist (or have RPC return it) for UX consistency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cli/subcommands/snapshot_cmd.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/cli/subcommands/snapshot_cmd.rs (3)
src/rpc/methods/chain.rs (18)
chain(359-359)chain(390-390)handle(141-153)handle(167-182)handle(197-211)handle(226-258)handle(271-279)handle(292-303)handle(316-404)handle(417-460)handle(473-497)handle(513-522)handle(537-542)handle(557-595)handle(610-626)handle(641-646)handle(719-730)handle(748-759)src/rpc/client.rs (3)
call(92-151)client(114-114)request(272-285)src/db/car/forest.rs (2)
new_forest_car_temp_path_in(471-478)std(110-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: Bootstrap checks - Forest
- GitHub Check: Bootstrap checks - Lotus
- GitHub Check: Diff snapshot export checks
- GitHub Check: Calibnet eth mapping check
- GitHub Check: State migrations
- GitHub Check: Calibnet stateless mode check
- GitHub Check: Snapshot export checks
- GitHub Check: Calibnet kademlia checks
- GitHub Check: Snapshot export checks v2 with F3 data
- GitHub Check: db-migration-checks
- GitHub Check: Calibnet api test-stateful check
- GitHub Check: Calibnet stateless RPC check
- GitHub Check: Calibnet no discovery checks
- GitHub Check: Wallet tests
- GitHub Check: Devnet checks
- GitHub Check: Forest CLI checks
- GitHub Check: Calibnet check
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: All lint checks
🔇 Additional comments (4)
src/cli/subcommands/snapshot_cmd.rs (4)
9-9: Import looks correct.Param type aligns with the new RPC handler.
176-181: Filename generation LGTM.Pattern matches the PR description and is stable/predictable.
193-212: Progress reporting LGTM.Spinner + size poller mirrors Export path and is correctly aborted.
214-219: Unbounded per‑request timeout matches Export path.Consistent approach; OK if the RPC can take hours.
|
|
||
| #[allow(clippy::too_many_arguments)] | ||
| async fn do_export( | ||
| pub async fn do_export( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanabi1224 Do you think we should move the core logic of this in separate crate/file ?
I don't think we should use cli methods directly in the RPC crate, we should have separation of concerns here.
We can do this as a follow up work, as it's not really an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai please create a follow up issue for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@hanabi1224 I've created a follow-up issue to address the architectural concern raised by @akaladarshi about separating the core export logic from CLI-specific code. The issue describes the problem, proposes a solution to move the shared export functionality to a dedicated module, and includes references to this PR and comment thread.
The issue suggests refactoring the do_export function and related logic into a separate module that can be shared between CLI and RPC layers without violating separation of concerns principles.
(\ /)
( . .)
o_(")(") thoughtful rabbit organizing architectural improvements
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue for the architectural improvement
gh issue create \
--title "Refactor export logic to separate module for better separation of concerns" \
--body "## Problem
Currently, the RPC layer directly uses CLI methods from the \`archive_cmd\` module, which violates separation of concerns principles. Specifically, the \`do_export\` function in \`src/tool/subcommands/archive_cmd.rs\` is being used by both:
- CLI commands (\`forest-tool archive export\`)
- RPC methods (\`Forest.ChainExportDiff\`)
## Proposed Solution
Move the core export logic to a separate crate or dedicated module that can be shared between CLI and RPC layers without creating architectural dependencies.
This would involve:
1. Extracting the core export functionality from \`archive_cmd.rs\`
2. Creating a new module/crate for shared export logic
3. Having both CLI and RPC layers depend on this shared module
4. Maintaining clear separation between presentation layer (CLI/RPC) and business logic (export operations)
## Context
This issue was identified during review of PR #6074 which adds \`forest-cli snapshot export-diff\` functionality.
**References:**
- PR: https://github.com/ChainSafe/forest/pull/6074
- Comment: https://github.com/ChainSafe/forest/pull/6074#discussion_r2358997339
- Requested by: @hanabi1224
## Benefits
- Better architectural separation
- Easier testing of core export logic
- Reduced coupling between CLI and RPC layers
- More maintainable codebase" \
--assignee hanabi1224 \
--label enhancement
Length of output: 44
@hanabi1224 I've successfully created a follow-up issue (#6075) to address the architectural concern raised by @akaladarshi. The issue outlines the need to refactor the export logic into a separate module to maintain proper separation of concerns between the CLI and RPC layers.
The issue includes:
- Clear problem description of the current architectural coupling
- Proposed solution with specific steps
- Context referencing this PR and comment thread
- Benefits of the refactoring
This will help ensure the codebase maintains clean architectural boundaries as it evolves.
(\ /)
( . .)
o_(")(") thoughtful rabbit organizing architectural improvements
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue without the label since 'enhancement' doesn't exist
gh issue create \
--title "Refactor export logic to separate module for better separation of concerns" \
--body "## Problem
Currently, the RPC layer directly uses CLI methods from the \`archive_cmd\` module, which violates separation of concerns principles. Specifically, the \`do_export\` function in \`src/tool/subcommands/archive_cmd.rs\` is being used by both:
- CLI commands (\`forest-tool archive export\`)
- RPC methods (\`Forest.ChainExportDiff\`)
## Proposed Solution
Move the core export logic to a separate crate or dedicated module that can be shared between CLI and RPC layers without creating architectural dependencies.
This would involve:
1. Extracting the core export functionality from \`archive_cmd.rs\`
2. Creating a new module/crate for shared export logic
3. Having both CLI and RPC layers depend on this shared module
4. Maintaining clear separation between presentation layer (CLI/RPC) and business logic (export operations)
## Context
This issue was identified during review of PR #6074 which adds \`forest-cli snapshot export-diff\` functionality.
**References:**
- PR: https://github.com/ChainSafe/forest/pull/6074
- Comment: https://github.com/ChainSafe/forest/pull/6074#discussion_r2358997339
- Requested by: @hanabi1224
## Benefits
- Better architectural separation
- Easier testing of core export logic
- Reduced coupling between CLI and RPC layers
- More maintainable codebase" \
--assignee hanabi1224
Length of output: 47
@hanabi1224 Perfect! I've successfully created follow-up issue #6096 to address the architectural concern raised by @akaladarshi about separating the core export logic from CLI-specific code.
The issue has been assigned to you and includes:
- Clear description of the current architectural coupling problem
- Proposed solution to extract shared export logic into a dedicated module
- Context and references to this PR and comment thread
- Benefits of maintaining proper separation of concerns
This will help ensure the codebase maintains clean architectural boundaries as the export functionality continues to evolve.
(\ /)
( . .)
o_(")(") organized rabbit with architectural blueprints
🧠 Learnings used
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
Summary of changes
This PR adds
forest-cli snapshot export-diffsubcommand for exporting a diff snapshot from a live node, to achive what the offline toolforest-tool archive exportdoes.Note that,
--diff-depth 900in implied in the new toolChanges introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit
New Features
Documentation
Tests
Refactor