Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Nov 10, 2025

Summary of changes

To match the logic in Lotus: filecoin-project/lotus#535

	if h.Timestamp > now+buildconstants.AllowableClockDriftSecs {
		return xerrors.Errorf("block was from the future (now=%d, blk=%d): %w", now, h.Timestamp, consensus.ErrTemporal)
	}

        func isPermanent(err error) bool {
	        return !errors.Is(err, consensus.ErrTemporal)
        }

        if isPermanent(err) {
	        syncer.bad.Add(b.Cid(), NewBadBlockReason([]cid.Cid{b.Cid()}, "%s", err.Error()))
        }

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed a synchronization issue that occurred when processing time-traveling blocks, preventing sync disruptions.
    • Enhanced bad block tracking throughout the synchronization workflow to improve chain sync stability and reliability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 10, 2025

Walkthrough

The changes implement a fix for sync issues caused by time-traveling blocks by propagating a shared bad block cache through the chain synchronization pipeline. The cache is now passed through chain_follower into SyncTask execution and validate_tipset, enabling validation steps to track and reject previously failed blocks during tipset validation.

Changes

Cohort / File(s) Summary
Documentation
CHANGELOG.md
Added entry documenting fix for sync issue caused by time-traveling blocks (PR #6241)
Bad Block Cache Logging
src/chain_sync/bad_block_cache.rs
Added warning log statement when pushing CID to cache
Chain Synchronization Pipeline
src/chain_sync/chain_follower.rs, src/chain_sync/tipset_syncer.rs
Propagated bad_block_cache parameter through chain follower and task execution paths; updated validate_tipset to accept optional cache and push failed block CIDs (except time-travel errors) into cache; removed previous per-block iteration logic from mark_bad_tipset

Sequence Diagram

sequenceDiagram
    participant CF as chain_follower
    participant SSM as SyncStateMachine
    participant ST as SyncTask
    participant TV as validate_tipset
    participant BBC as BadBlockCache

    CF->>CF: Initialize with bad_block_cache
    CF->>SSM: Pass bad_block_cache.clone()
    
    SSM->>ST: Spawn SyncTask with bad_block_cache
    ST->>TV: execute() with bad_block_cache param
    
    TV->>TV: Validate tipset blocks
    alt Validation Success
        TV-->>ST: Ok(())
    else Validation Failure (non-time-travel)
        TV->>BBC: push(failed_block_cid)
        BBC-->>TV: Logged warning
        TV-->>ST: Err(TipsetSyncerError)
    else Time-Travel Error
        TV-->>ST: Err(TimeTravellingBlock)
    end
    
    ST-->>SSM: SyncEvent
    SSM-->>CF: State transition
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • Signature changes across multiple public APIs: chain_follower and SyncTask::execute both gain new parameters requiring verification of all call sites.
  • Parameter propagation complexity: bad_block_cache flows through multiple layers (chain_follower → SyncStateMachine → SyncTask → validate_tipset), requiring careful trace-through of each pass and clone operation.
  • Logic removal verification: Confirm that removing per-block iteration logic from mark_bad_tipset doesn't inadvertently lose important caching behavior in edge cases.
  • Error handling changes: Review conditional caching logic in validate_tipset to ensure only appropriate error types trigger cache population, especially distinguishing time-travel errors.

Possibly related PRs

  • ChainSafe/forest#5841: Replaces bad block cache implementation (introduces SizeTrackingLruCache and renames put→push), directly related to cache call sites being updated here.
  • ChainSafe/forest#5842: Modifies bad-block caching and mark_bad_tipset behavior for handling bad tipsets, overlaps with removal of per-block caching logic in this PR.
  • ChainSafe/forest#6165: Modifies chain_follower function signature and related chain synchronization APIs, involves similar parameter propagation concerns.

Suggested reviewers

  • LesnyRumcajs
  • akaladarshi
  • sudo-shashank

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: do not mark time traveling blocks as bad' directly and specifically describes the main change in the changeset, which involves modifying validation logic to exclude time-traveling blocks from being cached as bad blocks.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/do-not-mark-time-traveling-blocks-as-bad

Comment @coderabbitai help to get the list of available commands and usage tips.

@hanabi1224 hanabi1224 marked this pull request as ready for review November 10, 2025 15:38
@hanabi1224 hanabi1224 requested a review from a team as a code owner November 10, 2025 15:38
@hanabi1224 hanabi1224 requested review from LesnyRumcajs and sudo-shashank and removed request for a team November 10, 2025 15:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/chain_sync/bad_block_cache.rs (1)

32-35: Use structured fields for tracing and avoid string formatting.

Prefer structured, queryable logs and avoid allocs in the hot path.

Apply:

-        tracing::warn!("Marked bad block: {c}");
+        tracing::warn!(cid = %c, "Marked bad block");
CHANGELOG.md (1)

36-39: Clarify the fix and align wording.

Be explicit that future-timestamp blocks aren’t marked bad and are retried later.

[respond_minimally]

-- [#6241](https://github.com/ChainSafe/forest/pull/6241) Fixed a sync issue that is caused by time traveling block(s).
+- [#6241](https://github.com/ChainSafe/forest/pull/6241) Do not mark time‑travelling (future‑timestamp) blocks as bad; return a temporal error and retry later (Lotus parity).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 154bf9c and 19eda8e.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • src/chain_sync/bad_block_cache.rs (1 hunks)
  • src/chain_sync/chain_follower.rs (4 hunks)
  • src/chain_sync/tipset_syncer.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.947Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.

Applied to files:

  • src/chain_sync/tipset_syncer.rs
📚 Learning: 2025-08-08T12:11:55.266Z
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.

Applied to files:

  • src/chain_sync/tipset_syncer.rs
🧬 Code graph analysis (1)
src/chain_sync/chain_follower.rs (1)
src/chain_sync/tipset_syncer.rs (1)
  • validate_tipset (100-147)
🪛 GitHub Actions: Link Checker
src/chain_sync/bad_block_cache.rs

[warning] 1-1: Lychee: Error creating request: InvalidPathToUri("/")

src/chain_sync/tipset_syncer.rs

[warning] 1-1: Lychee: Error creating request: InvalidPathToUri("/")

src/chain_sync/chain_follower.rs

[warning] 1-1: Lychee: Error creating request: InvalidPathToUri("/")

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build MacOS
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
  • GitHub Check: All lint checks
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: Check
🔇 Additional comments (2)
src/chain_sync/tipset_syncer.rs (1)

105-106: Don’t mark transient validation errors as permanently bad (exclude TipsetParentNotFound).

Current logic excludes only TimeTravellingBlock; TipsetParentNotFound is also transient (see Lines 196–205), but will be pushed to bad_block_cache, causing permanent rejection. Align with Lotus’ isPermanent(err) idea.

Apply within validate_tipset:

-                warn!("Validating block [CID = {cid}] in EPOCH = {epoch} failed: {why}");
-                match &why {
-                    TipsetSyncerError::TimeTravellingBlock(_, _) => {
-                        // Do not mark a block as bad for temporary errors.
-                        // See <https://github.com/filecoin-project/lotus/blob/v1.34.1/chain/sync.go#L602> in Lotus
-                    }
-                    _ => {
-                        if let Some(bad_block_cache) = bad_block_cache {
-                            bad_block_cache.push(cid);
-                        }
-                    }
-                };
+                warn!(epoch, %cid, error = %why, "block validation failed");
+                let is_temp = matches!(
+                    &why,
+                    TipsetSyncerError::TimeTravellingBlock(_, _)
+                        | TipsetSyncerError::TipsetParentNotFound(_)
+                );
+                if !is_temp {
+                    if let Some(ref bad_block_cache) = bad_block_cache {
+                        bad_block_cache.push(cid);
+                    }
+                }

Optionally extract the predicate:

fn is_temporary_sync_error(e: &TipsetSyncerError) -> bool {
    matches!(e,
        TipsetSyncerError::TimeTravellingBlock(_, _)
        | TipsetSyncerError::TipsetParentNotFound(_)
    )
}

Then use if !is_temporary_sync_error(&why) { … }.

Also applies to: 129-140

⛔ Skipped due to learnings
Learnt from: hanabi1224
Repo: ChainSafe/forest PR: 5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.
src/chain_sync/chain_follower.rs (1)

233-234: Ripgrep found the validate_tipset call site as expected, but cargo check failed due to environment limitations.

The rg search located the call to validate_tipset at line 851, which falls within the changed range (852–859). However, the sandbox environment lacks a C compiler, preventing cargo check from completing. The structural changes appear correct, but manual verification in your local environment is required to confirm compilation:

cargo check -q

Also applies to: 260-261, 829-835, 852-859

@hanabi1224 hanabi1224 enabled auto-merge November 10, 2025 16:15
@hanabi1224 hanabi1224 added this pull request to the merge queue Nov 10, 2025
Merged via the queue into main with commit 21cc6c6 Nov 10, 2025
51 checks passed
@hanabi1224 hanabi1224 deleted the hm/do-not-mark-time-traveling-blocks-as-bad branch November 10, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants