-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Remove RR Sync: Testing #17694
Description
Validate the assumption that ELP2P is working as expected, for performing EL Sync to fill in the unsafe gap.
Run a real world/sysgo testing scenario trying to sync using the op-node.
List of Tracked Tests
- Test
sync.CLSyncbehavior with unsafe payload queue: op-acceptance-tests: Cache and canonicalize L2EL payloads after gap fill #17675:TestSyncAfterInitialELSync@pcw109550 - Test L2EL behavior while EL Syncing, using direct engine / eth API calls to the EL: Insight: We only need to rely on FCU not newPayload because EL Sync only triggers with FCU.
- Test op-geth: op-acceptance-tests: ELP2P for EL Syncing for unsafe gap #17752:
TestL2ELP2PCanonicalChainAdvancedByFCU@pcw109550 - Test op-reth(not easily mergable to dev because we need reth binary): [PoC] op-acceptance-tests: ELP2P for EL Syncing for unsafe gap for reth #17802:
TestL2ELP2PCanonicalChainAdvancedByFCUbut tweak @pcw109550 -
Merge op-reth tests, or run at op-rs/kona
- Test op-geth: op-acceptance-tests: ELP2P for EL Syncing for unsafe gap #17752:
- Test L2EL behavior when FCUed with invalid hash: Expected return:
SYNCINGop-acceptance-tests: Edge cases with engine APIs #18001TestELP2PFCUUnavailableHash
- Test while FCU, safe head cannot be advanced when unsafe head hash cannot be validated. op-acceptance-tests: Edge cases with engine APIs #18001
TestSafeDoesNotAdvanceWhenUnsafeIsSyncing_NoELP2P
- Test multiple scenarios when payload is INVALID (newPayload return value is INVALID) op-acceptance-tests: Edge cases with engine APIs #18001
TestInvalidPayloadThroughCLP2P(only works in geth. reth have differentnewPayloadimpl)
- The test demonstrates that the op-node does not rewind when INVALID payload detected while EL Syncing. (e.g. VALID -> SYNCING -> ... -> SYNCING -> INVALID). The test checks that the rewinding is not implemented yet at op-node. op-acceptance-tests: Edge cases with engine APIs #18001
TestCLUnsafeNotRewoundOnInvalidDuringELSync
- Use op-node to check that further sync(with initial sync on/off) completes and unsafe head reaches sequencer tip. This must be validated using a real chain, possibly using sync tester.
- Verify that if an unsafe chain gap emerges, due to network issues, a node running in CLSync or ELSync mode will fill the gap and continue to advance its unsafe chain, without help from the L1 derivation pipeline. op-node: update OnUnsafeL2Payload to perform ELSync gap filling #17751:
TestUnsafeChainStalling_{CLSync|ELSync}@nonsense - Same as upper case, but with actual stopping/starting of the op-node and not just network issue. Verifies that if an unsafe chain gap emerges, a node running in CLSync or ELSync mode will fill the gap after booting up: op-node: update OnUnsafeL2Payload to perform ELSync gap filling #17751:
TestUnsafeChainStalling_{CLSync|ELSync}_RestartOpNode_Long@nonsense - Show that the unsafe chain does not stall if RR sync is disabled, which is the case on current develop (i.e. we can't just switch RR sync on develop as is, without losing functionality) op-node: update OnUnsafeL2Payload to perform ELSync gap filling #17751:
TestUnsafeChainStalling_DisabledReqRespSync@nonsense - Validate upper behavior using real chain @nonsense
- Verify that if an unsafe chain gap emerges, due to network issues, a node running in CLSync or ELSync mode will fill the gap and continue to advance its unsafe chain, without help from the L1 derivation pipeline. op-node: update OnUnsafeL2Payload to perform ELSync gap filling #17751:
- ELP2P down, but chain still advancing since the unsafe payloads build on top of the unsafe head: will be implemented on top of tests after we will in the unsafe gap. This scenario occurs when the verifier eventually reached the unsafe head tip @pcw109550
- op-acceptance-tests: ELP2P down but payload appendable till chain tip #17895
TestReachUnsafeTipByAppendingUnsafePayload
- op-acceptance-tests: ELP2P down but payload appendable till chain tip #17895
- Reorg cases, where safe head reorg happened. FCU result will be INVALID. Test Reset behavior. This may be mocked using the Sync tester, since harder to test. When the safe head reorgs, the FCU call will (eventually) return INVALID, because that unsafe payload will not build on top of the safe head. Use sync-tester or test-sequencer. Ref: op-node: Support multiple ELSync runs #17627 (comment) @pcw109550
- op-acceptance-tests: Reorg then gap filling tests #17893
TestUnsafeGapFillAfterSafeReorgTestUnsafeGapFillAfterUnsafeReorg_RestartL2CLTestUnsafeGapFillAfterUnsafeReorg_RestartCLP2P
- op-acceptance-tests: Reorg then gap filling tests #17893
- Cover op-reth syncing, all upper scenarios may pass using reth @pcw109550 at op-node: update OnUnsafeL2Payload to perform ELSync gap filling #17751, branch
nonsense/deprecate-req-res-use-elsynccommit. 79efd35341c2e55685a1c078e3a00860e7a7b12d.TestL2ELP2PCanonicalChainAdvancedByFCUnot tested because it does not use op-node.-
TestUnsafeGapFillAfterSafeReorg -
TestUnsafeGapFillAfterUnsafeReorg_RestartL2CL -
TestUnsafeGapFillAfterUnsafeReorg_RestartCLP2P -
TestReachUnsafeTipByAppendingUnsafePayload -
TestSyncAfterInitialELSync -
TestUnsafeChainStalling_CLSync_Short -
TestUnsafeChainStalling_CLSync_Long -
TestUnsafeChainStalling_CLSync_RestartOpNode_Long -
TestUnsafeChainStalling_ELSync_Short -
TestUnsafeChainStalling_ELSync_Long -
TestUnsafeChainStalling_ELSync_RestartOpNode_Long -
TestUnsafeChainStalling_DisabledReqRespSync
-
reth version:
reth-optimism-cli Version: 1.8.2
Commit SHA: fe10c0785241a2ab92ee80c1e68629835d822770
Discussions
We have multiple combinations:
- syncmode:
sync.CLSync/sync.ELSync - RR Sync enabled / disabled
- EL implementation: geth / reth
Note that we have an assumption that EL P2P connectivity is stable, which means the syncing EL is connected to the EL which is fully synced
Validate the EL side payload caching behavior, and check that if the gaps are filled, the latest payload can be appended to the unsafe chain and become canonicalized, after the initial EL sync run.
Should benchmark and examine how long will it takes to fill in the unsafe gap using real chain. We may directly query the EL using the eth_getBlockByNumber("latest") to check that the EL actually reached the unsafe head tip, not relying on the optimism_syncStatus result from the op-node.
More concretely the testing scenario will be
- Prepare op-node and EL(with stable ELP2P) which are fully synced, reaching the unsafe tip
- Shut op-node down.
- Stay for 5 minutes to make the EL(with stable ELP2P) not advance, intentionally making the unsafe gap
- Start op-node with
- RR Sync disabled (via flag)
- Patched to do the EL Sync to fill in the gap
- Connected via CLP2P and receiving unsafe payloads from the sequencer
- Measure time until the EL(with stable ELP2P) reaches the tip.