Move batch authentication to derivation pipeline, remove BatchInbox#357
Conversation
Summary of ChangesHello @QuentinI, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the batch submission and authentication process by eliminating the dedicated Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the batch authentication mechanism by removing the BatchInbox smart contract and shifting its validation logic off-chain to the derivation pipeline. The new approach relies on the BatchAuthenticator contract emitting BatchInfoAuthenticated events, which the op-node then scans for within a defined lookback window to authorize batches. The changes introduce new Go functions to compute batch hashes and collect these authentication events from L1 receipts, updating calldata_source and blob_data_source to use this event-based authentication. Review comments highlight a potential performance bottleneck and reorg-safety concern in the CollectAuthenticatedBatches implementation, suggesting a more efficient eth_getLogs approach, and raise a liveness risk due to the fixed BatchAuthLookbackWindow potentially being too short during periods of high L1 congestion.
| for blockNum := startBlock; blockNum <= ref.Number; blockNum++ { | ||
| blockRef, err := fetcher.L1BlockRefByNumber(ctx, blockNum) | ||
| if err != nil { | ||
| return nil, NewTemporaryError(fmt.Errorf("batch auth: failed to fetch L1 block ref %d: %w", blockNum, err)) | ||
| } | ||
| _, receipts, err := fetcher.FetchReceipts(ctx, blockRef.Hash) | ||
| if err != nil { | ||
| return nil, NewTemporaryError(fmt.Errorf("batch auth: failed to fetch receipts for block %d: %w", blockNum, err)) | ||
| } | ||
| for h := range collectAuthEventsFromReceipts(receipts, authenticatorAddr) { | ||
| allAuthenticated[h] = true | ||
| } | ||
| } |
There was a problem hiding this comment.
This loop introduces a severe performance bottleneck. For every L1 block processed, it performs up to 100 iterations, each making two RPC calls (L1BlockRefByNumber and FetchReceipts). This results in ~200 RPC calls per L1 block, which will likely cause the node to fall behind during synchronization. Furthermore, L1BlockRefByNumber is not reorg-safe as it retrieves the block currently at that height, which may not belong to the canonical chain being derived. It is highly recommended to use eth_getLogs with a block range to retrieve all relevant events in a single, efficient, and reorg-safe RPC call.
There was a problem hiding this comment.
The use of L1BlockRefByNumber seems fine to me.
Re: the iterations, can we cache the allAuthenticated set to avoid iterating 100 times for each consecutive block?
7c2d257 to
6051341
Compare
| for blockNum := startBlock; blockNum <= ref.Number; blockNum++ { | ||
| blockRef, err := fetcher.L1BlockRefByNumber(ctx, blockNum) | ||
| if err != nil { | ||
| return nil, NewTemporaryError(fmt.Errorf("batch auth: failed to fetch L1 block ref %d: %w", blockNum, err)) | ||
| } | ||
| _, receipts, err := fetcher.FetchReceipts(ctx, blockRef.Hash) | ||
| if err != nil { | ||
| return nil, NewTemporaryError(fmt.Errorf("batch auth: failed to fetch receipts for block %d: %w", blockNum, err)) | ||
| } | ||
| for h := range collectAuthEventsFromReceipts(receipts, authenticatorAddr) { | ||
| allAuthenticated[h] = true | ||
| } | ||
| } |
There was a problem hiding this comment.
The use of L1BlockRefByNumber seems fine to me.
Re: the iterations, can we cache the allAuthenticated set to avoid iterating 100 times for each consecutive block?
op-node/rollup/derive/data_source.go
Outdated
| // Legacy mode: verify sender | ||
| return isAuthorizedBatchSender(tx, dsCfg.l1Signer, batcherAddr, logger) |
There was a problem hiding this comment.
Can we require authenticatedHashes to be not nil for the TEE batcher, and only check isAuthorizedBatchSender for the fallback batcher, to avoid the TEE batcher not providing the hash?
| // | ||
| // At ~12s per L1 block, 100 blocks ≈ 20 minutes. This gives the batcher ample time | ||
| // to land the batch data transaction on L1 after the authentication transaction, | ||
| // even under moderate L1 congestion or batcher restarts. The window is intentionally |
There was a problem hiding this comment.
What does moderate mean in that case? What could happen in the case of a re-org for example?
There was a problem hiding this comment.
What does moderate mean in that case?
Nothing in particular, this comment was co-written by Claude so it's a bit pretentious in wording. I'll re-write to make it sound less like I made a formal study of L1 congestion before landing on the 100 value for the lookback window.
What could happen in the case of a re-org for example?
If we're traversing the lookback window mid-reorg, we could run into inconsistent state, but the derivation pipeline is reset on reorgs, so the issue would be transient.
| } | ||
| } | ||
|
|
||
| if len(allAuthenticated) > 0 { |
There was a problem hiding this comment.
Nit: return 0 authenticated batchs, can be a debug log.
7c2e7cb to
3765a04
Compare
Yeah, that's a good idea. Added an LRU cache. |
834dc67 to
2a3142d
Compare
… contract Move batch authentication from on-chain validation (BatchInbox contract calling BatchAuthenticator.validateBatch) to off-chain verification in the derivation pipeline via BatchInfoAuthenticated event scanning. Key changes: - Add batch_authenticator.go: event-based auth using CollectAuthenticatedBatches which scans a lookback window once per L1 block and returns authenticated hashes - Add isBatchTxAuthorized helper shared by calldata and blob data sources - Remove BatchInbox contract, interface, tests, bindings, and snapshots entirely (Espresso-introduced, never in Celo fork) - Remove validateBatch/validateTeeBatch/validateNonTeeBatch from BatchAuthenticator (authenticateBatchInfo and signer management remain) - Regenerate BatchAuthenticator Go bindings - Revert BatchInbox-related fields from L1Deployments, ChainState, DeployEspressoOutput, and calculateBatchInboxAddr back to upstream signatures (introduce+remove no-ops) - Update integration/devnet tests for EOA BatchInbox behavior Co-authored-by: OpenCode <noreply@opencode.ai>
… auth When batch authentication is enabled, the TEE batcher requires a matching BatchInfoAuthenticated event on L1. The fallback (non-TEE) batcher does not call authenticateBatchInfo, so its transactions had no auth events and were rejected by the derivation pipeline. Add FallbackBatcherAddress to rollup.Config and the derivation pipeline's DataSourceConfig. When event-based auth finds no matching event for a batch, it falls back to sender verification against the fallback batcher address. This allows the non-TEE batcher to post batches without on-chain auth events while still requiring event-based auth for the TEE batcher. Co-authored-by: OpenCode <noreply@opencode.ai>
The validBatchInfo mapping was written to on every authenticateBatchInfo call but no longer read by the derivation pipeline, which now uses BatchInfoAuthenticated events instead. Removing it saves ~20k gas per call. - Remove mapping declaration and write from BatchAuthenticator.sol - Remove from IBatchAuthenticator interface - Remove test assertions on validBatchInfo - Update SECURITY_ANALYSIS.md to reflect event-based auth model - Bump contract version to 1.1.0 - Regenerate ABI/storage snapshots and Go bindings Co-authored-by: OpenCode <noreply@opencode.ai>
2a3142d to
9451342
Compare
| if dsCfg.BatchAuthEnabled() { | ||
| // Event-based authentication: TEE batcher must have an auth event | ||
| if authenticatedHashes[batchHash] { | ||
| return true | ||
| } | ||
| // Fallback batcher: accept via sender verification | ||
| if dsCfg.fallbackBatcherAddress != (common.Address{}) { | ||
| if isAuthorizedBatchSender(tx, dsCfg.l1Signer, dsCfg.fallbackBatcherAddress, logger) { | ||
| return true | ||
| } | ||
| } | ||
| logger.Warn("batch not authenticated via event or fallback sender", | ||
| "txHash", tx.Hash(), "batchHash", batchHash) | ||
| return false |
There was a problem hiding this comment.
This looks like it's removing the need for a switch batcher step? If so can the switching functionality be removed from the contract?
There was a problem hiding this comment.
Batchers may post simultaneously, it won't break anything; but they still shouldn't, so batchers poll it here: https://github.com/EspressoSystems/optimism-espresso-integration/pull/357/changes#diff-c734d1296b2fd691221b92df3edf09c7533c507a74c2316117745c75c3ad5776R814
| CaffNodeConfig espresso.CLIConfig `json:"caff_node_config,omitempty"` | ||
|
|
||
| BatchAuthenticatorAddress common.Address `json:"batch_authenticator_address,omitempty,omitzero"` | ||
| FallbackBatcherAddress common.Address `json:"fallback_batcher_address,omitempty,omitzero"` |
There was a problem hiding this comment.
Could we drop this address and just use the existing batcher address? I believe this would also simplify the isBatchTxAuthorized implementation since it would avoid a second callsite for isAuthorizedBatchSender
There was a problem hiding this comment.
Could we drop this address and just use the existing batcher address
Certainly, but it would be an extensive change due to all the wiring outside of derivation pipeline assuming the existing batcher address is the TEE batcher. I'm not opposed to making that change in the future, but would prefer to leave it out of scope of this PR.
cc @philippecamacho @shenkeyao what do you think?
There was a problem hiding this comment.
Hey @QuentinI, could you explain a bit what and where this external wiring is? At least from within the derivation pipeline there doesn't seem to be any assumption that the existing batcher address is the TEE batcher.
There was a problem hiding this comment.
Sorry, forgot to answer your question. Here's a PR based on this branch that does just that: #370
| batchAuthCacheOnce.Do(func() { | ||
| // Size slightly larger than the lookback window to avoid thrashing | ||
| // at the boundary. lru.New only errors on size <= 0. | ||
| batchAuthCache, _ = lru.New[common.Hash, map[common.Hash]bool](int(BatchAuthLookbackWindow) + 16) |
There was a problem hiding this comment.
I think + 2 would be sufficient here instead of + 16. The lookback window covers 101 blocks (the ref block plus 100 ancestors), so 101 slots seems like the natural size. But the traversal reads from newest to oldest, which means the ref block is touched first and becomes the LRU entry. With exactly 101 slots, inserting the next ref block evicts the previous ref block (its parent), which we're about to read — causing a cascade of evict-and-refetch through the entire window.
With 102 slots, the cache has room for the 101 new window entries plus one stale entry (the block that just fell out of the lookback window). That stale entry is the LRU since it wasn't touched in the current traversal, so it's the one that gets evicted — no cascade
There was a problem hiding this comment.
That is fair, I was being conservative. I'll push it down to +2.
The fallback batcher (op-batcher-fallback) was using Anvil account #3 (0x90F79bf6EB2c4f870365E785982E1f101E93b906), which is also the Deployer key used by devnet tests for admin transactions (SwitchBatcher, TransferOwnership, etc.). When both the fallback batcher service and test code sign L1 transactions from the same account concurrently, nonce collisions cause transactions to never be mined, leading to timeouts. Switch the fallback batcher to Anvil account #6 (0x976EA74026E726554dB657fA54763abd0C3a0aa9) across docker-compose, prepare-allocs (nonTeeBatcher in rollup config), and .env. Co-authored-by: OpenCode <noreply@opencode.ai>
Query BatchAuthenticator.activeIsTee() before publishing to L1. The inactive batcher (TEE or fallback) now skips publishStateToL1 instead of posting transactions that would be ignored by the derivation pipeline. This was previously unnecessary because the BatchInbox contract's validateBatch() would revert inactive batcher transactions during eth_estimateGas. With the move to an EOA batch inbox, both batchers can freely post, so an explicit idle check is needed. Co-authored-by: OpenCode <noreply@opencode.ai>
…tes32,address) PR #363 removed the 'address indexed signer' parameter from the BatchInfoAuthenticated event, but the Go derivation pipeline still used the old 2-parameter signature. The keccak256 hash of the wrong signature never matched actual log topics, causing CollectAuthenticatedBatches to find zero events and reject every batch. - Fix ABI string in batch_authenticator.go - Update test mocks to use 2-topic logs matching the actual event - Regenerate Go bindings from fresh forge artifacts Co-authored-by: OpenCode <noreply@opencode.ai>
CollectAuthenticatedBatches walks backwards ~100 L1 blocks per call, making an L1BlockRefByHash RPC call for each step. Since consecutive L1 blocks share ~99 blocks in their lookback windows, these calls are almost entirely redundant after the first traversal. Add a second global LRU cache (blockRefCache) mapping block hash to L1BlockRef, alongside the existing receipt cache. On steady state this reduces L1BlockRefByHash RPC calls from ~100 per L1 block to ~0-1, a ~50x reduction in total RPC overhead for the batch auth path. Co-authored-by: OpenCode <noreply@opencode.ai>
…vePublishOnly In-flight sendTxWithEspresso goroutines spawned before deactivation can take ~25s to drain their queued Txmgr.Send calls. The previous 10s delay was insufficient, causing stale TEE batcher transactions to appear in the post-switch monitoring window and failing the assertion. Co-authored-by: OpenCode <noreply@opencode.ai>
| // resetBatchAuthCaches resets both global caches (receipt and block ref). | ||
| // This is only intended for use in tests to ensure isolation between test cases. | ||
| func resetBatchAuthCaches() { | ||
| batchAuthCache = nil | ||
| batchAuthCacheOnce = sync.Once{} | ||
| blockRefCache = nil | ||
| blockRefCacheOnce = sync.Once{} | ||
| } |
There was a problem hiding this comment.
I'm wondering why you went for a global variable, as opposed to just passing an instance down to where it is needed?
There was a problem hiding this comment.
This is to reduce diff with upstream by keeping the changes in a separate file.
| // Keyed by block hash so it is naturally reorg-safe: after a reorg the | ||
| // parent-hash traversal follows a different chain and stale entries are | ||
| // never hit. Thread-safe via lru.Cache's internal mutex. | ||
| batchAuthCache *lru.Cache[common.Hash, map[common.Hash]bool] |
There was a problem hiding this comment.
What about not verifying the tee signature on chain and instead having the event emitted by the batch authenticator contract include the batch hash and TEE signature and then verifying those in the derivation pipeline?
There was a problem hiding this comment.
This would be more efficient, but less generic and require an update to the derivation pipeline - and thus a hardfork - if we ever want to change how the verification is done, e.g, if we transition to ZK-proofs instead of TEE attestations.
|
@piersy @shenkeyao @jjeangal |
shenkeyao
left a comment
There was a problem hiding this comment.
LGTM. Two devnet test CIs are failing--maybe rerun them?
That's expected, needs this first EspressoSystems/kona-celo-fork#15 |
37b0281 to
cff0be6
Compare
0983f1c
into
celo-integration-rebase-14.2
…357) * Move batch authentication into derivation pipeline, remove BatchInbox contract Move batch authentication from on-chain validation (BatchInbox contract calling BatchAuthenticator.validateBatch) to off-chain verification in the derivation pipeline via BatchInfoAuthenticated event scanning. Key changes: - Add batch_authenticator.go: event-based auth using CollectAuthenticatedBatches which scans a lookback window once per L1 block and returns authenticated hashes - Add isBatchTxAuthorized helper shared by calldata and blob data sources - Remove BatchInbox contract, interface, tests, bindings, and snapshots entirely (Espresso-introduced, never in Celo fork) - Remove validateBatch/validateTeeBatch/validateNonTeeBatch from BatchAuthenticator (authenticateBatchInfo and signer management remain) - Regenerate BatchAuthenticator Go bindings - Revert BatchInbox-related fields from L1Deployments, ChainState, DeployEspressoOutput, and calculateBatchInboxAddr back to upstream signatures (introduce+remove no-ops) - Update integration/devnet tests for EOA BatchInbox behavior Co-authored-by: OpenCode <noreply@opencode.ai> * Add FallbackBatcherAddress to derivation pipeline for non-TEE batcher auth When batch authentication is enabled, the TEE batcher requires a matching BatchInfoAuthenticated event on L1. The fallback (non-TEE) batcher does not call authenticateBatchInfo, so its transactions had no auth events and were rejected by the derivation pipeline. Add FallbackBatcherAddress to rollup.Config and the derivation pipeline's DataSourceConfig. When event-based auth finds no matching event for a batch, it falls back to sender verification against the fallback batcher address. This allows the non-TEE batcher to post batches without on-chain auth events while still requiring event-based auth for the TEE batcher. Co-authored-by: OpenCode <noreply@opencode.ai> * Remove validBatchInfo mapping from BatchAuthenticator The validBatchInfo mapping was written to on every authenticateBatchInfo call but no longer read by the derivation pipeline, which now uses BatchInfoAuthenticated events instead. Removing it saves ~20k gas per call. - Remove mapping declaration and write from BatchAuthenticator.sol - Remove from IBatchAuthenticator interface - Remove test assertions on validBatchInfo - Update SECURITY_ANALYSIS.md to reflect event-based auth model - Bump contract version to 1.1.0 - Regenerate ABI/storage snapshots and Go bindings Co-authored-by: OpenCode <noreply@opencode.ai> * Comments * Fix txmgr wrapper * Fix nonce collision: use separate account for fallback batcher The fallback batcher (op-batcher-fallback) was using Anvil account #3 (0x90F79bf6EB2c4f870365E785982E1f101E93b906), which is also the Deployer key used by devnet tests for admin transactions (SwitchBatcher, TransferOwnership, etc.). When both the fallback batcher service and test code sign L1 transactions from the same account concurrently, nonce collisions cause transactions to never be mined, leading to timeouts. Switch the fallback batcher to Anvil account #6 (0x976EA74026E726554dB657fA54763abd0C3a0aa9) across docker-compose, prepare-allocs (nonTeeBatcher in rollup config), and .env. Co-authored-by: OpenCode <noreply@opencode.ai> * Add batcher-side idle check: inactive batcher skips publishing Query BatchAuthenticator.activeIsTee() before publishing to L1. The inactive batcher (TEE or fallback) now skips publishStateToL1 instead of posting transactions that would be ignored by the derivation pipeline. This was previously unnecessary because the BatchInbox contract's validateBatch() would revert inactive batcher transactions during eth_estimateGas. With the move to an EOA batch inbox, both batchers can freely post, so an explicit idle check is needed. Co-authored-by: OpenCode <noreply@opencode.ai> * Fix event signature mismatch: BatchInfoAuthenticated(bytes32) not (bytes32,address) PR #363 removed the 'address indexed signer' parameter from the BatchInfoAuthenticated event, but the Go derivation pipeline still used the old 2-parameter signature. The keccak256 hash of the wrong signature never matched actual log topics, causing CollectAuthenticatedBatches to find zero events and reject every batch. - Fix ABI string in batch_authenticator.go - Update test mocks to use 2-topic logs matching the actual event - Regenerate Go bindings from fresh forge artifacts Co-authored-by: OpenCode <noreply@opencode.ai> * Cache L1BlockRef lookups in batch auth lookback window traversal CollectAuthenticatedBatches walks backwards ~100 L1 blocks per call, making an L1BlockRefByHash RPC call for each step. Since consecutive L1 blocks share ~99 blocks in their lookback windows, these calls are almost entirely redundant after the first traversal. Add a second global LRU cache (blockRefCache) mapping block hash to L1BlockRef, alongside the existing receipt cache. On steady state this reduces L1BlockRefByHash RPC calls from ~100 per L1 block to ~0-1, a ~50x reduction in total RPC overhead for the batch auth path. Co-authored-by: OpenCode <noreply@opencode.ai> * Increase batcher switch stabilization delay to 60s in TestBatcherActivePublishOnly In-flight sendTxWithEspresso goroutines spawned before deactivation can take ~25s to drain their queued Txmgr.Send calls. The previous 10s delay was insufficient, causing stale TEE batcher transactions to appear in the post-switch monitoring window and failing the assertion. Co-authored-by: OpenCode <noreply@opencode.ai> * Reduce lookback window * Pin eth2-val-tools * Update succinct versions --------- Co-authored-by: OpenCode <noreply@opencode.ai> # Conflicts: # espresso/docker-compose.yml # op-batcher/batcher/driver.go # op-e2e/system/e2esys/setup.go
This PR:
Moves batch authentication from on-chain to the derivation pipeline. Now the derivation pipeline scans L1 receipts for
BatchInfoAuthenticatedevents in a 100-block lookback window, compatible withop-programfault proofs.Removes the
BatchInboxcontract entirely.Removes the
validBatchInfomapping fromBatchAuthenticator. The contract no longer needs to store batch info for on-chain validation — it only needs to emit theBatchInfoAuthenticatedevent.Adds
FallbackBatcherAddressfor dual-mode auth. The TEE batcher must have a matchingBatchInfoAuthenticatedevent. The fallback (non-TEE) batcher doesn't callauthenticateBatchInfo, so it's authorized via sender verification against a configuredFallbackBatcherAddress. This fixes the derivation pipeline rejecting all fallback batcher transactions.This PR does not:
fallback_batcher_addressin rollup config JSON — tracked separately. Thesuccinct-proposerRust deserializer rejects this unknown field, causing devnet test failures.Key places to review:
op-node/rollup/derive/batch_authenticator.go— New file: event scanning logic (CollectAuthenticatedBatches,ComputeCalldataBatchHash,ComputeBlobBatchHash)op-node/rollup/derive/data_source.go—isBatchTxAuthorized()dual-mode auth (event-based for TEE, sender-based for fallback)packages/contracts-bedrock/src/L1/BatchAuthenticator.sol— Storage removal (validBatchInfomapping removed)