Conversation
6cb90c6 to
c266ef9
Compare
c266ef9 to
968fb96
Compare
ab6a7fe to
e670cb3
Compare
b6ed730 to
58f2418
Compare
| /// The source must be an OP Stack L2 CL RPC exposing optimism_syncStatus. | ||
| #[arg(long, visible_alias = "l2.follow.source", env = "KONA_NODE_L2_FOLLOW_SOURCE")] | ||
| pub l2_follow_source: Option<Url>, |
There was a problem hiding this comment.
Can we make this any clearer to prevent misconfiguration?
Maybe something like this?
| /// The source must be an OP Stack L2 CL RPC exposing optimism_syncStatus. | |
| #[arg(long, visible_alias = "l2.follow.source", env = "KONA_NODE_L2_FOLLOW_SOURCE")] | |
| pub l2_follow_source: Option<Url>, | |
| /// If configured, we will not run derivation, instead trusting the provided delegate to do so. The provided url must be an OP Stack L2 CL RPC exposing optimism_syncStatus. | |
| #[arg(long, visible_alias = "l2.derivation.delegate.rpc.url", env = "KONA_NODE_L2_DERIVATION_DELEGATE_RPC_URL")] | |
| pub l2_derivation_delegate_rpc_url: Option<Url>, |
There was a problem hiding this comment.
Agreed, would even love to add the URL to the design doc if possible
There was a problem hiding this comment.
The reason I chose the flag name l2.follow.source is to leave room for a small namespace around how the CL follows an external L2 source, e.g.:
l2.follow.sourcel2.follow.verifyl2.follow.poll-interval
This mirrors the existing naming in op-node, which already uses the l2.follow.* prefix, so I was trying to stay consistent with established conventions.
Conceptually, the CL is "following" an external L2 derivation source, which is why follow felt like the right abstraction to me. That said, I understand the concern about misconfiguration and ambiguity, especially since this is an RPC URL and not just an abstract source.
Open to tightening the name if we think the explicitness is worth the verbosity, and I agree that adding the concrete URL to the design doc would help reduce confusion regardless of the final flag name.
|
|
||
| /// Optional derivation delegation client. | ||
| #[clap(flatten)] | ||
| pub derivation_delegate_args: DerivationDelegateArgs, |
There was a problem hiding this comment.
Nit: we could remove the option inside DerivationDelegateArgs and instead just wrap the whole thing inside an Option so that it is really optional to use the derivation delegation client:
| pub derivation_delegate_args: DerivationDelegateArgs, | |
| pub derivation_delegate_args: Option<DerivationDelegateArgs>, |
There was a problem hiding this comment.
Can we rely on the optionality of DerivationDelegateConfig instead of DerivationDelegateArgs?
/// L2 derivation delegate connection arguments.
#[derive(Clone, Debug, Default, clap::Args)]
pub struct DerivationDelegateArgs {
/// The source must be an OP Stack L2 CL RPC exposing optimism_syncStatus.
#[arg(long, visible_alias = "l2.follow.source", env = "KONA_NODE_L2_FOLLOW_SOURCE")]
pub l2_follow_source: Option<Url>,
}
impl DerivationDelegateArgs {
/// Builds the derivation delegate configuration if an L2 CL URL was provided.
pub fn config(self) -> Option<DerivationDelegateConfig> {
self.l2_follow_source.map(|url| DerivationDelegateConfig { l2_cl_url: url })
}
}I followed the existing pattern for the L2ClientArgs.
theochap
left a comment
There was a problem hiding this comment.
General question: how hard/cumbersome would it be to make the delegate derivation actor be a separate implementation of some derivation actor trait? Depending on the CLI inputs we could decide to either spawn a normal or a "delegate derivation actor". There's probably some plumbing to do at the RollupService level, but it would be really helpful down the line. The normal and the light derivation actors actually share very little logic so this would make sense to have them implement the same trait.
If this is too long/hard to do in this PR, I am happy to track this as a follow-up work. It would be really helpful though to get it done sooner than later (and also a great testimony to how flexible kona's architecture can be!)
| { | ||
| /// Hardcoded poll interval for Derivation Delegation | ||
| const DERIVATION_DELEGATE_POLL_INTERVAL: std::time::Duration = | ||
| std::time::Duration::from_secs(4); |
There was a problem hiding this comment.
We probably said it somewhere, but would it make sense to use websockets instead? Is that on the roadmap as a future improvement? Hardcoded polls are quite brittle in my experience, specially if we want to guarantee a similar level of responsiveness we had before the light CL introduction
There was a problem hiding this comment.
Good question. Today, the light CL design consumes delegate derivation info via the HTTP RPC optimism_syncStatus. As far as I know, op-node does not currently expose this over WebSockets, so polling is effectively the only option right now.
That said, I agree that hardcoded polling can be brittle if we want to preserve the same level of responsiveness. There is definitely room to generalize both the connection mechanism and the data source.
I would consider WS / push based updates a reasonable future improvement rather than something we can rely on today.
That said, the external source itself could potentially be generalized to an EL source ( via eth_getBlockByNumber(safe|finalized)).
|
|
||
| /// Sends a safe L2 block for consolidation (Derivation Delegate mode). | ||
| /// Note: This does not wait for the engine to process it. | ||
| async fn send_safe_l2_block(&self, safe_l2: L2BlockInfo) -> EngineClientResult<()>; |
There was a problem hiding this comment.
If we had a separate DelegateDerivationActor struct we could:
- Define an associated type
- Add this associated type as an argument of
send_derived_attributes - For the normal derivation client implementation the associated type can be
OpAttributesWithParent, for the delegate client this can beL2BlockInfo.
There was a problem hiding this comment.
There was a problem hiding this comment.
My understanding is that we want to use a unified EngineActorRequest::ProcessSafeL2SignalRequest. This request must handle two different types, l2blockinfo and attributes. So we need a enum or a trait(suggested at #3245 (comment)).
if we use the associated type, we still need to convert or wrap with the trait / enum. afaik we cannot use the traits because of #3245 (comment).
I think it is better to use the union (trait / enum) and combine two safe related method into one like:
async fn send_safe_l2_signal(&self, signal: ConsolidateInput) -> EngineClientResult<()>;There was a problem hiding this comment.
Attempted to unify by using enums at 442d9f2
| }, | ||
| /// Derivation Delegation: consolidate based on safe L2 block info. | ||
| BlockInfo(L2BlockInfo), | ||
| } |
There was a problem hiding this comment.
Instead of making it an enum, I'd make two structures and have them implement the same trait. The parent struct (that holds the ConsolidateInput) can be generic over the ConsolidateInput and when we do the actor wiring (after CLI parsing), we can decide which type to plug in there.
There was a problem hiding this comment.
I attempted this approach. So making the ConsolidateInput as a trait. Currently ConsolidateInput is a field of ConsolidateTask:
pub struct ConsolidateTask<EngineClient_: EngineClient> {
/// The engine client.
pub client: Arc<EngineClient_>,
/// The [`RollupConfig`].
pub cfg: Arc<RollupConfig>,
/// The input for consolidation (either attributes or block info).
pub input: ConsolidateInput,
}So this struct will receive another trait like
pub struct ConsolidateTask<EngineClient_: EngineClient, ConsolidateInput_: ConsolidateInput> {
/// The engine client.
pub client: Arc<EngineClient_>,
/// The [`RollupConfig`].
pub cfg: Arc<RollupConfig>,
/// The input for consolidation (either attributes or block info).
pub input: ConsolidateInput_,
}The tasks are consumed at
, pub fn enqueue(&mut self, task: EngineTask<EngineClient_>) {
self.tasks.push(task);
self.task_queue_length.send_replace(self.tasks.len());
}so if we add an additional trait, my understanding is that all other tasks must be aware of this type. Are there any clear methods to use the additional trait types in this case?
There was a problem hiding this comment.
Yeah I suppose that this would entail that EngineTask become generic, ie instead of being an enum, they implement a trait.
Let's leave it that way for now. I will revisit in the future and try to simplify this structure
936a5f5 to
102877e
Compare
- Comments and naming - Drop `is_derived` from ConsolidateTask - Combine SafeL2 related EngineProcessingRequests - Clean up derivation delegate sync status validation - Comments and method renaming for reconcile_to_safe_head case - Clean up derivation delegate sync status validation - Explicitly ignoring DerivationActorRequest for the remainig case - Merge derivationActor state machine - break out derivation delegation logic - DelegationDerivationActor with some hacks - Wire in ConfiguredDerivationActor
102877e to
205c5d0
Compare
| /// Derivation delegate provider. | ||
| derivation_delegate_provider: DerivationDelegateClient, | ||
| /// L1 provider for validating L1 info for derivation delegation. | ||
| l1_provider: AlloyChainProvider, |
There was a problem hiding this comment.
Acknowledging that @op-will suggested that DerivationDelegateClient and AlloyChainProvider may be swapped to a trait for unit testing. The previous comment is gone because i attempted to create a separate delegation actor.
theochap
left a comment
There was a problem hiding this comment.
Small comment: instead of having delegate_actor.rs and delegate_client.rs inside the actors/derivation folder, I'd create a subfolder delegated and then add mod.rs/actor.rs/client.rs there. This makes the structure easier to follow
| derivation_actor_request_rx, | ||
| self.create_pipeline().await, | ||
| ))) | ||
| }; |
There was a problem hiding this comment.
I think in the future we should try to make this enum go away. This is certainly out of the scope of this PR but this showcases that our current implementation of the RollupNode is not flexible enough. Ideally swapping out one actor for another one should be easier at the rollup node level.
Good to merge the PR as is. Dealing with the rollup node interface is a workstream in itself
theochap
left a comment
There was a problem hiding this comment.
Looks good, a few small comments but I think this look much better like that. Thanks for taking our feedback into consideration! Good for you to merge once the comments are addressed
~Base branch: op-rs/kona#3242 ~May conflict with op-rs/kona#3229, op-rs/kona#3253 Implement the kona light CL([Design doc](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md)): - [DerivationActor - Target Determination](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md#derivationactor---target-determination) - [EngineActor - Fork Choice Update](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md#engineactor---fork-choice-update) ```mermaid flowchart TB subgraph A ["Normal Mode (Derivation)"] direction TB subgraph A0 ["Rollup Node Service"] direction TB A_Derivation["DerivationActor<br/>(L1->L2 derivation)"] A_Engine["EngineActor"] A_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end A_L1[(L1 RPC)] A_EL[(Execution Layer)] A_L1 -->|L1 info| A_Derivation A_UnsafeSrc -->|unsafe| A_Engine A_Derivation -->|"safe(attr)/finalized"| A_Engine A_Engine -->|engine API| A_EL end subgraph B ["Light CL Mode"] direction TB subgraph B0 ["Rollup Node Service"] direction TB B_DerivationX[["DerivationActor<br/>(NEW: Poll external syncStatus)"]] B_Engine["EngineActor"] B_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end B_L1[(L1 RPC)] B_Ext[(External CL RPC<br/>optimism_syncStatus)] B_EL[(Execution Layer)] %% Connections B_Ext -->|safe/finalized/currentL1| B_DerivationX B_L1 -->|canonical L1 check| B_DerivationX B_DerivationX -->|"safe(blockInfo)/finalized (validated)"| B_Engine B_UnsafeSrc -->|unsafe| B_Engine %% Visual indicator for disabled actor B_Engine -->|engine API| B_EL end ``` ### Testing #### Acceptance Tests Running guidelines detailed at op-rs/kona#3199: - [x] `TestFollowL2_Safe_Finalized_CurrentL1` - [x] `TestFollowL2_WithoutCLP2P` - [ ] `TestFollowL2_ReorgRecovery` (blocked by [kona: Check L2 reorg due to L1 reorg](#18676)) Injecting CurrentL1 is blocked by [kona: Revise SyncStatus CurrentL1 Selection](#18673) #### Local Sync Tests Validated with syncing op-sepolia between kona-node light CL <> sync tester, successfully finishing the initial EL sync and progress every safety levels reaching each tip. #### Devnet Tests Commit op-rs/kona@0b36fdd is baked to `us-docker.pkg.dev/oplabs-tools-artifacts/dev-images/kona-node:0b36fdd-light-cl` and deployed at `changwan-0` devnet: - As a verifier: `changwan-0-kona-geth-f-rpc-3` [[grafana]](https://optimistic.grafana.net/d/nUSlc3d4k/bedrock-networks?orgId=1&refresh=30s&from=now-1h&to=now&timezone=browser&var-network=changwan-0&var-node=$__all&var-layer=$__all&var-safety=l2_finalized&var-cluster=$__all&var-konaNodes=changwan-0-kona-geth-f-rpc-3) - As a sequencer: `changwan-0-kona-geth-f-sequencer-3` [[grafana]](https://optimistic.grafana.net/d/nUSlc3d4k/bedrock-networks?orgId=1&refresh=30s&from=now-1h&to=now&timezone=browser&var-network=changwan-0&var-node=$__all&var-layer=$__all&var-safety=l2_finalized&var-cluster=$__all&var-konaNodes=changwan-0-kona-geth-f-sequencer-3) - As a standby | leader Noticed all {unsafe, safe, finalized} head progression as a kona node light CL.
Base branch: #3242May conflict with #3229, #3253Implement the kona light CL(Design doc):
flowchart TB subgraph A ["Normal Mode (Derivation)"] direction TB subgraph A0 ["Rollup Node Service"] direction TB A_Derivation["DerivationActor<br/>(L1->L2 derivation)"] A_Engine["EngineActor"] A_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end A_L1[(L1 RPC)] A_EL[(Execution Layer)] A_L1 -->|L1 info| A_Derivation A_UnsafeSrc -->|unsafe| A_Engine A_Derivation -->|"safe(attr)/finalized"| A_Engine A_Engine -->|engine API| A_EL end subgraph B ["Light CL Mode"] direction TB subgraph B0 ["Rollup Node Service"] direction TB B_DerivationX[["DerivationActor<br/>(NEW: Poll external syncStatus)"]] B_Engine["EngineActor"] B_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end B_L1[(L1 RPC)] B_Ext[(External CL RPC<br/>optimism_syncStatus)] B_EL[(Execution Layer)] %% Connections B_Ext -->|safe/finalized/currentL1| B_DerivationX B_L1 -->|canonical L1 check| B_DerivationX B_DerivationX -->|"safe(blockInfo)/finalized (validated)"| B_Engine B_UnsafeSrc -->|unsafe| B_Engine %% Visual indicator for disabled actor B_Engine -->|engine API| B_EL endTesting
Acceptance Tests
Running guidelines detailed at #3199:
TestFollowL2_Safe_Finalized_CurrentL1TestFollowL2_WithoutCLP2PTestFollowL2_ReorgRecovery(blocked by kona: Check L2 reorg due to L1 reorg)Injecting CurrentL1 is blocked by kona: Revise SyncStatus CurrentL1 Selection
Local Sync Tests
Validated with syncing op-sepolia between kona-node light CL <> sync tester, successfully finishing the initial EL sync and progress every safety levels reaching each tip.
Devnet Tests
Commit 0b36fdd is baked to
us-docker.pkg.dev/oplabs-tools-artifacts/dev-images/kona-node:0b36fdd-light-cland deployed atchangwan-0devnet:changwan-0-kona-geth-f-rpc-3[grafana]changwan-0-kona-geth-f-sequencer-3[grafana]Noticed all {unsafe, safe, finalized} head progression as a kona node light CL.