Skip to content

op-node/rollup/sequencing: Fix temporary engine error handling#12258

Merged
sebastianst merged 1 commit intodevelopfrom
seb/sequencer-temp-err-fix
Oct 16, 2024
Merged

op-node/rollup/sequencing: Fix temporary engine error handling#12258
sebastianst merged 1 commit intodevelopfrom
seb/sequencer-temp-err-fix

Conversation

@sebastianst
Copy link
Copy Markdown
Member

Description

Do schedule a next sequencer action even if the building state is empty in onEngineTemporaryError.
This can happen if the previous successful payload cleared the building state and then when startBuildingBlock is entered, it hits a temp error but never modified the cleared building state, so the temp error handler would never schedule the next action.

Tests

Additional context

Metadata

Towards #12240 #12041 (but not necessarily fixing yet, if the current solution is wrong)

@bearpebble
Copy link
Copy Markdown
Contributor

@sebastianst just my two cents for a potential minimally invasive fix.

Add a StartedSequencing bool member to the BuildingState struct and then assign true right before the PreparePayloadAttributes() call, i.e.

d.log.Info("Started sequencing new block", "parent", l2Head, "l1Origin", l1Origin)
d.latest.StartedSequencing = true

This assumes setting d.latest.Started earlier than currently in onBuildStarted() is not an option. Because if that's fine as well, it can just be assigned in the same place with the same result I think.

@sebastianst
Copy link
Copy Markdown
Member Author

@sebastianst just my two cents for a potential minimally invasive fix.

Add a StartedSequencing bool member to the BuildingState struct and then assign true right before the PreparePayloadAttributes() call, i.e.

d.log.Info("Started sequencing new block", "parent", l2Head, "l1Origin", l1Origin)
d.latest.StartedSequencing = true

This assumes setting d.latest.Started earlier than currently in onBuildStarted() is not an option. Because if that's fine as well, it can just be assigned in the same place with the same result I think.

Yep that would also have been my 2nd try. But removing the early return might just be the better fix.

@mdehoog
Copy link
Copy Markdown
Contributor

mdehoog commented Oct 11, 2024

@sebastianst to add some data to the testing story: we've been running this change on some internal L3 testnets and it's working well

Copy link
Copy Markdown
Contributor

@protolambda protolambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change itself is valid, to get the sequencer unstuck, if it ever does encounter a combination of nextActionOK == false and temporary engine errors.

But I am a little puzzled why nextActionOK is false to begin with. It should only be false if the sequencer isn't able to sequence (during sync from L1, during reset of L1 sync, during extraordinary safe-head lag, or when meant to be inactive). So that may be worth investigating more, and I think we can drop nextActionOK state altogether, if we capture the above no-sequence cases individually.

Without the nextActionOK reason, I don't see how we can put together a coherent test. So I am in favor of merging this, based on what we have seen, and then refactoring it to not have a nextActionOK at all anymore.

@protolambda protolambda marked this pull request as ready for review October 16, 2024 11:28
@sebastianst sebastianst enabled auto-merge October 16, 2024 11:34
@sebastianst sebastianst disabled auto-merge October 16, 2024 13:11
Do schedule a next sequencer action even if the building state is empty
in `onEngineTemporaryError`.
This can happen if the previous successful payload cleared the building
state and then when `startBuildingBlock` is entered, it hits a temp
error but never modified the cleared building state, so the temp error
handler would never schedule the next action.

Towards #12240 #12041
@sebastianst sebastianst force-pushed the seb/sequencer-temp-err-fix branch from c647801 to 82c60df Compare October 16, 2024 13:12
@sebastianst sebastianst enabled auto-merge October 16, 2024 13:12
@sebastianst sebastianst added this pull request to the merge queue Oct 16, 2024
Merged via the queue into develop with commit 86e5f63 Oct 16, 2024
@sebastianst sebastianst deleted the seb/sequencer-temp-err-fix branch October 16, 2024 13:26
samlaf pushed a commit to samlaf/optimism that referenced this pull request Nov 10, 2024
…eum-optimism#12258)

Do schedule a next sequencer action even if the building state is empty
in `onEngineTemporaryError`.
This can happen if the previous successful payload cleared the building
state and then when `startBuildingBlock` is entered, it hits a temp
error but never modified the cleared building state, so the temp error
handler would never schedule the next action.

Towards ethereum-optimism#12240 ethereum-optimism#12041
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants