Notify lookup sync of gossip processing results#5722
Notify lookup sync of gossip processing results#5722mergify[bot] merged 6 commits intosigp:unstablefrom
Conversation
90c30b3 to
e6c8955
Compare
Squashed commit of the following: commit e6c8955 Author: dapplion <35266934+dapplion@users.noreply.github.com> Date: Mon May 6 22:52:48 2024 +0900 Notify lookup sync of gossip processing results
|
I was thinking about this more. In order to avoid sync having to know anything about block processing results (explicitly) can't we do the following:
Eventually the block will either fail, or be processed. We will eventually get more requests for the chain of blocks and eventually it will succeed because the block will either be failed by the beacon processor, or it will be successful. The issue is around just restarting the chains. To prevent loops, like we wait a slot or something. Just thinking this would be away to avoid explicit communication from the beacon processor. |
For child lookups it a bit tricky. Assume you are bit behind, such that given this chain of blocks: You have to create a lookup for slot 10, fetch block, attempt process, get unknown parent, fetch 9, etc until 6. If 6 is processing lookup 7 needs some event to make progress. So the beacon processor must tell sync that block 6 has been processed. Otherwise dropping all lookups child of 6 while 6 is processing sounds silly. You must leave child lookups of 6 paused somehow awaiting for their ancestor to process. Sync lookup technically doesn't need to know about the processing result, it would be enough to have a signal to continue childs of 6. But then we have to add a new SyncMessage and have a dedicated handler in block lookups. While it reduces how much information the processor leaks, I don't see how it simplifies things. We already have a Note: The moment that sync lookup checks the chain's processing cache or da_checker we break the separation of concerns between sync lookup and gossip processing. If we are crossing that line, let's implement a proper informational complete fix. Notice that this ignores will happen thousands of times, one per each attestation. Unless we have something like #5706 |
| if reprocess_tx.try_send(reprocess_msg).is_err() { | ||
| error!(self.log, "Failed to inform block import"; "source" => "rpc", "block_root" => %hash) | ||
| }; | ||
| if matches!(process_type, BlockProcessType::SingleBlock { .. }) { |
There was a problem hiding this comment.
Why are we removing this match?
It it redundant because we now only process single blocks here?
There was a problem hiding this comment.
Yes, blobs are processed in process_rpc_blobs
beacon_node/network/src/network_beacon_processor/sync_methods.rs
Outdated
Show resolved
Hide resolved
|
@mergify queue |
🛑 The pull request has been removed from the queue
|
|
@mergify queue |
🛑 The pull request has been removed from the queue
|
|
@mergify refresh |
✅ Pull request refreshed |
|
@mergify queue |
✅ The pull request has been merged automaticallyDetailsThe pull request has been merged automatically at 93e0649 |
General problem
Lookup sync and gossip interact in very racy ways. They interact at all because we want to prevent downloading things via ReqResp that we are already processing. Note that this optimization carries complexity, which we may decide if it's worth it at all.
Consider the image above, we can receive an attestation referencing a block not yet imported into fork-choice at these moments in time:
Abefore processing: Easy, download and process the blockBduring processing: Block is already processing, should skip download. However, we must consider:Cwhile waiting blobs: Block is fully processed but not yet in fork-choice. In this case we don't have to worry about processing failures, but the child lookup issue remains.Dafter import: This trigger should never happen. If it does we will download all block components and get aBlockAreadyKnownerror.Issues with current unstable da9d386
^ introduces the optimization to prevent downloads for triggers
BandC. However, it fails to explicitly handle both pitfalls listed above. If processing fails, a lookup may get stuck.Also, it creates a spammy situation where a lookup get dropped immediately if the block is still processing, once per attestation
^ Fixed the spammy loop, but now it make the case of processing failure worse. Completed chains remain in a hashset for 60 seconds, so a failing lookup will be blocked from being retried for that period.
Proposed Changes
Sync lookup must be aware of gossip processing results. I think there's no way around that.
The least invasive way to achieve that IMO is:
BlockComponentProcessedWith the above we tackle:
Pitfalls
This PR relies on these assumptions:
availability_checker.has_block(block)returns true, the block is permanently valid and is missing blobsreqresp_pre_import_cache.contains_key(block)returns true, aBlockComponentProcessedevent MUST be emitted some time in the futureWe don't have e2e to assert those conditions at the moment. I think code comments should be good for now but it would be great to codify them in the future somehow
Todo
Closes #5693