answer RequestManager queries from disk if possible#6109
Conversation
When restarting beacon node, orphaned blocks remain in the database but on startup, only the canonical chain as selected by fork choice loads. When a new block is discovered that builds on top of an orphaned block, the orphaned block is re-downloaded using sync/request manager, despite it already being present on disk. Such queries can be answered locally to improve discovery speed of alternate forks.
e274539 to
c2fc6de
Compare
for example, fasttracks these, when data is available tto connect it to the DAG quicker |
|
I'm not too sure about this one - ie the keeping of forks in the db is a bug more than a feature and we're increasing complexity of the flow quite a bit here, for a fringe case where you both restart and are loading a non-final chain. It feels as though the better approach would perhaps be to load all heads on startup, which also would involve storing all heads on any head update (this is trivial btw). Here, we end up with a significant increase in module coupling, creating kind of a spaghetti/pingpong flow just to support a minor use case, without instead considering the underlying source of this discord. |
|
Yeah, an alternative to this (request manager ingress) + #6122 (sync manager / gossip ingress) is to extend the DB schema. Has the disadvantage of also loading non-canonical but orphaned branches. But, on the other hand, avoids the two code segments from here + #6122; not sure which one is better at this time. Given the situation on Goerli, either of the fixes are quite important, because the network is currently split across three major branches that are relatively deep. In such a situation, re-discovering all the blocks from the network takes hours, preventing timely inclusion of a different but eventually canonical branch. I think starting with the change that does not affect DB schema (this + #6122), seeing how it goes, before optimizing it further, is also valid. The |
My point is more that this is a long-standing bug in chaindag - not loading branches has the consequence of them not getting pruned either, which is a fundamental bug (it's not a feature so fixing it is not a disadvantage) - here, we're basically heaping workarounds on top of this flaw which would not be necessary if the chaindag was doing the right thing (remembering what it has in the database). Fixing this is as easy as writing a new list of non-canonical heads every now and then and reloading it on startup, and then everything will work much better - and we can keep the decoupled, more simple design by relying on chaindag working as it should. |
|
Regarding pruning, either of the fixes improves the situation, as the end result is that I suspect the total complexity to be similar ~130 lines, but a DB based solution obviously also needs more thorough testing given its persistence. I think going with the workarounds for now, to unblock other improvements while goerli still is an interesting network to test on, is also alright. Monitoring those nodes quite closely these days and finding small tweaks in different areas. Once the DB based solution exists, can remove the workarounds. Have annotated the workaround with a TODO highlighting the discussion, in #6122 |
- #4732 KZG primitives added - #4785 added gossip validation for BlobSidecar but did not check KZG - #4844 added req/resp for BlobSidecar but did not validate responses - #5445 fixed KZG validate_blobs in state transition (Opt[bool] jank) - ethereum/consensus-specs#3531 new spec hits - #5565 new spec, + fixed gossip missing inclusion proof + KZG checks - #6109 goerli hardening, + fixed Req/Resp missing inclusion proof check - #6741 copies the req/resp gap to Fulu - #8254 removes the Deneb blob_quarantine - This one now finally adds the missing KZG checks to req/resp
- #4732 KZG primitives added - #4785 added gossip validation for BlobSidecar but did not check KZG - #4844 added req/resp for BlobSidecar but did not validate responses - #5445 fixed KZG validate_blobs in state transition (Opt[bool] jank) - ethereum/consensus-specs#3531 new spec hits - #5565 new spec, + fixed gossip missing inclusion proof + KZG checks - #6109 goerli hardening, + fixed Req/Resp missing inclusion proof check - #6741 copies the remaining req/resp missing KZG check gap to Fulu - #8254 removes the Deneb blob_quarantine - This one now finally adds the missing KZG checks to req/resp
- #4732 KZG primitives added - #4785 added gossip validation for BlobSidecar but did not check KZG - #4844 added req/resp for BlobSidecar but did not validate responses - #5445 fixed KZG validate_blobs in state transition (Opt[bool] jank) - ethereum/consensus-specs#3531 new spec hits - #5565 new spec, + fixed gossip missing inclusion proof + KZG checks - #6109 goerli hardening, + fixed Req/Resp missing inclusion proof check - #6741 copies the remaining req/resp missing KZG check gap to Fulu - #8254 removes the Deneb blob_quarantine - This one now finally adds the missing KZG checks to req/resp
When restarting beacon node, orphaned blocks remain in the database but on startup, only the canonical chain as selected by fork choice loads. When a new block is discovered that builds on top of an orphaned block, the orphaned block is re-downloaded using sync/request manager, despite it already being present on disk. Such queries can be answered locally to improve discovery speed of alternate forks.