Skip to content
This repository was archived by the owner on Aug 2, 2021. It is now read-only.
This repository was archived by the owner on Aug 2, 2021. It is now read-only.

rewrite pull syncer #1451

@acud

Description

@acud

pull syncer has a few edge case bugs that are very hard to trace and debug.

we sometimes do not get chunks at the expected nodes and it is very difficult to support these debugging efforts with the current syncer infrastructure.

work outline:

  • submit an initial spec to iterate on top of
  • write the protocol boilerplate for initialising a new protocol over the devp2p network

sync:

  • retrieve stream cursors upon node connection
  • drop cursors on node moved out of depth
  • establish streams inside NN according to kademlia depth
  • make sure that stream cancellations happen on depth change
  • debounce mechanism

sync-localstore-intervals:

  • make sure closed intervals are always delivered from localstore pull

get stream < cursor (history priority):

  • test case for continuous intervals (no gaps)
  • test case for missing intervals (enclosing interval should still persist)

get stream > cursor (live priority):

  • test case for continuous intervals (no gaps)
  • test case for missing intervals (enclosing interval should still persist) difficult to test since intervals are fetched faster than they can be erased to create gaps. on hold

consistency:

  • test case for no duplicate chunks delivered
  • check no overlap between historical and live streaming
  • make sync bins within depth feature toggle configurable and write test cases that validate syncing with it on and off

cluster/snapshot:

  • test that chunks are sent correctly in a star topology (adapt second test from syncer_test.go from existing stream package)
  • test that 3 nodes with full connectivity sync between them and that on each node will be the union of all three nodes' localstores
  • test that chunks are synced correctly in a larger full topology w/ discovery. this test vector needs more description

resilience:

  • guaranty that there's always a live historical fetch with an unbounded stream
  • check that existing historical and live historical stream fetchers terminate when depth changes and node moves out of depth

optimisations/benchmarking:

  • is it faster to concurrently deliver 3000 chunks in 3000 different messages between two nodes? or is it more effective to send one message with 3000 chunks? this should be easily measurable

tooling:

  • add assert for smoke tests to know when syncing is done

logging:

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions