Skip to content
This repository was archived by the owner on Feb 1, 2023. It is now read-only.
This repository was archived by the owner on Feb 1, 2023. It is now read-only.

Improve Request Sharding #167

@dirkmc

Description

@dirkmc

#165 outlines session request sharding in Bitswap. Each session maintains a list of peers that have responded to a wantlist request with a block, ordered by latency. Requests are split such that groups of CIDs (wants) are sent to groups of peers. As responses come in, the session adjusts the split factor up or down to maintain a balance between redundancy and too many duplicates.

Responses are processed on a block-by-block basis

The live want queue has 32 slots. Once the queue is full, then each time a response block is processed

  1. one slot in the live want queue opens up (the slot the response block CID occupied)
  2. the slot is filled with one CID (want) from the secondary queue
  3. the CID is sent out as a new want request

However the splitter is designed to work over groups of blocks, not a single block. So regardless of the split factor, blocks will always be sent to the same peers until the split factor changes.

For example if there are 8 peers (A - H) and the split factor is 3, the peers will be split into three groups. The CID will be sent to the peers in the first group and the other two groups will be disregarded:

cid        ->  A D G
<none>     ->  B E H
<none>     ->  C F

All blocks will be sent to peers A, D and G until the split factor changes. Regardless of the split factor, blocks will always be sent to peer A.

Shard wants on a streaming basis

It would be better for a few reasons if all blocks in the response were processed as a group, however quite often the response contains a small number of blocks anyway, so any solution should be able to handle a single block as well as several blocks.

There is a WIP PR to adjust the peer list order probabilistically, which should mitigate the problem, as the order of peers will change somewhat with each request. However it may be easier to reason about the problem if wants are instead sharded on a streaming basis.

Sharding should

  • Probabilistically select N peers to send the block to.
    N is analogous to the split factor: it should vary to find a balance between redundancy and too many duplicates.
  • Select peers with low latency and high probability of having the target block.
  • Balance load so peers don't get over-loaded.
  • Ask nearby peers for blocks even if they didn't have any so far.
    Asking nearby peers we are already connected to for a block is a relatively cheap operation. If several peers are downloading the same data, we want them to quickly start downloading from each other so as to
    • reduce load on the "seed" peer and the network as a whole
    • increase bandwidth (download from several peers at once)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions