Skip to content
This repository was archived by the owner on Aug 2, 2021. It is now read-only.
This repository was archived by the owner on Aug 2, 2021. It is now read-only.

Kademlia Upgrade #1535

@nonsense

Description

@nonsense

I think we should discuss how/when we are going to tackle a refactor of Kademlia. We already have >5 known issues that we want to address, that the current implementation is not supporting:

  1. Kademlia suggests peers when a node retrieves remote chunks without knowledge of how utilised those peers are, resulting in not-optimal recommendations if we have a set of N equally distant peers from a given chunk - Kademlia peers are not utilised adequately during retrieval #1533

  2. The number of connections per bin is not adequate - sometimes it is 2 , sometimes it is 20 - we need to come up with a way to have a more deterministic way to build up a Kademlia table - investigate kademlia connectivity and suggest peer functionality #1436

  3. Currently a node pull syncs with all peers in a given bin. With push sync we might want to disable syncing of all bins < depth (or not?) and only sync with all our peers within our depth (>= depth). If we still decide to keep pull sync on lower bins (0, 1, etc. < depth), we should definitely not sync with all our peers within bin 0, but only a few. Basically there needs to be a distinction of peers - some peers should be available for retrieve requests, some peers should be available for syncing, and this should be more explicit. Right now in the Kademlia impl. we have a single container with conns, so we should think how we want to design this.

  4. Light nodes - they need to have connections with other peers and have a Kademlia table so that they issue properly retrieve requests, but ideally they should not appear in the Kademlia table of full nodes as we don't want Kademlia to suggest them for syncing, or other caps that they don't have. However it makes sense for Light nodes to share their view of the network with Full nodes, so it seems like there is benefit for them to run partly the hive protocol?

  5. Kademlia connectivity state saving and restoring - should be more deterministic - investigate restarting of networks and traffic incurred  #1396 . If we restart our node, we should prefer nodes that we were recently connected to, so that we don't incur syncing costs... (FYI our smoke tests suffer from this if you just restart a deployment and nodes connect to new peers and start historic syncing).

  6. Visibility over Kademlia (some connections and known peers are hidden) and usage of peers can be improved - improve kademlia table output #1403 - currently we don't have a good dashboard ala torrent client, where we can see how many chunks we have sent/received from a peer and how many are in flight. It'd be nice to have this so that we increase throughput of Swarm in general.

  7. Move loading and storing of Kademlia known and connected peers outside of the hive protocol?

I suggest we discuss these soon and decide how and when to tackle them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions