-
Notifications
You must be signed in to change notification settings - Fork 110
Accurate DONT_HAVE timeouts #385
Description
Background
When a client requests a file using a Bitswap Session, the Session broadcasts want-have for the root block's CID to all connected peers. When a peer responds with HAVE it is added to the Session.
For subsequent blocks, the Session sends want-have to all peers in the Session and want-block to one peer, optimistically. The Session waits for a response or a timeout from that peer before sending want-block to another peer.
Currently the timeout is calculated by pinging the peer to estimate latency, and adding a constant for processing time.
Proposal
It would be more accurate to instead measure the time between sending a want and receiving a block, HAVE or DONT_HAVE in response.
The MessageQueue keeps a list of sent wants. Modify the MessageQueue sent wants list to also keep the time at which the want was sent.
Add a mechanism by which the MessageQueue can listen for messages from its peer. When a message arrives, calculate the delta between the time a want was sent and the corresponding block, HAVE or DONT_HAVE was received. Keep an EWMA for response times for the peer and use that as the DONT_HAVE timeout.
Caveats
Typically we broadcast want-have during Session discovery and receive HAVE in response. From that point forward we have a rough estimate of latency for peers that respond.
Broadcast want-haves are sent with send-dont-have=true, meaning that if the peer does not have the block, they won't respond. The peer may receive the block at a later date from someone else, and only then respond with HAVE, which will skew their apparent latency.
I think this is unlikely to happen often enough to matter much.