docs: add initial stream! protocol specification#1454
Conversation
| ### definition of stream | ||
| a protocol that facilitates data transmission between two swarm nodes, specifically targeting sequencial data in the form of a sequence of chunks as defined by swarm. the protocol should cater for the following requirements: | ||
| - client should be able to request arbitrary ranges from the server | ||
| - client can be assumed to have some of the data already and therefore can opt in to selectivally request chunks based on their hashes |
There was a problem hiding this comment.
aren't those two requirements actually the same requirement? if client can request arbitrary ranges from the server, then it basically means that the client can selectively request chunks?
There was a problem hiding this comment.
not really, because bin index ranges can be asked for, for example, without a roundtrip, and that constitutes a valid request for a range of chunks.
for more granular control you have the possibility of a roundtrip
docs/Stream-Protocol-Spec.md
Outdated
| - client can be assumed to have some of the data already and therefore can opt in to selectivally request chunks based on their hashes | ||
|
|
||
| As mentioned, the client is typically expected to have some of the data in the stream. to mitigate duplicate data transmission the stream protocol provides a configurable message roundtrip before batch delivery which allows the downstream peer to selectively request the chunks which it does not store at the time of the request. | ||
| This comes, expectedly, at a certain price. Since delivery batches are pre-negotiated and do not rely on the mere benevolence of nodes, we can conclude that the delivery batches are optimsed for _urgency_ rather than for maximising batch utilisation (this is however, would be more apparent with unbounded streams). |
There was a problem hiding this comment.
I don't really understand that sentence. It is also vague - certain price? benevolence of nodes? Please give easy to understand and specific example of real-world scenarios to illustrate what you mean.
|
I've actually had a few outstanding thoughts with this design:
|
|
also, @nonsense, i'm taking into account that different requirements would pop-up while implementing this (they are already coming up), so i could PR them into this document while working on it with the new protocol implementation, and so in general my expectations from this PR is to have an agreement on a baseline of what should be done and to see that there are no reservations about the design |
| - range is defined by client and should be strictly respected and followed by server | ||
| - all intervals specified in protocol messages are closed (inclusive) | ||
| - when roundtrip is configured - chunk deliveries can be handled concurrently (therefore their order is not guaranteed), but a server end-of-batch with topmost session index must be sent to signal the end of a batch | ||
| - when roundtrip is not configured - chunks are expected to be sent in order, one after the other |
There was a problem hiding this comment.
line 30 and 31 are not clear to me.
in the current impl of Swarm, we handle multiple messages concurrently, so depending on how we specify the ChunkDeliveryMsg, we will be handling many of those concurrently.
from a server perspective we always send chunks in order as we write messages to the TCP connection in order... i don't understand why we have to specify these things here.
| - stream indexes always > 0 | ||
| - syncing is an implementation of the stream protocol | ||
| - client is expected to manage all intervals, and therefore: | ||
| - server is designed to be stateless, except for the case of managing a offered/wanted roundtrip and the knowledge of a boundedness of a stream (e.g. the server knows that syncing streams are always unbounded from the localstore perspective - data can always enter the system, however this is not the case for live video stream for example) |
There was a problem hiding this comment.
i suggest we format a bit better the except cases here. there are two cases:
- a specific
GetRangeflow -> 1. get range, 2. offered, 3. wanted, 4. deliver and batch done. - unbounded streams - server knows that client has requested an unbounded stream, so it does what we have up at line 32. for that to happen, server keeps state on that request type, until connection is dead or client says stop.
|
@acud i think you've converged on something simpler than we already have. i suggest we iterate on it again and make this document a bit more succinct - it has too much prose for what the protocol is about in my opinion. |
…or message definitions
|
As discussed with @nonsense, we are merging this in a Draft state (see table at the top of the spec MD), so we can iterate over this through the new syncer PR without having to maintain two different PRs at the same time |
This PR adds a spec for the stream! protocol.
The proposed design should create more clarity on implementation, removes unnecessary abstractions and simplifies both the server side and client side.
The aim is to remove as much state management as possible. Ideally we would be creating a completely stateless server, this is however, not fully possible when having
offered/wantedroundtrip.human-readable version here: https://github.com/ethersphere/swarm/blob/stream-spec/docs/Stream-Protocol-Spec.md
Todo:
StreamStatecodes and possible errors