Skip to content
This repository was archived by the owner on Aug 2, 2021. It is now read-only.

docs: add initial stream! protocol specification#1454

Merged
acud merged 8 commits intomasterfrom
stream-spec
Jun 20, 2019
Merged

docs: add initial stream! protocol specification#1454
acud merged 8 commits intomasterfrom
stream-spec

Conversation

@acud
Copy link
Copy Markdown
Contributor

@acud acud commented Jun 11, 2019

This PR adds a spec for the stream! protocol.

The proposed design should create more clarity on implementation, removes unnecessary abstractions and simplifies both the server side and client side.
The aim is to remove as much state management as possible. Ideally we would be creating a completely stateless server, this is however, not fully possible when having offered/wanted roundtrip.

human-readable version here: https://github.com/ethersphere/swarm/blob/stream-spec/docs/Stream-Protocol-Spec.md

Todo:

  • examples of the messages are still missing
  • diagram of message exchange
  • diagram of initial stream query (getting session indexes etc)
  • diagram of unbounded stream
  • diagram of bounded stream
  • define all StreamState codes and possible errors

### definition of stream
a protocol that facilitates data transmission between two swarm nodes, specifically targeting sequencial data in the form of a sequence of chunks as defined by swarm. the protocol should cater for the following requirements:
- client should be able to request arbitrary ranges from the server
- client can be assumed to have some of the data already and therefore can opt in to selectivally request chunks based on their hashes
Copy link
Copy Markdown
Contributor

@nonsense nonsense Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't those two requirements actually the same requirement? if client can request arbitrary ranges from the server, then it basically means that the client can selectively request chunks?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, because bin index ranges can be asked for, for example, without a roundtrip, and that constitutes a valid request for a range of chunks.
for more granular control you have the possibility of a roundtrip

- client can be assumed to have some of the data already and therefore can opt in to selectivally request chunks based on their hashes

As mentioned, the client is typically expected to have some of the data in the stream. to mitigate duplicate data transmission the stream protocol provides a configurable message roundtrip before batch delivery which allows the downstream peer to selectively request the chunks which it does not store at the time of the request.
This comes, expectedly, at a certain price. Since delivery batches are pre-negotiated and do not rely on the mere benevolence of nodes, we can conclude that the delivery batches are optimsed for _urgency_ rather than for maximising batch utilisation (this is however, would be more apparent with unbounded streams).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand that sentence. It is also vague - certain price? benevolence of nodes? Please give easy to understand and specific example of real-world scenarios to illustrate what you mean.

@acud
Copy link
Copy Markdown
Contributor Author

acud commented Jun 16, 2019

I've actually had a few outstanding thoughts with this design:

  1. Retrieve requests are actually not covered within the scope of this document. Now comes the question - should we contain retrieve requests as part of the stream protocol or are they a good refactoring candidate to be pulled out to a separate spec? the fact that streams are defined as a stream of chunks with monotonically incrementing indexes, means that we look at the streams through the identity of the bin indexes rather than through their hashes.
  2. Maybe we should consider pulling retrieve requests into a separate protocol that would be easier to also account for adaptive nodes; in addition, a paid symmetric counterpart to retrieve requests could be added to this new protocol to cover both cases of: (a) i want to pay someone to fetch something for me (b) i want to pay someone to push something onto the network for me.
  3. I am not 100% sure that handling deliveries at the individual chunk level is so effective to haul large amounts of data. I'd really like to have a special delivery message that can take up multiple chunks, at least for syncing (devp2p max message size is 16mb. to me it seems rather wasteful not to make use of that frame size. we can deliver almost 4000 chunks at one go with this option). That being said - this should and can easily be benchmarked.

@acud
Copy link
Copy Markdown
Contributor Author

acud commented Jun 16, 2019

also, @nonsense, i'm taking into account that different requirements would pop-up while implementing this (they are already coming up), so i could PR them into this document while working on it with the new protocol implementation, and so in general my expectations from this PR is to have an agreement on a baseline of what should be done and to see that there are no reservations about the design

@acud acud mentioned this pull request Jun 17, 2019
22 tasks
- range is defined by client and should be strictly respected and followed by server
- all intervals specified in protocol messages are closed (inclusive)
- when roundtrip is configured - chunk deliveries can be handled concurrently (therefore their order is not guaranteed), but a server end-of-batch with topmost session index must be sent to signal the end of a batch
- when roundtrip is not configured - chunks are expected to be sent in order, one after the other
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 30 and 31 are not clear to me.

in the current impl of Swarm, we handle multiple messages concurrently, so depending on how we specify the ChunkDeliveryMsg, we will be handling many of those concurrently.

from a server perspective we always send chunks in order as we write messages to the TCP connection in order... i don't understand why we have to specify these things here.

- stream indexes always > 0
- syncing is an implementation of the stream protocol
- client is expected to manage all intervals, and therefore:
- server is designed to be stateless, except for the case of managing a offered/wanted roundtrip and the knowledge of a boundedness of a stream (e.g. the server knows that syncing streams are always unbounded from the localstore perspective - data can always enter the system, however this is not the case for live video stream for example)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest we format a bit better the except cases here. there are two cases:

  • a specific GetRange flow -> 1. get range, 2. offered, 3. wanted, 4. deliver and batch done.
  • unbounded streams - server knows that client has requested an unbounded stream, so it does what we have up at line 32. for that to happen, server keeps state on that request type, until connection is dead or client says stop.

@nonsense
Copy link
Copy Markdown
Contributor

@acud i think you've converged on something simpler than we already have. i suggest we iterate on it again and make this document a bit more succinct - it has too much prose for what the protocol is about in my opinion.

@acud acud merged commit 8afb316 into master Jun 20, 2019
@acud
Copy link
Copy Markdown
Contributor Author

acud commented Jun 20, 2019

As discussed with @nonsense, we are merging this in a Draft state (see table at the top of the spec MD), so we can iterate over this through the new syncer PR without having to maintain two different PRs at the same time

@acud acud deleted the stream-spec branch June 20, 2019 10:22
@acud acud restored the stream-spec branch July 1, 2019 08:50
@acud acud deleted the stream-spec branch July 5, 2019 11:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants