-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Context
What is GraphSync?
GraphSync is a protocol to synchronize IPLD graphs across peers. The full specification can be found here: https://github.com/ipld/specs/blob/master/graphsync/graphsync.md
Where Bitswap deals with requesting individual blocks from other peers, Graphsync assumes that the data you're interested in is node or nodes in an IPLD graph, also allows you to make requests to remote peers to return the results querying the graph using an IPLD Selector
There are several potentially several use cases for requesting data this way, but perhaps the simplest to understand is attempting to get a single node at a deeply nested path. Using only Bitswap, getting to a deeply nested path means several roundtrips of making a query for a block, receiving it, following a link, then requesting the next block, and repeating until you reach the final link in the path. With Graphsync, you could make a query to a remote node using a path selector, and have that node perform the traversal of the path locally, then send you all the blocks in the path back at once, in a single roundtrip (it has to send all the blocks in the path so you're able to verify the results locally from the block you already have)
Status of go-graphsync
go-graphsync is the initial implementation of GraphSync in go. It is nearing alpha 'feature complete' status as of April 2019, and will be ready for use around the beginning of May. go-graphsync was initially written to support use cases in Filecoin. Filecoin's integration however is not likely to begin until circa Q3 2019.
Important Caveat: The initial implementation of go-graphsync is entirely single peer to single peer -- a graphsync request is made directly to only one peer at a time. It assumes that you've already found providers for the graph. It assumes you know the peer you are requesting from has the data you want, or that you will write the code to query multiple peers outside of go-graphsync.
Status of go-ipld-prime
go-ipld-prime is a complete rewrite of the go implementation of the IPLD specification that the IPLD team has been working on for some time. It is relevant to GraphSync because:
go-ipld-primeis the only implementation of IPLD in go that supports IPLD selectorsgo-graphsynctherefore relies ongo-ipld-primein its implementation and assumes the underlying node format for the DAG one is compatible withgo-ipld-prime(relevant in particular becausego-ipld-primedoes not currently support nodes encoded in protobuf)
go-ipld-prime is a very very different looking library than go-ipld-format and switching all or parts of IPFS to use it is a potentially large task.
IPFS Use Cases For Graphsync
This RFC is in part to identify potential use cases for GraphSync
The most obvious use case for Graphsync is working with UnixFS directories. GraphSync provides an efficient method for requesting a deeply nested path in a UnixFS directory. With augmentation, it might provide a more efficient way to transfer entire UnixFS directories without being as many roundtrips to traverse the directory and request more nodes.
There are potential other use cases anywhere the data that IPFS works with is in a DAG structure. We can use discussion in this issue to identify some of these use cases.
Integration Path, Questions, Challenges
UnixFS & GraphSync
As stated before, GraphSync relies on go-ipld-prime which currently does not support nodes that use an internal protobuf serialization format. UnixFS, at least in its v1 implementation, uses protobufs for serializing nodes.
Therefore, to integrate GraphSync with UnixFS, we would either need to augment go-ipld-prime with protobuf support OR we would need to first complete UnixFS v2, which is intended to be based on go-ipld-prime directly. Given that supporting protobufs in go-ipld-prime is non-trivial and potentially quite challenging, and moreover that having UnixFS v2 complete would unlock a number of potential features, it seems like it would much easier to simply prioritize UnixFS V2 and wait on its completion to integrate GraphSync.
Independent CoreAPI GraphSync
While not as potentially useful to users of IPFS, or most importantly package managers, it would be useful for real world testing to be able to make GraphSync queries in the real IPFS network from the IPFS Core API or the command line. Enabling GraphSync in the CoreAPI would also allow people writing applications on top of IPFS to potentially experiment with how GraphSync might unlock new types of uses for IPFS.
The path to CoreAPI GraphSync integration is potentially much shorter than integration in UnixFS-- it simply requires agreeing to a specification for what CoreAPI function signatures would look like, and then implementing them. There are no obvious blockers for proceeding with CoreAPI integration once go-graphsync is feature complete
Future Integrations / Supporting Providers/DHT Work
The ability to query into a graph from a root node might pair well with efforts to experiment with different strategies for providing only some nodes in a DAG. However, this probably will need to wait for further work on provider strategies.