Skip to content

RFC: GraphSync integration #6208

@hannahhoward

Description

@hannahhoward

Context

What is GraphSync?

GraphSync is a protocol to synchronize IPLD graphs across peers. The full specification can be found here: https://github.com/ipld/specs/blob/master/graphsync/graphsync.md

Where Bitswap deals with requesting individual blocks from other peers, Graphsync assumes that the data you're interested in is node or nodes in an IPLD graph, also allows you to make requests to remote peers to return the results querying the graph using an IPLD Selector

There are several potentially several use cases for requesting data this way, but perhaps the simplest to understand is attempting to get a single node at a deeply nested path. Using only Bitswap, getting to a deeply nested path means several roundtrips of making a query for a block, receiving it, following a link, then requesting the next block, and repeating until you reach the final link in the path. With Graphsync, you could make a query to a remote node using a path selector, and have that node perform the traversal of the path locally, then send you all the blocks in the path back at once, in a single roundtrip (it has to send all the blocks in the path so you're able to verify the results locally from the block you already have)

Status of go-graphsync

go-graphsync is the initial implementation of GraphSync in go. It is nearing alpha 'feature complete' status as of April 2019, and will be ready for use around the beginning of May. go-graphsync was initially written to support use cases in Filecoin. Filecoin's integration however is not likely to begin until circa Q3 2019.

Important Caveat: The initial implementation of go-graphsync is entirely single peer to single peer -- a graphsync request is made directly to only one peer at a time. It assumes that you've already found providers for the graph. It assumes you know the peer you are requesting from has the data you want, or that you will write the code to query multiple peers outside of go-graphsync.

Status of go-ipld-prime

go-ipld-prime is a complete rewrite of the go implementation of the IPLD specification that the IPLD team has been working on for some time. It is relevant to GraphSync because:

  1. go-ipld-prime is the only implementation of IPLD in go that supports IPLD selectors
  2. go-graphsync therefore relies on go-ipld-prime in its implementation and assumes the underlying node format for the DAG one is compatible with go-ipld-prime (relevant in particular because go-ipld-prime does not currently support nodes encoded in protobuf)

go-ipld-prime is a very very different looking library than go-ipld-format and switching all or parts of IPFS to use it is a potentially large task.

IPFS Use Cases For Graphsync

This RFC is in part to identify potential use cases for GraphSync

The most obvious use case for Graphsync is working with UnixFS directories. GraphSync provides an efficient method for requesting a deeply nested path in a UnixFS directory. With augmentation, it might provide a more efficient way to transfer entire UnixFS directories without being as many roundtrips to traverse the directory and request more nodes.

There are potential other use cases anywhere the data that IPFS works with is in a DAG structure. We can use discussion in this issue to identify some of these use cases.

Integration Path, Questions, Challenges

UnixFS & GraphSync

As stated before, GraphSync relies on go-ipld-prime which currently does not support nodes that use an internal protobuf serialization format. UnixFS, at least in its v1 implementation, uses protobufs for serializing nodes.

Therefore, to integrate GraphSync with UnixFS, we would either need to augment go-ipld-prime with protobuf support OR we would need to first complete UnixFS v2, which is intended to be based on go-ipld-prime directly. Given that supporting protobufs in go-ipld-prime is non-trivial and potentially quite challenging, and moreover that having UnixFS v2 complete would unlock a number of potential features, it seems like it would much easier to simply prioritize UnixFS V2 and wait on its completion to integrate GraphSync.

Independent CoreAPI GraphSync

While not as potentially useful to users of IPFS, or most importantly package managers, it would be useful for real world testing to be able to make GraphSync queries in the real IPFS network from the IPFS Core API or the command line. Enabling GraphSync in the CoreAPI would also allow people writing applications on top of IPFS to potentially experiment with how GraphSync might unlock new types of uses for IPFS.

The path to CoreAPI GraphSync integration is potentially much shorter than integration in UnixFS-- it simply requires agreeing to a specification for what CoreAPI function signatures would look like, and then implementing them. There are no obvious blockers for proceeding with CoreAPI integration once go-graphsync is feature complete

Future Integrations / Supporting Providers/DHT Work

The ability to query into a graph from a root node might pair well with efforts to experiment with different strategies for providing only some nodes in a DAG. However, this probably will need to wait for further work on provider strategies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions