RFC: GraphSync integration

## Context

### What is GraphSync?

GraphSync is a protocol to synchronize IPLD graphs across peers. The full specification can be found here: https://github.com/ipld/specs/blob/master/graphsync/graphsync.md

Where Bitswap deals with requesting individual blocks from other peers, Graphsync assumes that the data you're interested in is node or nodes in an IPLD graph, also allows you to make requests to remote peers to return the results querying the graph using an [IPLD Selector](https://github.com/ipld/specs/blob/master/selectors/selectors.md) 

There are several potentially several use cases for requesting data this way, but perhaps the simplest to understand is attempting to get a single node at a deeply nested path. Using only Bitswap, getting to a deeply nested path means several roundtrips of making a query for a block, receiving it, following a link, then requesting the next block, and repeating until you reach the final link in the path. With Graphsync, you could make a query to a remote node using a path selector, and have that node perform the traversal of the path locally, then send you all the blocks in the path back at once, in a single roundtrip (it has to send all the blocks in the path so you're able to verify the results locally from the block you already have)

### Status of go-graphsync

[go-graphsync](https://github.com/ipfs/go-graphsync) is the initial implementation of GraphSync in go. It is nearing alpha 'feature complete' status as of April 2019, and will be ready for use around the beginning of May. `go-graphsync` was initially written to support use cases in Filecoin. Filecoin's integration however is not likely to begin until circa Q3 2019.

Important Caveat: The initial implementation of go-graphsync is entirely single peer to single peer -- a graphsync request is made directly to only one peer at a time. It assumes that you've already found providers for the graph. It assumes you know the peer you are requesting from has the data you want, or that you will write the code to query multiple peers outside of go-graphsync.

### Status of go-ipld-prime

[go-ipld-prime](https://github.com/ipld/go-ipld-prime) is a complete rewrite of the go implementation of the IPLD specification that the IPLD team has been working on for some time. It is relevant to GraphSync because:
1. `go-ipld-prime` is the only implementation of IPLD in go that supports IPLD selectors
2. `go-graphsync` therefore relies on `go-ipld-prime` in its implementation and assumes the underlying node format for the DAG one is compatible with `go-ipld-prime` (relevant in particular because `go-ipld-prime` does not currently support nodes encoded in protobuf)

`go-ipld-prime` is a very very different looking library than `go-ipld-format` and switching all or parts of IPFS to use it is a potentially large task. 

## IPFS Use Cases For Graphsync

This RFC is in part to identify potential use cases for GraphSync

The most obvious use case for Graphsync is working with UnixFS directories. GraphSync provides an efficient method for requesting a deeply nested path in a UnixFS directory. With augmentation, it might provide a more efficient way to transfer entire UnixFS directories without being as many roundtrips to traverse the directory and request more nodes.

There are potential other use cases anywhere the data that IPFS works with is in a DAG structure. We can use discussion in this issue to identify some of these use cases.

## Integration Path, Questions, Challenges

### UnixFS & GraphSync

As stated before, GraphSync relies on `go-ipld-prime` which currently does not support nodes that use an internal protobuf serialization format. UnixFS, at least in its v1 implementation, uses protobufs for serializing nodes.

Therefore, to integrate GraphSync with UnixFS, we would either need to augment `go-ipld-prime` with protobuf support OR we would need to first complete UnixFS v2, which is intended to be based on `go-ipld-prime` directly. Given that supporting protobufs in `go-ipld-prime` is non-trivial and potentially quite challenging, and moreover that having UnixFS v2 complete would unlock a number of potential features, it seems like it would much easier to simply prioritize UnixFS V2 and wait on its completion to integrate GraphSync.

### Independent CoreAPI GraphSync

While not as potentially useful to users of IPFS, or most importantly package managers, it would be useful for real world testing to be able to make GraphSync queries in the real IPFS network from the IPFS Core API or the command line. Enabling GraphSync in the CoreAPI would also allow people writing applications on top of IPFS to potentially experiment with how GraphSync might unlock new types of uses for IPFS.

The path to CoreAPI GraphSync integration is potentially much shorter than integration in UnixFS-- it simply requires agreeing to a specification for what CoreAPI function signatures would look like, and then implementing them. There are no obvious blockers for proceeding with CoreAPI integration once `go-graphsync` is feature complete

### Future Integrations / Supporting Providers/DHT Work

The ability to query into a graph from a root node might pair well with efforts to experiment with different strategies for providing only some nodes in a DAG. However, this probably will need to wait for further work on provider strategies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: GraphSync integration #6208

Context

What is GraphSync?

Status of go-graphsync

Status of go-ipld-prime

IPFS Use Cases For Graphsync

Integration Path, Questions, Challenges

UnixFS & GraphSync

Independent CoreAPI GraphSync

Future Integrations / Supporting Providers/DHT Work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC: GraphSync integration #6208

Description

Context

What is GraphSync?

Status of go-graphsync

Status of go-ipld-prime

IPFS Use Cases For Graphsync

Integration Path, Questions, Challenges

UnixFS & GraphSync

Independent CoreAPI GraphSync

Future Integrations / Supporting Providers/DHT Work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions