Ecosystem context
Raw block and CAR stream Gateway response types are added in #8758 and https://github.com/ipfs/go-ipfs/issues/8769.
This unlocks exciting features to happen in IPFS ecosystem:
- light clients that fetch data from multiple gateways in trustless fashion
- Mobile web browsers with low impact on battery
- IoT devices fetching firmware updates
- etc.
- "transport gateways"
- easier to implement: no HTML hosting, only Block / CAR
- lower risk, no automated DMCA takedowns
What is missing
Ability to do a quick test if data behind a CID is already present in Gateway's local cache
Essentially an equivalent for what we already can test in CLI/RPC API:
ipfs block get --offline /ipfs/{cid} → errors if block is not present locally
ipfs dag stat --offline /ipfs/{cid} → errors if full DAG is not present locally
I think doing this only per block should be enough – checking if entire DAG is present may be too expensive / overkill in practice.
Why we need it
- Enables inexpensive checks that do not trigger data retrieval → reduced cost of running gateway
- Light clients are able to use multiple gateways more efficiently → improved performance on the client
- It is fair to assume light clients will have a gateway pool (list). Such client should be able to probe which gateway has the data in local cache, and is able to respond with it immediately (without hitting DHT/retrieval), and send GET request to one of them.
How to implement this
HTTP HEAD request is a good candidate. It does not return any payload, only HTTP headers.
Right now, HEAD request is being used for shallow preload of root blocks: depending on resource type, it usually triggers block fetch events along the requested content path up to the root block of the final path segment. We can't change this, because it works as expected – clients use HEAD to read Content-Length of unixfs files and raw blocks, and that is why root block has to be fetched if it is not present in the local datastore.
This means we need some additional flag to signal we want to do a local datastore check without triggering any additional work.
How to indicate "no-remote-fetch" when sending HTTP HEAD request for /ipfs/{cid}?
Perhaps RFC 7234#only-if-cached?
Cache-Control: only-if-cached could be used for requesting payload only if the gateway already has the data and can return it immediately. If data is not cached locally, and the response requires an expensive remote fetch, a 504 (Gateway Timeout) status code should be returned.
HEAD + Cache-Control: only-if-cached + optional Accept seem to cover the needs of light clients.
Ecosystem context
Raw block and CAR stream Gateway response types are added in #8758 and https://github.com/ipfs/go-ipfs/issues/8769.
This unlocks exciting features to happen in IPFS ecosystem:
What is missing
Ability to do a quick test if data behind a CID is already present in Gateway's local cache
Essentially an equivalent for what we already can test in CLI/RPC API:
ipfs block get --offline /ipfs/{cid}→ errors if block is not present locallyipfs dag stat --offline /ipfs/{cid}→ errors if full DAG is not present locallyI think doing this only per block should be enough – checking if entire DAG is present may be too expensive / overkill in practice.
Why we need it
How to implement this
HTTP HEADrequest is a good candidate. It does not return any payload, only HTTP headers.Right now,
HEADrequest is being used for shallow preload of root blocks: depending on resource type, it usually triggers block fetch events along the requested content path up to the root block of the final path segment. We can't change this, because it works as expected – clients use HEAD to readContent-Lengthof unixfs files and raw blocks, and that is why root block has to be fetched if it is not present in the local datastore.This means we need some additional flag to signal we want to do a local datastore check without triggering any additional work.
How to indicate "no-remote-fetch" when sending HTTP HEAD request for
/ipfs/{cid}?Perhaps RFC 7234#only-if-cached?
Cache-Control: only-if-cachedcould be used for requesting payload only if the gateway already has the data and can return it immediately. If data is not cached locally, and the response requires an expensive remote fetch, a 504 (Gateway Timeout) status code should be returned.HEAD+Cache-Control: only-if-cached+ optionalAcceptseem to cover the needs of light clients.