Skip to content

Gateway: fast check if CID is in local datastore cache (only-if-cached) #8783

@lidel

Description

@lidel

Ecosystem context

Raw block and CAR stream Gateway response types are added in #8758 and https://github.com/ipfs/go-ipfs/issues/8769.

This unlocks exciting features to happen in IPFS ecosystem:

  • light clients that fetch data from multiple gateways in trustless fashion
    • Mobile web browsers with low impact on battery
    • IoT devices fetching firmware updates
    • etc.
  • "transport gateways"
    • easier to implement: no HTML hosting, only Block / CAR
    • lower risk, no automated DMCA takedowns

What is missing

Ability to do a quick test if data behind a CID is already present in Gateway's local cache

Essentially an equivalent for what we already can test in CLI/RPC API:

  • ipfs block get --offline /ipfs/{cid} → errors if block is not present locally
  • ipfs dag stat --offline /ipfs/{cid} → errors if full DAG is not present locally

I think doing this only per block should be enough – checking if entire DAG is present may be too expensive / overkill in practice.

Why we need it

  • Enables inexpensive checks that do not trigger data retrieval → reduced cost of running gateway
  • Light clients are able to use multiple gateways more efficiently → improved performance on the client
    • It is fair to assume light clients will have a gateway pool (list). Such client should be able to probe which gateway has the data in local cache, and is able to respond with it immediately (without hitting DHT/retrieval), and send GET request to one of them.

How to implement this

HTTP HEAD request is a good candidate. It does not return any payload, only HTTP headers.

Right now, HEAD request is being used for shallow preload of root blocks: depending on resource type, it usually triggers block fetch events along the requested content path up to the root block of the final path segment. We can't change this, because it works as expected – clients use HEAD to read Content-Length of unixfs files and raw blocks, and that is why root block has to be fetched if it is not present in the local datastore.

This means we need some additional flag to signal we want to do a local datastore check without triggering any additional work.

How to indicate "no-remote-fetch" when sending HTTP HEAD request for /ipfs/{cid}?

Perhaps RFC 7234#only-if-cached?

Cache-Control: only-if-cached could be used for requesting payload only if the gateway already has the data and can return it immediately. If data is not cached locally, and the response requires an expensive remote fetch, a 504 (Gateway Timeout) status code should be returned.

HEAD + Cache-Control: only-if-cached + optional Accept seem to cover the needs of light clients.

Metadata

Metadata

Assignees

Labels

kind/enhancementA net-new feature or improvement to an existing featuretopic/gatewayTopic gateway

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions