Skip to content

Spec for CIDs#530

Open
darobin wants to merge 10 commits intomainfrom
cid
Open

Spec for CIDs#530
darobin wants to merge 10 commits intomainfrom
cid

Conversation

@darobin
Copy link
Contributor

@darobin darobin commented Mar 12, 2026

No description provided.

@darobin darobin requested a review from lidel March 12, 2026 09:58
Robin Berjon and others added 3 commits March 12, 2026 11:58
Co-authored-by: Volker Mische <volker.mische@gmail.com>
Comment on lines +149 to +153
* If it's 34 bytes long with the leading bytes `[0x12, 0x20, ...]`, it's a CIDv0.
* The CID's multihash is `cid`.
* The CID's multicodec is DagProtobuf
* The CID's version is 0.
* Otherwise, let `N` be the first varint in `cid`. This is the CID's version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using byte length anywhere in the binary decoding path isn't ideal and not how we do it in practice so I wouldn't want encourage this.

What we actually do, for both binary paths, is read the first bytes as a varint, which tells us what to do next: if the value is 0x12 then that's sha2-256 and this is a CIDv0 cause we've discovered the multihash and we proceed accordingly, if the value is 0x01 then it's a CIDv1 and we proceed accordingly.

One minor edge is that the JS decoder will also read the value 0x00 and decode the rest as a CIDv0 but that's probably not a good idea.

Copy link
Member

@lidel lidel Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using byte length anywhere in the binary decoding path isn't ideal and not how we do it in practice

How so? I compared go-cid and js-multiformats:

Check go-cid (Kubo) js-multiformats (Helia)
String CIDv0 len == 46 && "Qm" prefix source[0] == 'Q' only
Binary CIDv0 [0x12, 0x20] two-byte check first varint == 18 only
Invalid version (if not CIDv0) rejects != 1 rejects != 0 && != 1

js-multiformats skips the 46-char length check for strings and the second byte (0x20) verification for binary CIDv0 detection.

Feels safer if the spec match the stricter go-cid behavior, especially in LLM times where machines will read this and generate code based on what we write here, no?

The js implementation may want to tighten its checks to match -- the missing 0x20 verification could theoretically cause formal check/audit issues (a future CID version that starts with 0x12 etc).

Copy link
Member

@lidel lidel Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab at clarifications in 3582173 -- lmk if i'm too strict/paranoid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my comment is on the byte decoding path, not the string decoding path which is different

# corresponding human readable CID
base58btc - cidv1 - raw - sha2-256-256-6e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95
```
See: https://cid.ipfs.io/#zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA
Copy link
Member

@lidel lidel Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darobin how strongly you feel about this entire section?

iiucc this is an invention that feels off as part of the main spec, it is just an opinionated way of presenting things in diagnostic tool at https://cid.ipfs.tech/, no?

My main pet peeve is putting too much trust in string representation that depends on things like "the name of the multicodec code", which is very brittle long term, because it suggests the names of codecs are permanent -- THEY ARE NOT, strings in the table.csv were changed many times in past, they are provisional hints at best.

Feels like removal is the best way to avoid headache, or we could move this to "Notes for implementers" appendix on how to -- any preference?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend moving it to a non-normative Implementations Notes section and explicitly call out the brittleness of the name column in multicodes table, personally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wouldn't keep anything that's not needed for interop in there.

FWIW, the point of this PR was to move the spec over, not to fix it 🤣 I would have suggested moving first, then doing separate PRs to fix, but I guess we're too deep in at this point!

lidel added 6 commits March 16, 2026 21:58
- lead binary CIDv0 detection with two-byte check [0x12, 0x20]
  instead of "34 bytes long with leading bytes"
- add explicit dag-pb (0x70) multicodec for CIDv0
- fix step numbering (both were "1.", text referenced "step 2")
- fix sub-bullet indentation for CommonMark nesting
- drop CIDv2/CIDv3 reservation, simplify to "otherwise malformed"
- clarify multibase is not part of CID, list SHOULD-support bases
- update FAQ tone from informal README to spec language
- add multicodec hex values for commonly used IPFS codecs
- use web.archive.org links for ipld.io specs
- pin libp2p spec link to specific commit
- link original CIDv1 RFC in historical design decisions
- comment out implementations list pending conformance review
- fix "multihash to hash content addressed" grammar
- remove double spaces (lines 45, 80)
- standardize ipfs.io to ipfs.tech
- rename stringified form production to <cidv1-str> to avoid
  shadowing the binary <cidv1> definition
add thanks entries for people who made substantive spec
content changes in the original multiformats/cid repo
- integrate multibase-is-not-part-of-CID note into section flow
- remove redundant "CID is fundamentally binary" restatement
- rename <multibase-codec> to <multibase-prefix> in grammar
- link multibase.csv table for prefix reference
- add explicit prefix characters for SHOULD-support bases
- note base choice depends on string length and case-sensitivity
- sha256 → sha2-256 to match multicodec table
- add numeric values: dag-pb (0x70), sha2-256 (0x12), cidv0 (0)
- clarify base58btc prefix z is not present in CIDv0 strings
- fix "follow the following" stutter
@ipfs ipfs deleted a comment from github-actions bot Mar 16, 2026
@github-actions
Copy link

🚀 Build Preview on IPFS ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants