Conversation
Co-authored-by: Volker Mische <volker.mische@gmail.com>
src/data-formats/cid.md
Outdated
| * If it's 34 bytes long with the leading bytes `[0x12, 0x20, ...]`, it's a CIDv0. | ||
| * The CID's multihash is `cid`. | ||
| * The CID's multicodec is DagProtobuf | ||
| * The CID's version is 0. | ||
| * Otherwise, let `N` be the first varint in `cid`. This is the CID's version. |
There was a problem hiding this comment.
Using byte length anywhere in the binary decoding path isn't ideal and not how we do it in practice so I wouldn't want encourage this.
What we actually do, for both binary paths, is read the first bytes as a varint, which tells us what to do next: if the value is 0x12 then that's sha2-256 and this is a CIDv0 cause we've discovered the multihash and we proceed accordingly, if the value is 0x01 then it's a CIDv1 and we proceed accordingly.
One minor edge is that the JS decoder will also read the value 0x00 and decode the rest as a CIDv0 but that's probably not a good idea.
There was a problem hiding this comment.
Using byte length anywhere in the binary decoding path isn't ideal and not how we do it in practice
How so? I compared go-cid and js-multiformats:
| Check | go-cid (Kubo) | js-multiformats (Helia) |
|---|---|---|
| String CIDv0 | len == 46 && "Qm" prefix |
source[0] == 'Q' only |
| Binary CIDv0 | [0x12, 0x20] two-byte check |
first varint == 18 only |
| Invalid version | (if not CIDv0) rejects != 1 |
rejects != 0 && != 1 |
js-multiformats skips the 46-char length check for strings and the second byte (0x20) verification for binary CIDv0 detection.
Feels safer if the spec match the stricter go-cid behavior, especially in LLM times where machines will read this and generate code based on what we write here, no?
The js implementation may want to tighten its checks to match -- the missing 0x20 verification could theoretically cause formal check/audit issues (a future CID version that starts with 0x12 etc).
There was a problem hiding this comment.
Took a stab at clarifications in 3582173 -- lmk if i'm too strict/paranoid
There was a problem hiding this comment.
my comment is on the byte decoding path, not the string decoding path which is different
| # corresponding human readable CID | ||
| base58btc - cidv1 - raw - sha2-256-256-6e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95 | ||
| ``` | ||
| See: https://cid.ipfs.io/#zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA |
There was a problem hiding this comment.
@darobin how strongly you feel about this entire section?
iiucc this is an invention that feels off as part of the main spec, it is just an opinionated way of presenting things in diagnostic tool at https://cid.ipfs.tech/, no?
My main pet peeve is putting too much trust in string representation that depends on things like "the name of the multicodec code", which is very brittle long term, because it suggests the names of codecs are permanent -- THEY ARE NOT, strings in the table.csv were changed many times in past, they are provisional hints at best.
Feels like removal is the best way to avoid headache, or we could move this to "Notes for implementers" appendix on how to -- any preference?
There was a problem hiding this comment.
I would recommend moving it to a non-normative Implementations Notes section and explicitly call out the brittleness of the name column in multicodes table, personally
There was a problem hiding this comment.
Yeah, I wouldn't keep anything that's not needed for interop in there.
FWIW, the point of this PR was to move the spec over, not to fix it 🤣 I would have suggested moving first, then doing separate PRs to fix, but I guess we're too deep in at this point!
- lead binary CIDv0 detection with two-byte check [0x12, 0x20] instead of "34 bytes long with leading bytes" - add explicit dag-pb (0x70) multicodec for CIDv0 - fix step numbering (both were "1.", text referenced "step 2") - fix sub-bullet indentation for CommonMark nesting - drop CIDv2/CIDv3 reservation, simplify to "otherwise malformed"
- clarify multibase is not part of CID, list SHOULD-support bases - update FAQ tone from informal README to spec language - add multicodec hex values for commonly used IPFS codecs - use web.archive.org links for ipld.io specs - pin libp2p spec link to specific commit - link original CIDv1 RFC in historical design decisions - comment out implementations list pending conformance review
- fix "multihash to hash content addressed" grammar - remove double spaces (lines 45, 80) - standardize ipfs.io to ipfs.tech - rename stringified form production to <cidv1-str> to avoid shadowing the binary <cidv1> definition
add thanks entries for people who made substantive spec content changes in the original multiformats/cid repo
- integrate multibase-is-not-part-of-CID note into section flow - remove redundant "CID is fundamentally binary" restatement - rename <multibase-codec> to <multibase-prefix> in grammar - link multibase.csv table for prefix reference - add explicit prefix characters for SHOULD-support bases - note base choice depends on string length and case-sensitivity
- sha256 → sha2-256 to match multicodec table - add numeric values: dag-pb (0x70), sha2-256 (0x12), cidv0 (0) - clarify base58btc prefix z is not present in CIDv0 strings - fix "follow the following" stutter
🚀 Build Preview on IPFS ready
|
No description provided.