feat!: add hash_kind to TreeRefIter and Data#2497
Conversation
Sebastian Thiel (Byron)
left a comment
There was a problem hiding this comment.
Thanks a lot! I think it's great to see that overall, passing the hash kind through isn't crazy. I also left a few comments which I hope can lead to a small direction change.
hash_len to TreeRefIter and Datahash_kind to TreeRefIter and Data
|
Sebastian Thiel (@Byron) I made some changes based on your feedback. The PR contains a lot of small commits now, feel free to shuffle them around as you see fit! The fuzzer seems to have found something, though at first sight, I wasn’t sure what to make of that finding. I’ll try figuring out what this is about, but maybe you’re more familiar with it and can quickly trace it back to one of my changes. I added one |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7383f0210e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
3586649 to
3430e49
Compare
Co-authored-by: Sebastian Thiel <sebastian.thiel@icloud.com>
22a23da to
a9ed60e
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates gitoxide’s object decoding to correctly handle non-SHA1 repositories by making tree entry hash sizes depend on the decoded tree’s hash kind (e.g., 32 bytes for SHA-256 instead of a hardcoded 20). This propagates hash_kind through core borrowed-object APIs (TreeRefIter, TreeRef::from_bytes, Data, and ObjectRef parsing) and updates downstream callers and tests accordingly.
Changes:
- Add
hash_kindtogix_object::TreeRefIterandgix_object::Data, and thread it through tree decoding/iteration and object decoding entrypoints. - Update call sites across
gix,gix-pack,gix-odb,gix-ref,gix-merge, andgitoxide-coreto provide the correct hash kind. - Extend/adjust tests and fixtures (including SHA-256 fixtures) and make diff test snapshots hash-agnostic.
Reviewed changes
Copilot reviewed 46 out of 50 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| justfile | Runs gix-diff-tests with SHA-256 fixtures in unit-tests recipe. |
| gix/tests/gix/object/tree/mod.rs | Adds regression test ensuring tree decode uses the tree id’s hash kind. |
| gix/src/repository/impls.rs | Populates hash_kind when producing empty-tree Data. |
| gix/src/repository/diff.rs | Passes tree id hash kind into TreeRefIter::from_bytes() for diffing. |
| gix/src/object/tree/traverse.rs | Passes root tree id hash kind into traversal iterator. |
| gix/src/object/tree/mod.rs | Uses tree id hash kind for decode/iteration and for Data::new() during lookups. |
| gix/src/object/tree/diff/for_each.rs | Passes per-tree hash kind into diff iterator construction. |
| gix/src/object/mod.rs | Constructs gix_object::Data with id-derived hash kind for decoding helpers. |
| gix-traverse/src/tree/depthfirst.rs | Supplies hash kind to TreeRefIter::from_bytes() during traversal. |
| gix-ref/tests/refs/file/mod.rs | Test Find impl now sets Data.hash_kind from requested id. |
| gix-ref/src/store/file/raw_ext.rs | Adjusts Data destructuring for added hash_kind field. |
| gix-pack/tests/pack/data/input.rs | Test objects and Find impl now set hash_kind from env/id. |
| gix-pack/src/index/verify.rs | Uses new ObjectRef::from_bytes(data, kind, hash_kind) signature. |
| gix-pack/src/index/traverse/mod.rs | Builds Data with id-derived hash kind before checksum verification. |
| gix-pack/src/data/output/count/objects/mod.rs | Uses obj.hash_kind when iterating trees. |
| gix-pack/src/bundle/find.rs | Sets Data.hash_kind from pack object hash. |
| gix-odb/src/store_impls/loose/find.rs | Sets Data.hash_kind from requested id. |
| gix-odb/src/store_impls/dynamic/find.rs | Sets Data.hash_kind from pack object hash in dynamic store. |
| gix-odb/src/memory.rs | Sets Data.hash_kind from requested id. |
| gix-odb/src/cache.rs | Creates cached Data with id.kind() as hash kind. |
| gix-object/tests/object/tree/iter.rs | Updates tree iterator tests to supply hash kind (from env). |
| gix-object/tests/object/tree/from_bytes.rs | Updates tree decode tests to supply hash kind (from env). |
| gix-object/tests/object/tree/entries.rs | Updates tree parsing tests to supply hash kind (from env). |
| gix-object/tests/object/tree/editor.rs | Test Find impl now sets Data.hash_kind from requested id. |
| gix-object/tests/object/tag.rs | Adds test for invalid target id length in tag parsing. |
| gix-object/tests/object/object_ref.rs | Updates ObjectRef::from_loose() tests for new signature. |
| gix-object/tests/object/encode.rs | Adds hash-kind-aware roundtrip macro and updates tree roundtrip. |
| gix-object/tests/object/commit/mod.rs | Adds test for invalid object id length in commit parsing. |
| gix-object/src/tree/ref_iter.rs | Core change: TreeRefIter carries hash kind; tree decoding uses hash-len parameter. |
| gix-object/src/parse.rs | Tightens hex_hash parsing to only accept lengths matching enabled hash kinds. |
| gix-object/src/object/mod.rs | ObjectRef::from_loose/from_bytes now require hash kind to decode trees correctly. |
| gix-object/src/lib.rs | Updates docs and adds hash_kind fields to TreeRefIter and Data structs. |
| gix-object/src/data.rs | Data::new() signature changed to include hash kind; tree decode uses it. |
| gix-object/fuzz/fuzz_targets/fuzz_tree.rs | Fuzzes tree decoding for both SHA-1 and SHA-256 hash kinds. |
| gix-object/fuzz/Cargo.toml | Enables gix-hash sha1+sha256 in fuzz crate dependencies. |
| gix-object/benches/edit_tree.rs | Bench Find impl now sets Data.hash_kind. |
| gix-object/benches/decode_objects.rs | Bench uses env-selected hash kind for tree parsing benchmarks. |
| gix-merge/tests/merge/blob/mod.rs | Test Find impl now sets Data.hash_kind from requested id. |
| gix-merge/src/tree/function.rs | Supplies base tree hash kind to TreeRefIter::from_bytes(). |
| gix-diff/tests/fixtures/generated-archives/make_blob_repo_sha256.tar | Adds SHA-256 fixture archive for diff tests. |
| gix-diff/tests/fixtures/generated-archives/.gitignore | Ignores newly added SHA-256 generated fixture tar. |
| gix-diff/tests/diff/tree_with_rewrites.rs | Makes snapshots hash-agnostic and opens ODB with correct object hash kind. |
| gix-diff/tests/diff/tree.rs | Uses hash-agnostic assertions and shared ODB-opening helper. |
| gix-diff/tests/diff/main.rs | Adds fixture hash-kind detection, ODB open helper, and snapshot normalization utilities. |
| gix-diff/tests/diff/index.rs | Updates snapshots/assertions and expected ids for SHA-1 vs SHA-256 fixtures. |
| gix-diff/tests/Cargo.toml | Enables gix-hash sha1+sha256 features for diff test crate. |
| gitoxide-core/src/index/checkout.rs | Ensures Data.hash_kind is set in custom Find implementations. |
|
Thanks so much! Please note that I accidentally added |
No worries! I’ve created #2532 as a small follow-up. |
Update call sites for gix 0.83's API: `SignatureRef::from_bytes` now returns a `Result` rather than an `Option` [1], and `TreeRefIter::from_bytes` takes the object hash kind [2]. The crate sets `default-features = false` on gix, so also enable the `sha1` feature, which `gix-hash` no longer enables by default [3]. For `TreeRefIter::from_bytes`, pass the hash kind of the tree id already in hand (`id.kind()`), matching how gitoxide's own tree API derives it. Make the `Item` size guard robust to the enabled hash features instead of asserting an absolute byte count. `gix::ObjectId` is a SHA-1/SHA-256 enum whose width depends on which `gix-hash` features are unified into the build; the gix-testtools dev-dependency enables `gix-hash/sha256` in test builds, widening it there but not in production. Bound only the part we own rather than the gix-controlled fields. [1]: GitoxideLabs/gitoxide@91c854e (GitoxideLabs/gitoxide#2546) [2]: GitoxideLabs/gitoxide@3b156e8 (GitoxideLabs/gitoxide#2497) [3]: GitoxideLabs/gitoxide@1019b42 (GitoxideLabs/gitoxide#2441) Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com>
Update call sites for gix 0.83's API: `SignatureRef::from_bytes` now returns a `Result` rather than an `Option` [1], and `TreeRefIter::from_bytes` takes the object hash kind [2]. The crate sets `default-features = false` on gix, so also enable the `sha1` feature, which `gix-hash` no longer enables by default [3]. For `TreeRefIter::from_bytes`, pass the hash kind of the tree id already in hand (`id.kind()`), matching how gitoxide's own tree API derives it. Update the `Item` size guard from 200 to 240 bytes. The gix-testtools dev-dependency enables `gix-hash/sha256` in test builds (where this guard runs), so `gix::ObjectId` is its wider SHA-1/SHA-256 enum (33 bytes) and `Item` holds three of them; a production (sha1-only) build is ~200. [1]: GitoxideLabs/gitoxide@91c854e (GitoxideLabs/gitoxide#2546) [2]: GitoxideLabs/gitoxide@3b156e8 (GitoxideLabs/gitoxide#2497) [3]: GitoxideLabs/gitoxide@1019b42 (GitoxideLabs/gitoxide#2441) Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update call sites for gix 0.83's API: `SignatureRef::from_bytes` now returns a `Result` rather than an `Option` [1], and `TreeRefIter::from_bytes` takes the object hash kind [2]. The crate sets `default-features = false` on gix, so also enable the `sha1` feature, which `gix-hash` no longer enables by default [3]. For `TreeRefIter::from_bytes`, pass the hash kind of the tree id already in hand (`id.kind()`), matching how gitoxide's own tree API derives it. Update the `Item` size guard from 200 to 240 bytes. The gix-testtools dev-dependency enables `gix-hash/sha256` in test builds (where this guard runs), so `gix::ObjectId` is its wider SHA-1/SHA-256 enum (33 bytes) and `Item` holds three of them; a production (sha1-only) build is ~200. [1]: GitoxideLabs/gitoxide#2546 [2]: GitoxideLabs/gitoxide@3b156e8 (GitoxideLabs/gitoxide#2497) [3]: GitoxideLabs/gitoxide@1019b42 (GitoxideLabs/gitoxide#2441) Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update call sites for gix 0.83's API: `SignatureRef::from_bytes` now returns a `Result` rather than an `Option` [1], and `TreeRefIter::from_bytes` takes the object hash kind [2]. The crate sets `default-features = false` on gix, so also enable the `sha1` feature, which `gix-hash` no longer enables by default [3]. For `TreeRefIter::from_bytes`, pass the hash kind of the tree id already in hand (`id.kind()`), matching how gitoxide's own tree API derives it. Update the `Item` size guard from 200 to 240 bytes. The gix-testtools dev-dependency enables `gix-hash/sha256` in test builds (where this guard runs), so `gix::ObjectId` is its wider SHA-1/SHA-256 enum (33 bytes) and `Item` holds three of them; a production (sha1-only) build is ~200. [1]: GitoxideLabs/gitoxide#2546 [2]: GitoxideLabs/gitoxide@3b156e8 (GitoxideLabs/gitoxide#2497) [3]: GitoxideLabs/gitoxide@1019b42 (GitoxideLabs/gitoxide#2441) Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update call sites for gix 0.83's API: `SignatureRef::from_bytes` now returns a `Result` rather than an `Option` [1], and `TreeRefIter::from_bytes` takes the object hash kind [2]. The crate sets `default-features = false` on gix, so also enable the `sha1` feature, which `gix-hash` no longer enables by default [3]. For `TreeRefIter::from_bytes`, pass the hash kind of the tree id already in hand (`id.kind()`), matching how gitoxide's own tree API derives it. Update the `Item` size guard from 200 to 240 bytes. The gix-testtools dev-dependency enables `gix-hash/sha256` in test builds (where this guard runs), so `gix::ObjectId` is its wider SHA-1/SHA-256 enum (33 bytes) and `Item` holds three of them; a production (sha1-only) build is ~200. [1]: GitoxideLabs/gitoxide@b060eb2, GitoxideLabs/gitoxide@91c854e (GitoxideLabs/gitoxide#2546) [2]: GitoxideLabs/gitoxide@3b156e8 (GitoxideLabs/gitoxide#2497) [3]: GitoxideLabs/gitoxide@1019b42 (GitoxideLabs/gitoxide#2441) Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update call sites for gix 0.83's API: `SignatureRef::from_bytes` now returns a `Result` rather than an `Option` [1], and `TreeRefIter::from_bytes` takes the object hash kind [2]. The crate sets `default-features = false` on gix, so also enable the `sha1` feature, which `gix-hash` no longer enables by default [3]. For `TreeRefIter::from_bytes`, pass the hash kind of the tree id already in hand (`id.kind()`), matching how gitoxide's own tree API derives it. Update the `Item` size guard from 200 to 240 bytes. The gix-testtools dev-dependency enables `gix-hash/sha256` in test builds (where this guard runs), so `gix::ObjectId` is its wider SHA-1/SHA-256 enum (33 bytes) and `Item` holds three of them; a production (sha1-only) build is ~200. [1]: GitoxideLabs/gitoxide@b060eb2, GitoxideLabs/gitoxide@91c854e (GitoxideLabs/gitoxide#2546) [2]: GitoxideLabs/gitoxide@3b156e8 (GitoxideLabs/gitoxide#2497) [3]: GitoxideLabs/gitoxide@1019b42 (GitoxideLabs/gitoxide#2441) Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update call sites for gix 0.83's API: `SignatureRef::from_bytes` is no longer generic over the error type and returns a concrete-error `Result` [1], and `TreeRefIter::from_bytes` takes the object hash kind [2]. The crate sets `default-features = false` on gix, so also enable the `sha1` feature, which `gix-hash` no longer enables by default [3]. For `TreeRefIter::from_bytes`, pass the hash kind of the tree id already in hand (`id.kind()`), matching how gitoxide's own tree API derives it. Update the `Item` size guard from 200 to 240 bytes. The gix-testtools dev-dependency enables `gix-hash/sha256` in test builds (where this guard runs), so `gix::ObjectId` is its wider SHA-1/SHA-256 enum (33 bytes) and `Item` holds three of them; a production (sha1-only) build is ~200. [1]: GitoxideLabs/gitoxide@b060eb2, GitoxideLabs/gitoxide@91c854e (GitoxideLabs/gitoxide#2546) [2]: GitoxideLabs/gitoxide@3b156e8 (GitoxideLabs/gitoxide#2497) [3]: GitoxideLabs/gitoxide@1019b42 (GitoxideLabs/gitoxide#2441) Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This PR is an attempt at making the hash length in
gix_object::tree::ref_iter::decodedepend on the decoded tree’s hash’s size. Previously, it was hardcoded to 20 which is only correct for SHA1, but not for SHA256.There were quite a few follow-up changes as
TreeRef::from_bytesas well asTreeRefIter::from_bytesnow takehash_kindas an argument. Also, sinceTreeRefcan be instantiated inData::decode,Datanow also has to carryhash_kind. Another option that I didn’t explore would have been to addhash_kindas an argument toData::decode.Update 2026-04-04: I changed the initial addition of
hash_lentohash_kind. Tests are now passing, with the exception of the fuzzer having found an issue.Review Tasks