Reftables support#7117
Conversation
fba6f0e to
b122fb0
Compare
|
Any news on this PR? |
@csware I guess the single biggest thing that these PRs need is review. If you have the time to help review I think that would help quite a bit to move this forward. Other than that I also plan to have some of the ICs in my team at GitLab review those PRs in the next Git release cycle. |
|
@csware Thanks for your initial feedback! One note though: I'd recommend to review the prerequisite PRs listed in the description first. This PR here is basically a merge of all of these plus the reftable backend. |
In our util headers we expose the `GIT_INLINE()` macro to declare a function as inline, but we don't provide a way to access the keyword directly. Expose a new `GIT_INLINE_KEYWORD` to fill this gap. This define will be used in a subsequent commit.
ba8f5de to
4521682
Compare
Import the reftable library from commit 4fee6ff3b2 (Merge branch 'ps/reftable-portability', 2026-04-08). This is an exact copy of the reftable library. The library will be wired into libgit2 over the next couple of commits.
4521682 to
749b5ee
Compare
|
The patch series that makes the reftable library more portable has been merged upstream, so I've updated the pull request with the exact version that we have in upstream Git now. |
|
Some questions/comments from me. I mostly focused on the reftable code and not the infrastructure around it. Thanks. |
|
|
||
| #define poll(fds, fds_len, timeout) p_poll(fds, fds_len, timeout) | ||
|
|
||
| #define REFTABLE_ALLOW_BANNED_ALLOCATORS |
There was a problem hiding this comment.
I couldn't understand what this header is for?
There was a problem hiding this comment.
@knayak-gitlab Do you mean the header specifically, or this specific define here?
There was a problem hiding this comment.
This particular define. Nvm though I didn't check the other files. All good here.
| uint32_t reftable_rand(void) | ||
| { | ||
| return git_rand(CSPRNG_BYTES_INSECURE); | ||
| return rand(); |
There was a problem hiding this comment.
Shouldn't we use git_rand_next() here?
There was a problem hiding this comment.
Is it because of the type? but that shouldn't matter for when converting uint64_t to uint32_t for a random number no?
There was a problem hiding this comment.
@knayak-gitlab It's mostly that I missed git_rand_next(). Will adapt.
|
|
||
| *out = NULL; | ||
|
|
||
| #ifdef GIT_EXPERIMENTAL_SHA256 |
There was a problem hiding this comment.
So sha256 support itself is experimental in libgit2?
| if ((error = refdb_reftable_check_ref(data->stack, data->ref->name, data->expected_oid, data->expected_target)) < 0) { | ||
| data->error = error; | ||
| git_error_set(GIT_ERROR_REFERENCE, "old reference value does not match"); | ||
| goto out; | ||
| } | ||
|
|
||
| if ((error = refdb_reftable_check_refname_available(data->stack, NULL, data->ref->name, data->force)) < 0) { | ||
| data->error = error; | ||
| goto out; | ||
| } |
There was a problem hiding this comment.
Both these function independently read the ref from the stack, wouldn't it be better if we supplied it?
|
|
||
| /* We do not (yet?) support renames across different worktree stacks. */ | ||
| if (git_repository_is_worktree(backend->repo) && | ||
| git_reference__is_per_worktree_ref(old_name) != git_reference__is_per_worktree_ref(new_name)) { |
There was a problem hiding this comment.
But this doesn't validate against two different worktrees, it only ensures both the names are worktree refs.
There was a problem hiding this comment.
@knayak-gitlab Hm, right. As far as I can see this can only be the case when we have two named worktree refs (refs/worktree/) that name different worktrees. So something like this should address this, right?
diff --git a/src/libgit2/refdb_reftable.c b/src/libgit2/refdb_reftable.c
index 8b0219a38..e9013f739 100644
--- a/src/libgit2/refdb_reftable.c
+++ b/src/libgit2/refdb_reftable.c
@@ -1311,13 +1311,46 @@ static int refdb_reftable_rename(git_reference **out,
data.out = out;
data.error = 0;
- /* We do not (yet?) support renames across different worktree stacks. */
+ /*
+ * Are we in a worktree, but the references aren't both per-worktree
+ * refs? In that case they cannot be part of the same worktree, but we
+ * don't support cross-worktree renames yet.
+ */
if (git_repository_is_worktree(backend->repo) &&
git_reference__is_per_worktree_ref(old_name) != git_reference__is_per_worktree_ref(new_name)) {
error = GIT_EINVALID;
goto out;
}
+ /*
+ * Otherwise, are both refs named worktree refs, but for different
+ * worktrees? If so, we need to bail out, as well.
+ */
+ if (!git__prefixcmp(old_name, "refs/worktree/") &&
+ !git__prefixcmp(new_name, "refs/worktree/")) {
+ const char *old_wtname_end = strchr(old_name + strlen("refs/worktree/"), '/');
+ const char *new_wtname_end = strchr(new_name + strlen("refs/worktree/"), '/');
+ size_t old_prefix_len, new_prefix_len;
+
+ if (!old_wtname_end || !new_wtname_end) {
+ error = GIT_EINVALID;
+ goto out;
+ }
+
+ old_prefix_len = p_strnlen(old_name, old_wtname_end - old_name);
+ new_prefix_len = p_strnlen(new_name, new_wtname_end - new_name);
+
+ if (old_prefix_len != new_prefix_len) {
+ error = GIT_EINVALID;
+ goto out;
+ }
+
+ if (git__strncmp(old_name, new_name, old_prefix_len)) {
+ error = GIT_EINVALID;
+ goto out;
+ }
+ }
+
if ((error = refdb_reftable_stack_for_refname(&data.stack, backend, old_name)) < 0)
goto out;
| creation = old_log; | ||
| git__free(creation.refname); | ||
| creation.refname = git__strdup(new_name); | ||
| memset(&old_log, 0, sizeof(old_log)); |
There was a problem hiding this comment.
Why do we need to memset only during rename? Why do we need to memset at all?
There was a problem hiding this comment.
@knayak-gitlab The idea here is that we're passing ownership of the old_log over to the updates array. I'll restructure this a bit and add a comment to explain.
| if ((error = refdb_reftable_stack_for(&stack, backend, REFDB_REFTABLE_STACK_MAIN)) < 0) | ||
| goto out; | ||
|
|
||
| if ((error = reftable_stack_compact_all(stack->stack, NULL)) < 0) { |
There was a problem hiding this comment.
So no way to do geometric compression?
There was a problem hiding this comment.
@knayak-gitlab I guess this can be added to the interfaces at a later point.
749b5ee to
6246f77
Compare
|
@knayak-gitlab Thanks for your review, back to you! 🏓 |
While the reftable library is mostly decoupled from the Git codebase, some specific functionality intentionally taps into Git subsystems. To make porting of the reftable library easier all of these are contained in "system.h" and "system.c". Reimplement those compatibility shims so that they work for libgit2.
Wire up the reftable library with CMake. At the current point in time the library is not yet plugged into libgit2 itself.
Extract the function `is_per_worktree_ref()` from the "files" backend and expose it via "refs.h". This function will be reused by the "reftable" backend.
Implement the reftable backend that is used to read and write reftables. The backend is not yet used anywhere after this commit.
Wire up the "reftable" storage format via the newly implemented reftable backend.
Generate test resources for reftables. These resources are basically the
Git repositories we already have, but converted to use the "reftable"
format. For most of the part, this conversion is done by executing `git
refs migrate`.
A couple notes:
- This require a recent Git upstream version with not-yet-upstreamed
patches due to a bug in `git refs migrate` with reflogs.
- The migration command does not yet support repositories with
worktrees. Those were converted by first removing the worktrees,
migrating the refs and then recreating them.
- The HEAD_TRACKER reference in testrepo.git is not recognized as a
root ref and is thus not automatically migrated.
- testrepo.git has an empty reflog for refs/heads/with-empty-log that
does not get migrated.
Wire up reftable tests so that they can be executed by setting the `CLAR_REF_FORMAT` environment variable. This only catches tests that use `cl_git_sandbox_init()`, but that should cover most of our tests. So this infrastructure isn't perfect, but for now it's good enough. We may want to iterate on it in the future.
Wire up three new CI jobs that exercise libgit2 with the reftable backend.
6246f77 to
035634f
Compare
The changes look good now! Thanks. |
|
I have a project that is blocked on this PR landing. In short - users who initialize repos with I am happy to test my project against a branch build if that would help. I am also willing to contribute an AI code review (Opus + GPT 5.5 + Gemini working as a “crew”) if that would be of help. Thanks to @pks-gitlab for carrying this through multiple iterations. |
|
Thanks @pks-gitlab for pushing on this! |
## Summary - Remove the `git2` crate and its C dependencies (`libgit2-sys`, `libz-sys`), replacing all remaining usage with the git CLI and other previously used rust crates - Replace `git2::Patch` in `buffer_diff` with `imara-diff` (used elsewhere in the codebase for word diffing) - Replace `git2::Oid` with `[u8; 20]` and the `hex` crate - Replace `git2::Repository` in `RealGitRepository` with stored paths and git CLI calls via the existing `GitBinary` - Fixes a bug where linked worktree git dir events could cause the repository entry to be dropped ### Motivation libgit2 does not support the [reftable](https://github.blog/2024-04-29-highlights-from-git-2-45/#preliminary-reftable-support) storage format that was introduced in git 2.45 (see [libgit2#7117](libgit2/libgit2#7117)), meaning that git operations in the app using these (like git status, branch names, diffs, e.g.) appeared broken for any repository using `--ref-format=reftable`. [Git 3 will use reftables by default](https://www.deployhq.com/blog/git-3-0-on-the-horizon-what-git-users-need-to-know-about-the-next-major-release), so this needs to be supported eventually. Most operations used the git binary already, but there were still a handfull of stragglers using libgit2. By swapping out the remaining uses, we can remove the dependancy of libgit2 and eliminate ~30k lines of vendored C and the associated build complexity (cmake, libz, pkg-config), as well as ensure that all git operations go through similar codepaths. Closes zed-industries#46747 Closes zed-industries#45702 fixes ZED-76X fixes ZED-73N ### Notes - `reload_index` is removed from the `GitRepository` trait since both implementations were no-ops (the CLI always reads from disk) - `change_branch` is simplified to `git checkout <name>`, which natively handles local/remote branch resolution - `load_index_text` and `load_committed_text` no longer explicitly filter symlinks (the old code returned `None` for symlinked entries) ## Test plan - [x] `cargo test -p git` (34 tests) - [x] `cargo test -p buffer_diff` (14 tests) - [x] `cargo test -p worktree -- test_linked_worktree_git_dir_events_do_not_panic` - [x] `cargo test -p util` (108 tests) - [x] `cargo test -p project -- test_file_status test_repository_subfolder_git_status test_update_gitignore` - [ ] Manual testing of diff gutter, staging, branch switching, remote operations - [ ] Manual testing with a reftable repository Release Notes: - Fixed git integration not working with repositories using the reftable reference storage format --------- Co-authored-by: Anthony Eid <anthony@zed.dev>
## Summary - Remove the `git2` crate and its C dependencies (`libgit2-sys`, `libz-sys`), replacing all remaining usage with the git CLI and other previously used rust crates - Replace `git2::Patch` in `buffer_diff` with `imara-diff` (used elsewhere in the codebase for word diffing) - Replace `git2::Oid` with `[u8; 20]` and the `hex` crate - Replace `git2::Repository` in `RealGitRepository` with stored paths and git CLI calls via the existing `GitBinary` - Fixes a bug where linked worktree git dir events could cause the repository entry to be dropped ### Motivation libgit2 does not support the [reftable](https://github.blog/2024-04-29-highlights-from-git-2-45/#preliminary-reftable-support) storage format that was introduced in git 2.45 (see [libgit2#7117](libgit2/libgit2#7117)), meaning that git operations in the app using these (like git status, branch names, diffs, e.g.) appeared broken for any repository using `--ref-format=reftable`. [Git 3 will use reftables by default](https://www.deployhq.com/blog/git-3-0-on-the-horizon-what-git-users-need-to-know-about-the-next-major-release), so this needs to be supported eventually. Most operations used the git binary already, but there were still a handfull of stragglers using libgit2. By swapping out the remaining uses, we can remove the dependancy of libgit2 and eliminate ~30k lines of vendored C and the associated build complexity (cmake, libz, pkg-config), as well as ensure that all git operations go through similar codepaths. Closes zed-industries#46747 Closes zed-industries#45702 fixes ZED-76X fixes ZED-73N ### Notes - `reload_index` is removed from the `GitRepository` trait since both implementations were no-ops (the CLI always reads from disk) - `change_branch` is simplified to `git checkout <name>`, which natively handles local/remote branch resolution - `load_index_text` and `load_committed_text` no longer explicitly filter symlinks (the old code returned `None` for symlinked entries) ## Test plan - [x] `cargo test -p git` (34 tests) - [x] `cargo test -p buffer_diff` (14 tests) - [x] `cargo test -p worktree -- test_linked_worktree_git_dir_events_do_not_panic` - [x] `cargo test -p util` (108 tests) - [x] `cargo test -p project -- test_file_status test_repository_subfolder_git_status test_update_gitignore` - [ ] Manual testing of diff gutter, staging, branch switching, remote operations - [ ] Manual testing with a reftable repository Release Notes: - Fixed git integration not working with repositories using the reftable reference storage format --------- Co-authored-by: Anthony Eid <anthony@zed.dev>
This pull request introduces support for the reftable backend. It depends on the following list of pull requests:
Other than that, all tests are passing and there aren't any memory leaks. But some polishing is still required for non-Linux platforms.
This pull request supersedes #5462.