Port all git functionality to use git CLI#3833
Conversation
db17525 to
a8686e2
Compare
|
Do we introduce a specific git cli minimum version requirement this way? I remember graphite breaking because my linux distro git was to old for their wrapper. |
CodSpeed Performance ReportMerging #3833 will degrade performances by 9.61%Comparing Summary
Benchmarks breakdown
|
3a8b6cb to
b631841
Compare
|
@konstin I don't think this relies on anything even relatively new, but I can audit the commands and see the minimum version we require. |
|
Benchmarks are pretty inconsistent but it seems like overall this change introduces a ~5% regression, which is expected because we have to shell out to Git quite a bit. We may be able to improve this somewhat taking into account the new cost (e.g. some of the |
| #[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
| pub struct GitOid { | ||
| len: usize, | ||
| bytes: [u8; 40], |
There was a problem hiding this comment.
libgit2 hex encodes object IDs to reduce their size (by half). However we can't use the hex crate directly because we would need to support odd-length strings, so I left it out for now. We could also just store the IDs as Strings and do a bit of extra cloning.
| // Skip anything related to templates, they just call all sorts of issues as | ||
| // we really don't want to use them yet they insist on being used. See #6240 | ||
| // for an example issue that comes up. | ||
| // opts.external_template(false); |
There was a problem hiding this comment.
I can't seem to find a way to do this with the CLI...
| git_config.set_bool("core.autocrlf", false)?; | ||
| } | ||
| // if let Ok(mut git_config) = self.repo.config() { | ||
| // git_config.set_bool("core.autocrlf", false)?; |
There was a problem hiding this comment.
Also unsure if we need to do this.
There was a problem hiding this comment.
I think we can just remove this. We don't support vendoring right? Unless I'm misunderstanding.
There was a problem hiding this comment.
What sort of vendoring are you thinking of? core.autocrlf configures whether git automatically converts line endings (CRLF to LF) for checked out files. I think this can still affect us, but I'm not sure whether it matters (or why it matters for Cargo)?
There was a problem hiding this comment.
Ah you mean cargo vendoring. Yeah that looks to be why they needed this, it shouldn't affect us. I'll remove it.
| } | ||
|
|
||
| let mut out = [0; 40]; | ||
| out[..s.len()].copy_from_slice(s.as_bytes()); |
There was a problem hiding this comment.
Do we need to validate that this a valid hex?
There was a problem hiding this comment.
I don't think so, git will end up failing if we try to identify a commit using a malformed/broken hash anyways. I guess we could for better error messages, but it seems unlikely anyways?
|
|
||
| maybe_gc_repo(repo)?; | ||
|
|
||
| clean_repo_temp_files(repo); |
There was a problem hiding this comment.
Both of those were linked to libgit2 issues. The git cli runs garbage collection automatically on git fetch, git commit, and other commands, so it shouldn't be a problem now.
| } | ||
| let result = match self { | ||
| // Resolve the commit pointed to by the tag. | ||
| Self::Tag(s) => repo.rev_parse(&format!("refs/remotes/origin/tags/{s}^0")), |
There was a problem hiding this comment.
What does the ^0 signify here?
There was a problem hiding this comment.
^0 is short for ^{commit}, which recursively "peels" to the underlying commit object. e.g. if the tag points to another tag, which then points to a commit. I'm not 100% sure if we need to do this, but the previous code had calls to peel(ObjectType::Commit), so this retains the original behavior. I'll add a more detailed comment explaining this.
|
So just confirming -- the overall scheme here is the same, right? We have one checkout for Git repo, and then we fetch individual commits and clone them into separate directories to build? |
| ProcessBuilder::new("git") | ||
| .arg("init") | ||
| .cwd(path) | ||
| .exec_with_streaming(&mut silent, &mut silent, true)?; |
There was a problem hiding this comment.
Is there any risk of dropping important output here? (What did we do with Git CLI output before, when fetching?)
There was a problem hiding this comment.
This is the same thing we've always done in fetch_with_cli. The true means that stderr output will still be captured in the error message if the command fails, the silent callbacks just make sure that the stdout/stderr output is not piped to the user's terminal (Command::{stdout,stderr} are overridden).
There was a problem hiding this comment.
Turns out we can use exec_with_output here instead, which is a bit clearer.
Yes, there should be no change in behavior. |
Summary
We currently rely on libgit2 for most git-related functionality. However, libgit2 has long-standing performance issues, as well as lags significantly behind git in terms of new features. For these reasons we now use the git CLI by default for fetching repositories (#1781). This PR completely drops libgit2 in favor of the git CLI for all git-related functionality, which should allow us to use features such as partial clones and sparse checkouts in the future for performance.
There is also a lot of technical debt in the current git code as it's mostly taken from Cargo. Switching to the git CLI vastly simplifies the
uv-gitcodebase.Eventually we might want to look into switching to
gitoxide, but it's currently too immature for our use case.Test Plan