Skip to content

Git-Xet: LFS custom transfer agent with Xet protocol#425

Merged
seanses merged 28 commits intomainfrom
di/lfs-integration
Sep 8, 2025
Merged

Git-Xet: LFS custom transfer agent with Xet protocol#425
seanses merged 28 commits intomainfrom
di/lfs-integration

Conversation

@seanses
Copy link
Collaborator

@seanses seanses commented Jul 24, 2025

This PR builds a Git integration called git-xet that enables users to upload files using the Xet protocol as part of a standard git push.

This integration builds on the Git LFS custom transfer adapter protocol, the same mechanism we now use to handle Git LFS uploads for files larger than 5 GB through multipart PUT.
To enable uploads to Xet, users run git-xet install, which writes the following configuration to the Git config file at a selected scope [--system, --global (default), or --local]:

[lfs "customtransfer.xet"]
	path = git-xet
	args = transfer
	concurrent = true

This setup registers a new transfer adapter named xet, allowing Git to delegate LFS file transfers to the git-xet binary when applicable.

On the server side, support is rolled out in two stages:

Stage 1 (Upload): The Git LFS batch API for the "upload" operation is updated.

  • If a repo is Xet enabled but users didn't run git-xet install, moon-landing rejects the request when users initiated git push and returns an instruction to install git-xet. (see demo in screenshot 1).
Screenshot 2025-07-24 at 3 54 16 PM
  • If a repo is Xet enabled and users have git-xet configured correctly, moon-landing accepts the request and replies with CAS server URL and access token, which git-xet will use to upload files to Xet. (see demo in screenshot 2 & 3).
Screenshot 2025-07-24 at 3 55 13 PM Screenshot 2025-07-24 at 3 56 05 PM
  • If a repo is NOT Xet enabled, upload goes through the LFS path.

Stage 2 (Download): The same logic is applied to the Git LFS batch API for the "download" operation.

Server side changes are at https://github.com/huggingface-internal/moon-landing/pull/14572

@seanses seanses marked this pull request as draft July 24, 2025 22:19
@seanses seanses changed the title LFS custom transfer agent with Xet protocol Git-Xet: LFS custom transfer agent with Xet protocol Aug 21, 2025
@seanses seanses marked this pull request as ready for review August 21, 2025 17:40
Copy link
Contributor

@jgodlew jgodlew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good, I mostly have nits.

// user for a password to fetch public repository data.
// 2. The Git Remote URL, which should be something like "https://git.com/repo.git" This URL is used for the Git
// Credential Helper. This way existing https Git remote credentials can be re-used for LFS.
pub fn get_creds(repo: &GitRepo, remote_url: &GitUrl, operation: Operation) -> Result<Arc<dyn CredentialHelper>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have debug/info logs for the selected credential helper?

}

// 6. check Git credential helper
Ok(GitCredentialHelper::new(repo.git_path()?, &derived_host_url)?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case this fails as in the git credential returns nothing, maybe log that we failed to use the git credential command.

struct CliOverrides {
/// Increase verbosity of output (-v, -vv, etc.)
#[clap(long, short = 'v', action = ArgAction::Count)]
pub verbose: u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to set up logging somewhere, unless it's set up already? we should also make distinction for if we respect RUST_LOG over the -v flag

also set a max/min maybe? I'm guessing it's:
0 - error
1 - warn
2 - info
3 - debug
4 - trace
so min 0, max 4

Copy link
Collaborator

@rajatarya rajatarya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanses I am only through first 15 files of this PR, will review the remaining files later today.

Overall I LOVE the tests and the comments - my suggestions are mostly around adding more comments.

}

async fn terminate(&mut self) -> Result<()> {
Ok(())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log something here? Is terminate() expected to be called by someone? I am wondering if a debug/trace log message makes sense here.

Copy link
Collaborator

@rajatarya rajatarya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overall looking good. Most of my comments are to add documentation/comments and maybe restructure code for easier readability.

The child process part needs some additional testing.

Great work @seanses! This is a big feature and it is great to see its implementation really close to completed!

Copy link
Collaborator

@hoytak hoytak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things, but overall looks good to me. I'm assuming it works overall. I suspect there will be a lot to tweak when it gets used more widely, but it looks solid.

Copy link
Collaborator

@rajatarya rajatarya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I think this is ready to merge now, and any suggestions could come in subsequent PRs.

Love the comments added and the additional tests.

@seanses seanses merged commit 0e1f9f4 into main Sep 8, 2025
6 checks passed
@seanses seanses deleted the di/lfs-integration branch September 8, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants