-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Description
I'd like to propose a new design for the scanner code in the lfs package. The scanner functions are concerned with extracting LFS pointers from a range of git objects:
lfs.ScanRefs()scans a range of commits.lfs.ScanTree()scans just the tree of a single commit.lfs.ScanIndex()scans uncommited objects in the git index.lfs.ScanUnpushed()looks for local objects that have not been pushed to a given remote.lfs.ScanPreviousVersions()scans commits in a ref since a given time.
There are a number of problems with it, as it has organically grown:
- Common usage of the scanner functions buffer the LFS pointers to a slice. This means operations that depend on the scanner can't start until it's done. This is most apparent when you notice that LFS uploads of large git pushes don't start until
git rev-listhas run on the entire history. - No shared context, each
lfs.ScanRefs()in agit pushwith lots of branches/tags results in duplicated work, such as comparing local and remote refs, or spinning up extragit cat-fileprocesses for each ref. - Lots of coupling with the
configandlfspackages.
@ttaylorr and I came up with a design for a new scanner, focusing on async communication so it can begin feeding LFS Pointers to the transfer queue almost immediately.
Essentially, I want to create a gitscanner.Scanner object that can handle multiple scan options. Here's an illustration in pseudo go code for a pre-push implementation.
// The hook receives commit information on stdin in the form:
// <local ref> <local sha1> <remote ref> <remote sha1>
// Pushes with lots of branches/tags will have to scan multiple commit ranges
// this spins up `git cat-file --batch-check` and `git cat-file --batch` commands
gscanner := gitscanner.New()
// the scanner can calculate the remote refs needed for a push, just once,
// instead of for each ref line in stdin.
// https://github.com/github/git-lfs/blob/49354864921c6d4acaedf70ccd33a6ff5abff431/lfs/scanner.go#L296-L310
gscanner.Remote("origin")
go func() {
for pointer := range gscanner.Output {
// pass to the transfer queue
}
}()
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
left, right := decodeRefs(line)
// this blocks until the rev-list command finished
gscanner.ScanLeftToRemote(left)
}
// kills the `git cat-file` commands
gscanner.Close()Reactions are currently unavailable
