rewrite new catfilebatch implementation for upcoming gitscanner pkg#1650
rewrite new catfilebatch implementation for upcoming gitscanner pkg#1650technoweenie merged 12 commits intomasterfrom
Conversation
|
That failure is due to It passes locally because it's referencing a prototype branch that I have locally, but isn't pushed to GitHub. It should pass with those objects, but I think we have a better way of testing git stuff in our go test. I don't want this to be purely tested by our bash tests. |
ttaylorr
left a comment
There was a problem hiding this comment.
Looking good. Dropped some thoughts on the particulars of the implementation, but I think this is going to be awesome 🤘
lfs/gitscanner_catfilebatch.go
Outdated
|
|
||
| func catFileBatchOutput(pointerCh chan *WrappedPointer, cmd *wrappedCmd, errCh chan error) { | ||
| for { | ||
| l, err := cmd.Stdout.ReadBytes('\n') |
There was a problem hiding this comment.
I think using a *bufio.Reader might be better here, that way you can just call Text() or Bytes().
There was a problem hiding this comment.
On second thought, maybe the scanner pattern could work to return the three fields:
scanner := newCatFileBatchScanner(cmd.Stdout)
for scanner.Scan() {
scanner.Sha1()
scanner.Type()
scanner.Size()
}
if err := scanner.Err(); err != nil && err != io.EOF {
// ...
}This should make it a little easier to test this code and I dig the logical separation.
There was a problem hiding this comment.
I think this would make a good follow up PR. Trying not to change things too much.
lfs/gitscanner_catfilebatch.go
Outdated
| "io/ioutil" | ||
| "strconv" | ||
|
|
||
| "github.com/pkg/errors" |
There was a problem hiding this comment.
Did you mean to import github.com/github/git-lfs/errors?
lfs/gitscanner_catfilebatch.go
Outdated
| break // Legit errors | ||
| } | ||
|
|
||
| p, err := DecodePointer(bytes.NewBuffer(nbuf)) |
There was a problem hiding this comment.
You could use a limit reader here to avoid buffering this data.
p, err := DecodePointer(io.LimitReader(cmd.Stdout, s)There might be some internal buffering going on in DecodePointer, but I at least think this looks a little cleaner.
lfs/gitscanner_catfilebatch.go
Outdated
|
|
||
| _, err = cmd.Stdout.ReadBytes('\n') // Extra \n inserted by cat-file | ||
| if err != nil { | ||
| if err != io.EOF { |
There was a problem hiding this comment.
Could you reverse this conditional, making it a little easier to read?
if err != nil {
if err == io.EOF {
break
}
errCh <- errors.Wrap(err, "...")
}There was a problem hiding this comment.
Actually, that's invalid. We always want to break here after an error. But, it IS the end of the for, so there's no sense in the break anyway.
There was a problem hiding this comment.
Well duh, break != continue. I wrote a scanner impl as you suggested, which was pretty easy. Simplifies the flow a tad too.
lfs/gitscanner_catfilebatch.go
Outdated
| } | ||
| } | ||
|
|
||
| stderr, _ := ioutil.ReadAll(cmd.Stderr) |
There was a problem hiding this comment.
I should really push my branch that wraps this behavior in the command itself, but I think that's outside the scope of this PR.
|
Welp, I give up. I can't use the So, I went back to my original idea, of just adding fake lfs pointers to the repository and running the scanner against itself. That worked well, until I tried to push. So unfortunately, I can't add go tests for this right now :( I'll bring the test back once this code has been extracted to a separate package. |
|
Converted |
| func catFileBatchOutput(pointerCh chan *WrappedPointer, cmd *wrappedCmd, errCh chan error) { | ||
| scanner := &catFileBatchScanner{r: cmd.Stdout} | ||
| for scanner.Scan() { | ||
| pointerCh <- scanner.Pointer() |
There was a problem hiding this comment.
So clean 😍 ! Awesome work.
This is the first step towards #1649. This is a pattern that worked for a small scanner prototype I hacked up this weekend. The scanner currently relies on
git rev-list,git cat-file --batch-check, andgit cat-file --batch. Feeding data through requires 3 channels:The commands themselves continue running in goroutines until the input channel is closed. Once closed, the stdout reader will eventually return
io.EOF, which will eventually close the output channel. The output channel ofgit cat-file --batch-checkwould be an input channel forgit cat-file --batch, so its closure would eventually closegit cat-file --batchtoo.This code is written in the lfs package with prefixed filenames and prefixed types where necessary. The goal is to get the new design in there, and then start phasing out coupling on the channel wrappers,
*lfs.WrappedPointer, etc. Doing this in thelfspackage is vital, so we can watch it run our test suite and find issues earlier rather than later.