Skip to content

rewrite new catfilebatch implementation for upcoming gitscanner pkg#1650

Merged
technoweenie merged 12 commits intomasterfrom
catfilebatch-v2
Nov 16, 2016
Merged

rewrite new catfilebatch implementation for upcoming gitscanner pkg#1650
technoweenie merged 12 commits intomasterfrom
catfilebatch-v2

Conversation

@technoweenie
Copy link
Contributor

This is the first step towards #1649. This is a pattern that worked for a small scanner prototype I hacked up this weekend. The scanner currently relies on git rev-list, git cat-file --batch-check, and git cat-file --batch. Feeding data through requires 3 channels:

  • An output channel that is sent results from the current command.
  • An input channel that receives values (usually git sha1s) from something.
  • A single error channel that all commands share.

The commands themselves continue running in goroutines until the input channel is closed. Once closed, the stdout reader will eventually return io.EOF, which will eventually close the output channel. The output channel of git cat-file --batch-check would be an input channel for git cat-file --batch, so its closure would eventually close git cat-file --batch too.

This code is written in the lfs package with prefixed filenames and prefixed types where necessary. The goal is to get the new design in there, and then start phasing out coupling on the channel wrappers, *lfs.WrappedPointer, etc. Doing this in the lfs package is vital, so we can watch it run our test suite and find issues earlier rather than later.

@technoweenie
Copy link
Contributor Author

That failure is due to cat-file --batch returning this:

126fd41019b623ce1621a638d2535b26e0edb4df missing

It passes locally because it's referencing a prototype branch that I have locally, but isn't pushed to GitHub. It should pass with those objects, but I think we have a better way of testing git stuff in our go test. I don't want this to be purely tested by our bash tests.

Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Dropped some thoughts on the particulars of the implementation, but I think this is going to be awesome 🤘


func catFileBatchOutput(pointerCh chan *WrappedPointer, cmd *wrappedCmd, errCh chan error) {
for {
l, err := cmd.Stdout.ReadBytes('\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a *bufio.Reader might be better here, that way you can just call Text() or Bytes().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, maybe the scanner pattern could work to return the three fields:

scanner := newCatFileBatchScanner(cmd.Stdout)

for scanner.Scan() {
        scanner.Sha1()
        scanner.Type()
        scanner.Size()
}

if err := scanner.Err(); err != nil && err != io.EOF {
        // ...
}

This should make it a little easier to test this code and I dig the logical separation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would make a good follow up PR. Trying not to change things too much.

"io/ioutil"
"strconv"

"github.com/pkg/errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to import github.com/github/git-lfs/errors?

break // Legit errors
}

p, err := DecodePointer(bytes.NewBuffer(nbuf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a limit reader here to avoid buffering this data.

p, err := DecodePointer(io.LimitReader(cmd.Stdout, s)

There might be some internal buffering going on in DecodePointer, but I at least think this looks a little cleaner.


_, err = cmd.Stdout.ReadBytes('\n') // Extra \n inserted by cat-file
if err != nil {
if err != io.EOF {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reverse this conditional, making it a little easier to read?

if err != nil {
        if err == io.EOF {
               break
        }

        errCh <- errors.Wrap(err, "...")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that's invalid. We always want to break here after an error. But, it IS the end of the for, so there's no sense in the break anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well duh, break != continue. I wrote a scanner impl as you suggested, which was pretty easy. Simplifies the flow a tad too.

}
}

stderr, _ := ioutil.ReadAll(cmd.Stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should really push my branch that wraps this behavior in the command itself, but I think that's outside the scope of this PR.

@technoweenie technoweenie added this to the v1.5.0 milestone Nov 15, 2016
@technoweenie
Copy link
Contributor Author

Welp, I give up. I can't use the test package, since it imports lfs. So, trying to use it to test the git scanner in the lfs package results in an import loop.

So, I went back to my original idea, of just adding fake lfs pointers to the repository and running the scanner against itself. That worked well, until I tried to push.

7e9e5ac30f2216fd0fd6f5faed316f2d5983361a4203c3330cfa46ef65bb4767 does not exist in .git/lfs/objects. Tried git/fixtures/add.lfs.txt, which matches 7e9e5ac30f2216fd0fd6f5faed316f2d5983361a4203c3330cfa46ef65bb4767.
error: failed to push some refs to 'https://github.com/github/git-lfs'

So unfortunately, I can't add go tests for this right now :( I'll bring the test back once this code has been extracted to a separate package.

@technoweenie
Copy link
Contributor Author

Converted catFileBatchCheck as well. I think this is ready to go. I want to start on the new public scanner interface in the next PR.

func catFileBatchOutput(pointerCh chan *WrappedPointer, cmd *wrappedCmd, errCh chan error) {
scanner := &catFileBatchScanner{r: cmd.Stdout}
for scanner.Scan() {
pointerCh <- scanner.Pointer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So clean 😍 ! Awesome work.

@technoweenie technoweenie mentioned this pull request Nov 16, 2016
3 tasks
Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great.

@technoweenie technoweenie merged commit ced5270 into master Nov 16, 2016
@ttaylorr ttaylorr deleted the catfilebatch-v2 branch December 16, 2016 22:47
@ttaylorr ttaylorr deleted the catfilebatch-v2 branch December 16, 2016 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants