filter-process: fix reading 1024 byte files by larsxschneider · Pull Request #1699 · git-lfs/git-lfs

larsxschneider · 2016-11-22T08:52:40Z

larsxschneider · 2016-11-22T09:10:12Z

test/test-filter-process.sh

-  echo "skip: $0 git version does not include support for filter protocol"
-  exit
-fi
+ensure_git_version_isnt $VERSION_LOWER "2.11.0"


This seems to work but it still generates a few error messages:

test/testhelpers.sh: line 416: 10#rc2: value too great for base (error token is "10#rc2") test/testhelpers.sh: line 420: 10#rc2: value too great for base (error token is "10#rc2")

See: https://travis-ci.org/git-lfs/git-lfs/jobs/177907753

... Oh, a few lines down I read this:

skip: test/test-install-custom-hooks-path-unsupported.sh (git version > 2.9.0)

The git version is clear greater than 2.9, right?

However, this is a separate issue and unrelated to the 1024 error.

These "custom hook paths" tests have two parts:

The test/test-install-custom-hooks-path-unsupported.sh runs in versions of Git that don't support custom hooks paths (i.e., Git <2.9.0), and ensures that the custom hooks directory is ignored, and the hooks are placed in .git/hooks/.

the test/test-install-custom-hooks-path.sh runs on versions of Git that do support custom hooks paths (i.e., Git >=2.9.0), and ensures that the custom hooks directory is respected.

There's no (current) clean way to do this in one file, so I split it into two, and only one of them runs at a time.

If we read exactly the number of bytes that fit into `p` then implementation detected already an "overfilled" buffer and returned the number of read bytes `n` and `nil` as error. Fix this by treading the exact number not as "overfill". This way the next `readPacket()` call we will read an empty chunk and `read()` will return the number of read bytes `n` and `o.EOF` as (legitmate) error. c.f. #1697

larsxschneider · 2016-11-22T17:03:06Z

Heads up: this seems to work for me on macOS and Linux but I have a hanging process on Windows. Unfortunately I cannot share the repository that creates the hanging process and I haven't been able to create a small test case, yet.

ttaylorr · 2016-11-22T17:16:24Z

I have a hanging process on Windows.

Are you running Go 1.7.3 on the Windows box?

larsxschneider · 2016-11-22T17:18:01Z

Ah!! There was something about 1.7... ... let me try!

ttaylorr · 2016-11-22T17:42:39Z

OK, I am convinced that this is the correct solution. Here's why:

When we try to clean a file, the first thing that we do is see if that file is already an LFS pointer, and if it is, we pass that file through directly. To check if the file is an LFS pointer, we only read part of it, since the whole file could be rather large. We read the first 1024 bytes of the file (as dictated by the blobSizeCutoff.

We buffer the data in that read by passing it through a io.TeeReader into a bytes.Buffer (so we can use the data later on if the file wasn't already a pointer). This happens here. Here's where things get interesting. We combine the buffered data with the actual reader (an *os.File, usually) if we buffered less data than the total length of the file. That happens here. We only concatenate if there is more data, since if we had read the entire contents of the file, there'd be no more data left to read, and the call would either a) block forever, or b) return an io.EOF immediately.

However! Our pkt-line implementation of io.Reader splits the last read and the EOF read into two. So if you're at the end of a file, you'll get:

Read([1024]byte) => (1024, nil)
Read[n]byte) => (0, io.EOF)

This is totally valid according to the Go documentation for the io.Reader interface, but introduces a very subtle problem in our code. Consider a 1024 byte file, and the pkt-line representation of it. We'd have:

A header with the pathname and command (consumed by the filter-process scanner)
A flush packet (consumed by the filter-process scanner)
1024 bytes of the file's contents (consumed by the reader above)
A flush packet (NOT CONSUMED!)

Since we filled the buffer, and didn't reach into the next packet (which would have been a 0000 flush), the stream is left 4 bytes behind where it should be, thus the next command isn't recognized, and the command exits in a dirty state.

This fix is correct because it combines the last read into:

Read([1024]byte) => (1024, io.EOF)

because the reader reads packets until it has to buffer, not until the given p buffer is full. This causes it to read the flush packet against a perfectly sized buffer, which advances the buffer to the right state, even though we don't concatenate the readers.

This behavior is covered under our unit tests here and here.

technoweenie

The fix and explanations make sense to me 👍

eminence · 2016-11-22T17:52:58Z

I'd be happy to test this new version on my set of test data (the data that found this issue). It totals about 10GB and 138000 files. Would you happen to have a 64-bit linux binary handy?

`n`, representing the total number of bytes we have read into `p`, can never become `>len(p)`. As such, let's remove a check for that impossible condition.

ttaylorr · 2016-11-22T18:12:04Z

@eminence: sure. Here are builds for both AMD64 and 386 architectures built against Linux containing all of the commits in this PR:

~/g/git-lfs (lars/test-1024-error) $ git show -q HEAD
commit 0899760263ad35ef11bd7e8af2799b8ea33ae624
Author: Taylor Blau <me@ttaylorr.com>
Date:   Tue Nov 22 10:57:18 2016 -0700

    git: remove extraneous length check

    `n`, representing the total number of bytes we have read into `p`, can never
    become `>len(p)`. As such, let's remove a check for that impossible condition.

git-lfs-linux-386-1.5.0.tar.gz
git-lfs-linux-amd64-1.5.0.tar.gz

eminence · 2016-11-22T18:49:05Z

Thanks! It's been running for about 30 minuets now with no errors. I'll report back here when it completes.

Edit: just finished! Took 32minutes in total.

Backport #1699 for v1.5.x: filter-process: fix reading 1024 byte files

Apparently the process-filter does not work with Go below version 1.7 cf. #1699 (comment)

Apparently the process-filter does not work with Go below version 1.7 cf. git-lfs/git-lfs#1699 (comment) Former-commit-id: d99f597a8b6072113cebb7f77d9c66d0d1dc449d

Because full support of the "filter" attribute and protocol were only introduced with Git version 2.11.0, we require at least that version of Git for our t/t-filter-process.sh test script. This attribute and the associated protocol were developed in part so as to make the Git LFS client's integration with Git more efficient. The original version of the t/t-filter-process.sh test script predated the release of Git v2.11.0, so when it was introduced in commit d1dca3e of PR git-lfs#1617, it included a workaround technique for detecting whether the version of Git in use did or did not support the "filter" attribute yet. This workaround was preceded by a comment describing how it worked. In commit 54ddc13 of PR git-lfs#1699 the workaround was removed and replaced with a call to the ensure_git_version_isnt() function from our t/testhelpers.sh shell library, since Git version 2.11.0 had been released and so we could perform a simpler version check to confirm whether Git supported the "filter" attribute. However, the comment describing the original workaround technique for detecting support for the "filter" attribute was left in place, so we update it now.

filter-process: add test to demonstrate problem with 1024 byte files

b4c6d79

c.f. #1697

larsxschneider assigned ttaylorr Nov 22, 2016

filter-process: enable test for Git 2.11 and up

54ddc13

larsxschneider commented Nov 22, 2016

View reviewed changes

larsxschneider mentioned this pull request Nov 22, 2016

external filter 'git-lfs filter-process' failed #1697

Closed

larsxschneider force-pushed the lars/test-1024-error branch from 2c0a349 to 53fcd96 Compare November 22, 2016 11:22

Travis-CI complains that it cannot create directory `repo'. Rename it!

161d2c8

larsxschneider changed the title ~~filter-process: add test to demonstrate problem with 1024 byte files~~ filter-process: fix reading 1024 byte files Nov 22, 2016

technoweenie mentioned this pull request Nov 22, 2016

Support for Windows Long Paths #1690

Closed

technoweenie approved these changes Nov 22, 2016

View reviewed changes

git: remove extraneous length check

0899760

`n`, representing the total number of bytes we have read into `p`, can never become `>len(p)`. As such, let's remove a check for that impossible condition.

ttaylorr approved these changes Nov 22, 2016

View reviewed changes

Merge branch 'master' into lars/test-1024-error

212dd0f

ttaylorr merged commit 21b2fbf into master Nov 22, 2016

ttaylorr deleted the lars/test-1024-error branch November 22, 2016 20:18

ttaylorr added a commit that referenced this pull request Nov 22, 2016

Backport lars/test-1024-error from #1699 to release-1.5

24b87db

ttaylorr mentioned this pull request Nov 22, 2016

Backport #1699 for v1.5.x: filter-process: fix reading 1024 byte files #1708

Merged

ttaylorr added a commit that referenced this pull request Nov 22, 2016

Merge pull request #1708 from git-lfs/release-1.5-backport-1699

73f182e

Backport #1699 for v1.5.x: filter-process: fix reading 1024 byte files

larsxschneider added a commit that referenced this pull request Nov 23, 2016

GitLFS requires Go 1.7+

d99f597

Apparently the process-filter does not work with Go below version 1.7 cf. #1699 (comment)

larsxschneider mentioned this pull request Nov 23, 2016

GitLFS requires Go 1.7+ #1711

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filter-process: fix reading 1024 byte files#1699

filter-process: fix reading 1024 byte files#1699
ttaylorr merged 6 commits intomasterfrom
lars/test-1024-error

larsxschneider commented Nov 22, 2016

Uh oh!

larsxschneider Nov 22, 2016

Uh oh!

ttaylorr Nov 22, 2016

Uh oh!

larsxschneider commented Nov 22, 2016

Uh oh!

ttaylorr commented Nov 22, 2016

Uh oh!

larsxschneider commented Nov 22, 2016

Uh oh!

ttaylorr commented Nov 22, 2016

Uh oh!

technoweenie left a comment

Uh oh!

eminence commented Nov 22, 2016

Uh oh!

ttaylorr commented Nov 22, 2016

Uh oh!

eminence commented Nov 22, 2016 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

larsxschneider commented Nov 22, 2016

Uh oh!

larsxschneider Nov 22, 2016

Choose a reason for hiding this comment

Uh oh!

ttaylorr Nov 22, 2016

Choose a reason for hiding this comment

Uh oh!

larsxschneider commented Nov 22, 2016

Uh oh!

ttaylorr commented Nov 22, 2016

Uh oh!

larsxschneider commented Nov 22, 2016

Uh oh!

ttaylorr commented Nov 22, 2016

Uh oh!

technoweenie left a comment

Choose a reason for hiding this comment

Uh oh!

eminence commented Nov 22, 2016

Uh oh!

ttaylorr commented Nov 22, 2016

Uh oh!

eminence commented Nov 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eminence commented Nov 22, 2016 •

edited

Loading