Skip to content

filter-process: fix reading 1024 byte files#1699

Merged
ttaylorr merged 6 commits intomasterfrom
lars/test-1024-error
Nov 22, 2016
Merged

filter-process: fix reading 1024 byte files#1699
ttaylorr merged 6 commits intomasterfrom
lars/test-1024-error

Conversation

@larsxschneider
Copy link
Member

c.f. #1697

echo "skip: $0 git version does not include support for filter protocol"
exit
fi
ensure_git_version_isnt $VERSION_LOWER "2.11.0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work but it still generates a few error messages:

test/testhelpers.sh: line 416: 10#rc2: value too great for base (error token is "10#rc2")
test/testhelpers.sh: line 420: 10#rc2: value too great for base (error token is "10#rc2")

See: https://travis-ci.org/git-lfs/git-lfs/jobs/177907753

... Oh, a few lines down I read this:

skip: test/test-install-custom-hooks-path-unsupported.sh (git version > 2.9.0)

The git version is clear greater than 2.9, right?

However, this is a separate issue and unrelated to the 1024 error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These "custom hook paths" tests have two parts:

  • The test/test-install-custom-hooks-path-unsupported.sh runs in versions of Git that don't support custom hooks paths (i.e., Git <2.9.0), and ensures that the custom hooks directory is ignored, and the hooks are placed in .git/hooks/.
  • the test/test-install-custom-hooks-path.sh runs on versions of Git that do support custom hooks paths (i.e., Git >=2.9.0), and ensures that the custom hooks directory is respected.

There's no (current) clean way to do this in one file, so I split it into two, and only one of them runs at a time.

If we read exactly the number of bytes that fit into `p` then
implementation detected already an "overfilled" buffer and returned
the number of read bytes `n` and `nil` as error.

Fix this by treading the exact number not as "overfill". This way the
next `readPacket()` call we will read an empty chunk and `read()` will
return the number of read bytes `n` and `o.EOF` as (legitmate) error.

c.f. #1697
@larsxschneider larsxschneider changed the title filter-process: add test to demonstrate problem with 1024 byte files filter-process: fix reading 1024 byte files Nov 22, 2016
@larsxschneider
Copy link
Member Author

Heads up: this seems to work for me on macOS and Linux but I have a hanging process on Windows. Unfortunately I cannot share the repository that creates the hanging process and I haven't been able to create a small test case, yet.

@ttaylorr
Copy link
Contributor

I have a hanging process on Windows.

Are you running Go 1.7.3 on the Windows box?

@larsxschneider
Copy link
Member Author

Ah!! There was something about 1.7... ... let me try!

@ttaylorr
Copy link
Contributor

OK, I am convinced that this is the correct solution. Here's why:

When we try to clean a file, the first thing that we do is see if that file is already an LFS pointer, and if it is, we pass that file through directly. To check if the file is an LFS pointer, we only read part of it, since the whole file could be rather large. We read the first 1024 bytes of the file (as dictated by the blobSizeCutoff.

We buffer the data in that read by passing it through a io.TeeReader into a bytes.Buffer (so we can use the data later on if the file wasn't already a pointer). This happens here. Here's where things get interesting. We combine the buffered data with the actual reader (an *os.File, usually) if we buffered less data than the total length of the file. That happens here. We only concatenate if there is more data, since if we had read the entire contents of the file, there'd be no more data left to read, and the call would either a) block forever, or b) return an io.EOF immediately.

However! Our pkt-line implementation of io.Reader splits the last read and the EOF read into two. So if you're at the end of a file, you'll get:

  1. Read([1024]byte) => (1024, nil)
  2. Read[n]byte) => (0, io.EOF)

This is totally valid according to the Go documentation for the io.Reader interface, but introduces a very subtle problem in our code. Consider a 1024 byte file, and the pkt-line representation of it. We'd have:

  1. A header with the pathname and command (consumed by the filter-process scanner)
  2. A flush packet (consumed by the filter-process scanner)
  3. 1024 bytes of the file's contents (consumed by the reader above)
  4. A flush packet (NOT CONSUMED!)

Since we filled the buffer, and didn't reach into the next packet (which would have been a 0000 flush), the stream is left 4 bytes behind where it should be, thus the next command isn't recognized, and the command exits in a dirty state.

This fix is correct because it combines the last read into:

  1. Read([1024]byte) => (1024, io.EOF)

because the reader reads packets until it has to buffer, not until the given p buffer is full. This causes it to read the flush packet against a perfectly sized buffer, which advances the buffer to the right state, even though we don't concatenate the readers.

This behavior is covered under our unit tests here and here.

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix and explanations make sense to me 👍

@eminence
Copy link

I'd be happy to test this new version on my set of test data (the data that found this issue). It totals about 10GB and 138000 files. Would you happen to have a 64-bit linux binary handy?

`n`, representing the total number of bytes we have read into `p`, can never
become `>len(p)`. As such, let's remove a check for that impossible condition.
@ttaylorr
Copy link
Contributor

@eminence: sure. Here are builds for both AMD64 and 386 architectures built against Linux containing all of the commits in this PR:

~/g/git-lfs (lars/test-1024-error) $ git show -q HEAD
commit 0899760263ad35ef11bd7e8af2799b8ea33ae624
Author: Taylor Blau <me@ttaylorr.com>
Date:   Tue Nov 22 10:57:18 2016 -0700

    git: remove extraneous length check

    `n`, representing the total number of bytes we have read into `p`, can never
    become `>len(p)`. As such, let's remove a check for that impossible condition.

git-lfs-linux-386-1.5.0.tar.gz
git-lfs-linux-amd64-1.5.0.tar.gz

@eminence
Copy link

eminence commented Nov 22, 2016

Thanks! It's been running for about 30 minuets now with no errors. I'll report back here when it completes.

Edit: just finished! Took 32minutes in total.

@ttaylorr ttaylorr merged commit 21b2fbf into master Nov 22, 2016
@ttaylorr ttaylorr deleted the lars/test-1024-error branch November 22, 2016 20:18
ttaylorr added a commit that referenced this pull request Nov 22, 2016
Backport #1699 for v1.5.x: filter-process: fix reading 1024 byte files
larsxschneider added a commit that referenced this pull request Nov 23, 2016
Apparently the process-filter does not work with Go below version 1.7
cf. #1699 (comment)
dhiwakarK pushed a commit to dhiwakarK/expert-octo-doodle that referenced this pull request Nov 3, 2022
Apparently the process-filter does not work with Go below version 1.7
cf. git-lfs/git-lfs#1699 (comment)

Former-commit-id: d99f597a8b6072113cebb7f77d9c66d0d1dc449d
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 18, 2024
Because full support of the "filter" attribute and protocol were only
introduced with Git version 2.11.0, we require at least that version of
Git for our t/t-filter-process.sh test script.  This attribute and
the associated protocol were developed in part so as to make the
Git LFS client's integration with Git more efficient.

The original version of the t/t-filter-process.sh test script predated
the release of Git v2.11.0, so when it was introduced in commit
d1dca3e of PR git-lfs#1617, it included
a workaround technique for detecting whether the version of Git in use
did or did not support the "filter" attribute yet.  This workaround
was preceded by a comment describing how it worked.

In commit 54ddc13 of PR git-lfs#1699 the
workaround was removed and replaced with a call to the
ensure_git_version_isnt() function from our t/testhelpers.sh shell
library, since Git version 2.11.0 had been released and so we could
perform a simpler version check to confirm whether Git supported the
"filter" attribute.

However, the comment describing the original workaround technique
for detecting support for the "filter" attribute was left in place,
so we update it now.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 18, 2024
Because full support of the "filter" attribute and protocol were only
introduced with Git version 2.11.0, we require at least that version of
Git for our t/t-filter-process.sh test script.  This attribute and
the associated protocol were developed in part so as to make the
Git LFS client's integration with Git more efficient.

The original version of the t/t-filter-process.sh test script predated
the release of Git v2.11.0, so when it was introduced in commit
d1dca3e of PR git-lfs#1617, it included
a workaround technique for detecting whether the version of Git in use
did or did not support the "filter" attribute yet.  This workaround
was preceded by a comment describing how it worked.

In commit 54ddc13 of PR git-lfs#1699 the
workaround was removed and replaced with a call to the
ensure_git_version_isnt() function from our t/testhelpers.sh shell
library, since Git version 2.11.0 had been released and so we could
perform a simpler version check to confirm whether Git supported the
"filter" attribute.

However, the comment describing the original workaround technique
for detecting support for the "filter" attribute was left in place,
so we update it now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants