filter-process: Ignore file size in working copy when cleaning by cynix · Pull Request #6011 · git-lfs/git-lfs

cynix · 2025-03-12T13:21:08Z

Sometimes when Git sends us the clean command, the target file in the
working copy does not have the correct size (e.g. it's empty). Since
Git will feed us the blob to clean via stdin, we have to read it all
anyway, especially if we're invoked via filter-process where leaving
any data unread will cause our Pktline's reader to not be positioned
correctly for parsing the next command. Don't trust the file size on
disk when determining how much to copy.

Resolves #4974 (specifically, the secondary problem reported in that issue).

Sometimes when Git sends us the clean command, the target file in the working copy does not have the correct size (e.g. it's empty). Since Git will feed us the blob to clean via stdin, we have to read it all anyway, especially if we're invoked via filter-process where leaving any data unread will cause our Pktline's reader to not be positioned correctly for parsing the next command. Don't trust the file size on disk when determining how much to copy.

cynix · 2025-03-12T14:07:45Z

t/cmd/lfstest-badlocalfile.go

+	//
+	// If we simply truncate the file to 0 bytes in situ, Git
+	// may not be able to finish reading its full contents, and
+	// might send us fewer packets than actually needed.


I don't know how to actually reproduce what Git is doing when it invokes our filter-process and send us clean commands with incorrect files in the working copy, so I'm using this util to simulate it. The GIT_TRACE_PACKET output looks sensible and matches what I see when the issue actually occurs. Let me know if you figure out a way to reproduce this natively.

chrisd8088 · 2025-03-13T17:03:54Z

Hey, thanks very much for this PR and for the investigation into the problems reported #4974, which have proven challenging to replicate! We really appreciate this kind of contribution from our community of users.

I think this PR deserves a thoughtful review and so I hope to devote a solid chunk of time to it as soon as I can. My apologies in advance if that takes a week or two; I can assure you it's not because we're not interested in this potential fix for a longstanding issue!

chrisd8088 · 2025-03-27T04:55:49Z

Thank you again for this PR and for your patience, @cynix!

I've started working on a review, and will get that done as soon as possible so you can see my suggested revisions, which are for the most part quite minor, at least as regards the work in this PR at the moment.

However, there are some ancillary changes I think we'll need to make as a consequence of this PR, and so I may try to implement those in another PR which we could merge ahead of this one, or ask if I can contribute some commits to this PR (for which I'd need an invitation to collaborate on your repository).

My apologies for the delay in replying, and thanks again very much for this bug fix!

cynix · 2025-03-28T12:26:16Z

@chrisd8088 Thanks for the update. I've invited you as a collaborator, feel free to add commits as you see fit. I'm guessing you're thinking of finding a way to still report progress to the user even though we can't trust the file size? (e.g. use the file size on disk as a hint for progress display only, not for determining how much to copy, or something like that) :D

chrisd8088 · 2025-03-28T16:11:02Z

I'm guessing you're thinking of finding a way to still report progress to the user even though we can't trust the file size?

Yes, that's one of the issues, indeed. At the moment, the file size in the progress log (when enabled with GIT_LFS_PROGRESS) will have -1 as the denominator in the "number of bytes written so far" field, so I worked up some changes to provide an alternative (along with a bunch of tests), but then noticed a more subtle issue, which is that when total is -1, we inadvertently defeat the 200ms throttling of writes to the progress log, so we write to the log on every callback.

Since noticing that I've been pulled off in another direction, but will return to this topic shortly, I very much hope!

chrisd8088 · 2025-05-23T17:17:06Z

Hey, I just wanted to note that although I haven't had time to circle back to this PR and the outstanding progress log issues, I will do so in time because I think this is a valuable change and I'd like to get it merged.

Thanks again for your contribution and your patience!

chrisd8088 · 2026-02-08T08:37:29Z

It's taken me much longer than I expected to circle back to this PR, but I have not forgotten about it!

For my own reference, I wanted to note that the progress log (when enabled with GIT_LFS_PROGRESS) will already have a -1 in the denominator of the bytes-received field if a server replies to an object transfer request with a Content-Encoding: gzip header and a gzipped response body.

The Go HTTP client automatically inserts an Accept-Encoding: gzip request header, as described in #6193, so a server is within its rights to reply with a gzipped response. When the net/http package from the Go standard library has inserted the Accept-Encoding: gzip request header on its own, and the server returns a gzipped response with a Content-Encoding: gzip header, the ContentLength value of the Response structure is set to -1, and we then just pass that on to our progress meter.

Under these conditions, we'll also bypass our throttling of progress log messages, as I noted above in #6011 (comment).

Enabling a progress log with GIT_LFS_PROGRESS="$TRASHDIR/progress.log" in our storage upload with compression test is sufficient to demonstrate this behaviour.

cynix · 2026-02-09T07:20:01Z

It's taken me much longer than I expected to circle back to this PR, but I have not forgotten about it!

Thanks for keeping this on your radar. Keen to see it merged eventually :)

In prior commits in this PR we introduced a new configuration option named "lfs.transfer.httpDownloadEncoding", which may be set to either of the values "gzip" or "zstd", with a default value of "gzip". When our new option is not defined or is set to "gzip", the Git LFS client simply follows its legacy behaviour when making object transfer download requests over HTTP. Specifically, the client should allow the "net/http" package in the Go standard library to insert an "Accept-Encoding: gzip" header into the requests, and if the server chooses to respond with gzip-encoded data, the client should expect the "net/http" package to automatically decode the body of the response. By contrast, when our new option is set to the value "zstd", the Git LFS client will explicitly add an "Accept-Encoding: zstd" header when it makes an object transfer download HTTP request. If the server chooses to encode the body of its response, it should include a "Content-Encoding: zstd" header which the client will recognize and then attempt to decode the response body using the zstd algorithm. In either case, the server may also send a Content-Length header, in which case the value of that header will be the length of the encoded content and not the length of the original, uncompressed data. For this reason, when the "net/http" package handles gzip-encoded response data, it returns a Response structure whose ContentLength field is set to -1. This informs any reader of the data stream in the structure's Body field that the length of the stream is unknown: https://github.com/golang/go/blob/go1.25.7/src/net/http/response.go#L90-L92 https://github.com/golang/go/blob/go1.25.7/src/net/http/transport.go#L2385 The same issue pertains when we handle a zstd-encoded response to an object download request, except that the "net/http" package is not responsible for decoding the data in the response body. Instead, we decode this data in the download() method of the basicDownloadAdapter structure in our "tq" package. Later in its operation, the download() method passes the Response structure's ContentLength field to the CopyWithCallback() function from our "tools" package. Thus when handling a response with a Content-Length header and a zstd-encoded body, we currently pass an inaccurate value to the CopyWithCallback() function, one which represents the length of the encoded object data and not its uncompressed actual size. We therefore update the download() to set the ContentLength field to -1 after decoding a zstd-encoded response body, to match the behaviour of the "net/http" package when it decodes a gzip-encoded response body. One consequence of this change will be visible if a user has set the GIT_LFS_PROGRESS environment variable to an absolute file path. In this case, the progress log messages we write to that path will contain a misleading "-1" value as the denominator of the fraction which indicates how much object data has been received so far. However, this is a known issue which already occurs when a server returns a gzip-encoded response body, and one which we expect to address comprehensively in a future PR. See, for reference: git-lfs#6011 (comment)

cynix requested a review from a team as a code owner March 12, 2025 13:21

cynix changed the title ~~filter-process: Ignore file size in working copy when cleaning.~~ filter-process: Ignore file size in working copy when cleaning. Fixes #4974 Mar 12, 2025

cynix force-pushed the fix_clean branch from 38578d1 to 53c4cce Compare March 12, 2025 13:53

cynix added 2 commits March 13, 2025 00:58

Add test case for git-lfs#4974

81934e2

cynix force-pushed the fix_clean branch from 53c4cce to fb4c72c Compare March 12, 2025 13:58

Improove comment in test util

7bb7042

cynix commented Mar 12, 2025

View reviewed changes

cynix and others added 2 commits March 14, 2025 08:56

Fix formatting

5a79fc1

Merge branch 'main' into fix_clean

7a2eb50

chrisd8088 changed the title ~~filter-process: Ignore file size in working copy when cleaning. Fixes #4974~~ filter-process: Ignore file size in working copy when cleaning Mar 22, 2025

chrisd8088 added the bug label Mar 22, 2025

This comment was marked as spam.

Sign in to view

chrisd8088 mentioned this pull request Mar 2, 2026

add lfs.httpDownloadEncoding with zstd and gzip support #6196

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filter-process: Ignore file size in working copy when cleaning#6011

filter-process: Ignore file size in working copy when cleaning#6011
cynix wants to merge 5 commits intogit-lfs:mainfrom
cynix:fix_clean

cynix commented Mar 12, 2025 •

edited by chrisd8088

Loading

Uh oh!

cynix Mar 12, 2025 •

edited

Loading

Uh oh!

chrisd8088 commented Mar 13, 2025

Uh oh!

chrisd8088 commented Mar 27, 2025

Uh oh!

cynix commented Mar 28, 2025

Uh oh!

chrisd8088 commented Mar 28, 2025

Uh oh!

This comment was marked as spam.

chrisd8088 commented May 23, 2025

Uh oh!

chrisd8088 commented Feb 8, 2026

Uh oh!

cynix commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cynix commented Mar 12, 2025 • edited by chrisd8088 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cynix Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrisd8088 commented Mar 13, 2025

Uh oh!

chrisd8088 commented Mar 27, 2025

Uh oh!

cynix commented Mar 28, 2025

Uh oh!

chrisd8088 commented Mar 28, 2025

Uh oh!

This comment was marked as spam.

chrisd8088 commented May 23, 2025

Uh oh!

chrisd8088 commented Feb 8, 2026

Uh oh!

cynix commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cynix commented Mar 12, 2025 •

edited by chrisd8088

Loading

cynix Mar 12, 2025 •

edited

Loading