Skip to content

Avoid deadlock when transfer queue fails#3800

Merged
bk2204 merged 1 commit intogit-lfs:masterfrom
bk2204:abort-wait-queue
Sep 9, 2019
Merged

Avoid deadlock when transfer queue fails#3800
bk2204 merged 1 commit intogit-lfs:masterfrom
bk2204:abort-wait-queue

Conversation

@bk2204
Copy link
Member

@bk2204 bk2204 commented Aug 28, 2019

In 1412d6e ("Don't fail if we lack objects the server has", 2019-04-30), we changed the code to abort later if a missing object occurs. In doing so, we had to consider the case where the transfer queue aborts early for some reason and ensure that the sync.WaitGroup does not unnecessarily block due to outstanding objects never getting processed.

However, the approach we used, which was to explicitly add the number of items we skipped processing, was error prone and didn't cover all cases. Notably, a DNS failure could randomly cause a hang during a push. Solve this by creating a class for a wait group which is abortable and simply abort it if we encounter an error, preventing any deadlocks caused by miscounting the number of items.

There's no test for this case because triggering it requires network to be disabled, which isn't something which we can do in our testsuite. As reported, existing tests do sporadically fail in this case, so when run in a suitable environment, our testsuite already covers this case.

/cc @QuLogic as reporter
Fixes #3798.

@bk2204 bk2204 requested a review from a team August 28, 2019 21:19
@QuLogic
Copy link
Contributor

QuLogic commented Aug 31, 2019

This works for me.

Copy link
Contributor

@PastelMobileSuit PastelMobileSuit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 1412d6e ("Don't fail if we lack objects the server has",
2019-04-30), we changed the code to abort later if a missing object
occurs.  In doing so, we had to consider the case where the transfer
queue aborts early for some reason and ensure that the sync.WaitGroup
does not unnecessarily block due to outstanding objects never getting
processed.

However, the approach we used, which was to explicitly add the number of
items we skipped processing, was error prone and didn't cover all cases.
Notably, a DNS failure could randomly cause a hang during a push.  Solve
this by creating a class for a wait group which is abortable and simply
abort it if we encounter an error, preventing any deadlocks caused by
miscounting the number of items.
@bk2204 bk2204 merged commit 15e3c32 into git-lfs:master Sep 9, 2019
@bk2204 bk2204 deleted the abort-wait-queue branch September 9, 2019 18:00
@chrisd8088 chrisd8088 mentioned this pull request Jun 19, 2024
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 3, 2025
Since commit 1412d6e of PR git-lfs#3634,
during push operations the Git LFS client has sometimes avoided
reporting an error when an object to be pushed is missing locally
if the remote server reports that it has a copy already.

To implement this feature, a new Missing element was added to the
Transfer and objectTuple structures in our "tq" package, and the
Add() method of the TransferQueue structure in that package was
updated to accept an additional "missing" flag, which the method
uses to set the Missing element of the objectTuple structure it
creates and sends to the "incoming" channel.  Batches of objects to be
pushed are then gathered from this channel by the collectBatches()
method of the TransferQueue structure.

As batches of objectTuple structures are collected, they are passed
to the enqueueAndCollectRetriesFor() method, which converts them
to Transfer structures using the ToTransfers() method and then passes
them to the Batch() function, which is defined in our tq/api.go
source file.  This function initializes a batchRequest structure
which contains the set of Transfer structures as its Objects element,
and then passes those to the Batch() method specific to the current
batch transfer adapter's structure.  These Batch() methods return
a BatchResponse structure, which the Batch() function then returns
to the enqueueAndCollectRetriesFor() method.  The BatchResponse
structure also contains an Objects element which is another set
of Transfer structures that represent the per-object metadata
received from the remote server.

After the enqueueAndCollectRetriesFor() method receives a BatchResponse
during a push operation, if any of the Transfer structures in that
response define an upload action to be performed, this implies that
the remote server does not have a copy of those objects.

As one of the changes we made in PR git-lfs#3634, we introduced a step into
the enqueueAndCollectRetriesFor() method which halts the push operation
if the server's response indicates that the server lacks a copy of an
object, and if the "missing" value passed to the Add() method for that
object was set to "true".  (Initially, this step also decremented the
count of the number of objects waiting to be transferred, but this
created the potential for stalled push operations, and so another
approach to handling an early exit from the batch transfer process was
implemented in commit eb83fcd of
PR git-lfs#3800.)

Also in PR git-lfs#3634, several methods of the uploadContext structure in
our "commands" package were revised to set the "missing" value for
each object before calling the Add() method of the TransferQueue
structure.  Specifically, the ensureFile() method of the uploadContext
structure first checks for the presence of the object data file in
the .git/lfs/objects local storage directories.  If that does not
exist, then the method looks for a file in the working tree at the
path associated with the Git LFS pointer that corresponds to the
object.  If that file also does not exist, and if the
"lfs.allowIncompletePush" Git configuration option is set to "false",
then the method returns "true", and this value is ultimately used
for the "missing" argument in the call to the TransferQueue's
Add() method for the object.

Note that the file in the working tree may have any content, or be
entirely empty; its simple presence is enough to change the value
returned by the ensureFile() method, given the other conditions
described above.  We expect to revise this unintuitive behaviour in
a subsequent commit in this PR.

Before we make that change, however, we first adjust two aspects
of the implementation from PR git-lfs#3634 so as to simplify our handling
of missing objects during push operations.  We make one of these
adjustments in this commit, and the other in a following commit
in this PR.

As noted above, the enqueueAndCollectRetriesFor() method halts a
push operation if the server's response indicates that the server lacks
a copy of an object, and if the "missing" value passed to the Add()
method for that object was set to "true".  In order for the method
to determine an object's "missing" value, at present it consults
the Missing element of the object's Transfer structure from the
set of those structures provided in the BatchResponse structure.

However, Git LFS remote servers do not report an object's local
"missing" status, so these Missing fields are not populated from
the server's response.  Instead, the appropriate values for these
Missing fields in the Transfer structures representing the list
of objects in the server's response are set by either the Batch()
method of the tqClient structure, which handles HTTP-based Git LFS
Batch API requests and responses, and the Batch() method of the
SSHBatchClient structure, which handles the equivalent in the SSH
version of the Git LFS object transfer protocol.

These methods are invoked by the Batch() function in the "tq" package,
and are passed a batchRequest structure, whose Objects element has
been populated by that function from the list of Transfer structures
passed to it by the enqueueAndCollectRetriesFor() method.  The
Batch() methods for both the HTTP and SSH versions of the Git LFS
object transfer protocol iterate over this input set of Transfer
structures and create local maps from each object's ID to the
value of the Missing element from the object's input Transfer structure.

After performing the relevant HTTP or SSH batch request and receiving
a response from the remote server, the Batch() methods then iterate
over the list of objects in the server's reply and set each object's
output Transfer structure's Missing element from the value found in
the local map.

This design dates from the original implementation for the HTTP-based
Git LFS transfer protocol in commit 1412d6e
of PR git-lfs#3634, and was then implemented for the SSH-based protocol in
commit 594f8e3 of PR git-lfs#4446, when
that protocol was introduced.

However, we can simplify this design by eliminating the need to
create and reference local maps in the Batch() methods for each
transfer protocol, if we instead make use of the map of object IDs
to lists of objectTuple structures held in the "transfers" element
of the TransferQueue structure.  This "transfers" element contains
 a map from object IDs to "objects" structures, which in turn
have "objects" elements that are lists of objectTuple structures.
These are populated by the remember() method of the TransferQueue
method, which is called by the Add() method to record all the
input data it is passed when an object is added to the transfer queue.

One potential complication is that the "objects" structure allows
for multiple objectTuples to be recorded for a common object ID,
in case the Add() method is called repeatedly for the same Git LFS
object.  Hypothetically, this could allow for different values
of the "missing" argument to the Add() method to be recorded for
the same object ID during the operation of the transfer queue.

In practice, though, this is not possible, because the Add() method
will only be called once per unique object ID during push operations.
When a "git lfs pre-push" or "git lfs push" command runs, the
UploadPointers() method of the uploadContext structure in the
"commands" is the only caller of the Add() method of the TransferQueue
structure, and the UploadPointers() method only performs that call
for the object IDs returned by the prepareUpload() method, which it
calls before the TransferQueue's Add() method.

In the common case where a Git LFS push command scans through the
history of one or more Git references to find Git LFS pointers,
the UploadPointers() method of the uploadContext structure is called
for each individual pointer found in the Git history.  As a consequence,
the objects are passed individually to the prepareUpload() method.
That method checks if the given object have been processed already by
calling the HasUploaded() method, and if that returns "true", the
prepareUpload() method returns nothing, so the UploadPointers() method
in turn does not invoke the UploadPointers() method for that object.
If the HasUploaded() method returns "false", though, the object is
returned by the prepareUpload() method, so the UploadPointers() method
invokes the TransferQueue's Add() method, and then calls the
SetUploaded() method to register the object ID in the set of IDs
that have been processed so far.

Note that is unlikely for multiple Git LFS pointers with the same
object ID to be found in a repository's Git history in the first place,
because these pointers will normally be identical to each other, and
so will have identical Git blob SHA-1 values.  Therefore when we
run "git rev-list --objects ..." to retrieve the Git blobs reachable
from a set of Git references, Git LFS pointers with the same
Git LFS object IDs will normally only be represented by a single
Git blob.

However, as demonstrated by our t/t-duplicate-oids.sh test script,
it is possible for Git LFS pointers with legacy values for their
"version" field to exist, which may then reference the same SHA-256
Git LFS object ID as a modern Git LFS pointer but be stored as
distinct Git blobs.  Similarly, it is possible for a Git LFS pointer
with extension fields to resolve to the same final content data as
a pointer without those fields, or with different extension fields,
and so could also result in distinct Git blobs that are all valid
Git LFS pointers with the same SHA-256 object IDs.

Aside from the common case where a Git LFS push command scans through
the history of one or more Git references to find Git LFS pointers,
the "git lfs push" command may also be invoked with its --object-id
or --stdin options, along with a list of specific Git LFS object IDs
to be uploaded.  When handling these command-line options, the
uploadsWithObjectIDs() function in our "commands" package will
invoke the UploadPointers() method of the uploadContext structure
with the full set of Git LFS object IDs passed to the command.

Because a user may inadvertently supply the same object ID more than
once, this means duplicate IDs may be passed by the UploadPointers()
method to the prepareUpload() method, in which case that method's
call to the HasUploaded() method would return the same "false" value
for all the duplicate object IDs in the list.  Hence, this check is
not sufficient in this case to avoid duplicate IDs being returned
by the prepareUpload() method.  Fortunately, though, the
prepareUpload() method performs another round of de-duplication using
a local StringSet from the "tools" package of the standard Go library.
Each object ID is added to this "uniqOids" set as it is processed,
unless the set already contains that object ID, in which case the
object is skipped.

This second level of de-duplication logic ensures that it is not
possible for UploadPointers() method to receive object IDs from the
prepareUpload() method unless they have not been processed before.
This in turn guarantees that the Add() method of the TransferQueue
structure will never be called for the same Git LFS object ID more
than once during a push operation.

(As a sidebar, the use of the "uniqOids" set to de-duplicate objects
during push operations was originally introduced in commit
d770dfa of PR git-lfs#1600 to guard against
duplicate object IDs which arose from distinct Git LFS pointers with
the same SHA-256 object IDs, as can result from the use of legacy
pointer "version" values.  At the time, the prepareUpload() method
was called by an upload() function rather than the newer
UploadPointers() method.  During normal "git lfs pre-push" and
"git lfs push" commands, without the --object-id or --stdin options,
this upload() function would be passed all the Git LFS objects found
from scanning the appropriate Git history, and so might encounter
duplicate object IDs which would not be filtered by checking the set
maintained by the SetUploaded() method, since that was only called
after the Add() method of the TransferQueue structure.  Later, with
the changes in PR git-lfs#1953, the logic changed such that objects are
only handled individually during push operations when the --object-id
and --stdin options are not supplied, and since then the checks of the
"uniqOids" set in the prepareUpload() method have only guarded against
duplicate object IDs provided via those command-line options.)

As described above, we can rely on the object ID de-duplication in the
"commands" package to prevent the same object being requested for
upload more than once.  This ensures that the "objects" structure
found in the "transfers" map element of the TransferQueue structure
will only contain a single objectTuple.  Therefore, in our revisions
to the enqueueAndCollectRetriesFor() method, we can simply check the
Missing element of that first (and only) objectTuple to determine
whether the "missing" argument was set to "true" when the Add() method
of the TransferQueue structure was called for that unique object.

(Note that in PR git-lfs#2476, the "transfers" structure was updated to
maintain a list of objectTuple structures for each object ID, in order
to handle duplicate OIDs passed during "smudge" filter operations
using Git's "delay" capability for long-running filter drivers
like the "git lfs filter-process" command.  This filter never
handles upload operations, though, so the de-duplication logic in
the "tq" package is not applicable to push operations, since those
are fully de-duplicated in the "commands" package.)

In prior commits in this PR we refactored and expanded the tests
relevant to the handling of missing objects during push operations,
in part to provide additional validation of the changes in this
commit, including several new tests in our t/t-push-failures-local.sh
which attempt to perform push operations with missing objects while
using the SSH-based version of the Git LFS object transfer protocol.

In particular, the "push reject missing object (lfs.allowincompletepush
false) (git-lfs-transfer)" test checks that when an object is missing
locally, and is detected as such by the ensureFile() method of the
uploadContext structure because there is also no file in the working
tree at the path associated with the object's Git LFS pointer, the
Git LFS client abandons the push operation immediately and does not
proceed to try to upload either the missing object or any other objects.

The test validates this behaviour by checking that no objects are
present on the server after the "git push" command exits, and also
by looking for error and trace log messages the client should output
when it halts the push operation.  We added the trace log message in
a previous commit in this PR because we intend to revise the error
message in a subsequent commit, which will make the message more
consistent, so it is not specific to the case where no file is found
in the working tree, but will also mean we cannot use the revised
error message as an indicator that the client has halted the push
operation where we expect that it should.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 3, 2025
Since commit 1412d6e of PR git-lfs#3634,
during push operations the Git LFS client has sometimes avoided
reporting an error when an object to be pushed is missing locally
if the remote server reports that it has a copy already.

To implement this feature, a new Missing element was added to the
Transfer and objectTuple structures in our "tq" package, and the
Add() method of the TransferQueue structure in that package was
updated to accept an additional "missing" flag, which the method
uses to set the Missing element of the objectTuple structure it
creates and sends to the "incoming" channel.  Batches of objects to be
pushed are then gathered from this channel by the collectBatches()
method of the TransferQueue structure.

As batches of objectTuple structures are collected, they are passed
to the enqueueAndCollectRetriesFor() method, which converts them
to Transfer structures using the ToTransfers() method and then passes
them to the Batch() function, which is defined in our tq/api.go
source file.  This function initializes a batchRequest structure
which contains the set of Transfer structures as its Objects element,
and then passes those to the Batch() method specific to the current
batch transfer adapter's structure.  These Batch() methods return
a BatchResponse structure, which the Batch() function then returns
to the enqueueAndCollectRetriesFor() method.  The BatchResponse
structure also contains an Objects element which is another set
of Transfer structures that represent the per-object metadata
received from the remote server.

After the enqueueAndCollectRetriesFor() method receives a BatchResponse
during a push operation, if any of the Transfer structures in that
response define an upload action to be performed, this implies that
the remote server does not have a copy of those objects.

As one of the changes we made in PR git-lfs#3634, we introduced a step into
the enqueueAndCollectRetriesFor() method which halts the push operation
if the server's response indicates that the server lacks a copy of an
object, and if the "missing" value passed to the Add() method for that
object was set to "true".  (Initially, this step also decremented the
count of the number of objects waiting to be transferred, but this
created the potential for stalled push operations, and so another
approach to handling an early exit from the batch transfer process was
implemented in commit eb83fcd of
PR git-lfs#3800.)

Also in PR git-lfs#3634, several methods of the uploadContext structure in
our "commands" package were revised to set the "missing" value for
each object before calling the Add() method of the TransferQueue
structure.  Specifically, the ensureFile() method of the uploadContext
structure first checks for the presence of the object data file in
the .git/lfs/objects local storage directories.  If that does not
exist, then the method looks for a file in the working tree at the
path associated with the Git LFS pointer that corresponds to the
object.  If that file also does not exist, and if the
"lfs.allowIncompletePush" Git configuration option is set to "false",
then the method returns "true", and this value is ultimately used
for the "missing" argument in the call to the TransferQueue's
Add() method for the object.

Note that the file in the working tree may have any content, or be
entirely empty; its simple presence is enough to change the value
returned by the ensureFile() method, given the other conditions
described above.  We expect to revise this unintuitive behaviour in
a subsequent commit in this PR.

Before we make that change, however, we first adjust two aspects
of the implementation from PR git-lfs#3634 so as to simplify our handling
of missing objects during push operations.  We made one of these
adjustments in the previous commit in this PR, and we make the
other in this commit.

As noted above, the enqueueAndCollectRetriesFor() method halts a
push operation if the server's response indicates that the server lacks
a copy of an object, and if the "missing" value passed to the Add()
method for that object was set to "true".  When this occurs, the
enqueueAndCollectRetriesFor() method outputs an error message that
is distinct from the message which would otherwise be reported later,
when the partitionTransfers() method of the TransferQueue rechecks
whether individual objects' data files are present in the local
storage directories.

The message reported by the enqueueAndCollectRetriesFor() method
when it abandons a push operation states that the client was
"Unable to find source for object".  This message was originally
introduced in commit fea77e1 of
PR git-lfs#3398, and was generated by the ensureFile() method of the
uploadContext structure in our "commands" package, when that method
was unable to locate either an object file in the local storage
directories, or a corresponding file in the working tree at the path
of the object's Git LFS pointer.

As such, the wording of the message alludes to the expectation that
the ensureFile() method will try to recreate a missing object file
from a file in the working tree, i.e., from the object's "source".
This is how the method is described in the code comments that
precede it, and how it was intended to operate since it was first
added in PR git-lfs#176.

However, the ensureFile() has never actually recreated object files
in this way, due to an oversight in its implementation, and given
the challenges posed by the likelihood that files in the current
working tree do not correspond exactly to the source of missing Git LFS
object files, we expect to simply remove the ensureFile() method in
a subsequent commit in this PR.

In PR git-lfs#3634 the enqueueAndCollectRetriesFor() method of the TransferQueue
structure was revised to halt push operations if the server reports that
it requires the upload of an object for which the "missing" value
provided to the TransferQueue's Add() method was set to "true".  The
error message reported in such a case was copied from the message
formerly output by the ensureFile() method of the uploadContext
structure, and that method was altered so it no longer generated this
error message.

This error message is not the only one reported by the Git LFS client
when an object file is missing during a push operation, though, because
it is only output when the ensureFile() method does not find a file
(with any content) in the working at the path associated with the
object's pointer.  When a file does exist in the working tree, but
the actual object file in the Git LFS local storage directories is
missing, the push operation proceeds past the checks in the
enqueueAndCollectRetriesFor() method and continues to the point where
that method calls the addToAdapter() method of the TransferQueue
structure.  That method in turn calls the partitionTransfers() method
to determine which of the current batch of objects to be uploaded
have local object files present and which do not.

If the partitionTransfers() method finds that an object file is
missing for a given object, it creates a new MalformedObjectError
with the "missing" element of that error structure set to "true".
The method then instantiates a TransferResult structure for the object
and sets the Error element of the TransferResult to the new
MalformedObjectError.  Finally, the method returns this TransferResult
along with the other TransferResult structures it creates for all the
other objects in the batch.

The addToAdapter() method passes these TransferResult structures
individually to the handleTransferResult() method, which checks whether
the Error element is defined for the given TransferResult.  If an
error was encountered, and is one which indicates the object transfer
should not be retried, then the handleTransferResult() method sends
the error to the TransferQueue's "errorc" channel.  For errors of the
MalformedObjectError type, this is always the case.

During a push operation, the CollectErrors() method of the uploadContext
structure in the "commands" package receives these errors from the
channel, and if they are errors of the MalformedObjectError type and
have a "true" values in their "missing" element, the object's ID and
the filename associated with the object's pointer are recorded in the
"missing" map of the uploadContext structure.

When the ReportErrors() method of the uploadContext structure is then
run, it iterates over the keys and values of the "missing" map and
outputs an error message containing both the object ID and the
associated filename of the object's pointer, along with a "(missing)"
prefix.  As well, a leading error message is output, whose exact text
depends on the value of the "lfs.allowIncompletePush" configuration
option, and if this option is set to its default value of "false",
several trailing error messages are output which provide a hint as to
how to set that option if the user wants to allow a subsequent push
operation with missing objects to proceed to completion as best it can.

These per-object error messages were first defined in commit
9be11e8 of PR git-lfs#2082, and the trailing
hints were added in commit f5f5731 of
PR git-lfs#3109, at which time the "lfs.allowIncompletePush" option's default
values was changed to "false".

As mentioned above, in a subsequent commit in this PR we expect to
remove the ensureFile() method of the uploadContext structure.  We
still expect to perform a check for missing object files in the
uploadTransfer() method, though, as this will typically find any
missing object files at the start of a push operation, and thereby
allow the enqueueAndCollectRetriesFor() method of the TransferQueue
structure to halt the operation as soon as one of those objects is
found to be required by the remote server.

For the time being, we will leave the secondary check for missing
object files in place in the partitionTransfers() method of the
TransferQueue structure, as this method also tests the size of the
object file and reports those with unexpected sizes as corrupt.

Nevertheless, we would like to make our error messages as consistent
as possible when handling missing object files.  Therefore we revise
the enqueueAndCollectRetriesFor() method so it no longer returns the
"Unable to find source for object" error message, but instead returns a
new MalformedObjectError with a "true" value for its "missing" element.

One advantage of this change is that we remove the somewhat stale
wording of the previous message, which reflected the assumption that
the ensureFile() method of the uploadContext structure would attempt
to recreate missing object files from "source" files in the working
tree, even though the method has never actually done so.

Another advantage is that by returning a MalformedObjectError, the
existing logic of the CollectErrors() and ReportErrors() methods of the
uploadContext structure will handle the error exactly as if it had been
generated by the TransferQueue's partitionTransfers() method, and will
output both the same leading error message and trailing hint messages
as in that case.

As a result, we also adjust several tests in our t/t-pre-push.sh
and t/t-push-failures-local.sh scripts to expect the error messages
output by the ReportErrors() method instead of the message previously
generated by the enqueueAndCollectRetriesFor() method.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 3, 2025
Since commit 1412d6e of PR git-lfs#3634,
during push operations the Git LFS client has sometimes avoided
reporting an error when an object to be pushed is missing locally
if the remote server reports that it has a copy already.

To implement this feature, a new Missing element was added to the
Transfer and objectTuple structures in our "tq" package, and the
Add() method of the TransferQueue structure in that package was
updated to accept an additional "missing" flag, which the method
uses to set the Missing element of the objectTuple structure it
creates and sends to the "incoming" channel.  Batches of objects to be
pushed are then gathered from this channel by the collectBatches()
method of the TransferQueue structure.

As batches of objectTuple structures are collected, they are passed
to the enqueueAndCollectRetriesFor() method, which converts them
to Transfer structures using the ToTransfers() method and then passes
them to the Batch() function, which is defined in our tq/api.go
source file.  This function initializes a batchRequest structure
which contains the set of Transfer structures as its Objects element,
and then passes those to the Batch() method specific to the current
batch transfer adapter's structure.  These Batch() methods return
a BatchResponse structure, which the Batch() function then returns
to the enqueueAndCollectRetriesFor() method.  The BatchResponse
structure also contains an Objects element which is another set
of Transfer structures that represent the per-object metadata
received from the remote server.

After the enqueueAndCollectRetriesFor() method receives a BatchResponse
during a push operation, if any of the Transfer structures in that
response define an upload action to be performed, this implies that
the remote server does not have a copy of those objects.

As one of the changes we made in PR git-lfs#3634, we introduced a step into
the enqueueAndCollectRetriesFor() method which halts the push operation
if the server's response indicates that the server lacks a copy of an
object, and if the "missing" value passed to the Add() method for that
object was set to "true".  (Initially, this step also decremented the
count of the number of objects waiting to be transferred, but this
created the potential for stalled push operations, and so another
approach to handling an early exit from the batch transfer process was
implemented in commit eb83fcd of
PR git-lfs#3800.)

Also in PR git-lfs#3634, several methods of the uploadContext structure in
our "commands" package were revised to set the "missing" value for
each object before calling the Add() method of the TransferQueue
structure.  Specifically, the ensureFile() method of the uploadContext
structure first checks for the presence of the object data file in
the .git/lfs/objects local storage directories.  If that does not
exist, then the method looks for a file in the working tree at the
path associated with the Git LFS pointer that corresponds to the
object.  If that file also does not exist, and if the
"lfs.allowIncompletePush" Git configuration option is set to "false",
then the method returns "true", and this value is ultimately used
for the "missing" argument in the call to the TransferQueue's
Add() method for the object.

Note that the file in the working tree may have any content, or be
entirely empty; its simple presence is enough to change the value
returned by the ensureFile() method, given the other conditions
described above.  We expect to revise this unintuitive behaviour in
a subsequent commit in this PR.

Before we make that change, however, we first adjust two aspects
of the implementation from PR git-lfs#3634 so as to simplify our handling
of missing objects during push operations.  We make one of these
adjustments in this commit, and the other in a following commit
in this PR.

As noted above, the enqueueAndCollectRetriesFor() method halts a
push operation if the server's response indicates that the server lacks
a copy of an object, and if the "missing" value passed to the Add()
method for that object was set to "true".  In order for the method
to determine an object's "missing" value, at present it consults
the Missing element of the object's Transfer structure from the
set of those structures provided in the BatchResponse structure.

However, Git LFS remote servers do not report an object's local
"missing" status, so these Missing fields are not populated from
the server's response.  Instead, the appropriate values for these
Missing fields in the Transfer structures representing the list
of objects in the server's response are set by either the Batch()
method of the tqClient structure, which handles HTTP-based Git LFS
Batch API requests and responses, and the Batch() method of the
SSHBatchClient structure, which handles the equivalent in the SSH
version of the Git LFS object transfer protocol.

These methods are invoked by the Batch() function in the "tq" package,
and are passed a batchRequest structure, whose Objects element has
been populated by that function from the list of Transfer structures
passed to it by the enqueueAndCollectRetriesFor() method.  The
Batch() methods for both the HTTP and SSH versions of the Git LFS
object transfer protocol iterate over this input set of Transfer
structures and create local maps from each object's ID to the
value of the Missing element from the object's input Transfer structure.

After performing the relevant HTTP or SSH batch request and receiving
a response from the remote server, the Batch() methods then iterate
over the list of objects in the server's reply and set each object's
output Transfer structure's Missing element from the value found in
the local map.

This design dates from the original implementation for the HTTP-based
Git LFS transfer protocol in commit 1412d6e
of PR git-lfs#3634, and was then implemented for the SSH-based protocol in
commit 594f8e3 of PR git-lfs#4446, when
that protocol was introduced.

However, we can simplify this design by eliminating the need to
create and reference local maps in the Batch() methods for each
transfer protocol, if we instead make use of the map of object IDs
to lists of objectTuple structures held in the "transfers" element
of the TransferQueue structure.  This "transfers" element contains
 a map from object IDs to "objects" structures, which in turn
have "objects" elements that are lists of objectTuple structures.
These are populated by the remember() method of the TransferQueue
method, which is called by the Add() method to record all the
input data it is passed when an object is added to the transfer queue.

One potential complication is that the "objects" structure allows
for multiple objectTuples to be recorded for a common object ID,
in case the Add() method is called repeatedly for the same Git LFS
object.  Hypothetically, this could allow for different values
of the "missing" argument to the Add() method to be recorded for
the same object ID during the operation of the transfer queue.

In practice, though, this is not possible, because the Add() method
will only be called once per unique object ID during push operations.
When a "git lfs pre-push" or "git lfs push" command runs, the
UploadPointers() method of the uploadContext structure in the
"commands" is the only caller of the Add() method of the TransferQueue
structure, and the UploadPointers() method only performs that call
for the object IDs returned by the prepareUpload() method, which it
calls before the TransferQueue's Add() method.

In the common case where a Git LFS push command scans through the
history of one or more Git references to find Git LFS pointers,
the UploadPointers() method of the uploadContext structure is called
for each individual pointer found in the Git history.  As a consequence,
the objects are passed individually to the prepareUpload() method.
That method checks if the given object have been processed already by
calling the HasUploaded() method, and if that returns "true", the
prepareUpload() method returns nothing, so the UploadPointers() method
in turn does not invoke the UploadPointers() method for that object.
If the HasUploaded() method returns "false", though, the object is
returned by the prepareUpload() method, so the UploadPointers() method
invokes the TransferQueue's Add() method, and then calls the
SetUploaded() method to register the object ID in the set of IDs
that have been processed so far.

Note that is unlikely for multiple Git LFS pointers with the same
object ID to be found in a repository's Git history in the first place,
because these pointers will normally be identical to each other, and
so will have identical Git blob SHA-1 values.  Therefore when we
run "git rev-list --objects ..." to retrieve the Git blobs reachable
from a set of Git references, Git LFS pointers with the same
Git LFS object IDs will normally only be represented by a single
Git blob.

However, as demonstrated by our t/t-duplicate-oids.sh test script,
it is possible for Git LFS pointers with legacy values for their
"version" field to exist, which may then reference the same SHA-256
Git LFS object ID as a modern Git LFS pointer but be stored as
distinct Git blobs.  Similarly, it is possible for a Git LFS pointer
with extension fields to resolve to the same final content data as
a pointer without those fields, or with different extension fields,
and so could also result in distinct Git blobs that are all valid
Git LFS pointers with the same SHA-256 object IDs.

Aside from the common case where a Git LFS push command scans through
the history of one or more Git references to find Git LFS pointers,
the "git lfs push" command may also be invoked with its --object-id
or --stdin options, along with a list of specific Git LFS object IDs
to be uploaded.  When handling these command-line options, the
uploadsWithObjectIDs() function in our "commands" package will
invoke the UploadPointers() method of the uploadContext structure
with the full set of Git LFS object IDs passed to the command.

Because a user may inadvertently supply the same object ID more than
once, this means duplicate IDs may be passed by the UploadPointers()
method to the prepareUpload() method, in which case that method's
call to the HasUploaded() method would return the same "false" value
for all the duplicate object IDs in the list.  Hence, this check is
not sufficient in this case to avoid duplicate IDs being returned
by the prepareUpload() method.  Fortunately, though, the
prepareUpload() method performs another round of de-duplication using
a local StringSet from the "tools" package of the standard Go library.
Each object ID is added to this "uniqOids" set as it is processed,
unless the set already contains that object ID, in which case the
object is skipped.

This second level of de-duplication logic ensures that it is not
possible for UploadPointers() method to receive object IDs from the
prepareUpload() method unless they have not been processed before.
This in turn guarantees that the Add() method of the TransferQueue
structure will never be called for the same Git LFS object ID more
than once during a push operation.

(As a sidebar, the use of the "uniqOids" set to de-duplicate objects
during push operations was originally introduced in commit
d770dfa of PR git-lfs#1600 to guard against
duplicate object IDs which arose from distinct Git LFS pointers with
the same SHA-256 object IDs, as can result from the use of legacy
pointer "version" values.  At the time, the prepareUpload() method
was called by an upload() function rather than the newer
UploadPointers() method.  During normal "git lfs pre-push" and
"git lfs push" commands, without the --object-id or --stdin options,
this upload() function would be passed all the Git LFS objects found
from scanning the appropriate Git history, and so might encounter
duplicate object IDs which would not be filtered by checking the set
maintained by the SetUploaded() method, since that was only called
after the Add() method of the TransferQueue structure.  Later, with
the changes in PR git-lfs#1953, the logic changed such that objects are
only handled individually during push operations when the --object-id
and --stdin options are not supplied, and since then the checks of the
"uniqOids" set in the prepareUpload() method have only guarded against
duplicate object IDs provided via those command-line options.)

As described above, we can rely on the object ID de-duplication in the
"commands" package to prevent the same object being requested for
upload more than once.  This ensures that the "objects" structure
found in the "transfers" map element of the TransferQueue structure
will only contain a single objectTuple.  Therefore, in our revisions
to the enqueueAndCollectRetriesFor() method, we can simply check the
Missing element of that first (and only) objectTuple to determine
whether the "missing" argument was set to "true" when the Add() method
of the TransferQueue structure was called for that unique object.

(Note that in PR git-lfs#2476, the "transfers" structure was updated to
maintain a list of objectTuple structures for each object ID, in order
to handle duplicate OIDs passed during "smudge" filter operations
using Git's "delay" capability for long-running filter drivers
like the "git lfs filter-process" command.  This filter never
handles upload operations, though, so the de-duplication logic in
the "tq" package is not applicable to push operations, since those
are fully de-duplicated in the "commands" package.)

In prior commits in this PR we refactored and expanded the tests
relevant to the handling of missing objects during push operations,
in part to provide additional validation of the changes in this
commit, including several new tests in our t/t-push-failures-local.sh
which attempt to perform push operations with missing objects while
using the SSH-based version of the Git LFS object transfer protocol.

In particular, the "push reject missing object (lfs.allowincompletepush
false) (git-lfs-transfer)" test checks that when an object is missing
locally, and is detected as such by the ensureFile() method of the
uploadContext structure because there is also no file in the working
tree at the path associated with the object's Git LFS pointer, the
Git LFS client abandons the push operation immediately and does not
proceed to try to upload either the missing object or any other objects.

The test validates this behaviour by checking that no objects are
present on the server after the "git push" command exits, and also
by looking for error and trace log messages the client should output
when it halts the push operation.  We added the trace log message in
a previous commit in this PR because we intend to revise the error
message in a subsequent commit, which will make the message more
consistent, so it is not specific to the case where no file is found
in the working tree, but will also mean we cannot use the revised
error message as an indicator that the client has halted the push
operation where we expect that it should.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 3, 2025
Since commit 1412d6e of PR git-lfs#3634,
during push operations the Git LFS client has sometimes avoided
reporting an error when an object to be pushed is missing locally
if the remote server reports that it has a copy already.

To implement this feature, a new Missing element was added to the
Transfer and objectTuple structures in our "tq" package, and the
Add() method of the TransferQueue structure in that package was
updated to accept an additional "missing" flag, which the method
uses to set the Missing element of the objectTuple structure it
creates and sends to the "incoming" channel.  Batches of objects to be
pushed are then gathered from this channel by the collectBatches()
method of the TransferQueue structure.

As batches of objectTuple structures are collected, they are passed
to the enqueueAndCollectRetriesFor() method, which converts them
to Transfer structures using the ToTransfers() method and then passes
them to the Batch() function, which is defined in our tq/api.go
source file.  This function initializes a batchRequest structure
which contains the set of Transfer structures as its Objects element,
and then passes those to the Batch() method specific to the current
batch transfer adapter's structure.  These Batch() methods return
a BatchResponse structure, which the Batch() function then returns
to the enqueueAndCollectRetriesFor() method.  The BatchResponse
structure also contains an Objects element which is another set
of Transfer structures that represent the per-object metadata
received from the remote server.

After the enqueueAndCollectRetriesFor() method receives a BatchResponse
during a push operation, if any of the Transfer structures in that
response define an upload action to be performed, this implies that
the remote server does not have a copy of those objects.

As one of the changes we made in PR git-lfs#3634, we introduced a step into
the enqueueAndCollectRetriesFor() method which halts the push operation
if the server's response indicates that the server lacks a copy of an
object, and if the "missing" value passed to the Add() method for that
object was set to "true".  (Initially, this step also decremented the
count of the number of objects waiting to be transferred, but this
created the potential for stalled push operations, and so another
approach to handling an early exit from the batch transfer process was
implemented in commit eb83fcd of
PR git-lfs#3800.)

Also in PR git-lfs#3634, several methods of the uploadContext structure in
our "commands" package were revised to set the "missing" value for
each object before calling the Add() method of the TransferQueue
structure.  Specifically, the ensureFile() method of the uploadContext
structure first checks for the presence of the object data file in
the .git/lfs/objects local storage directories.  If that does not
exist, then the method looks for a file in the working tree at the
path associated with the Git LFS pointer that corresponds to the
object.  If that file also does not exist, and if the
"lfs.allowIncompletePush" Git configuration option is set to "false",
then the method returns "true", and this value is ultimately used
for the "missing" argument in the call to the TransferQueue's
Add() method for the object.

Note that the file in the working tree may have any content, or be
entirely empty; its simple presence is enough to change the value
returned by the ensureFile() method, given the other conditions
described above.  We expect to revise this unintuitive behaviour in
a subsequent commit in this PR.

Before we make that change, however, we first adjust two aspects
of the implementation from PR git-lfs#3634 so as to simplify our handling
of missing objects during push operations.  We made one of these
adjustments in the previous commit in this PR, and we make the
other in this commit.

As noted above, the enqueueAndCollectRetriesFor() method halts a
push operation if the server's response indicates that the server lacks
a copy of an object, and if the "missing" value passed to the Add()
method for that object was set to "true".  When this occurs, the
enqueueAndCollectRetriesFor() method outputs an error message that
is distinct from the message which would otherwise be reported later,
when the partitionTransfers() method of the TransferQueue rechecks
whether individual objects' data files are present in the local
storage directories.

The message reported by the enqueueAndCollectRetriesFor() method
when it abandons a push operation states that the client was
"Unable to find source for object".  This message was originally
introduced in commit fea77e1 of
PR git-lfs#3398, and was generated by the ensureFile() method of the
uploadContext structure in our "commands" package, when that method
was unable to locate either an object file in the local storage
directories, or a corresponding file in the working tree at the path
of the object's Git LFS pointer.

As such, the wording of the message alludes to the expectation that
the ensureFile() method will try to recreate a missing object file
from a file in the working tree, i.e., from the object's "source".
This is how the method is described in the code comments that
precede it, and how it was intended to operate since it was first
added in PR git-lfs#176.

However, the ensureFile() has never actually recreated object files
in this way, due to an oversight in its implementation, and given
the challenges posed by the likelihood that files in the current
working tree do not correspond exactly to the source of missing Git LFS
object files, we expect to simply remove the ensureFile() method in
a subsequent commit in this PR.

In PR git-lfs#3634 the enqueueAndCollectRetriesFor() method of the TransferQueue
structure was revised to halt push operations if the server reports that
it requires the upload of an object for which the "missing" value
provided to the TransferQueue's Add() method was set to "true".  The
error message reported in such a case was copied from the message
formerly output by the ensureFile() method of the uploadContext
structure, and that method was altered so it no longer generated this
error message.

This error message is not the only one reported by the Git LFS client
when an object file is missing during a push operation, though, because
it is only output when the ensureFile() method does not find a file
(with any content) in the working at the path associated with the
object's pointer.  When a file does exist in the working tree, but
the actual object file in the Git LFS local storage directories is
missing, the push operation proceeds past the checks in the
enqueueAndCollectRetriesFor() method and continues to the point where
that method calls the addToAdapter() method of the TransferQueue
structure.  That method in turn calls the partitionTransfers() method
to determine which of the current batch of objects to be uploaded
have local object files present and which do not.

If the partitionTransfers() method finds that an object file is
missing for a given object, it creates a new MalformedObjectError
with the "missing" element of that error structure set to "true".
The method then instantiates a TransferResult structure for the object
and sets the Error element of the TransferResult to the new
MalformedObjectError.  Finally, the method returns this TransferResult
along with the other TransferResult structures it creates for all the
other objects in the batch.

The addToAdapter() method passes these TransferResult structures
individually to the handleTransferResult() method, which checks whether
the Error element is defined for the given TransferResult.  If an
error was encountered, and is one which indicates the object transfer
should not be retried, then the handleTransferResult() method sends
the error to the TransferQueue's "errorc" channel.  For errors of the
MalformedObjectError type, this is always the case.

During a push operation, the CollectErrors() method of the uploadContext
structure in the "commands" package receives these errors from the
channel, and if they are errors of the MalformedObjectError type and
have a "true" values in their "missing" element, the object's ID and
the filename associated with the object's pointer are recorded in the
"missing" map of the uploadContext structure.

When the ReportErrors() method of the uploadContext structure is then
run, it iterates over the keys and values of the "missing" map and
outputs an error message containing both the object ID and the
associated filename of the object's pointer, along with a "(missing)"
prefix.  As well, a leading error message is output, whose exact text
depends on the value of the "lfs.allowIncompletePush" configuration
option, and if this option is set to its default value of "false",
several trailing error messages are output which provide a hint as to
how to set that option if the user wants to allow a subsequent push
operation with missing objects to proceed to completion as best it can.

These per-object error messages were first defined in commit
9be11e8 of PR git-lfs#2082, and the trailing
hints were added in commit f5f5731 of
PR git-lfs#3109, at which time the "lfs.allowIncompletePush" option's default
values was changed to "false".

As mentioned above, in a subsequent commit in this PR we expect to
remove the ensureFile() method of the uploadContext structure.  We
still expect to perform a check for missing object files in the
uploadTransfer() method, though, as this will typically find any
missing object files at the start of a push operation, and thereby
allow the enqueueAndCollectRetriesFor() method of the TransferQueue
structure to halt the operation as soon as one of those objects is
found to be required by the remote server.

For the time being, we will leave the secondary check for missing
object files in place in the partitionTransfers() method of the
TransferQueue structure, as this method also tests the size of the
object file and reports those with unexpected sizes as corrupt.

Nevertheless, we would like to make our error messages as consistent
as possible when handling missing object files.  Therefore we revise
the enqueueAndCollectRetriesFor() method so it no longer returns the
"Unable to find source for object" error message, but instead returns a
new MalformedObjectError with a "true" value for its "missing" element.

One advantage of this change is that we remove the somewhat stale
wording of the previous message, which reflected the assumption that
the ensureFile() method of the uploadContext structure would attempt
to recreate missing object files from "source" files in the working
tree, even though the method has never actually done so.

Another advantage is that by returning a MalformedObjectError, the
existing logic of the CollectErrors() and ReportErrors() methods of the
uploadContext structure will handle the error exactly as if it had been
generated by the TransferQueue's partitionTransfers() method, and will
output both the same leading error message and trailing hint messages
as in that case.

As a result, we also adjust several tests in our t/t-pre-push.sh
and t/t-push-failures-local.sh scripts to expect the error messages
output by the ReportErrors() method instead of the message previously
generated by the enqueueAndCollectRetriesFor() method.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request May 22, 2025
Since commit 1412d6e of PR git-lfs#3634,
during push operations the Git LFS client has sometimes avoided
reporting an error when an object to be pushed is missing locally
if the remote server reports that it has a copy already.

To implement this feature, a new Missing element was added to the
Transfer and objectTuple structures in our "tq" package, and the
Add() method of the TransferQueue structure in that package was
updated to accept an additional "missing" flag, which the method
uses to set the Missing element of the objectTuple structure it
creates and sends to the "incoming" channel.  Batches of objects to be
pushed are then gathered from this channel by the collectBatches()
method of the TransferQueue structure.

As batches of objectTuple structures are collected, they are passed
to the enqueueAndCollectRetriesFor() method, which converts them
to Transfer structures using the ToTransfers() method and then passes
them to the Batch() function, which is defined in our tq/api.go
source file.  This function initializes a batchRequest structure
which contains the set of Transfer structures as its Objects element,
and then passes those to the Batch() method specific to the current
batch transfer adapter's structure.  These Batch() methods return
a BatchResponse structure, which the Batch() function then returns
to the enqueueAndCollectRetriesFor() method.  The BatchResponse
structure also contains an Objects element which is another set
of Transfer structures that represent the per-object metadata
received from the remote server.

After the enqueueAndCollectRetriesFor() method receives a BatchResponse
during a push operation, if any of the Transfer structures in that
response define an upload action to be performed, this implies that
the remote server does not have a copy of those objects.

As one of the changes we made in PR git-lfs#3634, we introduced a step into
the enqueueAndCollectRetriesFor() method which halts the push operation
if the server's response indicates that the server lacks a copy of an
object, and if the "missing" value passed to the Add() method for that
object was set to "true".  (Initially, this step also decremented the
count of the number of objects waiting to be transferred, but this
created the potential for stalled push operations, and so another
approach to handling an early exit from the batch transfer process was
implemented in commit eb83fcd of
PR git-lfs#3800.)

Also in PR git-lfs#3634, several methods of the uploadContext structure in
our "commands" package were revised to set the "missing" value for
each object before calling the Add() method of the TransferQueue
structure.  Specifically, the ensureFile() method of the uploadContext
structure first checks for the presence of the object data file in
the .git/lfs/objects local storage directories.  If that does not
exist, then the method looks for a file in the working tree at the
path associated with the Git LFS pointer that corresponds to the
object.  If that file also does not exist, and if the
"lfs.allowIncompletePush" Git configuration option is set to "false",
then the method returns "true", and this value is ultimately used
for the "missing" argument in the call to the TransferQueue's
Add() method for the object.

Note that the file in the working tree may have any content, or be
entirely empty; its simple presence is enough to change the value
returned by the ensureFile() method, given the other conditions
described above.  We expect to revise this unintuitive behaviour in
a subsequent commit in this PR.

Before we make that change, however, we first adjust two aspects
of the implementation from PR git-lfs#3634 so as to simplify our handling
of missing objects during push operations.  We make one of these
adjustments in this commit, and the other in a following commit
in this PR.

As noted above, the enqueueAndCollectRetriesFor() method halts a
push operation if the server's response indicates that the server lacks
a copy of an object, and if the "missing" value passed to the Add()
method for that object was set to "true".  In order for the method
to determine an object's "missing" value, at present it consults
the Missing element of the object's Transfer structure from the
set of those structures provided in the BatchResponse structure.

However, Git LFS remote servers do not report an object's local
"missing" status, so these Missing fields are not populated from
the server's response.  Instead, the appropriate values for these
Missing fields in the Transfer structures representing the list
of objects in the server's response are set by either the Batch()
method of the tqClient structure, which handles HTTP-based Git LFS
Batch API requests and responses, and the Batch() method of the
SSHBatchClient structure, which handles the equivalent in the SSH
version of the Git LFS object transfer protocol.

These methods are invoked by the Batch() function in the "tq" package,
and are passed a batchRequest structure, whose Objects element has
been populated by that function from the list of Transfer structures
passed to it by the enqueueAndCollectRetriesFor() method.  The
Batch() methods for both the HTTP and SSH versions of the Git LFS
object transfer protocol iterate over this input set of Transfer
structures and create local maps from each object's ID to the
value of the Missing element from the object's input Transfer structure.

After performing the relevant HTTP or SSH batch request and receiving
a response from the remote server, the Batch() methods then iterate
over the list of objects in the server's reply and set each object's
output Transfer structure's Missing element from the value found in
the local map.

This design dates from the original implementation for the HTTP-based
Git LFS transfer protocol in commit 1412d6e
of PR git-lfs#3634, and was then implemented for the SSH-based protocol in
commit 594f8e3 of PR git-lfs#4446, when
that protocol was introduced.

However, we can simplify this design by eliminating the need to
create and reference local maps in the Batch() methods for each
transfer protocol, if we instead make use of the map of object IDs
to lists of objectTuple structures held in the "transfers" element
of the TransferQueue structure.  This "transfers" element contains
 a map from object IDs to "objects" structures, which in turn
have "objects" elements that are lists of objectTuple structures.
These are populated by the remember() method of the TransferQueue
method, which is called by the Add() method to record all the
input data it is passed when an object is added to the transfer queue.

One potential complication is that the "objects" structure allows
for multiple objectTuples to be recorded for a common object ID,
in case the Add() method is called repeatedly for the same Git LFS
object.  Hypothetically, this could allow for different values
of the "missing" argument to the Add() method to be recorded for
the same object ID during the operation of the transfer queue.

In practice, though, this is not possible, because the Add() method
will only be called once per unique object ID during push operations.
When a "git lfs pre-push" or "git lfs push" command runs, the
UploadPointers() method of the uploadContext structure in the
"commands" is the only caller of the Add() method of the TransferQueue
structure, and the UploadPointers() method only performs that call
for the object IDs returned by the prepareUpload() method, which it
calls before the TransferQueue's Add() method.

In the common case where a Git LFS push command scans through the
history of one or more Git references to find Git LFS pointers,
the UploadPointers() method of the uploadContext structure is called
for each individual pointer found in the Git history.  As a consequence,
the objects are passed individually to the prepareUpload() method.
That method checks if the given object have been processed already by
calling the HasUploaded() method, and if that returns "true", the
prepareUpload() method returns nothing, so the UploadPointers() method
in turn does not invoke the UploadPointers() method for that object.
If the HasUploaded() method returns "false", though, the object is
returned by the prepareUpload() method, so the UploadPointers() method
invokes the TransferQueue's Add() method, and then calls the
SetUploaded() method to register the object ID in the set of IDs
that have been processed so far.

Note that is unlikely for multiple Git LFS pointers with the same
object ID to be found in a repository's Git history in the first place,
because these pointers will normally be identical to each other, and
so will have identical Git blob SHA-1 values.  Therefore when we
run "git rev-list --objects ..." to retrieve the Git blobs reachable
from a set of Git references, Git LFS pointers with the same
Git LFS object IDs will normally only be represented by a single
Git blob.

However, as demonstrated by our t/t-duplicate-oids.sh test script,
it is possible for Git LFS pointers with legacy values for their
"version" field to exist, which may then reference the same SHA-256
Git LFS object ID as a modern Git LFS pointer but be stored as
distinct Git blobs.  Similarly, it is possible for a Git LFS pointer
with extension fields to resolve to the same final content data as
a pointer without those fields, or with different extension fields,
and so could also result in distinct Git blobs that are all valid
Git LFS pointers with the same SHA-256 object IDs.

Aside from the common case where a Git LFS push command scans through
the history of one or more Git references to find Git LFS pointers,
the "git lfs push" command may also be invoked with its --object-id
or --stdin options, along with a list of specific Git LFS object IDs
to be uploaded.  When handling these command-line options, the
uploadsWithObjectIDs() function in our "commands" package will
invoke the UploadPointers() method of the uploadContext structure
with the full set of Git LFS object IDs passed to the command.

Because a user may inadvertently supply the same object ID more than
once, this means duplicate IDs may be passed by the UploadPointers()
method to the prepareUpload() method, in which case that method's
call to the HasUploaded() method would return the same "false" value
for all the duplicate object IDs in the list.  Hence, this check is
not sufficient in this case to avoid duplicate IDs being returned
by the prepareUpload() method.  Fortunately, though, the
prepareUpload() method performs another round of de-duplication using
a local StringSet from the "tools" package of the standard Go library.
Each object ID is added to this "uniqOids" set as it is processed,
unless the set already contains that object ID, in which case the
object is skipped.

This second level of de-duplication logic ensures that it is not
possible for UploadPointers() method to receive object IDs from the
prepareUpload() method unless they have not been processed before.
This in turn guarantees that the Add() method of the TransferQueue
structure will never be called for the same Git LFS object ID more
than once during a push operation.

(As a sidebar, the use of the "uniqOids" set to de-duplicate objects
during push operations was originally introduced in commit
d770dfa of PR git-lfs#1600 to guard against
duplicate object IDs which arose from distinct Git LFS pointers with
the same SHA-256 object IDs, as can result from the use of legacy
pointer "version" values.  At the time, the prepareUpload() method
was called by an upload() function rather than the newer
UploadPointers() method.  During normal "git lfs pre-push" and
"git lfs push" commands, without the --object-id or --stdin options,
this upload() function would be passed all the Git LFS objects found
from scanning the appropriate Git history, and so might encounter
duplicate object IDs which would not be filtered by checking the set
maintained by the SetUploaded() method, since that was only called
after the Add() method of the TransferQueue structure.  Later, with
the changes in PR git-lfs#1953, the logic changed such that objects are
only handled individually during push operations when the --object-id
and --stdin options are not supplied, and since then the checks of the
"uniqOids" set in the prepareUpload() method have only guarded against
duplicate object IDs provided via those command-line options.)

As described above, we can rely on the object ID de-duplication in the
"commands" package to prevent the same object being requested for
upload more than once.  This ensures that the "objects" structure
found in the "transfers" map element of the TransferQueue structure
will only contain a single objectTuple.  Therefore, in our revisions
to the enqueueAndCollectRetriesFor() method, we can simply check the
Missing element of that first (and only) objectTuple to determine
whether the "missing" argument was set to "true" when the Add() method
of the TransferQueue structure was called for that unique object.

(Note that in PR git-lfs#2476, the "transfers" structure was updated to
maintain a list of objectTuple structures for each object ID, in order
to handle duplicate OIDs passed during "smudge" filter operations
using Git's "delay" capability for long-running filter drivers
like the "git lfs filter-process" command.  This filter never
handles upload operations, though, so the de-duplication logic in
the "tq" package is not applicable to push operations, since those
are fully de-duplicated in the "commands" package.)

In prior commits in this PR we refactored and expanded the tests
relevant to the handling of missing objects during push operations,
in part to provide additional validation of the changes in this
commit, including several new tests in our t/t-push-failures-local.sh
which attempt to perform push operations with missing objects while
using the SSH-based version of the Git LFS object transfer protocol.

In particular, the "push reject missing object (lfs.allowincompletepush
false) (git-lfs-transfer)" test checks that when an object is missing
locally, and is detected as such by the ensureFile() method of the
uploadContext structure because there is also no file in the working
tree at the path associated with the object's Git LFS pointer, the
Git LFS client abandons the push operation immediately and does not
proceed to try to upload either the missing object or any other objects.

The test validates this behaviour by checking that no objects are
present on the server after the "git push" command exits, and also
by looking for error and trace log messages the client should output
when it halts the push operation.  We added the trace log message in
a previous commit in this PR because we intend to revise the error
message in a subsequent commit, which will make the message more
consistent, so it is not specific to the case where no file is found
in the working tree, but will also mean we cannot use the revised
error message as an indicator that the client has halted the push
operation where we expect that it should.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request May 22, 2025
Since commit 1412d6e of PR git-lfs#3634,
during push operations the Git LFS client has sometimes avoided
reporting an error when an object to be pushed is missing locally
if the remote server reports that it has a copy already.

To implement this feature, a new Missing element was added to the
Transfer and objectTuple structures in our "tq" package, and the
Add() method of the TransferQueue structure in that package was
updated to accept an additional "missing" flag, which the method
uses to set the Missing element of the objectTuple structure it
creates and sends to the "incoming" channel.  Batches of objects to be
pushed are then gathered from this channel by the collectBatches()
method of the TransferQueue structure.

As batches of objectTuple structures are collected, they are passed
to the enqueueAndCollectRetriesFor() method, which converts them
to Transfer structures using the ToTransfers() method and then passes
them to the Batch() function, which is defined in our tq/api.go
source file.  This function initializes a batchRequest structure
which contains the set of Transfer structures as its Objects element,
and then passes those to the Batch() method specific to the current
batch transfer adapter's structure.  These Batch() methods return
a BatchResponse structure, which the Batch() function then returns
to the enqueueAndCollectRetriesFor() method.  The BatchResponse
structure also contains an Objects element which is another set
of Transfer structures that represent the per-object metadata
received from the remote server.

After the enqueueAndCollectRetriesFor() method receives a BatchResponse
during a push operation, if any of the Transfer structures in that
response define an upload action to be performed, this implies that
the remote server does not have a copy of those objects.

As one of the changes we made in PR git-lfs#3634, we introduced a step into
the enqueueAndCollectRetriesFor() method which halts the push operation
if the server's response indicates that the server lacks a copy of an
object, and if the "missing" value passed to the Add() method for that
object was set to "true".  (Initially, this step also decremented the
count of the number of objects waiting to be transferred, but this
created the potential for stalled push operations, and so another
approach to handling an early exit from the batch transfer process was
implemented in commit eb83fcd of
PR git-lfs#3800.)

Also in PR git-lfs#3634, several methods of the uploadContext structure in
our "commands" package were revised to set the "missing" value for
each object before calling the Add() method of the TransferQueue
structure.  Specifically, the ensureFile() method of the uploadContext
structure first checks for the presence of the object data file in
the .git/lfs/objects local storage directories.  If that does not
exist, then the method looks for a file in the working tree at the
path associated with the Git LFS pointer that corresponds to the
object.  If that file also does not exist, and if the
"lfs.allowIncompletePush" Git configuration option is set to "false",
then the method returns "true", and this value is ultimately used
for the "missing" argument in the call to the TransferQueue's
Add() method for the object.

Note that the file in the working tree may have any content, or be
entirely empty; its simple presence is enough to change the value
returned by the ensureFile() method, given the other conditions
described above.  We expect to revise this unintuitive behaviour in
a subsequent commit in this PR.

Before we make that change, however, we first adjust two aspects
of the implementation from PR git-lfs#3634 so as to simplify our handling
of missing objects during push operations.  We made one of these
adjustments in the previous commit in this PR, and we make the
other in this commit.

As noted above, the enqueueAndCollectRetriesFor() method halts a
push operation if the server's response indicates that the server lacks
a copy of an object, and if the "missing" value passed to the Add()
method for that object was set to "true".  When this occurs, the
enqueueAndCollectRetriesFor() method outputs an error message that
is distinct from the message which would otherwise be reported later,
when the partitionTransfers() method of the TransferQueue rechecks
whether individual objects' data files are present in the local
storage directories.

The message reported by the enqueueAndCollectRetriesFor() method
when it abandons a push operation states that the client was
"Unable to find source for object".  This message was originally
introduced in commit fea77e1 of
PR git-lfs#3398, and was generated by the ensureFile() method of the
uploadContext structure in our "commands" package, when that method
was unable to locate either an object file in the local storage
directories, or a corresponding file in the working tree at the path
of the object's Git LFS pointer.

As such, the wording of the message alludes to the expectation that
the ensureFile() method will try to recreate a missing object file
from a file in the working tree, i.e., from the object's "source".
This is how the method is described in the code comments that
precede it, and how it was intended to operate since it was first
added in PR git-lfs#176.

However, the ensureFile() has never actually recreated object files
in this way, due to an oversight in its implementation, and given
the challenges posed by the likelihood that files in the current
working tree do not correspond exactly to the source of missing Git LFS
object files, we expect to simply remove the ensureFile() method in
a subsequent commit in this PR.

In PR git-lfs#3634 the enqueueAndCollectRetriesFor() method of the TransferQueue
structure was revised to halt push operations if the server reports that
it requires the upload of an object for which the "missing" value
provided to the TransferQueue's Add() method was set to "true".  The
error message reported in such a case was copied from the message
formerly output by the ensureFile() method of the uploadContext
structure, and that method was altered so it no longer generated this
error message.

This error message is not the only one reported by the Git LFS client
when an object file is missing during a push operation, though, because
it is only output when the ensureFile() method does not find a file
(with any content) in the working at the path associated with the
object's pointer.  When a file does exist in the working tree, but
the actual object file in the Git LFS local storage directories is
missing, the push operation proceeds past the checks in the
enqueueAndCollectRetriesFor() method and continues to the point where
that method calls the addToAdapter() method of the TransferQueue
structure.  That method in turn calls the partitionTransfers() method
to determine which of the current batch of objects to be uploaded
have local object files present and which do not.

If the partitionTransfers() method finds that an object file is
missing for a given object, it creates a new MalformedObjectError
with the "missing" element of that error structure set to "true".
The method then instantiates a TransferResult structure for the object
and sets the Error element of the TransferResult to the new
MalformedObjectError.  Finally, the method returns this TransferResult
along with the other TransferResult structures it creates for all the
other objects in the batch.

The addToAdapter() method passes these TransferResult structures
individually to the handleTransferResult() method, which checks whether
the Error element is defined for the given TransferResult.  If an
error was encountered, and is one which indicates the object transfer
should not be retried, then the handleTransferResult() method sends
the error to the TransferQueue's "errorc" channel.  For errors of the
MalformedObjectError type, this is always the case.

During a push operation, the CollectErrors() method of the uploadContext
structure in the "commands" package receives these errors from the
channel, and if they are errors of the MalformedObjectError type and
have a "true" values in their "missing" element, the object's ID and
the filename associated with the object's pointer are recorded in the
"missing" map of the uploadContext structure.

When the ReportErrors() method of the uploadContext structure is then
run, it iterates over the keys and values of the "missing" map and
outputs an error message containing both the object ID and the
associated filename of the object's pointer, along with a "(missing)"
prefix.  As well, a leading error message is output, whose exact text
depends on the value of the "lfs.allowIncompletePush" configuration
option, and if this option is set to its default value of "false",
several trailing error messages are output which provide a hint as to
how to set that option if the user wants to allow a subsequent push
operation with missing objects to proceed to completion as best it can.

These per-object error messages were first defined in commit
9be11e8 of PR git-lfs#2082, and the trailing
hints were added in commit f5f5731 of
PR git-lfs#3109, at which time the "lfs.allowIncompletePush" option's default
values was changed to "false".

As mentioned above, in a subsequent commit in this PR we expect to
remove the ensureFile() method of the uploadContext structure.  We
still expect to perform a check for missing object files in the
uploadTransfer() method, though, as this will typically find any
missing object files at the start of a push operation, and thereby
allow the enqueueAndCollectRetriesFor() method of the TransferQueue
structure to halt the operation as soon as one of those objects is
found to be required by the remote server.

For the time being, we will leave the secondary check for missing
object files in place in the partitionTransfers() method of the
TransferQueue structure, as this method also tests the size of the
object file and reports those with unexpected sizes as corrupt.

Nevertheless, we would like to make our error messages as consistent
as possible when handling missing object files.  Therefore we revise
the enqueueAndCollectRetriesFor() method so it no longer returns the
"Unable to find source for object" error message, but instead returns a
new MalformedObjectError with a "true" value for its "missing" element.

One advantage of this change is that we remove the somewhat stale
wording of the previous message, which reflected the assumption that
the ensureFile() method of the uploadContext structure would attempt
to recreate missing object files from "source" files in the working
tree, even though the method has never actually done so.

Another advantage is that by returning a MalformedObjectError, the
existing logic of the CollectErrors() and ReportErrors() methods of the
uploadContext structure will handle the error exactly as if it had been
generated by the TransferQueue's partitionTransfers() method, and will
output both the same leading error message and trailing hint messages
as in that case.

As a result, we also adjust several tests in our t/t-pre-push.sh
and t/t-push-failures-local.sh scripts to expect the error messages
output by the ReportErrors() method instead of the message previously
generated by the enqueueAndCollectRetriesFor() method.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

t-push-bad-dns gets stuck if no network is available

3 participants