Skip to content

Reuse SSH sessions when possible and only close at client exit#6266

Merged
chrisd8088 merged 14 commits into
git-lfs:mainfrom
chrisd8088:perform-consistent-exit-cleanup
May 21, 2026
Merged

Reuse SSH sessions when possible and only close at client exit#6266
chrisd8088 merged 14 commits into
git-lfs:mainfrom
chrisd8088:perform-consistent-exit-cleanup

Conversation

@chrisd8088

Copy link
Copy Markdown
Member

This PR updates the Git LFS client so that where possible, SSH sessions are reused for multiple related operations.

This change will permit the "pure" SSH-based Git LFS transfer protocol to be used when pushing or fetching multiple branches, resolving the issue reported in #6118.

As well, this PR revises the Git LFS client so that when exiting, it always tries to cleanly terminate any SSH sessions that were started for the "pure" SSH-based transfer protocol.

This PR will be most easily reviewed on a commit-by-commit basis, with whitespace differences ignored.

SSH Session Reuse

When we introduced support for the "pure" SSH-based Git LFS object transfer protocol in PR #4446, we designed the SSHTransfer structure in our ssh package to be the basic abstraction with which we represent and manage SSH processes and connections for the protocol.

In commit 31d3fb7 of the same PR we added an SSHTransfer() method to the Client structure of our lfsapi package in order to provide a common interface through which we could initiate and share SSH sessions. This method always returns either a new SSHTransfer structure, representing a new SSH session and underlying set of connections, or else returns a nil value to indicate that the "pure" SSH-based Git LFS protocols are not supported by the remote service, or that an error condition has occurred.

Because the SSHTransfer() method always returns a new SSH session (when it does not return nil), at present we create separate sessions for both object transfer requests and locking requests.

Further, as reported in #6118, when a transfer queue completes we shut down any SSH session it was using, but then we attempt to reuse that session in subsequent transfer queues, which causes an error.

Specifically, when uploading or downloading objects for multiple Git references, we establish a single transfer manifest for the entire operation. The first time one of the methods of the lazyManifest structure in our tq package is invoked, it creates a concreteManifest structure by calling the newConcreteManifest() function, which calls the SSHTransfer() function of our lfsapi package. Thus if that function returns a non-nil value (i.e., an SSHTransfer structure), then the manifest retains a pointer to that structure for the lifetime of the Git LFS client process.

We then pass that manifest to the NewTransferQueue() function each time we start a new transfer queue. After all the objects to be transferred have been added to the queue, we then call the Wait() method of the TransferQueue structure.

When all the object transfers are complete, if the queue's manifest contains non-nil sshTransfer field, then the Wait() method calls the Shutdown() method of the SSHTransfer structure, which closes the underlying SSH connections.

The consequence is that if we then attempt to start another queue for the same operation and remote endpoint, we do attempt to reuse the same SSHTransfer structure and associated SSH session, but all of its SSH processes and connections have been terminated. This leads to the error reported in #6118.

To avoid this problem, we first refactor the SSHTransfer() method in the lfsapi package into two methods. The new initSSHTransfer() method performs the same actions as the SSHTransfer() method did before, and as such may return
a nil value if the remote service does not support the git-lfs-transfer command.

The SSHTransfer() method now maintains a map of the SSH sessions for which it has called the initSSHTransfer() method, and only calls that method if it is required. Otherwise, the value returned by the initSSHTransfer() method for the given operation and remote parameters is returned again, even if it is a nil value.

To maintain this map safely, the SSHTransfer() method first acquires a mutex, which we add to the Client structure along with the map. For keys, we use a concatenation of the unique pairs of operation and remote strings passed to the method as its parameters. This technique follows the existing approach used in our commands package, where we construct keys for the global tqManifest map by concatenating the operation and remote string parameters of the getTransferManifestOperationRemote() function.

One advantage of this change is that it also resolves the problem whereby we would previously create an extra SSH process and connection for each locking request, even if they were to be made to the same remote service as subsequent object transfer requests, and for the same type of operation (i.e., an upload or download operation).

Another advantage is that when the Git LFS client is exiting, we can identify the set of SSH sessions that were opened for the SSH-based transfer protocol and attempt to terminate them all properly, instead of closing the individual session used for a transfer queue after that queue has completed.

To do this, we add a closeSSHTransfers() method to the Client structure in the lfsapi package, and then invoke this new method from the Client structure's Close() method. The Close() method is called by the commands package's closeAPIClient() function when a Git LFS command has completed normally.

Our new closeSSHTransfers() method iterates over all the non-nil values in the map maintained by the SSHTransfer() method and calls the Shutdown() method of each SSHTransfer structure, thus attempting to terminate them properly. We can then revise the Wait() method of the TransferQueue structure so that it no longer attempts to shut down the SSH session associated with the queue's manifest.

Collectively, these revisions and enhancements should resolve the problem reported in #6118. To verify this, we add a new test to our t/t-batch-transfer.sh shell script which runs each of the git push and git lfs fetch commands for multiple branches, causing both commands to start and stop more than a single transfer queue.

Consistent Exit Handling

The changes described above ensure we reuse SSH sessions for the "pure" SSH-based Git LFS transfer protocols, and that we only attempt to terminate them when the client is exiting, at least under normal circumstances.

However, if certain errors occur or if one of the signals we trap is received, then the Git LFS client may halt immediately, and in these cases we would like it to at least attempt to terminate any SSH sessions started for the SSH-based transfer protocol.

In PR #6255 we revised all of the individual command implementation functions in our commands package so that they always invoke one of a small set of exit wrapper functions when they need to force the client to halt. We can now take advantage of this fact to ensure that regardless of circumstances, the Git LFS client will always attempt to terminate any active SSH sessions that were opened for the "pure" SSH-based Git LFS transfer protocols.

Most obviously, we add a call to the Cleanup() function of our commands package to each of the exit wrapper functions, just before they call the Exit() function of the Go standard library's os package.

Next, we refactor the Cleanup() function into two functions, with the actual cleanup activities performed by a new doCleanup() function, which is then invoked by the Cleanup() function exactly one time only, via the Do() method of a Once structure from the sync package of the Go standard library. We assign this Once structure to a new cleanupOnce global variable in our commands package, which replaces the once variable of the same type in our main package. In our main() function, we then just call the commands package's Cleanup() directly in the two instances where we previously did so via the once variable's Do() method.

Finally, we add to the doCleanup() function a call to the closeAPIClient() function, which acquires the necessary global mutex before invoking the global apiClient variable's Close() method. This supplants the legacy call to the closeAPIClient() function from our Run() function, so we just remove that call.

Other Changes

In addition to the principal changes described above, this PR also makes a number of small refinements, including:

  • We fix the SSH connection IDs in the trace log messages we output when uninitialized connections are ignored while reducing the number of active connections.
  • We output new messages to help clarify when lock verification begins and ends in our trace logs.
  • We update our existing tests of the "pure" SSH-based Git LFS object transfer protocol to perform additional checks and to make use of the setup_expected_concurrent_transfers() test helper function we added in PR #6241.
  • We rename one of these tests and revise our assert_ssh_transfer_sessions() test helper function to correctly reflect the conditions checked by the test, in which the number of SSH connections is limited to fewer than the number of objects to be transferred in a single batch. This causes the transfer of some objects to be delayed, but still performed within the same batch as the others (and to be performed by the same transfer queue).
  • We revise a number of our Go test functions and secondary programs so that we consistently call the Close() method of the lfsapi package's Client structure type, whenever such a structure has reached the end of its lifetime. For the same reason, we update one instance in the Git LFS client where we create a temporary Client structure, and eliminate another similar instance which exists solely to support some of our Go test functions.

Resolves #6118.
/cc @galets as reporter.

chrisd8088 added 12 commits May 15, 2026 23:19
In later commits in this PR we expect to revise our SSH connection
handling so that when a Git LFS remote supports the "pure" SSH Git LFS
object transfer protocol, we reuse SSH connections as much as possible
for both locking API requests and object transfers.

At present, though, we start separate SSH connections for locking API
requests and for object transfers.  As well, when verifying locks or
making other locking API requests for multiple remote Git references,
we start a separate SSH connection for each unique reference.

This is particularly noticeable when a "git lfs pre-push" command is
passed multiple references by the Git pre-push hook, in which case
the command attempts to check for locks held on the remote by other
users and makes one lock verification request for each Git reference
being pushed.  When the "pure" SSH Git LFS protocol is in use, the
consequence is that we start one SSH connection for each reference,
plus one or more SSH connections for the subsequent object transfers.

One challenge we encountered while working toward a resolution of this
issue is that our trace logs currently do not clearly indicate when
the Git LFS client is making a lock verification request.  To help
clarify the behaviour of the client in this regard and aid future
debugging efforts we therefore now add trace log messages which bracket
each lock verification request and which state the remote reference for
which the request is being made.
In commit 693c6f8 of PR git-lfs#5905 we
updated the trace log messages output by the Git LFS client when
it starts or stops SSH connections for the "pure" SSH-based Git LFS
object transfer protocol, and in particular we revised several of
the messages output by the setConnectionCount() method of the
SSHTransfer structure in our "ssh" package to more clearly identify
SSH connections by their internal session ID.

In the case of the "terminating pure SSH connection" message, we
made this change correctly, by adding a session ID to the message
which we calculated using the current loop index variable "i" plus
the variable "tn", which gives the initial offset into the slice of
SSH connections over which the loop iterates.

However, in the case of the "skipping uninitialized lazy pure SSH
connection" message, we accidentally omitted the "tn" variable's offset
from the value we interpolate into the message as the session ID, with
the result that this message's session IDs are not aligned with those
of the "terminating pure SSH connection" messages.

To fix this oversight we simply make sure to include the slice offset
from the "tn" variable in the session IDs we calculate when generating
"skipping uninitialized lazy pure SSH connection" trace log messages.
First in PR git-lfs#4446 and then later in PR git-lfs#5905 we added tests to our
"t/t-batch-transfer.sh" shell script to verify the behaviour of the
Git LFS client when it uploads and downloads objects using the
"pure" SSH-based Git LFS object transfer protocol.

Each of these tests performs a "git push" command followed by a
"git clone" command, checks the trace log output of those commands
to ensure the expected number of SSH connections were started, and
also checks that the commands have transferred all the objects
created by the tests.

However, at least in the case of the "git clone" commands, the tests
do not actually confirm that the exit codes of the commands are zero
(which indicates success).  We set the "errexit" shell attribute in
each test, but this only causes the tests' subshells to exit if the
final command in a pipeline fails, and we perform the "git clone"
commands in pipelines where the final tee(1) command always succeeds.

We therefore now add checks of the first element of the PIPESTATUS
shell array variable after each "git clone" command to verify that
these commands have succeeded and returned exit statuses of zero.

We also add the same checks after each "git push" command in our
tests of the SSH-based Git LFS object transfer protocol, and revise
the tests to run the "git push" commands in pipelines where a final
"tee" command captures the trace messages output by the commands
into a log file.

The existing design was sufficient to confirm that the "git push"
commands succeeded, since the "errexit" shell attribute would cause
the tests to fail if the commands returned a non-zero exit code.
Should this occur, though, the trace messages generated by the
commands would be lost, because we redirected the commands' standard
output into a log file.

To aid future debugging efforts we therefore now adopt the same
idiom we use elsewhere in our shell test suite.  In each of these
tests we pipe the output of the "git push" command into a "tee"
command, and then check the first element of the PIPESTATUS array
variable to confirm that the "git push" command succeeded.

This change ensures that if a "git push" command does fail then the
end_test() function which follows the test will report the full set
of trace log messages from the command.
In commit a84cd13 of PR git-lfs#5905 we added
a pair of tests to our "t/t-batch-transfer.sh" shell test script, both
of which verify that the Git LFS client is able to successfully upload
and download multiple objects using the "pure" SSH-based Git LFS object
transfer protocol.

These tests were intended to demonstrate that we had resolved the bug
reported in git-lfs#5880, where the Git LFS client would crash when trying to
transfer multiple objects using the SSH-based transfer protocol.

The first test, named "batch transfers with ssh endpoint and multiple
objects (git-lfs-transfer)", creates three objects and then checks that
when pushing and fetching the objects, the Git LFS client uses three
separate SSH processes to do so.  Note that because we set the
"lfs.ssh.autoMultiplex" configuration option to "true", the test
expects that these SSH processes will share a single multiplexed
connection.

The second test, which we named "batch transfers with ssh endpoint and
multiple objects and batches (git-lfs-transfer)", also creates three
objects and then pushes and fetches them, but only after setting the
"lfs.concurrentTransfers" configuration option to a value smaller than
the number of objects to be transferred.  This forces the Git LFS client
to use only two separate SSH processes to transfer the objects, so that
the transfer of the third object is delayed until the first two objects
have been successfully transferred.

Although the second test does verify this behaviour of the Git LFS
client, the test's name is not accurate, nor is one of its internal
code comments, because these both state that the objects are
transferred in multiple batches.  While the transfer queue does delay
the transfer of one object until the other two have been transferred,
this all still occurs within a single batch of objects.

We therefore now update the name of the test to "batch transfers with
ssh endpoint and multiple objects exceeding workers (git-lfs-transfer)",
which more accurately reflects the conditions created by the test.
We also rename the test's repository to align with the new test name,
and revise the incorrect code comment to better explain the effect of
setting the "lfs.concurrentTransfers" configuration option.

Note that the remainder of this commit description is provided here
solely for future reference, and to clarify and correct a few details
from the commit descriptions in PR git-lfs#5905.

The reason we originally developed the "batch transfers with ssh
endpoint and multiple objects and batches (git-lfs-transfer)" test
was to help reproduce the bug reported in git-lfs#5880, in which the Git LFS
client would panic and crash when it tried to use an SSH process it
had previously closed to transfer an object.  Hence the test tries
to force the client to reuse an SSH process to transfer at least one
object, unlike what occurs with the preceding "batch transfers with
ssh endpoint and multiple objects (git-lfs-transfer)" test.  In that
test, each of the three objects will be transferred using a separate
SSH process, since the number of objects is fewer than eight, which
is the default value of the "lfs.concurrentTransfers" option.

In practice, both of these tests could reproduce the crash, absent
some of the other changes we made in PR git-lfs#5905.  Most obviously, in
commit eca6c8a we updated the client
to avoid attempting to use a previously closed SSH process.  This
stopped the panic and crash from occurring, so the tests would of
course succeed once that change was made.

However, prior to making that correction, we first resolved a bug in
our "lfs-ssh-echo" test helper utility, and this on its own was
sufficient to allow our new tests to pass, although we did not intend
for that to be the case.

Specifically, in commit 9af2883 of the
same PR git-lfs#5905, we adjusted the "lfs-ssh-echo" utility so that it set
defined read/write permissions on the temporary file it uses to simulate
the behaviour of OpenSSH when the ControlMaster argument is supplied.
This allowed subsequent invocations of the utility to reopen the file,
whereas previously they would fail with a non-zero exit code.

As it happens, though, the two tests we introduced in commit
a84cd13 depended on the "lfs-ssh-echo"
utility failing unexpectedly in order to fully reproduce the crash bug
from git-lfs#5880.

The crash would occur when the Git LFS client attempted to reuse an
SSH process it had previously closed, which was only possible when
the client had shut down one transfer queue and then started another.
The crash did not occur if just a single queue was used, even if the
queue transferred objects in multiple batches or even in sequence
within a batch in order to respect the maximum transfer concurrency
limit.  Thus the values of the "lfs.transfer.batchSize" and
"lfs.concurrentTransfers" configuration option had no effect on
whether the crash occurred or not.

Instead, to reproduce the crash, the client needed to be induced to
start a second transfer queue after shutting down the initial one.
As they were originally written, both of the tests we added in commit
a84cd13 implicitly expected a second
queue to be started after the first queue experienced the failure of
an SSH process.  The tests therefore effectively relied on the bug in
the "lfs-ssh-echo" command to cause this to occur, but because we
fixed that bug before adding the tests, they could never actually
reproduce the issue as we intended.

Both tests create and then clone a repository containing several
Git LFS objects.  When cloning, the "git lfs filter-process" command
is invoked by Git and asked to apply its "smudge" filter to each
object.  Git sends these requests via its long-running filter protocol,
and indicates that the filter process may delay its response, so the
"git lfs filter-process" command replies with "status=delayed" messages
and then enqueues the objects for download.

To transfer the first object, the client starts an initial SSH process
with the ControlMaster argument set to "yes".  For the second object,
since the "lfs.ssh.autoMultiplex" configuration option is enabled,
the client starts another SSH process but sets its ControlMaster
argument to "no", with the expectation that both processes will share
the same underlying SSH connection.

Our shell test suite uses our "lfs-ssh-echo" utility in place of an
actual SSH program like OpenSSH, and the bug in the "lfs-ssh-echo"
utility caused it to exit with an error status code when it was run
with the ControlMaster argument set to "no".

When the bug was present, then, the second object transfer in the
client's initial queue would fail, and queue would treat this error as
one for which the transfer should not be retried.

As a consequence, when Git sent an initial "list_available_blobs"
request, per the long-running filter protocol, the Git LFS client
would reply with only the file path corresponding to the first object.
Git would then retrieve the "smudged" content of that file via another
"smudge" filter protocol request (this time with the "can-delay" option
not permitted, though), after which it would make another request
to list the available blobs.

At this point, the first transfer queue would have been shut down,
because when the "git lfs filter-process" command receives the first
"list_available_blobs" request from Git, it calls the Wait() method
of the TransferQueue structure in our "tq" package.  This method
closes various channels and then calls the Shutdown() method of the
SSHTransfer structure in our "ssh" package, which terminates all of
the running SSH processes.

When our "git lfs filter-process" command responds to the second
"list_available_blobs" request from Git, we might expect that it would
reply with an error status for the second object, since the queue
failed to download it and determined that further transfers would
not be attempted.

However, since commit e764429 of
PR git-lfs#2511, when we implemented support for the delay feature of Git's
long-running filter protocol, this is not how the Git LFS client
behaves under these conditions.  Since the transfer queue has been
closed down, when the filterCommand() function in our "commands"
package handles the second "list_available_blobs" request from Git,
it finds that no file paths can be read from the queue's "available"
channel.  However, it then iterates through the list of outstanding
file paths for which no content data has been returned to Git, and
sends each of those file paths back in its response.  (Specifically,
the function iterates over the "ptrs" map and replies to Git with
all of the map's keys.  These keys are the file paths Git sent in
"smudge" requests with the "can-delay" option enabled, and for which
no content data has been sent back to Git.)

Because Git receives the file path corresponding to the second object
in reply to its second "list_available_blobs" request, Git now
requests that object's "smudged" content, and does so with the
"can-delay" option disabled.  When the "git lfs filter-process"
command receives this request, its filterCommand() function invokes
the smudge() function for the single file's path, which in turn
calls the Smudge() method of the GitFilter structure in our "lfs"
package.  That method runs the same structure's downloadFile() method,
and that calls the "tq" package's NewTransferQueue() function to
start a dedicated queue to download just the one object.

The bug we fixed in commit eca6c8a
would now cause the client to panic and crash.  Note, though, that
our description in that commit suggests that the crash would occur
between batches in a single transfer queue, which is not accurate.
As explained above, for the crash to occur, a first transfer queue
must have been fully shut down, and then another queue started.

The new TransferQueue structure would be initialized with the same
concreteManifest structure as was populated for the original queue.
The batchClientAdapter field of this concreteManifest structure
pointed to an extant SSHBatchClient structure, and the "transfer"
field in that structure in turn pointed to the SSHTransfer structure
that was used to start and stop the SSH connections and processes
for the initial transfer queue.  When the SSHBatchClient structure's
batchInternal() method would attempt to retrieve a session, it would
receive a "nil" value from the SSHTransfer structure's Connection()
method and then try to dereference it, causing a panic.

All of which explains why, when we fixed the bug in the "lfs-ssh-echo"
utility, the tests we introduced in PR git-lfs#5905 were no longer effective
at actually simulating the conditions under which the Git LFS client
would panic and crash.
In commit aa08c37 of PR git-lfs#6241 we
changed the default value of the "lfs.concurrentTransfers"
configuration option from eight to a dynamic limit based on the
number of CPUs in the current system.

This change required that we also update a number of the tests in
our shell test suite, as they previously were written with the
value of eight hard-coded as the expected default value of the
"lfs.concurrentTransfers" option.

We therefore added a new setup_expected_concurrent_transfers() helper
function to our "t/testhelpers.sh" shell library, and revised several
test scripts to invoke this function before executing any tests.
The function sets a global "expectedConcurrentTransfers" variable with
the expected default value of the "lfs.concurrentTransfers" option so
that tests can now refer to this variable instead of using a fixed
hard-coded value as they did before.

However, when making these revisions, we inadvertently overlooked the
tests of the "pure" SSH-only Git LFS transfer protocol in our
"t/t-batch-transfer.sh" script.  At present, these tests also contain
a hard-coded value for the expected maximum number of object transfers
which may be performed concurrently.  The tests pass this value to
the assert_ssh_transfer_sessions() helper function, which is defined
in the same script as the tests.

One reason we did not adjust these tests in PR git-lfs#6241 is that the
assert_ssh_transfer_sessions() function's fourth parameter, which is
expected to receive a value representing the maximum number of
concurrent transfers, is misnamed as "objs_per_batch".

We chose the parameter name "objs_per_batch" when we first introduced
the function in commit b20b6e9 of
PR git-lfs#5905.  In the same PR, we then added tests of the SSH-based object
transfer protocol, including the "batch transfers with ssh endpoint
and multiple objects and batches (git-lfs-transfer)" test in which
we explicitly set the "lfs.concurrentTransfers" with a value of two.

As we explained in a prior commit in this PR, that test was also
misnamed, as it effectively forced the Git LFS client to only transfer
two objects at a time, but did not alter the batch size or the
number of batches the queue processed.  Hence we have now renamed
the test to "batch transfers with ssh endpoint and multiple objects
exceeding workers (git-lfs-transfer)", which more accurately reflects
the conditions established by the test.

The assert_ssh_transfer_sessions() function's fourth parameter, which
is currently named "objs_per_batch", is used to calculate the maximum
number of trace log messages the function expects to find which record
the start of an SSH process or its termination.  Since this number is
actually determined by the value of the "lfs.concurrentTransfers"
option, we now rename the parameter to "max_concurrency" to better
reflect its purpose.

As well, in the tests where we call the assert_ssh_transfer_sessions()
function and pass a hard-coded value of eight for the fourth parameter,
we now instead pass the value of the "expectedConcurrentTransfers"
variable.  This will allow the tests to continue to pass regardless of
how we choose to define the "lfs.concurrentTransfers" option's default
value in the future, as the setup_expected_concurrent_transfers()
function should always return the appropriate value.

Note that for the reasons outlined in PR git-lfs#6258, we plan to temporarily
reverse the principal changes from PR git-lfs#6241.  However, we also intend
to eventually re-adopt the use of a default concurrency limit that
scales with the number of CPUs, at least for HTTP-based object transfers.
Hence we still retain the setup_expected_concurrent_transfers() function
in PR git-lfs#6258 and can rely on it being present in the future.
In PR git-lfs#763 we revised the way the Git LFS client created and deleted
temporary files, and as part of this change we updated our top-level
main() function so that upon exit it invokes a function to delete any
stale temporary files and always does so exactly one time only.

Specifically, in commit a0704be we
updated the main() function to call the ClearTempObjects() function
in our "lfs" package using the Do() method of a Once structure from
the "sync" package in the Go standard library.  The main() function
invoked the ClearTempObjects() function via the Do() method in two
instances, when the main() function was exiting normally, and when
it received a signal from the operating system.  Then in commit
a3a2369 of the same PR we refined
the list of signals for which the client will be notified to exit
and restricted them to just the SIGINT and SIGKILL signals.

Later, in commit b9c5a10 of PR git-lfs#1390
we created a Cleanup() function in our "commands" package and changed
the Do() methods in our main() function to call this new Cleanup()
function instead.  In turn, the Cleanup() function then invoked the
ClearTempObjects() function.

Finally, in commit e9121fd of
PR git-lfs#2689 we replaced the ClearTempObjects() function with the
Cleanup() method of the Filesystem structure in a new "fs" package.
In the same commit we also added a Cleanup() method for the
Configuration structure in our "config" package, which just invokes
the Cleanup() method on the Configuration structure's "fs" field.
We then updated the Cleanup() function in the "commands" package to
call the Cleanup() method of the global "cfg" variable, whose type
is that of a Configuration structure from our "config" package.

These refactoring steps did not alter the basic design from PR git-lfs#763,
though, in which stale temporary files are removed when the Git LFS
client either receives a SIGINT and SIGKILL signal or otherwise
exits normally.

Note that these conditions include the case where the main() function
will return a non-zero exit code of 127 because the Run() function in
our "commands" package has returned this value.  However, this case
occurs only when a user provides an unknown Git LFS command name or
invalid set of command-line flag options and arguments.  When that
occurs, the Execute() method of the Command structure from the
"github.com/spf13/cobra" package returns a non-nil "error" value to
our Run() function, which then returns the value 127 to the main()
function.

In other instances when the Git LFS client encounters an unrecoverable
error condition, though, our command implementation functions call
one of the exit wrapper functions in our "commands" package, such as
Exit() or ExitWithCode().  In a recent set of changes in PR git-lfs#6255 we
ensured that one of these wrapper functions is always used when the
client needs to halt immediately.

However, these wrapper functions do not consistently perform the
cleanup steps that are otherwise executed by the Run() and main()
functions when the client exits normally.  In subsequent commits in
this PR we expect to revise our exit wrapper functions so that when
the client halts it always performs the same set of cleanup actions.

As an initial step toward this goal, we begin by moving the Once
structure from the "main" package to our "commands" package.  We
also refactor the Cleanup() function in the "commands" package so
that it just invokes a new doCleanup() function using the Do()
method of the Once structure.

The doCleanup() function then performs the actual removal of any
stale temporary files by calling the Cleanup() method of the global
"cfg" variable, which in turn executes the Cleanup() method of a
Filesystem structure from our "fs" package.

Next, we revise the main() function so that it calls the Cleanup()
function of the "commands" package directly, both at the end of
the main() function and when a SIGINT or SIGKILL signal has been
received.  As before, though, the temporary file cleanup operation
itself should occur at most one time only, as it is exclusively
invoked via the Once structure's Do() method.

With these changes in place, we will now be able to make further
refinements to ensure that the Git LFS client always performs
cleanup operations when exiting.  In subsequent commits in this PR
we will update the exit wrapper functions in our "commands" package
so they all call the Cleanup() function, which can now guarantee
on its own that it will only execute the cleanup activities a
single time.
In a prior commit in this PR we refactored our "main" package's
main() function and the Cleanup() function in our "commands" package
so that the Cleanup() function can now guarantee on its own that
it will only execute the cleanup activities a single time.

At present, the only caller of the Cleanup() function is the
main() function, which does so before it calls the Exit() function
from the Go standard library's "os" package.

However, as we described in the prior commit in which we refactored
the Cleanup() function, the individual command implementation functions
in our "commands" package often call one of the exit wrapper functions
in the same package when an unrecoverable error condition is
encountered.  These exit wrapper functions in turn call the "os"
package's Exit() function, so they cause the Git LFS client to halt
without performing any cleanup activities.

Since the Cleanup() function is now able to guarantee that it will
only execute the cleanup operations a single time, we can update
all of the exit wrapper functions which call the "os" package's
Exit() function so that they first invoke our Cleanup() function.

In a recent set of changes in PR git-lfs#6255 we ensured that one of these
wrapper functions is always used when the client needs to halt
immediately.  As a consequence, with the changes from this commit
in place the client should now always perform its cleanup activities
when exiting, regardless of whether or not an error condition was
encountered.
In PR git-lfs#1839 we first developed our "lfsapi" package, and in commit
fecdc9e of that PR we introduced the
getAPIClient() function to our "commands" package to ensure that we
only initialize a single, global Client structure from the "lfsapi"
package.

The getAPIClient() function acquires a global mutex, appropriately
named "global", and then checks if the "commands" package's "apiClient"
global variable has been initialized or not.  If not, the function
calls the NewClient() function of the "lfsapi" package and assigns
the newly instantiated Client structure to the "apiClient" variable.

Later, in commit 2435a67 of PR git-lfs#2184,
we added an HTTPLogger field to the "lfsapi" package's Client
structure.  This field's type was that of a WriteCloser structure from
the Go standard library's "io" package, and since this type has a
Close() method, we also defined a Close() method for the Client
structure that called the HTTPLogger field's Close() method, unless
the field was nil.

In the same commit in PR git-lfs#2184 we then updated the Run() function in
our "commands" package to call the new Close() method of the
Client structure, after first calling the getAPIClient() function
to ensure that the "apiClient" variable was not nil.

Subsequently, in commit d16a320
of PR git-lfs#2633, we introduced a closeAPIClient() function to the
"lfsapi" package, which acquires the "global" mutex and then
calls the "apiClient" variable's Close() method, but only if the
variable is not nil.  By revising the Run() function to then just
call the closeAPIClient() function before returning, this change
avoided the need to instantiate a Client structure only in order
to close it.

Note that later, in PR git-lfs#3244, we refactored the "lfsapi" package and
moved portions of its functionality into a new "lfshttp" package,
including the HTTP logging subsystem.  The Close() method of the
"lfsapi" package's Client structure now just calls the Close() method
of the "lfshttp" package's Client structure, and that structure is
the one which contains an httpLogger field.  Regardless, it remains
the responsibility of the "commands" package to call the Close()
method on its "apiClient" variable when necessary, in order to
release any resources associated with it or its internal components.

However, because the "apiClient" variable's Close() method is only
called when the Run() function is about to return, the resources
associated with our HTTP logging subsystem are only properly released
when the Git LFS client exits normally.  If the client encounters an
unrecoverable error condition, it invokes one of the exit wrapper
functions in our "commands" package, which call the Exit() function
of the Go standard library's "os" package and therefore do not
return.  As a result, the Run() function is not completely executed,
and the closeAPIClient() function will never be called.

Likewise, if the client receives a SIGINT or SIGKILL signal, the
signal handler established by our main() function in the "main"
package will halt the program by calling the "os" package's Exit()
function, and so the closeAPIClient() function will also not be
called in this case.

Fortunately, in previous commits in this PR as well as in PR git-lfs#6255
we have made a series of revisions which ensure that when exiting,
the Git LFS client will always call the Cleanup() function in our
"commands" package, regardless of the circumstances under which the
client needs to halt.

We therefore now revise our "commands" package so that the
closeAPIClient() function is called from the Cleanup() function
rather than the Run() function.  This change will guarantee that
the resources associated with our HTTP logging subsystem are always
properly released when the Git LFS client exits.

We also take the opportunity to add a check of the "error" return
value from the closeAPIClient() function, so that if it is not nil,
the Cleanup() function will report the corresponding error message to
the standard error file descriptor.
In previous commits in this PR, we revised the Git LFS client such
that it now always performs a consistent set of cleanup activities
when exiting, regardless of the circumstances.  In particular, the
Cleanup() function in our "commands" package should always be executed
before the client halts, and one of the steps this function now takes
is to invoke the closeAPIClient() function in the "commands" package.

The closeAPIClient() function checks if the "apiClient" global
variable has been initialized.  If it has, then it points to a Client
structure from our "lfsapi" package, and the closeAPIClient() function
calls that structure's Close() method to release any resources
associated with the structure.

However, we also instantiate Client structures from the "lfsapi"
package in a number of other instances, primarily for our Go test
suite but not exclusively so.  While in many of these cases we do
not strictly need to call the Close() method on the Client structures,
for consistency we now revise all these instances so that we do not
create a Client structure without eventually also calling its
Close() method.

In the Environ() function of our "lfs" package, this is simply a
matter of registering the Close() method to run when the function
returns, as the Client structure created by the function is only
referenced within the scope of the function.

In the newConcreteManifest() function of our "tq" package, however,
we remove the conditional block which would instantiate a new Client
structure if the function's "apiClient" parameter had a nil value.
The only purpose of this conditional block was to allow some of our
Go test functions in the "tq" package to avoid creating a Client
structure to pass to the package's NewManifest() function, which
instantiates a new lazyManifest structure.  When one of that structure's
methods was first invoked, it would call the newConcreteManifest()
function, and a Client structure from the "lfsapi" package would be
created at that time.

With this conditional block removed, we then update all of the
test functions in the "tq" package which call the NewManifest()
function to ensure that they create a Client structure to pass to
that function as its "apiClient" parameter, and that they register
the structure's Close() function to run when the test function
returns.
In previous commits in this PR, we revised the Git LFS client such
that it now always performs a consistent set of cleanup activities
when exiting, regardless of the circumstances.  In particular, the
Cleanup() function in our "commands" package should always be executed
before the client halts, and one of the steps this function now takes
is to invoke the closeAPIClient() function in the "commands" package.

The closeAPIClient() function checks if the "apiClient" global
variable has been instantiated.  If it has, then it points to a Client
structure from our "lfsapi" package, and the closeAPIClient() function
calls that structure's Close() method to release any resources
associated with the structure.

In addition, we updated several other packages in our main Git LFS
client program and its Go test functions so that after a Client
structure from the "lfsapi" package is instantiated, its Close()
method will always be invoked when the structure is no longer in use.

For consistency, we now update the three other programs in our
project which instantiate Client structures from the "lfsapi" package
so that they also always call the Close() method on these structures
at the end of their lifetimes.

Note that in the case of the "t/git-lfs-test-server-api" program,
we also revise the buildManifest() function as it no longer needs
to return an "error" value, but does now need to return the Client
structure it creates.

In commit d40d1b7 of PR git-lfs#6254 we
altered the NewClient() function in the "lfsapi" package so that
it does not return an "error" value.  At that time we could also
have simplified the buildManifest() function in the
"t/git-lfs-test-server-api" program to not return an "error" value,
as it previously did so only to report any error condition returned
by the NewClient() function of the "lfsapi" package.  However, we
overlooked this potential simplification in PR git-lfs#6254, so we make
it now instead.
When we introduced support for the "pure" SSH-based Git LFS object
transfer protocol in PR git-lfs#4446, we designed the SSHTransfer structure
in our "ssh" package to be the basic abstraction with which we
represent and manage SSH processes and connections.

Then in commit 31d3fb7 of the same
PR we added an SSHTransfer() method to the Client structure of our
"lfsapi" package in order to provide a common interface through
which we could initiate and share SSH sessions for multiple purposes.
As we wrote in the description of that commit:

  The lfsapi client is used to perform operations for SSH authentication
  already, so we know we'll have one wherever SSH operations might be
  done.  Let's move the instantiation to the client so that we can reuse
  it both in the transfer queue code and in the locking code as well.

Using this SSHTransfer() method of the Client structure does have the
effect of making consistent all the instances in which we instantiate
and initialize new SSH connections for the "pure" SSH-based Git LFS
object transfer and locking protocols.

Note that the SSHTransfer() method may return a nil value rather
than a fully initialized SSH session, not only in the case of an
error condition, but also when the remote service does not provide
support for the "pure" SSH-based Git LFS protocols.  In this case,
when the request to execute the "git-lfs-transfer" command fails,
the Git LFS client will fall back to attempting to execute the
"git-lfs-authenticate" command over SSH.  That step involves a
different abstraction, though, namely the SSHResolver type in our
"lfshttp" package.

While our implementation of the Client structure's SSHTransfer()
method allows us to establish a new SSH session when the "pure"
SSH-based Git LFS protocols are supported, we neglected to account
for the fact that the method creates a new set of sessions each time
it is called.  The consequence is that when handling SSH-based object
transfers as well as locking requests, we create an additional SSH
process and connection just for the locking requests.

Further, while we shut down the SSH connections we use for object
transfers after the first transfer queue completes, we never do the
same for the connections we create for locking requests.

In a subsequent commit in this PR we will address the latter issue,
as well as the issue reported in git-lfs#6118, which arises because we shut
down SSH connections at the end of each transfer queue, but also try
to reuse them between queues.  Thus when we are transferring objects
over SSH for more than one Git reference, the second queue we start
always fails.

To resolve this problem we will need to shut down SSH connections only
when the Git LFS client is exiting, rather than when a single transfer
queue completes.  In turn, this implies we need to retain a listing
of all the unique sets of SSH sessions we create during the lifetime
of the Git LFS client.

We therefore refactor the SSHTransfer() method in our "lfsapi" package
into two methods.  The new initSSHTransfer() method performs the same
actions as the SSHTransfer() method did before, and as such may return
a nil value if the remote service does not support the
"git-lfs-transfer" command.

The SSHTransfer() method now maintains a map of the SSH sessions for
which it has called the initSSHTransfer() method, and only calls that
method if it is required.  Otherwise, the value returned by the
initSSHTransfer() method for the given "operation" and "remote"
parameters is returned again, even if it is a nil value.

To maintain this map safely, the SSHTransfer() method first acquires
a mutex, which we add to the Client structure along with the map.
For keys, we use a concatenation of the unique pairs of "operation"
and "remote" strings passed to the method as its parameters.  This
technique follows the existing approach used in our "commands"
package, where we construct keys for the global "tqManifest" map
by concatenating the "operation" and "remote" string parameters
of the getTransferManifestOperationRemote() function.

One advantage of this change is that it also resolves the problem
whereby we would previously create an extra SSH process and connection
for each locking request, even if they were to be made to the same
remote service as subsequent object transfer requests, and for the
same type of operation (i.e., an upload or download operation).

Because we now reuse SSH connections for both object transfer and
locking requests, we can simplify the assert_ssh_transfer_sessions()
helper function we added to the "t/t-batch-transfer.sh" shell test
script in commit b20b6e9 of PR git-lfs#5905.

The assert_ssh_transfer_sessions() helper function, which checks the
number of trace log messages that record the start of an SSH process
or its termination, no longer needs to make special provision for the
fact that certain operations would cause an extra SSH process to be
started just to make a locking API request, and that this extra SSH
process would never be terminated.
When we introduced support for the "pure" SSH-based Git LFS object
transfer protocol in PR git-lfs#4446, we designed the SSHTransfer structure
in our "ssh" package to be the basic abstraction with which we
represent and manage SSH processes and connections.

Then in commit 31d3fb7 of the same
PR we added an SSHTransfer() method to the Client structure of our
"lfsapi" package in order to provide a common interface through
which we could initiate and share SSH sessions for multiple purposes.

As we described in a previous commit in this PR, our original
implementation of the SSHTransfer() method always either returned
a new SSHTransfer structure, representing a new SSH session and
underlying set of connections, or else returned a nil value to
indicate that the "pure" SSH-based Git LFS protocols were not
supported by the remote service, or that an error condition had
occurred.

Because the SSHTransfer() method always returns a new SSH session
(when it does not return nil), at present we create separate
sessions for both object transfer requests and locking requests.

However, our original intention appears to have been that we
would reuse SSH sessions as much as possible, per the remarks in
commit 31d3fb7:

  The lfsapi client is used to perform operations for SSH authentication
  already, so we know we'll have one wherever SSH operations might be
  done.  Let's move the instantiation to the client so that we can reuse
  it both in the transfer queue code and in the locking code as well.

In a prior commit in this PR we have addressed this issue by
refactoring the SSHTransfer() method so that for a given set of
input parameters, it always returns the same value.  In particular
this means that if a remote service supports the SSH-based Git LFS
protocols, then the method always returns the same SSH session.

A further issue in our original design, as reported in git-lfs#6118, is
that when a transfer queue completes, we shut down any SSH session
it was using, but then we attempt to reuse that session in
subsequent transfer queues, which causes an error.

Specifically, when uploading or downloading objects for multiple
Git references, we establish a single transfer manifest for the
entire operation.  The first time one of the methods of the
"lazyManifest" structure in our "tq" package is invoked, it creates
a "concreteManifest" structure by calling the newConcreteManifest()
function, which calls the SSHTransfer() function of our "lfsapi"
package.  Thus if that function returns a non-nil value (i.e., an
SSHTransfer structure), then the manifest retains a pointer to that
structure for the lifetime of the Git LFS client process.

We then pass that manifest to the NewTransferQueue() function each
time we start a new transfer queue.  After all the objects to be
transferred have been added to the queue, we then call the Wait()
method of the TransferQueue structure.

When all the object transfers are complete, if the queue's manifest
contains non-nil "sshTransfer" field, then the Wait() method calls
the Shutdown() method of the SSHTransfer structure, which closes
the underlying SSH connections.

The consequence is that if we then attempt to start another queue
for the same operation and remote endpoint, we do attempt to reuse the
same SSHTransfer structure and associated SSH session, but all of its
SSH processes and connections have been terminated.  This leads to
the error reported in git-lfs#6118.

Conversely, until we refactored the SSHTransfer() method in a prior
commit in this PR, we used to create a separate SSH session to make
locking requests via the SSH-based Git LFS protocol, and then never
attempted to close this session at all.

We can now take advantage of a series of other changes we made in
previous commits in this PR to ensure that we always attempt to
close SSH sessions, but only at the end of the Git LFS client's
lifetime and not when a single transfer queue completes.

In these prior commits, we revised the main() function of our "main"
package, along with the Run() function and exit wrapper functions
in our "commands" package, so that when the Git LFS client is about
to exit, it will always make exactly one call to the Close() method
of the global "apiClient" variable in that package.

As well, the SSHTransfer() method of the Client structure in the
"lfsapi" package now maintains a map of all the SSH sessions it has
initialized.  We can therefore add a closeSSHTransfers() method
which iterates over all the non-nil values in this map and calls
the Shutdown() method of each SSHTransfer structure, and then invoke
this new closeSSHTransfers() method from the Client structure's
Close() method.

We also revise the Wait() method of the TransferQueue structure so
that it no longer attempts to call the Shutdown() method of the
SSHTransfer structure associated with the queue's manifest.

As these changes should resolve the problem reported in git-lfs#6118, we can
then add a new test to our "t/t-batch-transfer.sh" shell test script
to verify that this is indeed the case.  Our new "batch transfers with
ssh endpoint and multiple branches (git-lfs-transfer)" test does this
by running each of the "git push" and "git lfs fetch" commands for
multiple Git references, which causes both commands to start and stop
more than a single transfer queue.

Note that because our new test uses the "git lfs fetch" command to
download Git LFS objects for multiple branches at once, we have to
adjust the assert_ssh_transfer_sessions() helper function slightly.
In our other tests of the SSH-based Git LFS object transfer protocol,
we use the "git clone" command to download objects, which only does
so for a single Git reference at most when checking out files into
the new working tree.

During this checkout phase of the "git clone" command, it invokes
either the "git lfs filter-process" or "git lfs smudge" commands,
depending on the version of Git available.  With versions of Git
earlier than 2.11.0, the "git lfs smudge" command will be used,
and will be invoked once for each individual object to be downloaded.
As a result, our assert_ssh_transfer_sessions() helper function
adjusts the number of distinct SSH sessions it expects to find
in the trace log it is passed.

However, because our new test runs the "git lfs fetch" command
directly, only a single SSH session should be started even when
the version of Git available is older than v2.11.0.  Hence we
revise the assert_ssh_transfer_sessions() helper function so that
it now checks the trace log for evidence that the "git lfs smudge"
command has been run, and otherwise assumes that it does not need
to account for each object being transferred by a separate Git LFS
command.

Note also that without the enhancements we made in a prior commit in
this PR, this test would have required that we further adjust the
assert_ssh_transfer_sessions() helper function.  We used to start a
separate SSH session to perform locking requests over the SSH-based
Git LFS protocol, and we used to initialize one such SSH session
for each Git reference.  Thus we would have needed to revise the
helper function to account for the extra SSH session started for
each branch processed by the "git push" command in our new test.

Previously, the assert_ssh_transfer_sessions() function simply
incremented the expected counts of SSH connections by one when locking
requests were anticipated, but this only worked because none of our
tests of the "pure" SSH-based object transfer protocol created more
than a single Git branch.  Fortunately, when we refactored the
SSHTransfer() method in the "lfsapi" package of the client so that
it now avoids creating duplicative SSH sessions, we were able to
simplify the helper function at the same time and remove the logic
which incremented the expected SSH connection counts.
@chrisd8088 chrisd8088 requested a review from a team as a code owner May 19, 2026 06:54
@chrisd8088 chrisd8088 added ssh Related to SSH connections or the pure SSH object transfer protocol cleanup labels May 19, 2026

@larsxschneider larsxschneider left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you 🙇

Comment thread t/t-batch-transfer.sh

GIT_TRACE=1 git push origin main >push.log 2>&1
GIT_TRACE=1 git push origin main 2>&1 | tee push.log
[ 0 -eq "${PIPESTATUS[0]}" ]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This captures the exit status for all SSH tests as the commit message describes it. Should we do the same for other tests? (e.g. https://github.com/git-lfs/git-lfs/blame/b84a8a68c81e1710ec456922ee9eac615458104a/t/t-batch-transfer.sh#L91 )

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, most especially the test of the git-lfs-authenticate flow (i.e., using SSH only to acquire HTTP credentials), but decided to keep the scope of the PR somewhat more focused.

There are certainly lots of instances in our test suite where we could standardize how we perform command checks, but I figure that's best left for a more comprehensive PR that tackles all our tests at once.

Comment thread commands/commands.go
Comment thread lfsapi/lfsapi.go Outdated
chrisd8088 and others added 2 commits May 20, 2026 18:15
As suggested by larsxschneider on PR review, we can slightly improve
the closeAPIClient() function in our "commands" package by resetting
the "apiClient" global variable to nil after calling its Close()
method.

The closeAPIClient() function is only called from the doCleanup()
function we introduced in a previous commit in this PR, and should
only be performed just before the Git LFS client halts.  As such,
we do not expect subsequent calls to the getAPIClient() function,
but if they should occur (perhaps due to a regression we introduce
in the future), they will now initialize a new Client structure
from the "lfsapi" package rather than return a nil value, which
might cause the client to panic and crash.

Co-authored-by: Lars Schneider <larsxschneider@github.com>
In a prior commit in this PR we refactored the SSHTransfer() method
of the Client structure in our "lfsapi" package so that for a given set
of input parameters, it always returns the same value.  In particular
this means that if a remote service supports the "pure" SSH-based
Git LFS object transfer and locking protocols, then the method always
returns the same SSH session.

The new initSSHTransfer() method of the same Client structure performs
the actions that the SSHTransfer() method did before, and as such
may return a nil value if the remote service does not provide the
"git-lfs-transfer" command for execution via SSH.

The SSHTransfer() method now maintains a map of the SSH sessions for
which it has called the initSSHTransfer() method, and only calls that
method if it is required.  Otherwise, the value returned by the
initSSHTransfer() method for the given "operation" and "remote"
parameters is returned again, even if it is a nil value.

At present, the initSSHTransfer() method returns a nil value when a
remote service is not configured for the SSH-based Git LFS transfer
protocols or does not support them, but also when an error occurs
while attempting to initialize an SSH session with a remote service.
This design is inherited from the original implementation of the
SSHTransfer() method in commit 31d3fb7
of PR git-lfs#4446.

Because we now cache the result of the initSSHTransfer() method and
return it from the SSHTransfer() method, if an error occurs while
trying to establish an SSH session, then this will prevent subsequent
calls to the SSHTransfer() method from trying again to establish
a session.

As noted by larsxschneider on PR review, this behaviour deviates
slightly from our legacy behaviour, in that we might previously
have successfully initialized one SSH session to make locking
requests, while failing to do so for object transfer requests,
or vice versa.

In practice, this difference is limited in its impact, due to the
order in which we call the SSHTransfer() method, and to a lesser
extent due to the default setting of the "lfs.locksVerify"
configuration option.

During an upload operation, unless the "lfs.locksVerify" configuration
option is set to "false", we attempt to make lock verification requests
to the remote endpoint.  However, unless the "lfs.locksVerify" option
is set to "true", we allow these to fail without halting the upload
operation.

Further, we actually try to establish an SSH session for the object
upload transfers prior to attempting lock verification, for the
reasons detailed below.  The consequence is that with our new design
of the SSHTransfer() method, if the initial attempt to establish an
SSH session is successful, then all the lock verification requests
will utilize that session, as will the subsequent object transfer
requests.

On the other hand, if the initial attempt to establish an SSH
session fails, then upload operation cannot succeed because the
object transfer requests will lack an active SSH session.  Hence
the fact that the locking requests will now also always fail in
this case is of little significance.

Nevertheless, we now adjust the SSHTransfer() and initSSHTransfer()
methods so that we avoid caching a nil value when an error occurs
while attempting to establish an SSH session.

This change will ensure that if in the future we alter our commands'
implementation such that the initial attempts to establish an SSH
session are made for optional actions, then these may safely fail
without necessarily causing the subsequent requisite actions to fail
as well.

For example, suppose we were to revise the order in which we call
the SSHTransfer() method so that we first did so specifically for
lock requests and not for object transfer requests.  Then unless the
"lfs.locksVerify" configuration option was set to "true", the
calls to the SSHTransfer() method for the lock requests could all
fail to establish an SSH session, but since we now avoid caching
a nil value when an error has occurred, the call to the SSHTransfer()
method for the object transfer requests might succeed and return
a valid SSH session.

Note, though, that at present this is not the order in which these
calls to the SSHTransfer() method are made, for the reasons we
outline below.

Both the "git lfs pre-push" or "git lfs push" commands begin by
calling the newUploadContext() function, which creates an initial
manifest for the transfer queue by calling the
getTransferManifestOperationRemote() function and passing the
result to the newLockVerifier() function.

The getTransferManifestOperationRemote() function then invokes the
NewManifest() function from our "tq" package, which returns a
simple "lazyManifest" structure, allowing us to defer any complex
initialization steps at this point.

However, the newLockVerifier() function calls the manifest's
IsStandaloneTransfer() method, which forces the manifest to be
upgraded to a "concreteManifest" structure using the "tq" package's
newConcreteManifest() function.  That function then calls the
SSHTransfer() method of the "lfsapi" package's Client structure to
try to establish an SSH session.

The consequence is that we try to establish an SSH session for the
object transfer requests first, although the lock verification requests
occur before the object transfer requests.
@chrisd8088 chrisd8088 merged commit bc812ce into git-lfs:main May 21, 2026
10 checks passed
@chrisd8088 chrisd8088 deleted the perform-consistent-exit-cleanup branch May 21, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup ssh Related to SSH connections or the pure SSH object transfer protocol

Projects

None yet

Development

Successfully merging this pull request may close these issues.

could not get connection for batch request: pure SSH connection unavailable

2 participants