Skip to content

Fix crash during pure SSH object transfer with multiple objects#5905

Merged
chrisd8088 merged 12 commits intogit-lfs:mainfrom
chrisd8088:ssh-batch-multi-fix
Nov 6, 2024
Merged

Fix crash during pure SSH object transfer with multiple objects#5905
chrisd8088 merged 12 commits intogit-lfs:mainfrom
chrisd8088:ssh-batch-multi-fix

Conversation

@chrisd8088
Copy link
Member

This PR fixes a bug where the Git LFS client may crash while cloning a repository with multiple Git LFS objects using the "pure" SSH version of the Git LFS transfer protocol.

Our existing test of the SSH object transfer protocol did not detect the problem because it only pushes and fetches a single object, so we add more tests to confirm that the changes in this PR are effective.

In order for these additional tests to succeed, we also fix a latent issue in our lfs-ssh-echo test utility program whereby it silently fails while trying to emulate the behaviour of the OpenSSH client when it is asked to multiplex a new SSH session over an existing SSH connection's control socket.

As well, we revise the trace log messages generated by our SSH session handling code and make several other minor adjustments, including renaming the receiver variables of our SSHTransfer structure so they do not conflict with the name of our tr text localization package, and expanding our existing single-object test of the SSH object transfer protocol to align with the more comprehensive checks performed by our new tests.

This PR will be most easily reviewed on a commit-by-commit basis.

Resolves #5880.

In subsequent commits in this PR we expect to resolve a pair of issues
which prevent us from testing the SSH object transfer protocol with
more than a single object.  The SSH transfer protocol was introduced in
PR git-lfs#4446, and in commit 691de51 of
that PR, the "batch transfers with ssh endpoint (git-lfs-transfer)" test
in our t/t-batch-transfer.sh test script was added to validate that the
new protocol worked as expected.

This test pushes and then fetches a single object, so the issues which
arise when handling multiple objects in a batch do not cause the test
to fail.  Nevertheless, before addressing those issues, we first expand
the existing test so it checks that the git-lfs-transfer command is
seen in the trace log messages during a push operation.  The test
already checks that this command is used during a clone operation.

As well, we enhance the "batch transfers with ssh endpoint
(git-lfs-authenticate)" test, which checks that when the SSH transfer
protocol is not available but Git is configured to use SSH for a remote,
the Git LFS client performs an authorization handshake over SSH prior
to using HTTP for its object transfers.  In the test we now confirm
that the git-lfs-authenticate command is seen in the trace log messages
generated during a push operation, and that the object was
successfully pushed and has been stored by the remote server.

We also make a small revision to the "batch transfers succeed with an
empty hash algorithm" test to remove an unnecessary file redirection.
In subsequent commits in this PR we expect to resolve a pair of issues
which prevent us from testing the SSH object transfer protocol with
more than a single object.  The SSH transfer protocol was introduced in
PR git-lfs#4446, and in commit 691de51 of
that PR, the "batch transfers with ssh endpoint (git-lfs-transfer)" test
in our t/t-batch-transfer.sh test script was added to validate that the
new protocol worked as expected.

This test pushes and then fetches a single object, so the issues which
arise when handling multiple objects in a batch do not cause the test to
fail.  Hence we intend to add two other tests to accompany the existing
one so as to validate the SSH transfer protocol with multiple objects.

As these tests will all perform object transfers only over SSH, our
HTTP-based lfstest-gitserver test helper program will not be used in
any of them.  That program retains a copy of each object it receives
in memory to simulate a remote Git LFS server.  As a consequence, many
of our tests use the assert_server_object() function defined in our
t/testhelpers.sh library to confirm that an object has been received
by the remote server; this function makes an HTTP batch request to
the lfstest-gitserver program and checks the JSON response.

In our tests of the SSH transfer protocol, by contrast, object
data will be proxied by the lfs-ssh-echo helper program to the
git-lfs-transfer command, which will write the data into a separate
bare Git repository in the location provided as command argument.
In fact this is how the existing test already operates; it uses
the ssh_remote() function from our t/testhelpers.sh library to
establish the SSH URL for its remote repository.  The URLs returned
by this function always include the directory path from our REMOTEDIR
variable.  We expect to also use the ssh_remote() function in our
new tests to establish their remote repositories.

In both our existing test and the ones we will add in a subsequent
commit in this PR, we would like to confirm that the Git LFS objects
we push have been successfully written into the "lfs/objects" cache
in the remote repositories.  To simplify such checks, we define a new
assert_remote_object() assertion function in our shell test library.

Unlike the assert_server_object() function, which makes an HTTP
request, our new assertion acts in a similar manner to the existing
assert_local_object() function by simply validating the size and
existence of an object file in the appropriate subdirectory of the
"lfs/objects" cache hierarchy of a bare repository.  However, our new
assert_remote_object() function constructs the path to the repository
using the REMOTEDIR variable, rather than checking the object's
presence in the current local repository as the assert_local_object()
function does.

With the assert_remote_object() function defined, we then update our
existing "batch transfers with ssh endpoint (git-lfs-transfer)" test
to make use of it after pushing an object over the SSH transfer
protocol, and we will use the function for the same purpose in the
additional tests we expect to introduce in a later commit in this PR.
In subsequent commits in this PR we expect to resolve a pair of issues
which prevent us from testing the SSH object transfer protocol with
more than a single object.  The SSH transfer protocol was introduced in
PR git-lfs#4446, and in commit 691de51 of
that PR, the "batch transfers with ssh endpoint (git-lfs-transfer)" test
in our t/t-batch-transfer.sh test script was added to validate that the
new protocol worked as expected.

This test pushes and then fetches a single object, so the issues which
arise when handling multiple objects in a batch do not cause the test to
fail.  Hence we intend to add two other tests to accompany the existing
one so as to validate the SSH transfer protocol with multiple objects.

All these tests will perform object transfers over SSH, but with varying
numbers and types of SSH sessions.  While our existing test only pushes
and fetches a single object, the tests we will add in a later commit
in this PR will push and fetch multiple objects in each batch.

By default, the Begin() method of the adapterBase structure in our "tq"
package starts eight goroutines to process the objects in a batch, each
running the structure's worker() method.  The first worker routine must
start an SSH session, and depending on how quickly other workers pull
objects from the batch transfer queue, additional SSH sessions may be
created as well, one per active worker.

On platforms other than Windows, we set the default value of the
"lfs.ssh.autoMultiplex" configuration option to "true".  When this
option is "true" and the available SSH client is considered to be
compatible with OpenSSH, we attempt use OpenSSH's ControlMaster option
to create a control socket and multiplex all SSH sessions over a
single common connection.  We expect the first SSH session to be
established with a ControlMaster argument value of "yes", and other
sessions with a value of "no".

At present, our existing "batch transfers with ssh endpoint
(git-lfs-transfer)" test does not check the value of the ControlMaster
argument.  As we expect to add tests which may create more than one SSH
session, however, we would like to validate the number of sessions that
create new control sockets, and the number of SSH sessions overall.

To perform these SSH session count checks we add two dedicated
assertion functions, assert_ssh_transfer_sessions() and
assert_ssh_transfer_session_counts(), with the former calling the
latter several times.  These functions are unlikely to be useful
outside of the context of our tests of the SSH object transfer
protocol, so we define them directly in the t/t-batch-transfer.sh
test script rather than in our generic t/testhelpers.sh library.

The primary function, assert_ssh_transfer_sessions(), checks the
number of times the git-lfs-transfer command is executed over SSH,
and confirms that each session has the ControlMaster option set to
"yes", which is always the case for our existing test, so long as we
update the test to force the use of SSH connection multiplexing on
Windows by explicitly setting the "lfs.ssh.autoMultiplex"
configuration option to "true".

The assert_ssh_transfer_sessions() function then uses the secondary
assert_ssh_transfer_session_counts() function to validate that the
expected number of startup, success, and termination trace log messages
are seen for each SSH session.

One specific issue arises in regard to the number of SSH sessions
and git-lfs-transfer commands we expect to see started during a Git
push operation.  At present, the Git LFS client always starts a unique
SSH session to run the git-lfs-transfer command when checking for
Git LFS locks on the objects being pushed, and this session will
create a control socket if the SSH client in use is considered
compatible with OpenSSH and the "lfs.ssh.autoMultiplex" configuration
option is set to "true".  The control socket opened for this session
is distinct from the one created later by the first SSH session
started by the batch transfer worker routines.

The unique SSH session used for the lock verification request during
push operations is managed with a separate instance of the SSHTransfer
structure from our "ssh" package.  At present, the Git LFS client never
calls the setConnectionCount() method with a zero argument on this
instance of the SSHTransfer structure, so the dedicated SSH session
used for lock verification is never explicitly terminated.

Therefore our new assert_ssh_transfer_sessions() function adjusts
the number of git-lfs-transfer command execution messages and the
number of start and success trace log messages it expects when a push
(i.e., "upload") operation is performed to be one higher than the
number of termination trace log messages it expects.  This allows
the function to account for the additional SSH control socket
session started for the lock verification step, and then never
explicitly terminated.
In commit 448b0c4 of PR git-lfs#5537 our
lfs-ssh-echo test helper utility was updated to examine the value of
any provided ControlMaster command-line argument and simulate the action
of the OpenSSH program when it is passed a ControlMaster argument with
a value of either "yes" or "no".  (The other values OpenSSH supports
for this option are not accepted by our lfs-ssh-echo utility at the
moment.)

When a ControlMaster command-line argument is found, and has the value
"yes", the lfs-ssh-echo program attempts to create a temporary file at
the location given by the ControlPath command-line argument, which must
also be provided.  If the file already exists, or some other error
occurs while creating it, the program halts with a non-zero exit code.

If the ControlMaster argument's value is "no", and the ControlPath
argument is also defined, the program attempts to open the file at
the given path, and exits with a non-zero code if the file does not
already exist or some other error occurs.

This logic is designed to emulate the behaviour of OpenSSH, which
supports the use of these arguments to multiplex SSH sessions over
a common connection using a control socket.  When the ControlMaster
argument is "yes", OpenSSH creates a socket, which may then be shared
with other invocations of OpenSSH by setting the ControlMaster argument
to "no" and passing the socket's path in the ControlPath argument.

Our lfs-ssh-echo program, however, creates its temporary file with
no defined file permissions (i.e., with a file mode argument of zero).
This results in a file which can not be opened by any other instances
of the lfs-ssh-echo program.  At the moment this does not cause any
problems with our test suite, because we have no tests which exercise
the SSH version of the Git LFS object transfer protocol with more
than a single object, so multiple SSH sessions with the same
ControlPath argument are never started.

We would like to create additional tests which push and fetch multiple
Git LFS objects over SSH, though, to help diagnose issues such as
those described in git-lfs#5880.  Therefore we change our lfs-ssh-echo
program to use the conventional file mode of 0666 when creating its
temporary file, which will allow subsequent invocations of the program
with the ControlMaster argument set to "no" to also open the same file.

As well as fixing this bug, we also alter the way the lfs-ssh-echo
program handles errors when the file specified by the ControlPath
argument can not be created or opened.  We now output distinct error
messages in the cases where the file already exists or does not exist,
respectively, as well as in the cases where some other type of error
occurs.  Previously, we assumed any error must be due to a file either
existing or not existing, and did not report the actual error message,
so problems such as those caused by missing file permissions were
obscured.
Since PR git-lfs#4446 the SSHTransfer() method of the Client structure in
our "lfsapi" package has output several trace log messages when
attempting to instantiate a new SSHTransfer structure.  This method
takes "operation" and "remote" string arguments, and the returned
SSHTransfer structure is specific to those values.

As this method may be called several times with distinct arguments
during the execution of a Git LFS process, we add the "operation"
and "remote" variables to the trace log messages, which will help
clarify the calling context of the method in our diagnostic logs.
In commit 326b1ee of PR git-lfs#5063 we added
trace logging to several functions and methods related to the creation
and termination of SSH sessions in order to help analyze the diagnostic
logs generated when transferring Git LFS objects over SSH.

However, in the startConnection() function in our ssh/connection.go
source file, we report the successful creation of a session even if
we are returning a non-nil error value.  Therefore we revise our
trace logging in that function to distinguish between unsuccessful
and successful conditions based on whether the PktlineConnection
structure's Start() method returned an error or not.

In addition, we also update a number of our other trace log messages to
include the relevant session ID.  (Note that we refer to SSH sessions
as "connections" in our code, although in practice they may share a
single SSH connection using a control socket.)

Because we maintain a set of SSH sessions and do not necessarily start
or terminate all of them at the same time, this change will provide
more clarity as to the state of each individual session at different
points in a trace log.

Finally, we rephrase several trace log messages generated by the
setConnectionCount() method of the SSHTransfer structure so they
more fully explain when the method is terminating specific sessions
because it has been asked to reduce the total number of sessions.
In commit b44cbe4 of PR git-lfs#5136 the
"multiplex" argument was added to the FormatArgs() function in our
ssh/ssh.go source file.  Later, the "controlPath" argument was also
added, in commit 448b0c4 of PR git-lfs#5537.

Neither of these arguments is used in the function's body, though,
so we can remove them now.
In PR git-lfs#4446 we defined a new SSHTransfer structure and a set of methods
for it in our ssh/connection.go source file, and used the name "tr" for
the receiver variables of those methods.

Later, in commit 5dbbf13 of PR git-lfs#5674,
we imported our "tr" message translation package into the same source
file in order to format and localize an error message generated by the
startConnection() function.  Because this function is not one of the
methods of the SSHTransfer structure there was no conflict with the
"tr" receiver variables of those methods.

However, in subsequent commits in this PR we expect to revise one of
the SSHTransfer structure's methods to output an error message, and
will want to use the Get() method of the "tr" package's global Tr
variable.  We are not able to do this with the current name of the
"tr" receiver variable as it masks the "tr" package name within the
method's scope.

Therefore we now rename all our "tr" receiver variables to "st" in the
ssh/connection.go source file, so as to avoid any namespace conflicts
with our "tr" package.
In commit 44b8801 of PR git-lfs#5634 we
revised the Connection() method of the SSHTransfer structure in our
"ssh" package to create SSH sessions on demand, rather than simply
return pre-existing sessions created by the setConnectionCount()
method, which allowed us to reduce the total number of SSH sessions
in many cases.

As part of this change, we altered the Connection() method to return
an error if it could not create a session successfully, and we then
updated the callers of this method to check for a non-nil error.

However, the method still returned a nil error value, along with a
nil PktlineConnection value, if the requested session ID exceeded
the maximum number of sessions permitted and so no additional session
was created or returned.  However, the method's callers all now assume
a nil error value implies a non-nil PktlineConnection value.

In particular, the batchInternal() method of the SSHBatchClient structure
in the "tq" package originally checked for a nil return value from the
Connection() method, as this was the only way the Connection() method
indicated a failure to find or create an active SSH session.  Since
the change in commit 44b8801, the
batchInternal() method now checks only for a non-nil error return value,
which means it does not detect the case where a nil PktlineConnection
value is returned.

As a result, in situations such as those described in git-lfs#5880, the
batchInternal() method may cause a Go panic when it attempts to call
the Lock() method of a nil PktlineConnection value.  One way this can
occur is after one batch transfer operation has finished and the
Wait() method of the TransferQueue structure has called the Shutdown()
method of the sshTransfer field of the TransferQueue's concreteManifest.
That closes all the SSH sessions and sets the "conn" array to nil.
But a reference to the same SSHTransfer structure is retained by the
sshTransfer field of the SSHBatchClient structure to which the
concreteManifest's batchClientAdapter field points.  When that manifest
is reused by a subsequent batch transfer operation, the batchInternal()
method of the SSHBatchClient structure attempts to retrieve a session,
receives a nil PktlineConnection value, and causes a panic by trying
to use it to run the Lock() method.

We therefore adjust the Connection() method of the SSHTransfer structure
to return a non-nil error when the requested session ID exceeds the
maximum number of sessions allowed.  This also covers the case where
the "conn" array has been set to nil, as the len() built-in function
returns zero for the length of a nil array.
In commit 691de51 of PR git-lfs#4446 we added
the "batch transfers with ssh endpoint (git-lfs-transfer)" test to our
t/t-batch-transfer.sh test script in order to validate that the new
SSH object transfer protocol for Git LFS was operating as expected.

This test only creates a single test object, and so does not cause
the Git LFS client to establish multiple SSH sessions during the batch
object transfer phase.  While the sole SSH session started during this
phase may create a control socket to multiplex its connection, it will
not be shared by any other SSH sessions, since only one is started.

In practice, this means that even if the "lfs.ssh.autoMultiplex"
configuration option is set to "true" and the available SSH client
is considered to be compatible with OpenSSH, the "batch transfers with
ssh endpoint (git-lfs-transfer)" test will never try to start an SSH
session with a value for the ControlMaster argument other than "yes".

Our test suite sets the GIT_SSH environment variable to refer to our
lfs-ssh-echo test utility program rather than an actual SSH client.
As noted in git-lfs#5903, at present the Git LFS client treats this program,
which it does not recognize as matching the filename of any known
SSH client, as compatible with OpenSSH, and so our tests invoke our
lfs-ssh-echo utility with the ControlMaster and ControlPath arguments.

As described in a prior commit in this PR, our lfs-ssh-echo test
utility program had a bug which prevented it from starting SSH
sessions with the ControlMaster argument set to "no", so the "batch
transfers with ssh endpoint (git-lfs-transfer)" test would have
failed if it tried to push more than a single object.

Such a test would also fail if it tried to fetch more than a single
object, due to the bug described in git-lfs#5880.  Specifically, we returned
a nil from the Connection() method of the SSHTransfer structure in our
"ssh" package when we had terminated all the SSH sessions, but then
sometimes tried to reference that nil pointer in the batchInternal()
method of the SSHBatchClient structure in the "tq" package, causing
a Go panic condition.

In prior commits in this PR we resolved these problems, first the bug
in the lfs-ssh-echo test utility and then the bad return value from the
Connection() method of the SSHTransfer structure in the Git LFS client.
In the case of the latter issue, we adjusted the Connection() method
to return a non-nil error when the requested session ID exceeds the
maximum number of sessions permitted, which also covers the case where
all the sessions have already been terminated.

Therefore we can now introduce additional tests of the SSH object
transfer protocol which push and fetch multiple objects.  In these
tests we use the new assert_remote_object() function that we defined
in a prior commit in this PR to confirm that the pushed objects are
all written to the remote repository.  As well, we use the new
assert_ssh_transfer_sessions() function we defined in another prior
commit in this PR to check that the number of SSH sessions that
create a control socket matches our expectations, as do the number
of startup, success, and termination trace log messages.

The first of our new tests leaves the maximum number of concurrent
object transfers set to the default value of eight, so that all three
of the objects we create in the test are pushed in a single batch,
and may also be fetched in a single batch if the available version
of Git is 2.11.0 or higher.  To push or fetch these objects in a
single batch requires that the Git LFS client establish as many as
three separate SSH sessions per invocation, of which only the first
should create a control socket.

The second of our new tests sets the maximum number of concurrent
object transfers to two, so we expect to see a maximum of only two SSH
sessions per invocation of the Git LFS client, and only the first of
these sessions should start an SSH connection with a control socket.

Prior to version 2.11.0, Git did not support the "process" filter
attribute, and so during a clone operation Git LFS would be invoked
via the "smudge" filter instead, once for each object.  When testing
with such a version of Git, our assert_ssh_transfer_sessions()
function adjusts its expectations to account for the fact that each
invocation of Git LFS during a clone operation will establish its
own SSH session with a control socket, and so the total number of
trace log messages from these sessions should match the total number
of objects being fetched.
@chrisd8088 chrisd8088 requested a review from a team as a code owner November 4, 2024 23:50
Copy link
Member

@larsxschneider larsxschneider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stellar PR. A joy to read/review 🙇


# On upload we currently spawn one extra control socket SSH connection
# to run locking commands and never shut it down cleanly, so our expected
# start counts are higher than our expected termination counts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider it a bug that the locking command session never shuts down cleanly?

I assume we do not create the extra control socket if locking is disabled, right? (via lfs.locksverify=false)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we do not create the extra control socket if locking is disabled, right? (via lfs.locksverify=false)

That's correct, I believe.

Would you consider it a bug that the locking command session never shuts down cleanly?

It's not ideal, but it's not causing any harm, so far as I know. I wrote up this issue in a bit more detail at the end of the description of #5880:

Ultimately, we may want to avoid trying to shut down our SSH connections at the end of individual batch operations, and do some analysis of the preferred lifetime for them. (Among other issues, when pushing, we create two separate ControlMaster=yes connections, one for locking operations and another for batch operations; it would be nice to avoid this duplication.)

Shutting down SSH connections cleanly at the end of the entire Git LFS process may also entail some refactoring of how we cause the program to exit. Right now, we make a token effort at closing resources associated with our API clients, but really that just amounts to closing our logger, and when we call os.Exit() in several dozen places, that step is skipped and we let Go take care of closing open I/O handles. If we want to always gracefully close multiple open SSH connections by sending a quit message, though, we may have to refactor these os.Exit() calls into a common exit routine.

It's also why I drew myself a class diagram in #5880 (comment).

To manage the lifetime of all the SSH sessions, including the one used for lock verification, we may have to do some more substantial revisions throughout the code in our commands package to replace all the os.Exit() calls with another mechanism. At any rate, it seemed out of scope for this PR.


# On Windows we do not multiplex SSH connections by default, so we
# enforce their use in order to match other platforms' connection counts.
git config --global lfs.ssh.autoMultiplex true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is automultiplex disabled on Windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetExeAndArgs() function in our ssh package effectively sets the default value of lfs.ssh.autoMultiplex to be false and not true because Windows SSH clients (of which there are a few) don't always support multiplexing SSH sessions over a single connection, and may fail if they see an unrecognized control socket option.

There was a discussion of this concern in #5537 (comment), which led to the changes from that PR #5537 setting the default to false on Windows. The conclusion at that time was to just let Windows users who had OpenSSH installed (via Git for Windows, for instance) set the configuration option to true manually if they wanted to use multiplexing.

I did note, while working on this PR, that the lfs.ssh.autoMultiplex section in our git-lfs-config(5) manual page doesn't explain this difference in the default value on Windows. I have some work-in-progress documentation updates and I put an edit to the manual page into that set of changes, but since you've reminded me again about the issue, I'll just put that commit into this PR.

if [ "download" = "$direction" ]; then
gitversion="$(git version | cut -d" " -f3)"
set +e
compare_version "$gitversion" '2.11.0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git 2.11 is a few years old. I wonder if we should require a minimum Git version for Git LFS to get rid of a few of those version checks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question—it's probably worth exploring the idea. I think we'd want to announce any such plans in an issue, with a proposed future release version at the first one where we'd set a new minimum Git version, and collect input from users as to what that minimum should be. Some folks are likely to be running Git LFS in systems with fairly old Git clients, especially if they have one of the older OS platforms we still support.

Right now, we claim in our README that Git v1.8.2 or v1.8.5 is the minimum, although our Git v2.0.0 is the earliest one we build for use in our CI jobs.

For now and for the near-term release we have planned, though, I don't think we should change anything about the Git versions we try to support. (As one example of how our project is expected to work with quite old software, when we overlooked the fact that upgrading to Go 1.21 in our 3.5.x releases meant that we dropped support for Windows 7 and 8, that caused problems for the Git for Windows project maintainers.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #5921 will at least update the README.md file to match the minimal Git version we use in our CI jobs.

As suggested by larsxschneider on PR review, we should use "clone.log"
as the name of the log files we capture from "git clone" commands in
our tests of the SSH object transfer protocol, rather than "trace.log",
so we adjust those file names now.

This change aligns these log files' names with the "push.log" files
we create in the same tests, and also with the "clone.log" files
created by the clone_repo() function in our t/testhelpers.sh shell
library.
Since commit 448b0c4 in PR git-lfs#5537
the GetExeAndArgs() function in our "ssh" package sets the default
value of our "lfs.ssh.autoMultiplex" configuration option to "false"
when running on Windows, and "true" otherwise.

This choice was made because the SSH clients available on Windows
may not support multiplexing SSH sessions over a single connection,
as OpenSSH does with its ControlMaster and ControlPath options.

Since some of these SSH clients may fail if they are passed the
ControlMaster and ControlPath options, we require Windows users who
want to use SSH multiplexing to explicitly enable it by setting the
"lfs.ssh.autoMultiplex" option to "true".  See also the discussion in:

  git-lfs#5537 (comment)

However, our git-lfs-config(5) manual page was not updated in PR git-lfs#5537
to reflect the change in the default value of the "lfs.ssh.autoMultiplex"
option on Windows, so we update it now.

Note that users with the Git for Windows project installed will
typically have a version of OpenSSH available which supports the
ControlMaster option.  However, the OpenSSH for Windows client may
not support multiplexing, as noted in PowerShell/Win32-OpenSSH#1328.
@chrisd8088 chrisd8088 merged commit 9d69005 into git-lfs:main Nov 6, 2024
@chrisd8088 chrisd8088 deleted the ssh-batch-multi-fix branch November 6, 2024 06:18
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 18, 2024
In commit 5e654f2 in PR git-lfs#565 a pair
of test assertion functions were added to the forerunner of our
current t/testhelpers.sh shell library.  These assert_local_object()
and refute_local_object() functions check for the presence or absence
of a file in the object cache maintained by the Git LFS client in
a local repository.

To perform these checks, the functions capture the output of the
"git lfs env" command and parse the contents of the LocalMediaDir
line, which reports the full path to the Git LFS object cache
location.  To retrieve the path, the functions ignore the first 14
characters of the line, as that corresponds to the length of the
LocalMediaDir field name (13 characters) plus one character in order
to account for the equals sign which follows the field name.

Later PRs have added three other assertion functions that follow
the same design.  The delete_local_object() function was added in
commit 97434fe of PR git-lfs#742 to help test
the "git lfs fetch" command's --prune option, the corrupt_local_object()
function was added in commit 4b0f50e
of PR git-lfs#2082 to help test the detection of corrupted local objects
during push operations, and most recently, the assert_remote_object()
function was added in commit 9bae8eb
of PR git-lfs#5905 to improve our tests of the SSH object transfer protocol
for Git LFS.

All of these functions retrieve the object cache location by ignoring
the first 14 characters from the LocalMediaDir line in the output of
the "git lfs env" command.  However, the refute_local_object() function
contains a hint of an alternative approach to parsing this line's data.

A local "regex" variable is defined in the refute_local_object()
function, which matches the LocalMediaDir field name and equals sign
and captures the subsequent object cache path value.  Although this
"regex" variable was included when the function was first introduced,
it has never been used, and does not appear in any of the other similar
functions.

While reviewing PR git-lfs#5905, larsxschneider suggested an even simpler
option than using a regular expression to extract the object cache
path from the LocalMediaDir line.  Rather than asking the Bash shell
to start its parameter expansion at a fixed offset of 14 characters
into the string, we can define a pattern which matches the leading
LocalMediaDir field name and equals sign and specify that the shell
should remove that portion of the string during parameter expansion.
See also the discussion in this review comment from PR git-lfs#5905:

  git-lfs#5905 (comment)

In addition to these changes, we can remove the definition of the
"regex" variable from the refute_local_object() function, as it
remains unused.

Co-authored-by: Lars Schneider <larsxschneider@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash during pure SSH object transfer with multiple objects

2 participants