Skip to content

Temporarily restore concurrency limit to historical fixed default#6258

Merged
chrisd8088 merged 3 commits into
git-lfs:mainfrom
chrisd8088:limit-concurrency-again
May 21, 2026
Merged

Temporarily restore concurrency limit to historical fixed default#6258
chrisd8088 merged 3 commits into
git-lfs:mainfrom
chrisd8088:limit-concurrency-again

Conversation

@chrisd8088

@chrisd8088 chrisd8088 commented Apr 30, 2026

Copy link
Copy Markdown
Member

This PR reverses the key change from PR #6241 and restores the fixed, historical default of 8 for the lfs.concurrentTransfers configuration option, so that we do not spawn large numbers of child processes when transferring objects using our "pure" SSH protocol or when using a custom transfer agent.

After the upcoming v3.8.0 Git LFS release we will revisit the issue of scaling default concurrency settings with CPU counts while limiting the number of child processes we might create. See #6259 for that discussion.

This PR will be most easily reviewed on a commit-by-commit basis.

Additional Context

As discussed in PRs #6241 and #6234, we would like to refine the concurrency controls for the Git LFS client so that they can scale with the number of available CPUs, at least when performing HTTP-based object transfers.

We adopted an initial implementation of in PR #6241, which was effective for HTTP object transfers and improved performance significantly on multi-core CPUs.

However, this change also has the unintended effect of scaling the number of child processes we execute when handling object transfers using a custom agent or when using multiplexed SSH connections. While some systems may not be affected, creating large numbers of separate processes is at best not ideal and at worst may run afoul of system or container limits.

We therefore temporarily reverse the key change from PR #6241 and restore the fixed, historical default of 8 for the lfs.concurrentTransfers configuration option.

After our upcoming release of version 3.8.0 of the Git LFS client, we intend to revisit this issue and find a more comprehensive solution so that we can safely scale thread counts while keeping process counts within reasonable bounds.

Other Changes

We also take the opportunity to correct the placement and wording of a comment regarding concurrency support for SSH-based object transfers.

/cc @joshfriend as the author of PRs #6241 and #6234.

When we first developed the "pure" SSH-only Git LFS object transfer
protocol in PR git-lfs#4446, our initial prototype supported only a single SSH
connection, and so in commit 594f8e3
of that PR we updated the NewManifest() function in our "tq" package to
set the maximum number of concurrent transfers to one when the new
SSH adapter was in use.  We also added a comment which states that
"Multiple concurrent transfers are not yet supported".

Later in the same PR we implemented support for multiple concurrent
SSH connections using the ControlMaster option of the ssh(1) command,
and so in commit 9c46a38 we revised
the NewManifest() function to no longer set the maximum number of
concurrent transfers to one when the new SSH adapter was in use.
However, we accidentally left the comment regarding the lack of
support for concurrent transfers in place.

Then in commit b44cbe4 of PR git-lfs#5136
we resolved the problem reported in git-lfs#5064 by updating the
NewManifest() function a third time.  The function once again sets
the maximum number of concurrent transfers to one when the SSH
adapter is in use, but now only when SSH multiplexing is not enabled,
as might occur when using an SSH agent other than the ssh(1) command.

When we made this most recent change, though, we neither updated the
original comment nor moved it to appear prior to the relevant
conditional block in the NewManifest() function, so we make those
corrections now.
@chrisd8088 chrisd8088 requested a review from a team as a code owner April 30, 2026 16:10
@chrisd8088 chrisd8088 added the git-config Related to our handling of Git configuration options label Apr 30, 2026
In PR git-lfs#6241 we recently adopted a change which made the default
value of the "lfs.concurrentTransfers" configuration option
variable and scaled to three times the number of available CPUs.

This change benefits users with multi-core CPUs when they transfer
Git LFS objects over HTTP, since we start multiple concurrent
threads (i.e., "goroutines") to make Git LFS Batch API requests
and object transfer requests in parallel.

However, the "lfs.concurrentTransfers" configuration option also
controls the number of parallel object transfers when using
protocols other than HTTP, and in these cases, we spawn multiple
separate child processes to perform the transfer operations.

For example, when using the "pure" SSH-only Git LFS object transfer
protocol, we may run multiple instances of the ssh(1) command,
unless SSH multiplexing is not enabled.  (Note that we may also
use another SSH agent, depending on what has been configured via
the "core.sshCommand" option and the GIT_SSH or GIT_SSH_COMMAND
environment variables.)

When multiplexing is enabled, we set the ssh(1) command's ControlMaster
option so that all the separate processes we start will share a single
underlying SSH connection.  We also only start SSH agent processes
on demand in this case.  When transferring many objects, though, we
will start multiple processes, up to the limit specified by the
"lfs.concurrentTransfers" configuration option.

As another example, when using our custom transfer protocol we always
start multiple separate instances of the transfer agent program, which
may either be our "git lfs standalone-file" command or an agent
configured with the "lfs.customTransfer.*" options.  In the first case,
we only start a single process, but otherwise we always start exactly
as many processes as is specified by the "lfs.concurrentTransfers"
option, unless the "lfs.concurrentTransfers.<name>.concurrent" option
is set to "false".

Finally, even when using our standard HTTP-based transfer adapter,
we may start multiple threads in situations where we do not need
them all.

For instance, in cases where the "git lfs smudge" command is used
instead of the more-efficient "git lfs filter-process" command, each
invocation of the "git lfs smudge" command will transfer only a
single object, but will nevertheless start the number of threads
specified by the "lfs.concurrentTransfers" option.

Typical modern installations of Git LFS do not use the "git lfs smudge"
command, but if the "filter.lfs.process" Git configuration option is
missing, or if a version of Git older than v2.11.0 is in use, then
the "git lfs smudge" command may still be utilized.

Even when the "git lfs filter-process" command is used, when just
a small number of objects need to be transferred, we will still start
the number of threads specified by the "lfs.concurrentTransfers"
option.

In all these instances, if we set the default value of the
"lfs.concurrentTransfers" option based on the number of CPUs then we
may start many more processes or threads than we did prior to the
changes in PR git-lfs#6241.  This may have unexpected and negative
consequences for our users, particularly in those cases where we
spawn separate processes to transfer objects, such as when a
custom transfer agent is configured.

Ideally, we would address all of these issues by refactoring our
HTTP and custom transfer adapters to start threads or processes only
on demand, as the SSH transfer adapter already does, while also
introducing new configuration options to better set limits on the
number of transfer agent processes we spawn as distinct from
the number of threads we start during HTTP object transfers.

As we expect to make a v3.8.0 release shortly, though, refactoring
work of this nature is out of scope for the near term.  We therefore
now reverse the key change from PR git-lfs#6241, and set the default value
of the "lfs.concurrentTransfers" option to its historical value of 8.

Looking forward, we should write a proposal to discuss how to implement
any new configuration options such that they respect the historical
meaning of the "lfs.concurrentTransfers" option.  As one consideration,
if a user has explicitly set the "lfs.concurrentTransfers" option but
has not set any of the new options, then the "lfs.concurrentTransfers"
option's value should be understood as a maximum limit for the number
of threads or processes we start during object transfers.  On the
other hand, if a user sets one of our new options but not the
"lfs.concurrentTransfers" option, then we might allow its value to
override the default value of the "lfs.concurrentTransfers" option.
@chrisd8088 chrisd8088 force-pushed the limit-concurrency-again branch from a396653 to d05aba2 Compare April 30, 2026 17:48
@chrisd8088

Copy link
Copy Markdown
Member Author

Hey @joshfriend, I'm sorry about this PR!

It certainly wasn't my intention to reverse course when I approved and merged PR #6241, which seemed like a relatively straightforward enhancement with clear performance benefits. I should have thought a bit more carefully about all the dusty edge cases we have to support.

While I don't think it's a disaster if we spin up, say, 10 or 12 custom transfer agents instead of 8, I wouldn't want to surprise users who discover we're now forking 384 processes on their 128-core machine, just to transfer one or two objects via their custom transfer process.

@joshfriend

Copy link
Copy Markdown
Contributor

Not a problem, Chris! In my setup with pretty vanilla github hosted LFS that was a pretty great default tuning, but I had no test harness available for some of the other transport methods and clearly missed a few things.

#6234 is the main unlock for performance though. We could leave the default parallelism pretty low as it sat before, with affordances for increasing it, and I think that would still be a big win!

chrisd8088 added a commit that referenced this pull request May 21, 2026
In commit aa08c37 of PR #6241 we
changed the default value of the "lfs.concurrentTransfers"
configuration option from eight to a dynamic limit based on the
number of CPUs in the current system.

This change required that we also update a number of the tests in
our shell test suite, as they previously were written with the
value of eight hard-coded as the expected default value of the
"lfs.concurrentTransfers" option.

We therefore added a new setup_expected_concurrent_transfers() helper
function to our "t/testhelpers.sh" shell library, and revised several
test scripts to invoke this function before executing any tests.
The function sets a global "expectedConcurrentTransfers" variable with
the expected default value of the "lfs.concurrentTransfers" option so
that tests can now refer to this variable instead of using a fixed
hard-coded value as they did before.

However, when making these revisions, we inadvertently overlooked the
tests of the "pure" SSH-only Git LFS transfer protocol in our
"t/t-batch-transfer.sh" script.  At present, these tests also contain
a hard-coded value for the expected maximum number of object transfers
which may be performed concurrently.  The tests pass this value to
the assert_ssh_transfer_sessions() helper function, which is defined
in the same script as the tests.

One reason we did not adjust these tests in PR #6241 is that the
assert_ssh_transfer_sessions() function's fourth parameter, which is
expected to receive a value representing the maximum number of
concurrent transfers, is misnamed as "objs_per_batch".

We chose the parameter name "objs_per_batch" when we first introduced
the function in commit b20b6e9 of
PR #5905.  In the same PR, we then added tests of the SSH-based object
transfer protocol, including the "batch transfers with ssh endpoint
and multiple objects and batches (git-lfs-transfer)" test in which
we explicitly set the "lfs.concurrentTransfers" with a value of two.

As we explained in a prior commit in this PR, that test was also
misnamed, as it effectively forced the Git LFS client to only transfer
two objects at a time, but did not alter the batch size or the
number of batches the queue processed.  Hence we have now renamed
the test to "batch transfers with ssh endpoint and multiple objects
exceeding workers (git-lfs-transfer)", which more accurately reflects
the conditions established by the test.

The assert_ssh_transfer_sessions() function's fourth parameter, which
is currently named "objs_per_batch", is used to calculate the maximum
number of trace log messages the function expects to find which record
the start of an SSH process or its termination.  Since this number is
actually determined by the value of the "lfs.concurrentTransfers"
option, we now rename the parameter to "max_concurrency" to better
reflect its purpose.

As well, in the tests where we call the assert_ssh_transfer_sessions()
function and pass a hard-coded value of eight for the fourth parameter,
we now instead pass the value of the "expectedConcurrentTransfers"
variable.  This will allow the tests to continue to pass regardless of
how we choose to define the "lfs.concurrentTransfers" option's default
value in the future, as the setup_expected_concurrent_transfers()
function should always return the appropriate value.

Note that for the reasons outlined in PR #6258, we plan to temporarily
reverse the principal changes from PR #6241.  However, we also intend
to eventually re-adopt the use of a default concurrency limit that
scales with the number of CPUs, at least for HTTP-based object transfers.
Hence we still retain the setup_expected_concurrent_transfers() function
in PR #6258 and can rely on it being present in the future.
@chrisd8088 chrisd8088 merged commit 1529d04 into git-lfs:main May 21, 2026
10 checks passed
@chrisd8088 chrisd8088 deleted the limit-concurrency-again branch May 21, 2026 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

git-config Related to our handling of Git configuration options

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants