Consider options for scaling concurrency of all transfer agents

As discussed in PRs #6241, #6234, and #6258 we would like to refine the concurrency controls for the Git LFS client so that they can scale with the number of available CPUs, at least when performing HTTP-based object transfers.

We adopted an initial implementation of in PR #6241, which was effective for HTTP object transfers and improved performance significantly on multi-core CPUs.

However, as described in PR #6258, this change also had the unintended effect of scaling the number of child processes we spawn when handling object transfers using a custom agent or when using multiplexed SSH connections.  While some systems may not be affected, creating large numbers of separate processes is at best not ideal and at worst may run afoul of system or container limits.  We therefore temporarily reversed the key change from PR #6241 in PR #6258.

After our v3.8.0 release, we should consider the best approach for a comprehensive solution that safely scales thread counts while keeping process counts within reasonable bounds.

#### SSH Transfer Agents

When using the "pure" SSH-only Git LFS object transfer protocol, we may run multiple instances of the `ssh(1) command`, unless SSH multiplexing is not enabled.  (Note that we may also use another SSH agent, depending on what has been configured via the `core.sshCommand` option and the `GIT_SSH` or `GIT_SSH_COMMAND` environment variables.)
    
When multiplexing is enabled, we set the `ssh(1)` command's `ControlMaster` option so that all the separate processes we start will share a single underlying SSH connection.  We also only start SSH agent processes on demand in this case.  When transferring many objects, though, we will start multiple processes, up to the limit specified by the `lfs.concurrentTransfers` configuration option.

#### HTTP Transfer Adapter

When using our standard HTTP-based transfer adapter, we may start multiple threads in situations where we do not need them all.
    
For instance, in cases where the `git lfs smudge` command is used instead of the more-efficient `git lfs filter-process` command, each invocation of the `git lfs smudge` command will transfer only a single object, but will nevertheless start the number of threads specified by the `lfs.concurrentTransfers` option.

Typical modern installations of Git LFS do not use the `git lfs smudge` command, but if the `filter.lfs.process` Git configuration option is missing, or if a version of Git older than v2.11.0 is in use, then the `git lfs smudge` command may still be utilized.

Even when the `git lfs filter-process` command is used, when just a small number of objects need to be transferred, we will still start the number of threads specified by the `lfs.concurrentTransfers` option.

#### Custom Transfer Agents

When using our custom transfer protocol we always start multiple separate instances of the transfer agent program, which may either be our `git lfs standalone-file` command or an agent configured with the `lfs.customTransfer.*` options.  In the first case, we only [start](https://github.com/git-lfs/git-lfs/blob/14ae1707938bf0a7b77e5bfe333b196574bbc843/tq/custom.go#L357) a single process, but otherwise we always [start](https://github.com/git-lfs/git-lfs/blob/14ae1707938bf0a7b77e5bfe333b196574bbc843/tq/custom.go#L389) exactly as many processes as is [specified](https://github.com/git-lfs/git-lfs/blob/14ae1707938bf0a7b77e5bfe333b196574bbc843/tq/adapterbase.go#L91-L97) by the `lfs.concurrentTransfers` option, unless the `lfs.concurrentTransfers.<name>.concurrent` option  is set to `false`.

In a related vein, the "Existing Implementation Issues" section from https://github.com/git-lfs/git-lfs/pull/6119#pullrequestreview-3675296168 describes an inconsistency in our handling of concurrency controls for custom transfer agents, and so is worth repeated here again for additional context:

> ... the `lfs.concurrentTransfers` option specifies the number of custom adapter processes that we [start](https://github.com/git-lfs/git-lfs/blob/1f2f7c697b5568b4bf1ed76e7ed4115dd1f0b3ba/tq/custom.go#L144) ([unless](https://github.com/git-lfs/git-lfs/blob/1f2f7c697b5568b4bf1ed76e7ed4115dd1f0b3ba/tq/custom.go#L120) the `lfs.customTransfer.<name>.concurrent` flag is set to `false`, in which case we only start [one](https://github.com/git-lfs/git-lfs/blob/1f2f7c697b5568b4bf1ed76e7ed4115dd1f0b3ba/tq/custom.go#L406) process).  We also [start](https://github.com/git-lfs/git-lfs/blob/1f2f7c697b5568b4bf1ed76e7ed4115dd1f0b3ba/tq/adapterbase.go#L96) one internal goroutine (i.e., thread) to communicate with each process.
>
> This all occurs within the `Begin()` method of the `adapterbase` structure where we [loop](https://github.com/git-lfs/git-lfs/blob/1f2f7c697b5568b4bf1ed76e7ed4115dd1f0b3ba/tq/adapterbase.go#L91-L97) as many times as is specified by the `lfs.concurrentTransfers` option, or the value with which we've overridden it.  This loop calls the `WorkerStarting()` method of the `customAdapter` structure because we've set that structure's `transferImpl` element to [point](https://github.com/git-lfs/git-lfs/blob/1f2f7c697b5568b4bf1ed76e7ed4115dd1f0b3ba/tq/custom.go#L346) back to itself.
>
> The `WorkerStarting()` method, in addition to starting one custom transfer agent process, then [sends](https://github.com/git-lfs/git-lfs/blob/1f2f7c697b5568b4bf1ed76e7ed4115dd1f0b3ba/tq/custom.go#L152-L155) an initialization message to the process.
>
> That all makes sense, but where I think we have an oversight is that the initialization message [includes](https://github.com/git-lfs/git-lfs/blob/1f2f7c697b5568b4bf1ed76e7ed4115dd1f0b3ba/tq/custom.go#L68-L69) `concurrent` and `concurrenttransfers` elements, and we've [documented](https://github.com/git-lfs/git-lfs/blob/1f2f7c697b5568b4bf1ed76e7ed4115dd1f0b3ba/docs/custom-transfers.md?plain=1#L134-L138) that we include the latter specifically in case "the transfer process wants to implement its own concurrency and wants to respect this setting".
>
> But there's no way for the process to "respect" the `concurrenttransfers` setting since we've already started that many separate processes, and each one just gets a copy of the same initialization message.
>
> Arguably, the Git LFS client should either start a single process, pass it the concurrency settings, and let it figure out how many child processes to spawn, or start multiple processes per the concurrency settings but then not pass those settings on because there's no purpose to doing so.
>
> However, we've had this design and implementation for many years, so we can't really reverse course now.  However, perhaps we can find a way to improve the situation as part of the `"version": 2.0` revisions to the custom transfer protocol.

#### Future Considerations

Ideally, we would address all of these issues by refactoring our HTTP and custom transfer adapters to start threads or processes only on demand, as the SSH transfer adapter already does, while also introducing new configuration options to better set limits on the number of transfer agent processes we spawn as distinct from the number of threads we start during HTTP object transfers.

Some work in this direction has already been developed in PR #6234, for HTTP transfers specifically.
    
We should write a proposal to discuss how to implement any new configuration options such that they respect the historical meaning of the `lfs.concurrentTransfers` option.  As one consideration, if a user has explicitly set the `lfs.concurrentTransfers` option but has not set any of the new options, then the `lfs.concurrentTransfers` option's value should be understood as a maximum limit for the number of threads or processes we start during object transfers.  On the other hand, if a user sets one of our new options but not the `lfs.concurrentTransfers` option, then we might allow its value to override the default value of the `lfs.concurrentTransfers` option.

/cc @joshfriend as the author of PRs #6241 and #6234.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider options for scaling concurrency of all transfer agents #6259

SSH Transfer Agents

HTTP Transfer Adapter

Custom Transfer Agents

Future Considerations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Consider options for scaling concurrency of all transfer agents #6259

Description

SSH Transfer Agents

HTTP Transfer Adapter

Custom Transfer Agents

Future Considerations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions