Merged
Conversation
chrisd8088
approved these changes
Jan 29, 2024
Member
chrisd8088
left a comment
There was a problem hiding this comment.
Looks good, thank you! I had one suggestion only for an update to a comment.
Right now, we spawn potentially several connections without necessarily needing to. For example, if we're transferring two objects, we can practically use at most three connections: one for the batch and one for each object. To make this more efficient and avoid needless overhead, let's not actually create the connection until we attempt to acquire it. At that point, if it doesn't exist, we'll spawn a new one (using the control path socket if possible) and then start sending data. Otherwise, unless it's the initial connection, let's just stub it out until we actually need it. Note that we still create the connection before it's needed because we create all workers up front, but that will change in a future commit now that we have this change in place. Co-authored-by: Chris Darroch <chris8088@github.com>
Right now, we spawn potentially several connections without necessarily needing to. For example, if we're transferring two objects, we can practically use at most three connections: one for the batch and one for each object. Instead of spawning all the workers up front, let's change how we do things and simply pass in the worker number at first. Then, acquire the connection when we're uploading or downloading so that the connection is only spawned on demand. This means that we spawn fewer workers in case there are few objects.
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Oct 22, 2024
In commit 44b8801 of PR git-lfs#5634 we revised the Connection() method of the SSHTransfer structure in our "ssh" package to create connections on demand, rather than simply return pre-existing connections created by the setConnectionCount() method, which allowed us to reduce the total number of SSH connections in many cases. As part of this change, we altered the Connection() method to return an error if it could not create a connection successfully, and we then updated the callers of this method to check for a non-nil error. However, the method still returned nil if the requested connection number exceeded the maximum number of connections permitted, and none of the callers check for that condition. In particular, the batchInternal() method of the SSHBatchClient structure in the "tq" package originally checked for a nil return value from the Connection() method, as this was the only way the Connection() method indicated a failure to find an active SSH connection. With the change in commit 44b8801, the batchInternal() method now checks only for a non-nil error return value, which means it does not detect the case where a nil PktlineConnection value is returned. As a result, in situations such as those described in git-lfs#5880, the batchInternal() method may cause a Go panic when it attempts to call the Lock() method of a nil PktlineConnection value. One way this can occur is after one batch transfer operation has finished and the Wait() method of the TransferQueue structure has called the Shutdown() method of the sshTransfer field of the TransferQueue's concreteManifest. That closes all the SSH connections and sets the "conn" array to nil. But a reference to the same SSHTransfer structure is retained by the sshTransfer field of the SSHBatchClient structure to which the concreteManifest's batchClientAdapter field points. When that manifest is reused by a subsequent batch transfer operation, the batchInternal() method of the SSHBatchClient structure attempts to retrieve a connection, receives a nil PktlineConnection value, and causes a panic by trying to use it to run the Lock() method. We therefore adjust the Connection() method of the SSHTransfer structure to return a non-nil error when the requested connection number exceeds the maximum number of permitted SSH connections. This also covers the case where the "conn" array has been set to nil, as the len() built-in function returns zero for the length of a nil array.
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Oct 23, 2024
In commit 44b8801 of PR git-lfs#5634 we revised the Connection() method of the SSHTransfer structure in our "ssh" package to create connections on demand, rather than simply return pre-existing connections created by the setConnectionCount() method, which allowed us to reduce the total number of SSH connections in many cases. As part of this change, we altered the Connection() method to return an error if it could not create a connection successfully, and we then updated the callers of this method to check for a non-nil error. However, the method still returned nil if the requested connection number exceeded the maximum number of connections permitted, and none of the callers check for that condition. In particular, the batchInternal() method of the SSHBatchClient structure in the "tq" package originally checked for a nil return value from the Connection() method, as this was the only way the Connection() method indicated a failure to find an active SSH connection. With the change in commit 44b8801, the batchInternal() method now checks only for a non-nil error return value, which means it does not detect the case where a nil PktlineConnection value is returned. As a result, in situations such as those described in git-lfs#5880, the batchInternal() method may cause a Go panic when it attempts to call the Lock() method of a nil PktlineConnection value. One way this can occur is after one batch transfer operation has finished and the Wait() method of the TransferQueue structure has called the Shutdown() method of the sshTransfer field of the TransferQueue's concreteManifest. That closes all the SSH connections and sets the "conn" array to nil. But a reference to the same SSHTransfer structure is retained by the sshTransfer field of the SSHBatchClient structure to which the concreteManifest's batchClientAdapter field points. When that manifest is reused by a subsequent batch transfer operation, the batchInternal() method of the SSHBatchClient structure attempts to retrieve a connection, receives a nil PktlineConnection value, and causes a panic by trying to use it to run the Lock() method. We therefore adjust the Connection() method of the SSHTransfer structure to return a non-nil error when the requested connection number exceeds the maximum number of permitted SSH connections. This also covers the case where the "conn" array has been set to nil, as the len() built-in function returns zero for the length of a nil array.
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Oct 29, 2024
In commit 44b8801 of PR git-lfs#5634 we revised the Connection() method of the SSHTransfer structure in our "ssh" package to create connections on demand, rather than simply return pre-existing connections created by the setConnectionCount() method, which allowed us to reduce the total number of SSH connections in many cases. As part of this change, we altered the Connection() method to return an error if it could not create a connection successfully, and we then updated the callers of this method to check for a non-nil error. However, the method still returned nil if the requested connection number exceeded the maximum number of connections permitted, and none of the callers check for that condition. In particular, the batchInternal() method of the SSHBatchClient structure in the "tq" package originally checked for a nil return value from the Connection() method, as this was the only way the Connection() method indicated a failure to find an active SSH connection. With the change in commit 44b8801, the batchInternal() method now checks only for a non-nil error return value, which means it does not detect the case where a nil PktlineConnection value is returned. As a result, in situations such as those described in git-lfs#5880, the batchInternal() method may cause a Go panic when it attempts to call the Lock() method of a nil PktlineConnection value. One way this can occur is after one batch transfer operation has finished and the Wait() method of the TransferQueue structure has called the Shutdown() method of the sshTransfer field of the TransferQueue's concreteManifest. That closes all the SSH connections and sets the "conn" array to nil. But a reference to the same SSHTransfer structure is retained by the sshTransfer field of the SSHBatchClient structure to which the concreteManifest's batchClientAdapter field points. When that manifest is reused by a subsequent batch transfer operation, the batchInternal() method of the SSHBatchClient structure attempts to retrieve a connection, receives a nil PktlineConnection value, and causes a panic by trying to use it to run the Lock() method. We therefore adjust the Connection() method of the SSHTransfer structure to return a non-nil error when the requested connection number exceeds the maximum number of permitted SSH connections. This also covers the case where the "conn" array has been set to nil, as the len() built-in function returns zero for the length of a nil array.
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Nov 1, 2024
In commit 44b8801 of PR git-lfs#5634 we revised the Connection() method of the SSHTransfer structure in our "ssh" package to create connections on demand, rather than simply return pre-existing connections created by the setConnectionCount() method, which allowed us to reduce the total number of SSH connections in many cases. As part of this change, we altered the Connection() method to return an error if it could not create a connection successfully, and we then updated the callers of this method to check for a non-nil error. However, the method still returned nil if the requested connection number exceeded the maximum number of connections permitted, and none of the callers check for that condition. In particular, the batchInternal() method of the SSHBatchClient structure in the "tq" package originally checked for a nil return value from the Connection() method, as this was the only way the Connection() method indicated a failure to find an active SSH connection. With the change in commit 44b8801, the batchInternal() method now checks only for a non-nil error return value, which means it does not detect the case where a nil PktlineConnection value is returned. As a result, in situations such as those described in git-lfs#5880, the batchInternal() method may cause a Go panic when it attempts to call the Lock() method of a nil PktlineConnection value. One way this can occur is after one batch transfer operation has finished and the Wait() method of the TransferQueue structure has called the Shutdown() method of the sshTransfer field of the TransferQueue's concreteManifest. That closes all the SSH connections and sets the "conn" array to nil. But a reference to the same SSHTransfer structure is retained by the sshTransfer field of the SSHBatchClient structure to which the concreteManifest's batchClientAdapter field points. When that manifest is reused by a subsequent batch transfer operation, the batchInternal() method of the SSHBatchClient structure attempts to retrieve a connection, receives a nil PktlineConnection value, and causes a panic by trying to use it to run the Lock() method. We therefore adjust the Connection() method of the SSHTransfer structure to return a non-nil error when the requested connection number exceeds the maximum number of permitted SSH connections. This also covers the case where the "conn" array has been set to nil, as the len() built-in function returns zero for the length of a nil array.
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Nov 3, 2024
In commit 44b8801 of PR git-lfs#5634 we revised the Connection() method of the SSHTransfer structure in our "ssh" package to create connections on demand, rather than simply return pre-existing connections created by the setConnectionCount() method, which allowed us to reduce the total number of SSH connections in many cases. As part of this change, we altered the Connection() method to return an error if it could not create a connection successfully, and we then updated the callers of this method to check for a non-nil error. However, the method still returned nil if the requested connection number exceeded the maximum number of connections permitted, and none of the callers check for that condition. In particular, the batchInternal() method of the SSHBatchClient structure in the "tq" package originally checked for a nil return value from the Connection() method, as this was the only way the Connection() method indicated a failure to find an active SSH connection. With the change in commit 44b8801, the batchInternal() method now checks only for a non-nil error return value, which means it does not detect the case where a nil PktlineConnection value is returned. As a result, in situations such as those described in git-lfs#5880, the batchInternal() method may cause a Go panic when it attempts to call the Lock() method of a nil PktlineConnection value. One way this can occur is after one batch transfer operation has finished and the Wait() method of the TransferQueue structure has called the Shutdown() method of the sshTransfer field of the TransferQueue's concreteManifest. That closes all the SSH connections and sets the "conn" array to nil. But a reference to the same SSHTransfer structure is retained by the sshTransfer field of the SSHBatchClient structure to which the concreteManifest's batchClientAdapter field points. When that manifest is reused by a subsequent batch transfer operation, the batchInternal() method of the SSHBatchClient structure attempts to retrieve a connection, receives a nil PktlineConnection value, and causes a panic by trying to use it to run the Lock() method. We therefore adjust the Connection() method of the SSHTransfer structure to return a non-nil error when the requested connection number exceeds the maximum number of permitted SSH connections. This also covers the case where the "conn" array has been set to nil, as the len() built-in function returns zero for the length of a nil array.
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Nov 4, 2024
In commit 44b8801 of PR git-lfs#5634 we revised the Connection() method of the SSHTransfer structure in our "ssh" package to create SSH sessions on demand, rather than simply return pre-existing sessions created by the setConnectionCount() method, which allowed us to reduce the total number of SSH sessions in many cases. As part of this change, we altered the Connection() method to return an error if it could not create a session successfully, and we then updated the callers of this method to check for a non-nil error. However, the method still returned a nil error value, along with a nil PktlineConnection value, if the requested session ID exceeded the maximum number of sessions permitted and so no additional session was created or returned. However, the method's callers all now assume a nil error value implies a non-nil PktlineConnection value. In particular, the batchInternal() method of the SSHBatchClient structure in the "tq" package originally checked for a nil return value from the Connection() method, as this was the only way the Connection() method indicated a failure to find or create an active SSH session. Since the change in commit 44b8801, the batchInternal() method now checks only for a non-nil error return value, which means it does not detect the case where a nil PktlineConnection value is returned. As a result, in situations such as those described in git-lfs#5880, the batchInternal() method may cause a Go panic when it attempts to call the Lock() method of a nil PktlineConnection value. One way this can occur is after one batch transfer operation has finished and the Wait() method of the TransferQueue structure has called the Shutdown() method of the sshTransfer field of the TransferQueue's concreteManifest. That closes all the SSH sessions and sets the "conn" array to nil. But a reference to the same SSHTransfer structure is retained by the sshTransfer field of the SSHBatchClient structure to which the concreteManifest's batchClientAdapter field points. When that manifest is reused by a subsequent batch transfer operation, the batchInternal() method of the SSHBatchClient structure attempts to retrieve a session, receives a nil PktlineConnection value, and causes a panic by trying to use it to run the Lock() method. We therefore adjust the Connection() method of the SSHTransfer structure to return a non-nil error when the requested session ID exceeds the maximum number of sessions allowed. This also covers the case where the "conn" array has been set to nil, as the len() built-in function returns zero for the length of a nil array.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Right now, we spawn potentially several connections without necessarily needing to. For example, if we're transferring two objects, we can practically use at most three connections: one for the batch and one for each object.
Instead of spawning all the workers up front, let's change how we do things and simply pass in the worker number at first. Then, acquire the connection when we're uploading or downloading so that the connection is only spawned on demand. This means that we spawn fewer workers in case there are few objects.
/cc #5622