refactor(ssh): use agent instead of temporary files for private keys#196
refactor(ssh): use agent instead of temporary files for private keys#196water-sucks merged 5 commits intonix-community:mainfrom
Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis PR introduces a dedicated SSH agent management module, refactors SSH system architecture to use a configuration-based approach with separate SSHConfig and AgentManager types, integrates this new architecture into the apply command flow, and removes the TempFile abstraction from system utilities. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/system/ssh.go (1)
256-260:⚠️ Potential issue | 🟠 MajorDon’t retain an unverified sudo password on auth failure.
The password is cached before verification. If verification fails once, future calls won’t reprompt because
s.sudoPasswordis already set.✅ Suggested fix
s.sudoPassword = bytePassword if err = s.testRemoteRoot(rootCmd); err != nil { + s.sudoPassword = nil return fmt.Errorf("failed to verify %s password: %s", rootCmd, err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/system/ssh.go` around lines 256 - 260, The code currently sets s.sudoPassword = bytePassword before verification, causing an unverified password to be cached; move the assignment so s.sudoPassword is only set after testRemoteRoot succeeds, or if you prefer to keep the assignment where it is, ensure you clear it on error (e.g., set s.sudoPassword = nil) when testRemoteRoot(rootCmd) returns an error; update the block around testRemoteRoot, referencing s.sudoPassword, testRemoteRoot, rootCmd, and bytePassword accordingly.
🧹 Nitpick comments (1)
internal/system/ssh.go (1)
296-297: Clone should copy password bytes instead of sharing the same backing array.Sharing the same
[]byteacross clones is fragile for concurrent/session lifecycle handling.✅ Suggested refactor
- sudoPassword: s.sudoPassword, + sudoPassword: append([]byte(nil), s.sudoPassword...),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/system/ssh.go` around lines 296 - 297, The Clone implementation currently assigns the sudoPassword slice by reference (sudoPassword: s.sudoPassword), which shares the backing array across clones; change Clone (or the method constructing the cloned ssh session) to deep-copy the password bytes: if s.sudoPassword != nil create a new []byte of the same length, copy the contents, and assign that new slice to sudoPassword (handle nil appropriately) so each clone has its own backing array.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/apply/apply.go`:
- Around line 236-241: The deferred call to sshAgent.Stop() may panic when
sshUtils.NewAgentManager(log) returns an error and sshAgent is nil; update the
code to only defer sshAgent.Stop() when sshAgent is non-nil (e.g., check the
result of sshUtils.NewAgentManager and, after ensuring err == nil and sshAgent
!= nil, register defer sshAgent.Stop()), or assign sshAgent to a local variable
and use a nil-check inside the deferred closure before calling Stop(); reference
sshAgent, sshUtils.NewAgentManager, and Stop() when making the change.
In `@internal/ssh/agent.go`:
- Around line 312-319: The code calls os.Stat(tmpdir) and if it returns
os.ErrNotExist it creates the dir with os.MkdirAll(tmpdir, 0o777|os.ModeSticky)
but then still returns the original stat error unconditionally; update the
branch so that when errors.Is(err, os.ErrNotExist) and os.MkdirAll succeeds you
do NOT return the original err (only return on MkdirAll failure or other
non-NotExist stat errors). Locate the tmpdir check around os.Stat and
os.MkdirAll and remove or guard the final return "", err so success paths
continue normally.
- Around line 242-243: The goroutine passed to s.wg.Go (e.g., the callback that
currently defers s.wg.Done() around s.acceptConnections and the handler at the
shown callback at lines 296–299) is calling Done() manually even though
WaitGroup.Go automatically handles Done(); remove the manual defer s.wg.Done()
from any functions or anonymous goroutines supplied to s.wg.Go (leave the
callsites like s.wg.Go(s.acceptConnections) or the anonymous callback intact),
ensuring Done() is only called by WaitGroup.Go to avoid a double Done/underflow
panic.
In `@internal/system/ssh.go`:
- Around line 100-116: The current logic only uses PrivateKeyCmd output when
options.AgentManager is present, causing key-based auth to be dropped if
AgentManager is nil or manager.Add fails; change the flow in the block around
options.PrivateKeyCmd/runPrivateKeyCmd so that after obtaining privateKey (or
its bytes) you attempt to add it to the agent if manager != nil and manager.Add
succeeds, but regardless of agent availability or Add failure, parse the
returned private key into an ssh.Signer (use the same conversion that
runPrivateKeyCmd returns) and append a direct key auth method to auth (e.g.,
ssh.PublicKeys with the signer or equivalent), while keeping the existing
ssh.PublicKeysCallback(manager.Client().Signers) when manager is present and Add
succeeded; reference options.PrivateKeyCmd, runPrivateKeyCmd, manager.Add, auth,
and ssh.PublicKeysCallback to locate where to add the fallback signer append.
- Around line 164-166: The code appends a possibly relative path when
os.UserHomeDir() fails; change the block in internal/system/ssh.go so you check
the error and the returned homeDir before building knownHostsUserFile and
appending to defaultKnownHosts (i.e., only construct filepath.Join(homeDir,
".ssh", "known_hosts") and append it when err == nil and homeDir != "");
optionally fall back to HOME env or log/skipped addition when the home directory
can't be determined to avoid writing a known_hosts file into the current working
directory.
---
Outside diff comments:
In `@internal/system/ssh.go`:
- Around line 256-260: The code currently sets s.sudoPassword = bytePassword
before verification, causing an unverified password to be cached; move the
assignment so s.sudoPassword is only set after testRemoteRoot succeeds, or if
you prefer to keep the assignment where it is, ensure you clear it on error
(e.g., set s.sudoPassword = nil) when testRemoteRoot(rootCmd) returns an error;
update the block around testRemoteRoot, referencing s.sudoPassword,
testRemoteRoot, rootCmd, and bytePassword accordingly.
---
Nitpick comments:
In `@internal/system/ssh.go`:
- Around line 296-297: The Clone implementation currently assigns the
sudoPassword slice by reference (sudoPassword: s.sudoPassword), which shares the
backing array across clones; change Clone (or the method constructing the cloned
ssh session) to deep-copy the password bytes: if s.sudoPassword != nil create a
new []byte of the same length, copy the contents, and assign that new slice to
sudoPassword (handle nil appropriately) so each clone has its own backing array.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
cmd/apply/apply.gointernal/ssh/agent.gointernal/system/ssh.gointernal/system/system.gointernal/system/tempfile_linux.gointernal/system/tempfile_other.go
💤 Files with no reviewable changes (3)
- internal/system/tempfile_linux.go
- internal/system/system.go
- internal/system/tempfile_other.go
4b70a36 to
460b51e
Compare
|
@Sporif since I know you worked on the temp private key functionality, I'd like your input on this in particular. |
Sporif
left a comment
There was a problem hiding this comment.
This is a nice solution for non-Linux systems. I wish ssh-agent supported abstract unix sockets though, that would make it as good as memfd for Linux. But overall I'm fine with this approach, and it works well in my limited testing so far.
This spins up an internal SSH agent server fallback for when an SSH agent is not running. All private key material that is accessed using this command must never be persisted to disk, and should instead use this SSH agent.
460b51e to
f073b9e
Compare
Description
After some testing, the temporary file mechanism introduced in #175 is a bit brittle sometimes on non-Linux systems, and has the potential to leave private key material on the system in failure scenarios. Even though the Linux implementation uses
memfdto make sure this never happens, I still wanted to see if there was an alternative approach for storing keys. And there is! We can just spin up an agent ourselves or add keys to an existing agent.This PR creates an SSH
AgentManagerto do just that: it either uses an existing SSH agent, or spins up its own internal SSH agent to add keys to, and subsequent processes (such asnix-copy-closurecan inherit this value throughSSH_AUTH_SOCKso that we don't need to pass temporary file paths throughNIX_SSHOPTSeither.The agent server can be disabled using a new setting,
ssh.enable_agent_serverif so desired.If an existing SSH agent is used, any added keys are removed from the agent at the end of the program.
Also, unrelated fix: it handles errors for
knownhostsdetection if$HOMEis not defined, which should be rare.Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Chores