Skip to content

refactor(ssh): use agent instead of temporary files for private keys#196

Merged
water-sucks merged 5 commits intonix-community:mainfrom
water-sucks:ssh-refactors
Mar 1, 2026
Merged

refactor(ssh): use agent instead of temporary files for private keys#196
water-sucks merged 5 commits intonix-community:mainfrom
water-sucks:ssh-refactors

Conversation

@water-sucks
Copy link
Copy Markdown
Collaborator

@water-sucks water-sucks commented Feb 26, 2026

Description

After some testing, the temporary file mechanism introduced in #175 is a bit brittle sometimes on non-Linux systems, and has the potential to leave private key material on the system in failure scenarios. Even though the Linux implementation uses memfd to make sure this never happens, I still wanted to see if there was an alternative approach for storing keys. And there is! We can just spin up an agent ourselves or add keys to an existing agent.

This PR creates an SSH AgentManager to do just that: it either uses an existing SSH agent, or spins up its own internal SSH agent to add keys to, and subsequent processes (such as nix-copy-closure can inherit this value through SSH_AUTH_SOCK so that we don't need to pass temporary file paths through NIX_SSHOPTS either.

The agent server can be disabled using a new setting, ssh.enable_agent_server if so desired.

If an existing SSH agent is used, any added keys are removed from the agent at the end of the program.

Also, unrelated fix: it handles errors for knownhosts detection if $HOME is not defined, which should be rare.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced SSH agent support with automatic management and connection pooling for improved SSH operation reliability.
    • Improved SSH configuration handling with better reconnection and cloning capabilities.
  • Bug Fixes

    • Refined SSH error handling and connection lifecycle management.
  • Chores

    • Removed legacy temporary file implementations in favor of standard system approaches.

@water-sucks
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 26, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

This PR introduces a dedicated SSH agent management module, refactors SSH system architecture to use a configuration-based approach with separate SSHConfig and AgentManager types, integrates this new architecture into the apply command flow, and removes the TempFile abstraction from system utilities.

Changes

Cohort / File(s) Summary
SSH Agent Management
internal/ssh/agent.go
New module introducing AgentManager for thread-safe SSH agent handling with support for connecting to existing agents via SSH_AUTH_SOCK or creating in-process Unix-socket-based agents. Includes key management, lifecycle control, and optional in-memory agent server functionality.
SSH System Refactoring
internal/system/ssh.go
Major architectural shift replacing monolithic SSHSystem with separate SSHConfig and SSHSystem types. Introduces SSHConfigOptions for centralized configuration, refactors authentication/host-key logic into new helpers (knownHostsCallback, dialClient, runPrivateKeyCmd), and adds Reconnect/Clone methods.
Apply Command Integration
cmd/apply/apply.go
Integrates new SSH agent and configuration-based system by creating an AgentManager at startup, building SSHConfig with agent/host information, and deferring cleanup of SSH resources.
System Utilities
internal/system/system.go
Removes accumulation of Nix SSH options from remote-copy branches, simplifying NIX_SSHOPTS construction logic.
Tempfile Removal
internal/system/tempfile_linux.go, internal/system/tempfile_other.go
Deletes platform-specific TempFile implementations including constructors, path accessors, and cleanup methods.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main architectural change: replacing temporary files with SSH agent for private key handling.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/system/ssh.go (1)

256-260: ⚠️ Potential issue | 🟠 Major

Don’t retain an unverified sudo password on auth failure.

The password is cached before verification. If verification fails once, future calls won’t reprompt because s.sudoPassword is already set.

✅ Suggested fix
 	s.sudoPassword = bytePassword

 	if err = s.testRemoteRoot(rootCmd); err != nil {
+		s.sudoPassword = nil
 		return fmt.Errorf("failed to verify %s password: %s", rootCmd, err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/system/ssh.go` around lines 256 - 260, The code currently sets
s.sudoPassword = bytePassword before verification, causing an unverified
password to be cached; move the assignment so s.sudoPassword is only set after
testRemoteRoot succeeds, or if you prefer to keep the assignment where it is,
ensure you clear it on error (e.g., set s.sudoPassword = nil) when
testRemoteRoot(rootCmd) returns an error; update the block around
testRemoteRoot, referencing s.sudoPassword, testRemoteRoot, rootCmd, and
bytePassword accordingly.
🧹 Nitpick comments (1)
internal/system/ssh.go (1)

296-297: Clone should copy password bytes instead of sharing the same backing array.

Sharing the same []byte across clones is fragile for concurrent/session lifecycle handling.

✅ Suggested refactor
-		sudoPassword: s.sudoPassword,
+		sudoPassword: append([]byte(nil), s.sudoPassword...),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/system/ssh.go` around lines 296 - 297, The Clone implementation
currently assigns the sudoPassword slice by reference (sudoPassword:
s.sudoPassword), which shares the backing array across clones; change Clone (or
the method constructing the cloned ssh session) to deep-copy the password bytes:
if s.sudoPassword != nil create a new []byte of the same length, copy the
contents, and assign that new slice to sudoPassword (handle nil appropriately)
so each clone has its own backing array.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/apply/apply.go`:
- Around line 236-241: The deferred call to sshAgent.Stop() may panic when
sshUtils.NewAgentManager(log) returns an error and sshAgent is nil; update the
code to only defer sshAgent.Stop() when sshAgent is non-nil (e.g., check the
result of sshUtils.NewAgentManager and, after ensuring err == nil and sshAgent
!= nil, register defer sshAgent.Stop()), or assign sshAgent to a local variable
and use a nil-check inside the deferred closure before calling Stop(); reference
sshAgent, sshUtils.NewAgentManager, and Stop() when making the change.

In `@internal/ssh/agent.go`:
- Around line 312-319: The code calls os.Stat(tmpdir) and if it returns
os.ErrNotExist it creates the dir with os.MkdirAll(tmpdir, 0o777|os.ModeSticky)
but then still returns the original stat error unconditionally; update the
branch so that when errors.Is(err, os.ErrNotExist) and os.MkdirAll succeeds you
do NOT return the original err (only return on MkdirAll failure or other
non-NotExist stat errors). Locate the tmpdir check around os.Stat and
os.MkdirAll and remove or guard the final return "", err so success paths
continue normally.
- Around line 242-243: The goroutine passed to s.wg.Go (e.g., the callback that
currently defers s.wg.Done() around s.acceptConnections and the handler at the
shown callback at lines 296–299) is calling Done() manually even though
WaitGroup.Go automatically handles Done(); remove the manual defer s.wg.Done()
from any functions or anonymous goroutines supplied to s.wg.Go (leave the
callsites like s.wg.Go(s.acceptConnections) or the anonymous callback intact),
ensuring Done() is only called by WaitGroup.Go to avoid a double Done/underflow
panic.

In `@internal/system/ssh.go`:
- Around line 100-116: The current logic only uses PrivateKeyCmd output when
options.AgentManager is present, causing key-based auth to be dropped if
AgentManager is nil or manager.Add fails; change the flow in the block around
options.PrivateKeyCmd/runPrivateKeyCmd so that after obtaining privateKey (or
its bytes) you attempt to add it to the agent if manager != nil and manager.Add
succeeds, but regardless of agent availability or Add failure, parse the
returned private key into an ssh.Signer (use the same conversion that
runPrivateKeyCmd returns) and append a direct key auth method to auth (e.g.,
ssh.PublicKeys with the signer or equivalent), while keeping the existing
ssh.PublicKeysCallback(manager.Client().Signers) when manager is present and Add
succeeded; reference options.PrivateKeyCmd, runPrivateKeyCmd, manager.Add, auth,
and ssh.PublicKeysCallback to locate where to add the fallback signer append.
- Around line 164-166: The code appends a possibly relative path when
os.UserHomeDir() fails; change the block in internal/system/ssh.go so you check
the error and the returned homeDir before building knownHostsUserFile and
appending to defaultKnownHosts (i.e., only construct filepath.Join(homeDir,
".ssh", "known_hosts") and append it when err == nil and homeDir != "");
optionally fall back to HOME env or log/skipped addition when the home directory
can't be determined to avoid writing a known_hosts file into the current working
directory.

---

Outside diff comments:
In `@internal/system/ssh.go`:
- Around line 256-260: The code currently sets s.sudoPassword = bytePassword
before verification, causing an unverified password to be cached; move the
assignment so s.sudoPassword is only set after testRemoteRoot succeeds, or if
you prefer to keep the assignment where it is, ensure you clear it on error
(e.g., set s.sudoPassword = nil) when testRemoteRoot(rootCmd) returns an error;
update the block around testRemoteRoot, referencing s.sudoPassword,
testRemoteRoot, rootCmd, and bytePassword accordingly.

---

Nitpick comments:
In `@internal/system/ssh.go`:
- Around line 296-297: The Clone implementation currently assigns the
sudoPassword slice by reference (sudoPassword: s.sudoPassword), which shares the
backing array across clones; change Clone (or the method constructing the cloned
ssh session) to deep-copy the password bytes: if s.sudoPassword != nil create a
new []byte of the same length, copy the contents, and assign that new slice to
sudoPassword (handle nil appropriately) so each clone has its own backing array.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1097d85 and d643fef.

📒 Files selected for processing (6)
  • cmd/apply/apply.go
  • internal/ssh/agent.go
  • internal/system/ssh.go
  • internal/system/system.go
  • internal/system/tempfile_linux.go
  • internal/system/tempfile_other.go
💤 Files with no reviewable changes (3)
  • internal/system/tempfile_linux.go
  • internal/system/system.go
  • internal/system/tempfile_other.go

@water-sucks water-sucks force-pushed the ssh-refactors branch 2 times, most recently from 4b70a36 to 460b51e Compare February 26, 2026 02:56
@water-sucks water-sucks requested a review from Sporif February 26, 2026 02:56
@water-sucks
Copy link
Copy Markdown
Collaborator Author

@Sporif since I know you worked on the temp private key functionality, I'd like your input on this in particular.

Copy link
Copy Markdown
Collaborator

@Sporif Sporif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice solution for non-Linux systems. I wish ssh-agent supported abstract unix sockets though, that would make it as good as memfd for Linux. But overall I'm fine with this approach, and it works well in my limited testing so far.

@water-sucks water-sucks merged commit 0d699d6 into nix-community:main Mar 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants