Skip to content

fix: tie agent goroutine to daemon shutdown ctx + add runGit helper#433

Merged
tomasz-tomczyk merged 2 commits intomainfrom
fix/daemon-shutdown-and-rungit
May 2, 2026
Merged

fix: tie agent goroutine to daemon shutdown ctx + add runGit helper#433
tomasz-tomczyk merged 2 commits intomainfrom
fix/daemon-shutdown-and-rungit

Conversation

@tomasz-tomczyk
Copy link
Copy Markdown
Owner

Summary

Two coupled correctness fixes for the daemon's shutdown path. They ship together because the second is groundwork for cancellable git work and shares one mental model with the first.

1. Daemon shutdown safety for the agent goroutine

runAgentCmd was creating its own context.WithTimeout(context.Background(), 10*time.Minute) and was launched bare (go s.runAgentCmd(...)), unparented from the daemon's signal-handled context. On SIGINT/SIGTERM/idle-timeout the agent subprocess (Setsid via daemonSysProcAttr) kept running, burning LLM API tokens until its 10-minute budget elapsed, and then called AddReply / RefreshFileList / RefreshDiffs / notify on a session whose watcher and SSE goroutines were already gone — after WriteFiles had run.

  • Server gains shutdownCtx and bgWG fields plus SetShutdownCtx, effectiveCtx, WaitBackground helpers.
  • runServe calls srv.SetShutdownCtx(ctx) immediately after wiring the signal context.
  • handleAgentRequest wraps its goroutine with bgWG.Add/Done.
  • runAgentCmd derives its 10-min timeout from effectiveCtx(); skips the post-run refresh fan-out if the shutdown ctx is already done (the reply is still added in-memory and persisted by the subsequent WriteFiles).
  • The shutdown path now runs in this order: session.Shutdown() (final SSE) → httpServer.Shutdown (drains in-flight handlers, gates further bgWG.Add calls) → WaitBackground(30s) (drains spawned agent runners) → WriteFiles.
  • On initErr, runServe calls stop() immediately instead of <-ctx.Done(). Without this the daemon would sit in 503-land for up to an hour after a failed init, holding a port and a process slot.

2. runGit helper

git.go had ~42 exec.Command("git", ...) call sites with no shared place to add cross-cutting concerns (ctx cancellation, env hardening, stderr capture).

Added runGit(ctx, dir, args...) ([]byte, error) that uses exec.CommandContext, sets dir, appends GIT_TERMINAL_PROMPT=0 to the env (prevents the daemon from blocking on a credential prompt with no tty), and includes captured stderr in the returned error.

Converted the high-traffic call sites: IsGitRepo, RepoRoot, AllTrackedFiles (used by handleFilesList), changedFilesStaged, changedFilesUnstaged, changedFilesBranch. Migration is intentionally partial — the helper docstring notes other call sites should be converted opportunistically.

The pre-existing test helper named runGit (in testutil_test.go) was renamed to gitT to free the name; purely mechanical, all _test.go updated in one pass.

Review

  • go build ./... clean
  • go vet ./... clean
  • gofmt -l . clean
  • golangci-lint run ./... clean (0 issues)
  • go test -race -count=1 ./... passing

Test plan

  • Added TestRunAgentCmd_CancelledByShutdownCtx — verifies agent subprocess is terminated within 5s of shutdown ctx cancel (regression guard for orphaned subprocess bug).
  • Added TestWaitBackground_TimesOut — verifies the wait helper drains cleanly when idle and times out as configured when goroutines are wedged.
  • Existing agent / handler / git tests continue to pass under -race.

🤖 Generated with Claude Code

Two coupled fixes shipped together. Both center on the daemon's shutdown
behaviour: the first stops orphaning agent subprocesses, the second is the
groundwork that lets future git operations actually be cancellable.

## 1. Daemon shutdown safety for the agent goroutine

`runAgentCmd` was creating its own `context.WithTimeout(context.Background(),
10*time.Minute)` and was launched bare (`go s.runAgentCmd(...)`), unparented
from the daemon's signal-handled context. On SIGINT/SIGTERM/idle-timeout the
agent subprocess (Setsid via daemonSysProcAttr) kept running, burning LLM API
tokens until its 10-minute budget elapsed, and then called AddReply /
RefreshFileList / RefreshDiffs / notify on a session whose watcher and SSE
goroutines were already gone — after WriteFiles had run.

Changes:
- `Server` gains `shutdownCtx` and `bgWG` fields plus `SetShutdownCtx`,
  `effectiveCtx`, and `WaitBackground` helpers.
- `runServe` calls `srv.SetShutdownCtx(ctx)` right after wiring the signal
  context, so all server-spawned background work parents on the same ctx.
- `handleAgentRequest` wraps the goroutine in `bgWG.Add/Done`.
- `runAgentCmd` derives its 10-min timeout from `effectiveCtx()` instead of
  `context.Background()`, and skips the post-run RefreshFileList/RefreshDiffs
  fan-out if the shutdown ctx is already done (the reply is still added to the
  in-memory session and persisted by the subsequent WriteFiles).
- The shutdown path in `runServe` calls `srv.WaitBackground(30*time.Second)`
  between `session.Shutdown()` and `session.WriteFiles()` so in-flight agent
  replies land before the file is rewritten. 30s ceiling so a wedged agent
  can't hang the daemon indefinitely.
- On `initErr`, `runServe` now calls `stop()` immediately instead of
  `<-ctx.Done()`. Without this the daemon would sit in 503-land for up to an
  hour after a failed init, holding a port and a process slot.

Tests: added `TestRunAgentCmd_CancelledByShutdownCtx` (verifies shutdown ctx
cancel kills the subprocess promptly) and `TestWaitBackground_TimesOut`
(verifies the wait helper drains cleanly and times out as expected).

## 2. `runGit` helper

`git.go` had ~42 `exec.Command("git", ...)` call sites, each repeating the
`exec.Command` + `cmd.Dir` + `cmd.Output()` + error-wrap pattern, with no
shared place to add cross-cutting concerns (ctx cancellation, env hardening,
stderr capture).

Added `runGit(ctx, dir, args...) ([]byte, error)` that uses
`exec.CommandContext`, sets dir, appends `GIT_TERMINAL_PROMPT=0` to the env
(prevents the daemon from blocking on a credential prompt with no tty), and
captures stderr in the returned error.

Converted the high-traffic call sites: `IsGitRepo`, `RepoRoot`,
`AllTrackedFiles` (used by handleFilesList), `changedFilesStaged`,
`changedFilesUnstaged`, `changedFilesBranch` (used by ChangedFilesScoped via
the session refresh path). Migration is intentionally partial — the helper
docstring notes that other call sites should be converted opportunistically.

The pre-existing test helper named `runGit` (in testutil_test.go) was renamed
to `gitT` — purely mechanical, all _test.go updated in one pass.
Two follow-up corrections to the prior commit, surfaced in pre-ship review:

1. Restore the SetSession docstring that the prior edit accidentally fused
   into SetShutdownCtx's comment block.

2. Reorder the runServe shutdown sequence so httpServer.Shutdown completes
   BEFORE WaitBackground starts. handleAgentRequest calls s.bgWG.Add(1)
   synchronously inside the handler, so a late-arriving request mid-shutdown
   could call Add concurrently with WaitBackground's Wait(), which
   sync.WaitGroup explicitly forbids and panics on. Draining handlers via
   httpServer.Shutdown first guarantees all Add() calls are done before
   Wait() begins. session.Shutdown() still runs first so the final SSE
   "server-shutdown" event reaches connected clients before their
   connections are torn down.

   This will be squashed into the parent commit on merge.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 85.96491% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.27%. Comparing base (38ee4e0) to head (4b96bd8).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
git.go 86.95% 2 Missing and 1 partial ⚠️
main.go 57.14% 2 Missing and 1 partial ⚠️
server.go 92.59% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #433      +/-   ##
==========================================
+ Coverage   67.18%   67.27%   +0.08%     
==========================================
  Files          29       29              
  Lines        9814     9848      +34     
==========================================
+ Hits         6594     6625      +31     
- Misses       2698     2701       +3     
  Partials      522      522              
Flag Coverage Δ
e2e 33.98% <77.19%> (+0.09%) ⬆️
unit 63.54% <78.94%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tomasz-tomczyk tomasz-tomczyk merged commit 75db040 into main May 2, 2026
8 checks passed
@tomasz-tomczyk tomasz-tomczyk deleted the fix/daemon-shutdown-and-rungit branch May 2, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant