fix: tie agent goroutine to daemon shutdown ctx + add runGit helper#433
Merged
tomasz-tomczyk merged 2 commits intomainfrom May 2, 2026
Merged
fix: tie agent goroutine to daemon shutdown ctx + add runGit helper#433tomasz-tomczyk merged 2 commits intomainfrom
tomasz-tomczyk merged 2 commits intomainfrom
Conversation
Two coupled fixes shipped together. Both center on the daemon's shutdown
behaviour: the first stops orphaning agent subprocesses, the second is the
groundwork that lets future git operations actually be cancellable.
## 1. Daemon shutdown safety for the agent goroutine
`runAgentCmd` was creating its own `context.WithTimeout(context.Background(),
10*time.Minute)` and was launched bare (`go s.runAgentCmd(...)`), unparented
from the daemon's signal-handled context. On SIGINT/SIGTERM/idle-timeout the
agent subprocess (Setsid via daemonSysProcAttr) kept running, burning LLM API
tokens until its 10-minute budget elapsed, and then called AddReply /
RefreshFileList / RefreshDiffs / notify on a session whose watcher and SSE
goroutines were already gone — after WriteFiles had run.
Changes:
- `Server` gains `shutdownCtx` and `bgWG` fields plus `SetShutdownCtx`,
`effectiveCtx`, and `WaitBackground` helpers.
- `runServe` calls `srv.SetShutdownCtx(ctx)` right after wiring the signal
context, so all server-spawned background work parents on the same ctx.
- `handleAgentRequest` wraps the goroutine in `bgWG.Add/Done`.
- `runAgentCmd` derives its 10-min timeout from `effectiveCtx()` instead of
`context.Background()`, and skips the post-run RefreshFileList/RefreshDiffs
fan-out if the shutdown ctx is already done (the reply is still added to the
in-memory session and persisted by the subsequent WriteFiles).
- The shutdown path in `runServe` calls `srv.WaitBackground(30*time.Second)`
between `session.Shutdown()` and `session.WriteFiles()` so in-flight agent
replies land before the file is rewritten. 30s ceiling so a wedged agent
can't hang the daemon indefinitely.
- On `initErr`, `runServe` now calls `stop()` immediately instead of
`<-ctx.Done()`. Without this the daemon would sit in 503-land for up to an
hour after a failed init, holding a port and a process slot.
Tests: added `TestRunAgentCmd_CancelledByShutdownCtx` (verifies shutdown ctx
cancel kills the subprocess promptly) and `TestWaitBackground_TimesOut`
(verifies the wait helper drains cleanly and times out as expected).
## 2. `runGit` helper
`git.go` had ~42 `exec.Command("git", ...)` call sites, each repeating the
`exec.Command` + `cmd.Dir` + `cmd.Output()` + error-wrap pattern, with no
shared place to add cross-cutting concerns (ctx cancellation, env hardening,
stderr capture).
Added `runGit(ctx, dir, args...) ([]byte, error)` that uses
`exec.CommandContext`, sets dir, appends `GIT_TERMINAL_PROMPT=0` to the env
(prevents the daemon from blocking on a credential prompt with no tty), and
captures stderr in the returned error.
Converted the high-traffic call sites: `IsGitRepo`, `RepoRoot`,
`AllTrackedFiles` (used by handleFilesList), `changedFilesStaged`,
`changedFilesUnstaged`, `changedFilesBranch` (used by ChangedFilesScoped via
the session refresh path). Migration is intentionally partial — the helper
docstring notes that other call sites should be converted opportunistically.
The pre-existing test helper named `runGit` (in testutil_test.go) was renamed
to `gitT` — purely mechanical, all _test.go updated in one pass.
Two follow-up corrections to the prior commit, surfaced in pre-ship review: 1. Restore the SetSession docstring that the prior edit accidentally fused into SetShutdownCtx's comment block. 2. Reorder the runServe shutdown sequence so httpServer.Shutdown completes BEFORE WaitBackground starts. handleAgentRequest calls s.bgWG.Add(1) synchronously inside the handler, so a late-arriving request mid-shutdown could call Add concurrently with WaitBackground's Wait(), which sync.WaitGroup explicitly forbids and panics on. Draining handlers via httpServer.Shutdown first guarantees all Add() calls are done before Wait() begins. session.Shutdown() still runs first so the final SSE "server-shutdown" event reaches connected clients before their connections are torn down. This will be squashed into the parent commit on merge.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #433 +/- ##
==========================================
+ Coverage 67.18% 67.27% +0.08%
==========================================
Files 29 29
Lines 9814 9848 +34
==========================================
+ Hits 6594 6625 +31
- Misses 2698 2701 +3
Partials 522 522
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two coupled correctness fixes for the daemon's shutdown path. They ship together because the second is groundwork for cancellable git work and shares one mental model with the first.
1. Daemon shutdown safety for the agent goroutine
runAgentCmdwas creating its owncontext.WithTimeout(context.Background(), 10*time.Minute)and was launched bare (go s.runAgentCmd(...)), unparented from the daemon's signal-handled context. On SIGINT/SIGTERM/idle-timeout the agent subprocess (Setsid viadaemonSysProcAttr) kept running, burning LLM API tokens until its 10-minute budget elapsed, and then calledAddReply/RefreshFileList/RefreshDiffs/notifyon a session whose watcher and SSE goroutines were already gone — afterWriteFileshad run.ServergainsshutdownCtxandbgWGfields plusSetShutdownCtx,effectiveCtx,WaitBackgroundhelpers.runServecallssrv.SetShutdownCtx(ctx)immediately after wiring the signal context.handleAgentRequestwraps its goroutine withbgWG.Add/Done.runAgentCmdderives its 10-min timeout fromeffectiveCtx(); skips the post-run refresh fan-out if the shutdown ctx is already done (the reply is still added in-memory and persisted by the subsequentWriteFiles).session.Shutdown()(final SSE) →httpServer.Shutdown(drains in-flight handlers, gates furtherbgWG.Addcalls) →WaitBackground(30s)(drains spawned agent runners) →WriteFiles.initErr,runServecallsstop()immediately instead of<-ctx.Done(). Without this the daemon would sit in 503-land for up to an hour after a failed init, holding a port and a process slot.2.
runGithelpergit.gohad ~42exec.Command("git", ...)call sites with no shared place to add cross-cutting concerns (ctx cancellation, env hardening, stderr capture).Added
runGit(ctx, dir, args...) ([]byte, error)that usesexec.CommandContext, sets dir, appendsGIT_TERMINAL_PROMPT=0to the env (prevents the daemon from blocking on a credential prompt with no tty), and includes captured stderr in the returned error.Converted the high-traffic call sites:
IsGitRepo,RepoRoot,AllTrackedFiles(used byhandleFilesList),changedFilesStaged,changedFilesUnstaged,changedFilesBranch. Migration is intentionally partial — the helper docstring notes other call sites should be converted opportunistically.The pre-existing test helper named
runGit(intestutil_test.go) was renamed togitTto free the name; purely mechanical, all_test.goupdated in one pass.Review
go build ./...cleango vet ./...cleangofmt -l .cleangolangci-lint run ./...clean (0 issues)go test -race -count=1 ./...passingTest plan
TestRunAgentCmd_CancelledByShutdownCtx— verifies agent subprocess is terminated within 5s of shutdown ctx cancel (regression guard for orphaned subprocess bug).TestWaitBackground_TimesOut— verifies the wait helper drains cleanly when idle and times out as configured when goroutines are wedged.-race.🤖 Generated with Claude Code