Description
Wire signal.NotifyContext for SIGTERM, SIGQUIT, and os.Interrupt in cmd/thv-proxyrunner/main.go and propagate the resulting context through cobra.ExecuteContext so that Kubernetes pod termination triggers graceful connection drain rather than abrupt process exit. Currently, main.go calls app.NewRootCmd().Execute() with no signal-aware context, meaning SIGTERM from the pod lifecycle will terminate the process immediately without giving in-flight MCP proxy connections a chance to drain.
Context
This task implements RC-14 from RFC THV-0047 (Manual Horizontal Scaling for vMCP and Proxy Runner). When the proxyrunner is deployed as multiple StatefulSet replicas, Kubernetes sends SIGTERM to pods during rolling updates, scale-down events, and node drains. Without a signal-aware context, in-flight MCP sessions are dropped at the transport layer without going through the graceful drain path in transportHandler.Stop. The fix is a small, self-contained change to main.go that mirrors the existing pattern in cmd/vmcp/main.go. The operator is responsible for setting an appropriate terminationGracePeriodSeconds; the proxyrunner simply stops accepting new connections and waits for transportHandler.Stop to complete.
Dependencies: None
Blocks: None (TASK-003 is independent of all other tasks in this epic)
Acceptance Criteria
Technical Approach
Recommended Implementation
Replace the current app.NewRootCmd().Execute() call in cmd/thv-proxyrunner/main.go with the signal.NotifyContext + ExecuteContext pattern taken verbatim from cmd/vmcp/main.go. The context flows through the existing chain: ExecuteContext → runCmdFunc (which reads cmd.Context()) → mcpRunner.Run(ctx) → the select { case <-ctx.Done() } block that calls stopMCPServer, which calls transportHandler.Stop. No changes to the app package are required.
Patterns & Frameworks
signal.NotifyContext from os/signal (stdlib) — creates a context that is cancelled when the specified OS signals arrive; this is the idiomatic Go pattern for signal-driven graceful shutdown since Go 1.16
syscall.SIGTERM and syscall.SIGQUIT from syscall (stdlib) — standard Kubernetes termination signals; SIGQUIT additionally triggers a goroutine dump useful for debugging stuck drain scenarios
cobra.Command.ExecuteContext — passes the context to all subcommand RunE functions via cmd.Context(); runCmdFunc already calls cmd.Context() on line 77 of run.go, so no changes to the command handler are needed
- Operator controls drain duration via
terminationGracePeriodSeconds on the proxyrunner Deployment/Pod spec — the proxyrunner itself does not set an internal exit deadline
Code Pointers
cmd/thv-proxyrunner/main.go — the only file that needs to change; replace Execute() with signal.NotifyContext + ExecuteContext; add imports for context, fmt, os/signal, and syscall
cmd/vmcp/main.go — reference implementation to follow exactly; lines 31–38 show the complete pattern
cmd/thv-proxyrunner/app/run.go:77 — ctx := cmd.Context() already reads the context from Cobra; no change needed here
pkg/runner/runner.go:529–531 — select { case <-ctx.Done(): stopMCPServer("Context cancelled") } is the existing drain path that triggers transportHandler.Stop; it fires automatically once the signal-aware context is cancelled
Component Interfaces
The change is entirely in cmd/thv-proxyrunner/main.go. The final main function should match this structure:
func main() {
// Bind TOOLHIVE_DEBUG env var early, before logger initialization.
if err := viper.BindEnv("debug", "TOOLHIVE_DEBUG"); err != nil {
slog.Error("failed to bind TOOLHIVE_DEBUG env var", "error", err)
}
// Initialize the logger
var opts []logging.Option
if viper.GetBool("debug") {
opts = append(opts, logging.WithLevel(slog.LevelDebug))
}
l := logging.New(opts...)
slog.SetDefault(l)
// Create a context that will be canceled on signal
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM, syscall.SIGQUIT)
defer cancel()
// Execute the root command with context
if err := app.NewRootCmd().ExecuteContext(ctx); err != nil {
slog.Error(fmt.Sprintf("Error executing command: %v", err))
os.Exit(1)
}
}
New imports required: "context", "fmt", "os/signal", "syscall" (in addition to the existing imports already present in the file).
Testing Strategy
Unit Tests
Integration Tests
Edge Cases
Out of Scope
- Operator-side
terminationGracePeriodSeconds configuration — this is an operator concern, not a proxyrunner code change
- vMCP-side graceful shutdown — covered by the vMCP epic
- Any changes to
cmd/thv-proxyrunner/app/run.go, cmd/thv-proxyrunner/app/commands.go, or pkg/runner/runner.go — the context chain and drain logic already exist
- Custom exit-deadline timeout inside the proxyrunner process — the operator controls the drain window via
terminationGracePeriodSeconds
References
Description
Wire
signal.NotifyContextforSIGTERM,SIGQUIT, andos.Interruptincmd/thv-proxyrunner/main.goand propagate the resulting context throughcobra.ExecuteContextso that Kubernetes pod termination triggers graceful connection drain rather than abrupt process exit. Currently,main.gocallsapp.NewRootCmd().Execute()with no signal-aware context, meaning SIGTERM from the pod lifecycle will terminate the process immediately without giving in-flight MCP proxy connections a chance to drain.Context
This task implements RC-14 from RFC THV-0047 (Manual Horizontal Scaling for vMCP and Proxy Runner). When the proxyrunner is deployed as multiple StatefulSet replicas, Kubernetes sends SIGTERM to pods during rolling updates, scale-down events, and node drains. Without a signal-aware context, in-flight MCP sessions are dropped at the transport layer without going through the graceful drain path in
transportHandler.Stop. The fix is a small, self-contained change tomain.gothat mirrors the existing pattern incmd/vmcp/main.go. The operator is responsible for setting an appropriateterminationGracePeriodSeconds; the proxyrunner simply stops accepting new connections and waits fortransportHandler.Stopto complete.Dependencies: None
Blocks: None (TASK-003 is independent of all other tasks in this epic)
Acceptance Criteria
cmd/thv-proxyrunner/main.gocreates asignal.NotifyContextcancellable onos.Interrupt,syscall.SIGTERM, andsyscall.SIGQUITdefer cancel()is called immediately aftersignal.NotifyContextto ensure the OS signal listener is released on exitapp.NewRootCmd().ExecuteContext(ctx)is called instead ofapp.NewRootCmd().Execute(), passing the signal-aware contextrunCmdFuncincmd/thv-proxyrunner/app/run.goalready readscmd.Context()— no changes needed there; the context flows through without modificationtransportHandler.Stopto complete before exiting (exercised by the existing context-cancellation path inrunner.Run)ExecuteContextmatches the vmcp pattern: log the error viaslog.Errorand callos.Exit(1)main.go—signal.NotifyContextis the single mechanismTechnical Approach
Recommended Implementation
Replace the current
app.NewRootCmd().Execute()call incmd/thv-proxyrunner/main.gowith thesignal.NotifyContext+ExecuteContextpattern taken verbatim fromcmd/vmcp/main.go. The context flows through the existing chain:ExecuteContext→runCmdFunc(which readscmd.Context()) →mcpRunner.Run(ctx)→ theselect { case <-ctx.Done() }block that callsstopMCPServer, which callstransportHandler.Stop. No changes to theapppackage are required.Patterns & Frameworks
signal.NotifyContextfromos/signal(stdlib) — creates a context that is cancelled when the specified OS signals arrive; this is the idiomatic Go pattern for signal-driven graceful shutdown since Go 1.16syscall.SIGTERMandsyscall.SIGQUITfromsyscall(stdlib) — standard Kubernetes termination signals; SIGQUIT additionally triggers a goroutine dump useful for debugging stuck drain scenarioscobra.Command.ExecuteContext— passes the context to all subcommandRunEfunctions viacmd.Context();runCmdFuncalready callscmd.Context()on line 77 ofrun.go, so no changes to the command handler are neededterminationGracePeriodSecondson the proxyrunner Deployment/Pod spec — the proxyrunner itself does not set an internal exit deadlineCode Pointers
cmd/thv-proxyrunner/main.go— the only file that needs to change; replaceExecute()withsignal.NotifyContext+ExecuteContext; add imports forcontext,fmt,os/signal, andsyscallcmd/vmcp/main.go— reference implementation to follow exactly; lines 31–38 show the complete patterncmd/thv-proxyrunner/app/run.go:77—ctx := cmd.Context()already reads the context from Cobra; no change needed herepkg/runner/runner.go:529–531—select { case <-ctx.Done(): stopMCPServer("Context cancelled") }is the existing drain path that triggerstransportHandler.Stop; it fires automatically once the signal-aware context is cancelledComponent Interfaces
The change is entirely in
cmd/thv-proxyrunner/main.go. The finalmainfunction should match this structure:New imports required:
"context","fmt","os/signal","syscall"(in addition to the existing imports already present in the file).Testing Strategy
Unit Tests
main.godirectly (it is an entrypoint with no testable logic beyond the signal wiring); the existing runner tests cover the drain pathrunCmdFuncreceives a non-nil context viacmd.Context()— this is already exercised by existing runner unit tests inpkg/runner/runner_test.goIntegration Tests
terminationGracePeriodSecondselapsesos.Interrupt(CTRL-C) to a locally running proxyrunner; verify clean exit (existing behaviour should be preserved)Edge Cases
signal.NotifyContextnatively (the context is already cancelled; no additional action needed)transportHandler.Start:ctx.Done()fires before the transport is running;stopMCPServeris never called because theselectis reached only afterStartreturns — verify the process exits cleanly in this pathOut of Scope
terminationGracePeriodSecondsconfiguration — this is an operator concern, not a proxyrunner code changecmd/thv-proxyrunner/app/run.go,cmd/thv-proxyrunner/app/commands.go, orpkg/runner/runner.go— the context chain and drain logic already existterminationGracePeriodSecondsReferences
cmd/vmcp/main.goinstacklok/toolhive— identical signal-context pattern to applysignal.NotifyContextGo documentation: https://pkg.go.dev/os/signal#NotifyContext