-
Notifications
You must be signed in to change notification settings - Fork 198
thv restart skips supervisor restart when PID is transiently zero #4429
Description
Bug description
isSupervisorProcessAlive in pkg/workloads/manager.go only checks the error from GetWorkloadPID, not the PID value. When GetWorkloadPID returns (0, nil) (which happens when ResetWorkloadPID sets process_id to 0 during transport restart), the function returns true, falsely reporting the supervisor as alive. This causes both maybeSetupContainerWorkload and maybeSetupRemoteWorkload to skip starting a new supervisor.
Steps to reproduce
- Start a remote MCP server with
thv run - Trigger a transport restart (e.g., health check failure causes proxy reconnect)
- During the 5-60s window where
ResetWorkloadPIDhas setprocess_idto 0, runthv restart <server> - The restart silently no-ops because
isSupervisorProcessAlivereturnstrue
Expected behavior
thv restart should detect that PID 0 is not a valid supervisor process and proceed with the restart.
Actual behavior
thv restart treats PID 0 as a live supervisor and returns without restarting, leaving the workload in a broken state.
Additional context
PR #4401 added pid <= 0 guards to KillProcess and FindProcess in the process package, but isSupervisorProcessAlive was missed because it reads the PID from the status file directly and doesn't call either of those functions.
The fix is a one-line change: capture the PID return value and add || pid <= 0 to the existing error check.