Bug description
isSupervisorProcessAlive in pkg/workloads/manager.go only checks the error from GetWorkloadPID, not the PID value. When GetWorkloadPID returns (0, nil) (which happens when ResetWorkloadPID sets process_id to 0 during transport restart), the function returns true, falsely reporting the supervisor as alive. This causes both maybeSetupContainerWorkload and maybeSetupRemoteWorkload to skip starting a new supervisor.
Steps to reproduce
- Start a remote MCP server with
thv run
- Trigger a transport restart (e.g., health check failure causes proxy reconnect)
- During the 5-60s window where
ResetWorkloadPID has set process_id to 0, run thv restart <server>
- The restart silently no-ops because
isSupervisorProcessAlive returns true
Expected behavior
thv restart should detect that PID 0 is not a valid supervisor process and proceed with the restart.
Actual behavior
thv restart treats PID 0 as a live supervisor and returns without restarting, leaving the workload in a broken state.
Additional context
PR #4401 added pid <= 0 guards to KillProcess and FindProcess in the process package, but isSupervisorProcessAlive was missed because it reads the PID from the status file directly and doesn't call either of those functions.
The fix is a one-line change: capture the PID return value and add || pid <= 0 to the existing error check.
Bug description
isSupervisorProcessAliveinpkg/workloads/manager.goonly checks the error fromGetWorkloadPID, not the PID value. WhenGetWorkloadPIDreturns(0, nil)(which happens whenResetWorkloadPIDsetsprocess_idto 0 during transport restart), the function returnstrue, falsely reporting the supervisor as alive. This causes bothmaybeSetupContainerWorkloadandmaybeSetupRemoteWorkloadto skip starting a new supervisor.Steps to reproduce
thv runResetWorkloadPIDhas setprocess_idto 0, runthv restart <server>isSupervisorProcessAlivereturnstrueExpected behavior
thv restartshould detect that PID 0 is not a valid supervisor process and proceed with the restart.Actual behavior
thv restarttreats PID 0 as a live supervisor and returns without restarting, leaving the workload in a broken state.Additional context
PR #4401 added
pid <= 0guards toKillProcessandFindProcessin theprocesspackage, butisSupervisorProcessAlivewas missed because it reads the PID from the status file directly and doesn't call either of those functions.The fix is a one-line change: capture the PID return value and add
|| pid <= 0to the existing error check.