[Bug]: Kanban workers stuck in zombie state after SIGTERM — claim never released, task blocked forever

## Summary

When a Kanban worker process receives **SIGTERM** (from gateway restart, launchd/systemd cgroup cleanup, `enforce_max_runtime`, or `_terminate_reclaimed_worker`), the single-query signal handler (`_signal_handler_q` in `cli.py`) calls `_agent.interrupt()` and raises `KeyboardInterrupt` — but the Python process **does not exit cleanly**. It remains in the process table as a **zombie** (`<defunct>` on macOS).

The dispatchers `detect_crashed_workers` / `release_stale_claims` check `os.kill(pid, 0)` which returns `True` for zombie processes (they still have a PID table entry). The dispatcher thinks the worker is still alive and **keeps extending the claim** forever. The task remains `running` indefinitely and never gets re-dispatched.

## Root Cause

In `cli.py` lines 14144–14158, `_signal_handler_q` is registered for `SIGTERM` and `SIGHUP` in single-query mode (`chat -q`). When a Kanban worker receives the signal:

1. `_signal_handler_q` calls `_agent.interrupt(...)` and sleeps for the grace window
2. Raises `KeyboardInterrupt`
3. The agent loop dies but the process stays alive as a zombie
4. `_pid_alive()` uses `os.kill(pid, 0)` which returns `True` even for zombies
5. The dispatcher extends the claim forever — task stuck permanently

## Impact

- Kanban tasks stuck in `running` state forever
- Downstream dependent tasks never execute
- Manual recovery required
- Affects all gateway-managed kanban setups, especially macOS with launchd

## Steps to Reproduce

1. Set up kanban with `dispatch_in_gateway: true`
2. Run `hermes gateway restart` while a kanban worker is running
3. Observe: the worker process becomes `<defunct>`, task stays `running` forever

## Proposed Fix

The signal handler should check for `HERMES_KANBAN_TASK` env var and call `block_task()` to release the claim before dying. Fix being tested locally.

## Workaround

```
hermes kanban block <task_id> "Worker was interrupted — manual recovery"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Kanban workers stuck in zombie state after SIGTERM — claim never released, task blocked forever #28181

Summary

Root Cause

Impact

Steps to Reproduce

Proposed Fix

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Kanban workers stuck in zombie state after SIGTERM — claim never released, task blocked forever #28181

Description

Summary

Root Cause

Impact

Steps to Reproduce

Proposed Fix

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions