Gateway does not auto-start after container restart/upgrade — signal-initiated shutdown persists `gateway_state=stopped`

## Summary
On Docker (s6-overlay) deployments, after a container restart or image upgrade (`docker compose up -d --force-recreate`), the gateway does **not** auto-start. Messaging channels (WeChat/Telegram/etc.) go silently dark while the CLI still works. Root cause: every shutdown path — including the SIGTERM a supervisor/container sends on stop/upgrade — unconditionally persists `gateway_state=stopped`, and `container_boot` then treats that as an explicit user stop and refuses to bring the gateway back up.

## Environment
- Hermes Agent 0.16.0, official Docker image (s6-overlay), `hermes dashboard` + gateway in one container, `HERMES_HOME` on a persistent volume.

## Repro
1. Gateway running (`gateway_state=running`).
2. `docker compose up -d --force-recreate` (or any image upgrade / container restart) sends SIGTERM to the gateway process.
3. New container boots; `container_boot` reads `gateway_state=stopped`, registers the s6 service **down** → gateway never starts.
4. Messaging channels stay down; `hermes gateway status` shows "not running". No error is surfaced to the user.

## Root cause
`gateway/run.py` `_stop_impl()` ends with an **unconditional**:
```python
self._update_runtime_status("stopped", self._exit_reason)   # run.py:5955 (0.16.0)
```
This runs for *every* shutdown, including signal-initiated ones (SIGTERM from s6/Docker on restart/upgrade). `hermes_cli/container_boot.py` intentionally preserves an explicit `stopped` across restarts ("explicit stopped/failed states keep winning") to respect a user who ran `hermes gateway stop`. But it cannot distinguish:
- **user-requested stop** (`hermes gateway stop`) — should persist `stopped`; vs
- **signal/container-initiated stop** (SIGTERM on upgrade/restart) — desired state is still "running"; should auto-recover.

Both currently write `stopped`, so a routine upgrade is misread as a deliberate stop and the gateway stays down.

## Relationship to #39381
This sits one layer above the s6 `down`-marker issue (#39381): `container_boot` decides whether to lay down the s6 `down` marker based on this `gateway_state`. Fixing the state semantics prevents the upgrade case from ever reaching the down-marker path.

## Possible fixes (seeking direction before a PR)
1. In the shutdown path, only persist `stopped` for **user-requested** stops; for signal-initiated shutdown, leave the desired state untouched (or write a distinct `interrupted`/`crashed` runtime state that `container_boot` treats as "recover").
2. Or have `container_boot` distinguish "process was signaled/interrupted" from "user explicitly stopped".
3. Or track a persistent `desired_state` (running/stopped), set only by explicit start/stop commands, separate from the transient runtime status.

Happy to implement once there's agreement on the preferred shape.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gateway does not auto-start after container restart/upgrade — signal-initiated shutdown persists `gateway_state=stopped` #42675

Summary

Environment

Repro

Root cause

Relationship to #39381

Possible fixes (seeking direction before a PR)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gateway does not auto-start after container restart/upgrade — signal-initiated shutdown persists gateway_state=stopped #42675

Description

Summary

Environment

Repro

Root cause

Relationship to #39381

Possible fixes (seeking direction before a PR)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Gateway does not auto-start after container restart/upgrade — signal-initiated shutdown persists `gateway_state=stopped` #42675