Skip to content

fix(gateway): skip persisting gateway_state=stopped on signal-initiated shutdown (#42675)#42789

Closed
kyssta-exe wants to merge 1 commit into
NousResearch:mainfrom
kyssta-exe:auto-fix/issue-42675
Closed

fix(gateway): skip persisting gateway_state=stopped on signal-initiated shutdown (#42675)#42789
kyssta-exe wants to merge 1 commit into
NousResearch:mainfrom
kyssta-exe:auto-fix/issue-42675

Conversation

@kyssta-exe

Copy link
Copy Markdown
Contributor

Fixes #42675. When Docker sends SIGTERM for container restart/upgrade, the gateway unconditionally persists gateway_state=stopped to gateway_state.json. On next boot, container_boot.py reads this state and refuses to auto-start the gateway, leaving messaging channels silently dead. The fix skips persisting stopped state when the shutdown was signal-initiated (no planned-stop or takeover marker), preserving the running state so container_boot auto-starts on next boot.

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/gateway Gateway runner, session dispatch, delivery area/docker Docker image, Compose, packaging labels Jun 9, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #42740 — same root cause and the same fix. Consolidating there; flagging for maintainer review.

@liuhao1024

Copy link
Copy Markdown
Contributor

✅ Verified — signal-initiated teardown skips gateway_state persistence

Reviewed the full diff in gateway/run.py.

The fix is correct. No issues found.

@benbarclay

Copy link
Copy Markdown
Collaborator

Superseded by #43236 (merged), which fixes #42675 using the existing planned-stop marker primitive instead of inferring intent from the signal.

The concern with skipping the stopped persist on any signal is that it treats every unmarked SIGTERM identically — but an OOM-kill or external kill under systemd Restart=always is also unmarked, and the marker-based classification distinguishes those from an operator stop reliably (see the analysis in #42517). #43236 also had to persist running rather than merely skip the write, because the mid-shutdown draining marker isn't in _AUTOSTART_STATES either — a real container-restart E2E surfaced that. Thanks for surfacing and pushing on this!

@benbarclay benbarclay closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docker Docker image, Compose, packaging comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway does not auto-start after container restart/upgrade — signal-initiated shutdown persists gateway_state=stopped

4 participants