fix(gateway): mark only still-running sessions resume_pending on drain timeout by teknium1 · Pull Request #12332 · NousResearch/hermes-agent

teknium1 · 2026-04-19T00:35:12Z

Summary

Follow-up to #12301 — the drain-timeout branch now marks only sessions that are still blocking the shutdown, not every session that was active when the drain started.

The original landing used active_agents.keys() (the drain-start snapshot) when marking resume_pending. That snapshot includes sessions that finished gracefully during the drain window. Marking them would give their next turn a stray "your previous turn was interrupted by a gateway restart" system note even though the prior turn actually completed cleanly.

Changes

gateway/run.py: swap active_agents.keys() for filtered self._running_agents.items() iteration in the drain-timeout mark loop. Mirrors _interrupt_running_agents() exactly — same set, same pending-sentinel skip.
tests/gateway/test_restart_resume_pending.py: two regression tests.

Validation

Scenario	Before	After
Session finishes during drain window	Marked `resume_pending`; next turn gets a false interruption note	Not marked; normal fresh turn
Session still running at drain timeout	Marked	Marked (unchanged)
Pending sentinel (agent not constructed yet) in `_running_agents`	Marked	Skipped — mirrors `_interrupt_running_agents` behaviour

Targeted test runs:

tests/gateway/test_restart_resume_pending.py test_gateway_shutdown.py test_restart_drain.py test_clean_shutdown_marker.py — 57 passed (31 in resume_pending suite, up from 29 with the two new regression tests).

…n timeout Follow-up to #12301. The drain-timeout branch of _stop_impl() was iterating the drain-start snapshot (active_agents) when marking sessions resume_pending. That snapshot can include sessions that finished gracefully during the drain window — marking them would give their next turn a stray 'your previous turn was interrupted by a gateway restart' system note even though the prior turn actually completed cleanly. Iterate self._running_agents at timeout time instead, mirroring _interrupt_running_agents() exactly: - only sessions still blocking the shutdown get marked - pending sentinels (AIAgent construction not yet complete) are skipped Changes: - gateway/run.py: swap active_agents.keys() for filtered self._running_agents.items() iteration in the drain-timeout mark loop. - tests/gateway/test_restart_resume_pending.py: two regression tests — finisher-during-drain not marked, pending sentinel not marked.

…n timeout (NousResearch#12332) Follow-up to NousResearch#12301. The drain-timeout branch of _stop_impl() was iterating the drain-start snapshot (active_agents) when marking sessions resume_pending. That snapshot can include sessions that finished gracefully during the drain window — marking them would give their next turn a stray 'your previous turn was interrupted by a gateway restart' system note even though the prior turn actually completed cleanly. Iterate self._running_agents at timeout time instead, mirroring _interrupt_running_agents() exactly: - only sessions still blocking the shutdown get marked - pending sentinels (AIAgent construction not yet complete) are skipped Changes: - gateway/run.py: swap active_agents.keys() for filtered self._running_agents.items() iteration in the drain-timeout mark loop. - tests/gateway/test_restart_resume_pending.py: two regression tests — finisher-during-drain not marked, pending sentinel not marked.

teknium1 merged commit c49a58a into main Apr 19, 2026
3 of 5 checks passed

teknium1 deleted the hermes/hermes-9ac4ef9e branch April 19, 2026 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): mark only still-running sessions resume_pending on drain timeout#12332

fix(gateway): mark only still-running sessions resume_pending on drain timeout#12332
teknium1 merged 1 commit into
mainfrom
hermes/hermes-9ac4ef9e

teknium1 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 19, 2026

Summary

Changes

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant