Skip to content

fix(gateway): mark only still-running sessions resume_pending on drain timeout#12332

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-9ac4ef9e
Apr 19, 2026
Merged

fix(gateway): mark only still-running sessions resume_pending on drain timeout#12332
teknium1 merged 1 commit into
mainfrom
hermes/hermes-9ac4ef9e

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #12301 — the drain-timeout branch now marks only sessions that are still blocking the shutdown, not every session that was active when the drain started.

The original landing used active_agents.keys() (the drain-start snapshot) when marking resume_pending. That snapshot includes sessions that finished gracefully during the drain window. Marking them would give their next turn a stray "your previous turn was interrupted by a gateway restart" system note even though the prior turn actually completed cleanly.

Changes

  • gateway/run.py: swap active_agents.keys() for filtered self._running_agents.items() iteration in the drain-timeout mark loop. Mirrors _interrupt_running_agents() exactly — same set, same pending-sentinel skip.
  • tests/gateway/test_restart_resume_pending.py: two regression tests.

Validation

Scenario Before After
Session finishes during drain window Marked resume_pending; next turn gets a false interruption note Not marked; normal fresh turn
Session still running at drain timeout Marked Marked (unchanged)
Pending sentinel (agent not constructed yet) in _running_agents Marked Skipped — mirrors _interrupt_running_agents behaviour

Targeted test runs:

  • tests/gateway/test_restart_resume_pending.py test_gateway_shutdown.py test_restart_drain.py test_clean_shutdown_marker.py — 57 passed (31 in resume_pending suite, up from 29 with the two new regression tests).

…n timeout

Follow-up to #12301.

The drain-timeout branch of _stop_impl() was iterating the drain-start
snapshot (active_agents) when marking sessions resume_pending. That
snapshot can include sessions that finished gracefully during the drain
window — marking them would give their next turn a stray
'your previous turn was interrupted by a gateway restart' system note
even though the prior turn actually completed cleanly.

Iterate self._running_agents at timeout time instead, mirroring
_interrupt_running_agents() exactly:
- only sessions still blocking the shutdown get marked
- pending sentinels (AIAgent construction not yet complete) are skipped

Changes:
- gateway/run.py: swap active_agents.keys() for filtered
  self._running_agents.items() iteration in the drain-timeout mark loop.
- tests/gateway/test_restart_resume_pending.py: two regression tests —
  finisher-during-drain not marked, pending sentinel not marked.
@teknium1 teknium1 merged commit c49a58a into main Apr 19, 2026
3 of 5 checks passed
@teknium1 teknium1 deleted the hermes/hermes-9ac4ef9e branch April 19, 2026 00:40
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…n timeout (NousResearch#12332)

Follow-up to NousResearch#12301.

The drain-timeout branch of _stop_impl() was iterating the drain-start
snapshot (active_agents) when marking sessions resume_pending. That
snapshot can include sessions that finished gracefully during the drain
window — marking them would give their next turn a stray
'your previous turn was interrupted by a gateway restart' system note
even though the prior turn actually completed cleanly.

Iterate self._running_agents at timeout time instead, mirroring
_interrupt_running_agents() exactly:
- only sessions still blocking the shutdown get marked
- pending sentinels (AIAgent construction not yet complete) are skipped

Changes:
- gateway/run.py: swap active_agents.keys() for filtered
  self._running_agents.items() iteration in the drain-timeout mark loop.
- tests/gateway/test_restart_resume_pending.py: two regression tests —
  finisher-during-drain not marked, pending sentinel not marked.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…n timeout (NousResearch#12332)

Follow-up to NousResearch#12301.

The drain-timeout branch of _stop_impl() was iterating the drain-start
snapshot (active_agents) when marking sessions resume_pending. That
snapshot can include sessions that finished gracefully during the drain
window — marking them would give their next turn a stray
'your previous turn was interrupted by a gateway restart' system note
even though the prior turn actually completed cleanly.

Iterate self._running_agents at timeout time instead, mirroring
_interrupt_running_agents() exactly:
- only sessions still blocking the shutdown get marked
- pending sentinels (AIAgent construction not yet complete) are skipped

Changes:
- gateway/run.py: swap active_agents.keys() for filtered
  self._running_agents.items() iteration in the drain-timeout mark loop.
- tests/gateway/test_restart_resume_pending.py: two regression tests —
  finisher-during-drain not marked, pending sentinel not marked.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…n timeout (NousResearch#12332)

Follow-up to NousResearch#12301.

The drain-timeout branch of _stop_impl() was iterating the drain-start
snapshot (active_agents) when marking sessions resume_pending. That
snapshot can include sessions that finished gracefully during the drain
window — marking them would give their next turn a stray
'your previous turn was interrupted by a gateway restart' system note
even though the prior turn actually completed cleanly.

Iterate self._running_agents at timeout time instead, mirroring
_interrupt_running_agents() exactly:
- only sessions still blocking the shutdown get marked
- pending sentinels (AIAgent construction not yet complete) are skipped

Changes:
- gateway/run.py: swap active_agents.keys() for filtered
  self._running_agents.items() iteration in the drain-timeout mark loop.
- tests/gateway/test_restart_resume_pending.py: two regression tests —
  finisher-during-drain not marked, pending sentinel not marked.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…n timeout (NousResearch#12332)

Follow-up to NousResearch#12301.

The drain-timeout branch of _stop_impl() was iterating the drain-start
snapshot (active_agents) when marking sessions resume_pending. That
snapshot can include sessions that finished gracefully during the drain
window — marking them would give their next turn a stray
'your previous turn was interrupted by a gateway restart' system note
even though the prior turn actually completed cleanly.

Iterate self._running_agents at timeout time instead, mirroring
_interrupt_running_agents() exactly:
- only sessions still blocking the shutdown get marked
- pending sentinels (AIAgent construction not yet complete) are skipped

Changes:
- gateway/run.py: swap active_agents.keys() for filtered
  self._running_agents.items() iteration in the drain-timeout mark loop.
- tests/gateway/test_restart_resume_pending.py: two regression tests —
  finisher-during-drain not marked, pending sentinel not marked.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…n timeout (NousResearch#12332)

Follow-up to NousResearch#12301.

The drain-timeout branch of _stop_impl() was iterating the drain-start
snapshot (active_agents) when marking sessions resume_pending. That
snapshot can include sessions that finished gracefully during the drain
window — marking them would give their next turn a stray
'your previous turn was interrupted by a gateway restart' system note
even though the prior turn actually completed cleanly.

Iterate self._running_agents at timeout time instead, mirroring
_interrupt_running_agents() exactly:
- only sessions still blocking the shutdown get marked
- pending sentinels (AIAgent construction not yet complete) are skipped

Changes:
- gateway/run.py: swap active_agents.keys() for filtered
  self._running_agents.items() iteration in the drain-timeout mark loop.
- tests/gateway/test_restart_resume_pending.py: two regression tests —
  finisher-during-drain not marked, pending sentinel not marked.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant