fix(gateway): drain in-flight work before restart#7290
Conversation
kshitijk4poor
left a comment
There was a problem hiding this comment.
PR Review — #7290: fix(gateway): drain in-flight work before restart
Verdict: Comment (observations and suggestions, nothing blocking)
Tests: 4123 passed, 11 failed (all 11 pre-existing on upstream/main). 0 new failures.
Live test: N/A (gateway runtime — tested via unit tests)
Warnings
-
gateway/run.py:
_drain_active_agentsstatus thrashing —_update_runtime_status("draining")is called 4 times in this method, including inside the 0.1s poll loop. Each call does a full JSON read-modify-write cycle to disk viawrite_runtime_status(). Over a 60s drain timeout, that's ~600 file read + 600 file write operations. Consider throttling to once per second or only whenactive_agentscount changes. -
Duplicated config parser:
_load_restart_drain_timeout()vs_get_restart_drain_timeout()—gateway/run.pyandhermes_cli/gateway.pyboth parseagent.restart_drain_timeoutfrom env/YAML independently. Therun.pyversion hardcodes60.0as default in two places, while thegateway.pyversion correctly referencesDEFAULT_CONFIG["agent"]["restart_drain_timeout"]. If the default changes inconfig.py, therun.pyversion silently drifts. -
Photo-merge logic copied a third time —
_queue_or_replace_pending_event()replicates the photo-burst media merging pattern frombase.py(and already duplicated atrun.py:2311). This is now in 3 places. Consider extracting a shared helper. -
Exit code 75 is an undocumented magic number — Used in
_stop_impland matched byRestartForceExitStatus=75in the systemd unit. The convention is sound (EX_TEMPFAIL from sysexits.h), but the number appears in 3 locations with no named constant or comment explaining why 75 was chosen.
Suggestions
-
Shared
_make_runner()test fixture —test_gateway_shutdown.pyandtest_restart_drain.pyeach have their own_make_runner()that constructs aGatewayRunnerviaobject.__new__()with ~18 manually-set attributes.test_session_boundary_hooks.pyandtest_session_race_guard.pyalso needed to add the 9 new attributes inline. A shared fixture intests/gateway/conftest.pywould prevent this N-file maintenance burden when new attributes are added. -
getattr(self, "_session_model_overrides", {})band-aid — The_session_model_overrideschange in_apply_session_model_overridesis a defensive workaround for theobject.__new__()test pattern. The proper fix is adding_session_model_overrides: Dict[str, Dict[str, str]] = {}to the class-level defaults alongside the other 9 attributes added in this PR. This would be consistent with the stated purpose of those class-level defaults. -
Test coverage gaps for new methods — The following are untested:
_launch_detached_restart_command()(complex subprocess orchestration),_load_busy_input_mode()and_load_restart_drain_timeout()(config loading with env/YAML/default fallback), and theexit_code = 75path throughstop(restart=True, service_restart=True). The exit code path is particularly important since it's the mechanism that makes systemd auto-restart work. -
request_restart()only tested via mock — Its actual implementation (idempotency guard via_restart_task_started, background task creation) is never exercised. A test calling it twice and asserting the second returnsFalsewould verify the guard.
Looks Good
- Core drain-before-disconnect logic is well-structured:
stop()→_drain_active_agents()→ optional_interrupt_running_agents()→_finalize_shutdown_agents()→ adapter disconnect. Clean separation of concerns. - Re-entrancy guard on
stop()via_stop_taskis a good pattern — second call awaits the same task instead of double-stopping. /restartcommand correctly registered inCOMMAND_REGISTRY(gateway-only) and added to the active-session bypass list inbase.py.- Busy-session handler integration through
BasePlatformAdapteris cleanly layered — the adapter delegates to the runner's handler without coupling to gateway internals. - Systemd changes are solid:
ExecReload=/bin/kill -USR1 $MAINPIDfor graceful restart,RestartForceExitStatus=75for the exit-code restart signal,reload-or-restartinstead of hard restart. - Launchd restart now does SIGTERM → wait-for-drain → kickstart instead of the previous atomic kill+restart. The
_wait_for_gateway_exitrefactor (returning bool, optional force_after) is clean. restart_drain_timeoutconfig properly added toDEFAULT_CONFIG,cli-config.yaml.example, and exposed via env varHERMES_RESTART_DRAIN_TIMEOUT.- All new test files pass. The PR fixes some pre-existing test failures rather than introducing any.
|
Addressed the review feedback in Implemented:
Verification:
I kept the fixture cleanup intentionally scoped to the restart/shutdown-related tests in this PR rather than turning this branch into a wider gateway test-helper consolidation. |
|
Merged via PR #7503. Your commits were cherry-picked onto current main with your authorship preserved in git log. Added agent.close() cleanup that was missing from the refactored stop() and fixed the /restart CommandDef category. Thanks for the solid contribution, Kenny! |
Three problems fixed: 1. bobashopcashier missing from v0.9.0 contributor list despite authoring the gateway drain PR (#7290, salvaged into #7503). Their email (kennyx102@gmail.com) was missing from AUTHOR_MAP. 2. release.py only scanned git commit authors, missing Co-authored-by trailers. Now parse_coauthors() extracts trailers from commit bodies. 3. No mechanism to detect contributors from salvaged PRs (where original author only appears in PR description, not git log). Changes: - scripts/release.py: add kennyx102@gmail.com to AUTHOR_MAP, enhance get_commits() to parse Co-authored-by trailers, filter AI assistants (Claude, Copilot, Cursor Agent) from co-author lists - scripts/contributor_audit.py: new script that cross-references git authors, co-author trailers, and salvaged PR descriptions. Reports unknown emails and contributors missing from release notes. - RELEASE_v0.9.0.md: add bobashopcashier to community contributors Usage: python scripts/contributor_audit.py --since-tag v2026.4.8 python scripts/contributor_audit.py --since-tag v2026.4.8 --release-file RELEASE_v0.9.0.md
Three problems fixed: 1. bobashopcashier missing from v0.9.0 contributor list despite authoring the gateway drain PR (#7290, salvaged into #7503). Their email (kennyx102@gmail.com) was missing from AUTHOR_MAP. 2. release.py only scanned git commit authors, missing Co-authored-by trailers. Now parse_coauthors() extracts trailers from commit bodies. 3. No mechanism to detect contributors from salvaged PRs (where original author only appears in PR description, not git log). Changes: - scripts/release.py: add kennyx102@gmail.com to AUTHOR_MAP, enhance get_commits() to parse Co-authored-by trailers, filter AI assistants (Claude, Copilot, Cursor Agent) from co-author lists - scripts/contributor_audit.py: new script that cross-references git authors, co-author trailers, and salvaged PR descriptions. Reports unknown emails and contributors missing from release notes. - RELEASE_v0.9.0.md: add bobashopcashier to community contributors Usage: python scripts/contributor_audit.py --since-tag v2026.4.8 python scripts/contributor_audit.py --since-tag v2026.4.8 --release-file RELEASE_v0.9.0.md
…arch#9264) Three problems fixed: 1. bobashopcashier missing from v0.9.0 contributor list despite authoring the gateway drain PR (NousResearch#7290, salvaged into NousResearch#7503). Their email (kennyx102@gmail.com) was missing from AUTHOR_MAP. 2. release.py only scanned git commit authors, missing Co-authored-by trailers. Now parse_coauthors() extracts trailers from commit bodies. 3. No mechanism to detect contributors from salvaged PRs (where original author only appears in PR description, not git log). Changes: - scripts/release.py: add kennyx102@gmail.com to AUTHOR_MAP, enhance get_commits() to parse Co-authored-by trailers, filter AI assistants (Claude, Copilot, Cursor Agent) from co-author lists - scripts/contributor_audit.py: new script that cross-references git authors, co-author trailers, and salvaged PR descriptions. Reports unknown emails and contributors missing from release notes. - RELEASE_v0.9.0.md: add bobashopcashier to community contributors Usage: python scripts/contributor_audit.py --since-tag v2026.4.8 python scripts/contributor_audit.py --since-tag v2026.4.8 --release-file RELEASE_v0.9.0.md
…arch#9264) Three problems fixed: 1. bobashopcashier missing from v0.9.0 contributor list despite authoring the gateway drain PR (NousResearch#7290, salvaged into NousResearch#7503). Their email (kennyx102@gmail.com) was missing from AUTHOR_MAP. 2. release.py only scanned git commit authors, missing Co-authored-by trailers. Now parse_coauthors() extracts trailers from commit bodies. 3. No mechanism to detect contributors from salvaged PRs (where original author only appears in PR description, not git log). Changes: - scripts/release.py: add kennyx102@gmail.com to AUTHOR_MAP, enhance get_commits() to parse Co-authored-by trailers, filter AI assistants (Claude, Copilot, Cursor Agent) from co-author lists - scripts/contributor_audit.py: new script that cross-references git authors, co-author trailers, and salvaged PR descriptions. Reports unknown emails and contributors missing from release notes. - RELEASE_v0.9.0.md: add bobashopcashier to community contributors Usage: python scripts/contributor_audit.py --since-tag v2026.4.8 python scripts/contributor_audit.py --since-tag v2026.4.8 --release-file RELEASE_v0.9.0.md
…arch#9264) Three problems fixed: 1. bobashopcashier missing from v0.9.0 contributor list despite authoring the gateway drain PR (NousResearch#7290, salvaged into NousResearch#7503). Their email (kennyx102@gmail.com) was missing from AUTHOR_MAP. 2. release.py only scanned git commit authors, missing Co-authored-by trailers. Now parse_coauthors() extracts trailers from commit bodies. 3. No mechanism to detect contributors from salvaged PRs (where original author only appears in PR description, not git log). Changes: - scripts/release.py: add kennyx102@gmail.com to AUTHOR_MAP, enhance get_commits() to parse Co-authored-by trailers, filter AI assistants (Claude, Copilot, Cursor Agent) from co-author lists - scripts/contributor_audit.py: new script that cross-references git authors, co-author trailers, and salvaged PR descriptions. Reports unknown emails and contributors missing from release notes. - RELEASE_v0.9.0.md: add bobashopcashier to community contributors Usage: python scripts/contributor_audit.py --since-tag v2026.4.8 python scripts/contributor_audit.py --since-tag v2026.4.8 --release-file RELEASE_v0.9.0.md
…arch#9264) Three problems fixed: 1. bobashopcashier missing from v0.9.0 contributor list despite authoring the gateway drain PR (NousResearch#7290, salvaged into NousResearch#7503). Their email (kennyx102@gmail.com) was missing from AUTHOR_MAP. 2. release.py only scanned git commit authors, missing Co-authored-by trailers. Now parse_coauthors() extracts trailers from commit bodies. 3. No mechanism to detect contributors from salvaged PRs (where original author only appears in PR description, not git log). Changes: - scripts/release.py: add kennyx102@gmail.com to AUTHOR_MAP, enhance get_commits() to parse Co-authored-by trailers, filter AI assistants (Claude, Copilot, Cursor Agent) from co-author lists - scripts/contributor_audit.py: new script that cross-references git authors, co-author trailers, and salvaged PR descriptions. Reports unknown emails and contributors missing from release notes. - RELEASE_v0.9.0.md: add bobashopcashier to community contributors Usage: python scripts/contributor_audit.py --since-tag v2026.4.8 python scripts/contributor_audit.py --since-tag v2026.4.8 --release-file RELEASE_v0.9.0.md
…arch#9264) Three problems fixed: 1. bobashopcashier missing from v0.9.0 contributor list despite authoring the gateway drain PR (NousResearch#7290, salvaged into NousResearch#7503). Their email (kennyx102@gmail.com) was missing from AUTHOR_MAP. 2. release.py only scanned git commit authors, missing Co-authored-by trailers. Now parse_coauthors() extracts trailers from commit bodies. 3. No mechanism to detect contributors from salvaged PRs (where original author only appears in PR description, not git log). Changes: - scripts/release.py: add kennyx102@gmail.com to AUTHOR_MAP, enhance get_commits() to parse Co-authored-by trailers, filter AI assistants (Claude, Copilot, Cursor Agent) from co-author lists - scripts/contributor_audit.py: new script that cross-references git authors, co-author trailers, and salvaged PR descriptions. Reports unknown emails and contributors missing from release notes. - RELEASE_v0.9.0.md: add bobashopcashier to community contributors Usage: python scripts/contributor_audit.py --since-tag v2026.4.8 python scripts/contributor_audit.py --since-tag v2026.4.8 --release-file RELEASE_v0.9.0.md
What does this PR do?
Fixes gateway self-restart so in-flight agent work is drained before shutdown instead of being torn down mid-response.
This adds a drain-aware
/restartpath inside the gateway, teaches shutdown/restart to wait for active agents up toagent.restart_drain_timeout, and updates service-manager restart behavior sohermes gateway restartrequests graceful shutdown before forcing a replacement.Related Issue
Fixes #7286
Type of Change
Changes Made
agent.restart_drain_timeoutconfig/defaults and expose drain/restart state through gateway runtime status/restartcommand and drain-aware runner shutdown that preserves the current response path when possibledisplay.busy_input_mode: queueis set/restart, and service restart behaviorHow to Test
Platform Tested
Checklist
pytest tests/ -qand all tests pass