fix(gateway): fix discrepancies in gateway status by snreynolds · Pull Request #11167 · NousResearch/hermes-agent

snreynolds · 2026-04-16T18:03:50Z

What does this PR do?

Fixes inconsistent Hermes gateway status reporting for the current profile.
Before this change, different parts of Hermes used different liveness checks:

hermes gateway run and gateway-dependent tooling relied on the profile-scoped gateway.pid validator
hermes gateway status could instead rely on service-manager state or process-table scanning
profile status checks used a weaker PID probe

That could lead to contradictory behavior such as:

hermes gateway status saying the gateway was not running
hermes gateway then refusing to start because a gateway process was already running

Related Issue

Fixes #

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

Added reusable PID-file validation support in gateway/status.py so callers can validate an explicit gateway PID file path with the same logic used by the main gateway runtime.
Added shared gateway runtime snapshot helpers in hermes_cli/gateway.py to centralize current-profile liveness reporting.
Updated find_gateway_pids() in hermes_cli/gateway.py to fall back to the current profile PID file before relying only on process-table scanning.
Updated hermes gateway status in hermes_cli/gateway.py to report service/process mismatches more clearly instead of silently producing contradictory output.
Updated other CLI status surfaces to use the shared runtime snapshot:
- hermes_cli/status.py
- hermes_cli/dump.py
Updated profile gateway checks in hermes_cli/profiles.py to use the shared PID validator instead of a weaker custom implementation.
Added/updated targeted tests in:
- tests/gateway/test_status.py
- tests/hermes_cli/test_gateway.py
- tests/hermes_cli/test_gateway_service.py
- tests/hermes_cli/test_profiles.py

How to Test

Reproduce the bug before the fix:
- Start a gateway process for the current profile manually.
- Put Hermes in a state where hermes gateway status does not rely on the same liveness path as hermes gateway run.
- Observe that hermes gateway status can report "not running" while hermes gateway refuses to start because a gateway process is already running.
Verify the fix:
- Run hermes gateway status
- Confirm it now reflects the current profile's actual gateway process state more accurately and surfaces service/process mismatches explicitly.

Checklist

Code

[x ] I've read the Contributing Guide
[ x] My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
[ x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform:

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

This skill is broadly useful to most users (if bundled) — see Contributing Guide
SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

## What broke Meta-only messages (e.g., `/model`, `/tools`) cause the agent to stuck forever. After issuing a meta-only command, the agent becomes unresponsive to subsequent requests like `/new` or regular messages. The agent logs show the meta-only message is processed, but then the loop waits indefinitely for a response that never arrives. ## Root cause In `chat_with_model()` at run_agent.py:3115-3119: - When `meta_only=True`, `_process_user_message()` is called - It returns `meta_result` (e.g., "Model changed") - But the code continues to Line 3136-3148 response processing - Meta-only messages don't produce LLM responses, so response is None - The loop waits for `_get_response_content(response)` indefinitely The original code: ```python if meta_only: meta_result = await self._process_user_message(...) # Then continues to response loop without returning ``` ## Why this fix is minimal Added 5 lines: immediate return for meta-only path. ```python if meta_only: meta_result = await self._process_user_message(...) # Meta-only messages don't produce LLM responses. # Return the meta_result directly. return meta_result if meta_result else "Processed meta-only message." ``` No changes to regular message handling (meta_only=False path unchanged). No changes to `_process_user_message()` or `_run_meta_only_handler()`. No opportunistic refactoring. ## What I tested Added test suite tests/test_meta_only_stuck_fix.py: - test_meta_only_returns_immediately - test_meta_only_does_not_enter_response_loop - test_meta_only_with_none_response - test_meta_only_flag_detection - test_process_user_message_meta_only_calls_handler - test_chat_with_model_meta_only_exits_early All tests verify meta-only path returns immediately without stuck. ## What I intentionally did not change - No changes to regular message handling - No changes to `_run_meta_only_handler()` implementation - No changes to response content processing - No opportunistic refactoring ## Evidence Before: `/model` → agent stuck, no response, `/new` ignored After: `/model` → "Model changed" response, agent responsive Fixes NousResearch#11167

teknium1 · 2026-04-18T01:58:44Z

Merged via #11896 — your commit was cherry-picked onto current main with your authorship preserved (commit 8ab1aa2). Really clean abstraction with the GatewayRuntimeSnapshot dataclass — cut a lot of duplicated platform-branching across status/dump/profiles. Thanks for the contribution, Sara!

snreynolds · 2026-04-20T15:58:16Z

Merged via #11896 — your commit was cherry-picked onto current main with your authorship preserved (commit 8ab1aa2). Really clean abstraction with the GatewayRuntimeSnapshot dataclass — cut a lot of duplicated platform-branching across status/dump/profiles. Thanks for the contribution, Sara!

@teknium1 sweet, thanks for the review!

fix(gateway): fix discrepancies in gateway status

d7fe80c

snreynolds marked this pull request as ready for review April 17, 2026 17:10

teknium1 mentioned this pull request Apr 18, 2026

fix(gateway): unify gateway status across CLI surfaces (salvaged from #11167) #11896

Merged

teknium1 closed this Apr 18, 2026

This was referenced Apr 18, 2026

fix(gateway): prefer pid file for manual status #9559

Closed

fix(cli): use runtime pid fallback for gateway detection on macOS #11445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): fix discrepancies in gateway status#11167

fix(gateway): fix discrepancies in gateway status#11167
snreynolds wants to merge 1 commit into
NousResearch:mainfrom
snreynolds:sarareynolds/fix-gateway-status

snreynolds commented Apr 16, 2026

Uh oh!

teknium1 commented Apr 18, 2026

Uh oh!

snreynolds commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

snreynolds commented Apr 16, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Uh oh!

teknium1 commented Apr 18, 2026

Uh oh!

snreynolds commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants