Skip to content

fix(gateway): overwrite stale PID in gateway_state.json on restart#1632

Merged
teknium1 merged 1 commit into
NousResearch:mainfrom
nidhi-singh02:fix/stale-pid-gateway-state
Mar 17, 2026
Merged

fix(gateway): overwrite stale PID in gateway_state.json on restart#1632
teknium1 merged 1 commit into
NousResearch:mainfrom
nidhi-singh02:fix/stale-pid-gateway-state

Conversation

@nidhi-singh02

@nidhi-singh02 nidhi-singh02 commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

write_runtime_status() used setdefault() for pid and start_time, which
preserved stale values from the previous process when the old state file
already existed on disk. Changed to direct assignment so the current
process PID is always written on gateway startup.

gateway.pid was unaffected (uses fresh writes), but gateway_state.json
retained the old PID indefinitely after restarts.

Related Issue

Fixes # #1631

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

- `gateway/status.py` lines 198-199 — changed `setdefault()` to direct assignment for `pid` and `start_time` fields in
`write_runtime_status()`

How to Test

  1. Start the gateway: hermes gateway run
  2. Note the PID in ~/.hermes/gateway_state.json
  3. Restart: hermes gateway run --replace
  4. Check ~/.hermes/gateway_state.json — PID should now match the new process
  5. Verify with ps aux | grep gateway — PIDs match

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Debian (aarch64, Raspberry Pi)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

  • This skill is broadly useful to most users (if bundled) — see Contributing Guide
  • SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
  • No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
  • I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

Signed-off-by: nidhi-singh02 <nidhi2894@gmail.com>
@teknium1 teknium1 merged commit 247e3c1 into NousResearch:main Mar 17, 2026
teknium1 added a commit that referenced this pull request Mar 17, 2026
Verifies that write_runtime_status() overwrites pid and start_time
from a previous process rather than preserving them via setdefault().
Covers the fix from PR #1632.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…-gateway-state

fix(gateway): overwrite stale PID in gateway_state.json on restart
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…search#1631)

Verifies that write_runtime_status() overwrites pid and start_time
from a previous process rather than preserving them via setdefault().
Covers the fix from PR NousResearch#1632.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…-gateway-state

fix(gateway): overwrite stale PID in gateway_state.json on restart
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…search#1631)

Verifies that write_runtime_status() overwrites pid and start_time
from a previous process rather than preserving them via setdefault().
Covers the fix from PR NousResearch#1632.
teknium1 added a commit that referenced this pull request May 16, 2026
…text (#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of #3838's three-feature scout).
The truncated-tool-call (#1632) and empty-response-recovery (#1677,
#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…-gateway-state

fix(gateway): overwrite stale PID in gateway_state.json on restart
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…search#1631)

Verifies that write_runtime_status() overwrites pid and start_time
from a previous process rather than preserving them via setdefault().
Covers the fix from PR NousResearch#1632.
rousegordon-ops pushed a commit to rousegordon-ops/hermes-agent that referenced this pull request May 16, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).

(cherry picked from commit 627f8a5)
DIZ-admin pushed a commit to DIZ-admin/hermes-agent that referenced this pull request May 16, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
NousResearch#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
venyon2k pushed a commit to venyon2k/hermes-agent that referenced this pull request May 17, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
NousResearch#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
clckmedia pushed a commit to clckmedia/hermes-agent that referenced this pull request May 19, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
NousResearch#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).

(cherry picked from commit 627f8a5)
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
NousResearch#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…-gateway-state

fix(gateway): overwrite stale PID in gateway_state.json on restart
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…search#1631)

Verifies that write_runtime_status() overwrites pid and start_time
from a previous process rather than preserving them via setdefault().
Covers the fix from PR NousResearch#1632.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants