Skip to content

fix: improve gateway restart resilience and WSL stability#8213

Closed
albsa wants to merge 1 commit into
NousResearch:mainfrom
albsa:fix/gateway-restart-resume-stt-docs
Closed

fix: improve gateway restart resilience and WSL stability#8213
albsa wants to merge 1 commit into
NousResearch:mainfrom
albsa:fix/gateway-restart-resume-stt-docs

Conversation

@albsa

@albsa albsa commented Apr 12, 2026

Copy link
Copy Markdown

What does this PR do?

This PR improves Hermes gateway reliability during restarts and makes local speech-to-text more stable on WSL-style environments.

Specifically it:

  • saves in-flight gateway turns when restart drain times out
  • resumes those interrupted turns automatically after the gateway starts again
  • switches local faster-whisper loading to CPU/int8 for reliability when partial CUDA setups cause libcublas failures
  • updates the FAQ to recommend idempotent WSL startup wrappers so healthy gateways are not bounced by repeated startup triggers

Related Issue

Fixes #

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • 📝 Documentation update
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

gateway/run.py

  • track active inbound events per session
  • persist interrupted in-flight turns to a restart-resume file when restart drain times out
  • replay those saved turns automatically after gateway startup

tools/transcription_tools.py

  • force local faster-whisper to load with device="cpu" and compute_type="int8"
  • avoids WSL/CTranslate2/CUDA partial-runtime failures such as missing libcublas.so.12

website/docs/reference/faq.md

  • document idempotent WSL startup guidance
  • warn against startup wrappers that restart an already healthy gateway

How to Test

1. Restart-resume behavior

  • start Hermes gateway
  • send a Telegram/gateway message that takes long enough to still be running during restart
  • trigger a gateway restart while the task is active
  • verify the gateway stores the interrupted turn and resumes it automatically after startup
  • verify the user eventually gets a final reply instead of being left with only tool-progress output

2. STT stability

  • send a voice note in a WSL environment with local STT enabled
  • verify local transcription works in CPU/int8 mode
  • verify the previous libcublas/CUDA failure path no longer occurs

3. Regression check

  • ensure normal gateway requests still complete correctly
  • ensure startup without interrupted turns does not attempt any replay

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform:
    • WSL2 / Linux gateway environment

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Validation run locally:

  • python3 -m py_compile gateway/run.py
  • ./venv/bin/python -m pytest tests/gateway/test_session_boundary_hooks.py -q

Observed user-facing failure before fix:

  • gateway could emit tool-progress messages on Telegram
  • gateway restart/drain timeout could interrupt active work
  • user would receive no final reply

Observed behavior after local fix:

  • restart-resume logic compiles cleanly
  • targeted gateway tests pass
  • FAQ updated with idempotent WSL startup guidance

Notes

This PR intentionally excludes local environment-specific startup wrapper changes outside the repository.

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the contribution @albsa! All three changes in this PR have since been implemented on main independently.


This is an automated hermes-sweeper review.

  • Gateway restart-resumecb4addaca (fix(gateway): auto-resume sessions after drain-timeout restart #12301) introduced resume_pending on SessionEntry, mark_resume_pending() / clear_resume_pending() in gateway/session.py, the drain-timeout flagging block in gateway/run.py (line 2808), and the reason-aware system note injected on the first post-restart turn (line 10413). This is a fuller implementation of the same concept.
  • faster-whisper CUDA → CPU fallback4350668ae (fix(transcription): fall back to CPU when CUDA runtime libs are missing) added _load_local_whisper_model() with try-device=auto / catch-CUDA-lib / retry-device=cpu compute_type=int8, plus a mid-transcribe() eviction+retry path, in tools/transcription_tools.py (line 362). Main's implementation is strictly more complete than the forced-CPU approach here.
  • WSL gateway FAQa8fd7257b (feat(gateway): WSL-aware gateway #7510) already added a WSL-specific FAQ section in website/docs/reference/faq.md (line 431) covering systemd unreliability, foreground/tmux/nohup alternatives, wsl.conf steps, and Task Scheduler auto-start guidance.

No further action needed — closing as implemented.

@teknium1 teknium1 closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants