Skip to content

fix: harden gateway persistence and desktop session loading#40550

Closed
bmoore210 wants to merge 2 commits into
NousResearch:mainfrom
bmoore210:fix/reliability-turn-persistence
Closed

fix: harden gateway persistence and desktop session loading#40550
bmoore210 wants to merge 2 commits into
NousResearch:mainfrom
bmoore210:fix/reliability-turn-persistence

Conversation

@bmoore210

Copy link
Copy Markdown
Contributor

Summary

  • persist the current user turn before provider/tool execution so early crashes keep the inbound message in the transcript
  • harden gateway startup by repairing stale/missing SSL certificate environment state
  • avoid duplicate gateway fallback writes and cover transient failure transcript behavior
  • fix Desktop startup/session-list failure on large multi-profile installs by scoping the initial session request to the active profile and using a longer timeout for session list requests

Verification

  • HERMES_HOME=$(mktemp -d) python -m pytest tests/run_agent/test_413_compression.py tests/run_agent/test_860_dedup.py tests/gateway/test_7100_transient_failure_transcript.py tests/gateway/test_ssl_cert_detection.py -q -o 'addopts=' --tb=short
    • 43 passed in 9.92s
  • cd apps/desktop && npx vitest run src/hermes.test.ts --environment jsdom
    • 2 passed
  • cd apps/desktop && npm run type-check -- --pretty false
    • passed

Desktop investigation notes

  • local unauthenticated /api/sessions correctly returns 401 Unauthorized
  • authenticated API request with X-Hermes-Session-Token returns session rows from state.db
  • Desktop was showing Hermes couldn't start because the renderer's initial all-profile session request exceeded the 15s IPC timeout on a large local profile set; the same request measured about 17.5s locally, while the active/default profile scoped request completed in about 2.3s

@alpindiay alpindiay left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #40550 -- Harden gateway persistence and desktop session loading

Summary: Multi-pronged crash-resilience and hardening PR touching agent persistence, gateway SSL certs, desktop session loading, and test cleanup.

Findings:

[PASS] Early turn-start persistence (conversation_loop.py): Good pattern -- persists the inbound user turn before any provider call or tool execution. Wrapped in try/except with a warning log, non-fatal by design. The existing _last_flushed_db_idx dedup mechanism makes this idempotent.

[PASS] SSL cert hardening (gateway/run.py): Solid fix -- previously, SSL_CERT_FILE set to a nonexistent path would break all httpx/OpenAI client construction. Now validates the file exists and falls back to certifi. Test coverage present for both stale-path and valid-path cases.

[PASS] Gateway crash-resilience fallback (gateway/run.py): If the agent errors before reaching its own persistence (e.g. provider init failure), the gateway now persists the user message. Dedup logic scans last 10 transcript entries -- good enough for practical purposes.

[PASS] Desktop session loading (desktop-controller.tsx): Profile-aware session loading respects profileScope. Dependency array correctly includes profileScope.

[PASS] Longer session-list timeouts (hermes.ts): 60s timeout for session list calls (up from default 30s). Documented with a named constant. Tested.

[PASS] Test improvements: New test for early persistence in test_413_compression.py, updated assertions to use any() for better failure messages. db.close() in finally blocks in test_860_dedup.py fixes real resource leaks. New hermes.test.ts covers timeout behavior.

[NOTE] Dedup window (gateway _handle_message_with_agent): The fallback dedup scans only the last 10 transcript messages. If the transcript is very long and the same message text appears earlier, you could get a false negative (duplicate). Consider comparing against only the most recent user message, or using message_id. Very low practical risk.

[NOTE] profileScope dependency array: Adding profileScope to useCallback deps means sessions reload when scope changes -- intentional but changes re-fetch semantics slightly. Worth a smoke test in multi-profile setups.

Verdict: Strong PR. All changes are well-scoped, tested, and address real failure modes. Approve with the minor dedup window note as a future optimization.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery labels Jun 6, 2026
@teknium1

teknium1 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Merged via #41103. Your commits were cherry-picked onto current main with your authorship preserved in git log (rebase-merge):

  • 330ca4585 fix: harden gateway startup and turn persistence
  • b55ac4526 fix(desktop): scope session list to active profile + longer timeout

All four concerns from this PR shipped. The desktop session-loading fix also resolves a community report where switching to a profile agent emptied the chat-history panel on a single-host remote install (the active profile's rows were windowed out of the cross-profile recency page). One small follow-up on our side: added an AUTHOR_MAP entry so CI's attribution gate passed. Thanks for the fix!

@teknium1 teknium1 closed this Jun 7, 2026
agogo233 added a commit to agogo233/hermes-agent that referenced this pull request Jun 8, 2026
* upstream/main: (430 commits)
  fix(yuanbao): bound ws.close() so an idle server can't stall shutdown ~5s (NousResearch#40607)
  docs: add Urdu translation of README (NousResearch#40578)
  fix(hindsight): send only new-turn delta on append retains instead of whole session (NousResearch#40605)
  feat(gateway): render terminal tool calls as native bash code blocks on markdown platforms (NousResearch#41215)
  feat(desktop): stop the chat viewport from following streaming output (NousResearch#41414)
  chore(release): map AlchemistChaos co-author email for NousResearch#40135 salvage
  fix(desktop): recover chat after sleep/wake by revalidating a stale remote backend
  fix(web): make _has_env config-aware so SEARXNG_URL auto-detect honors Hermes config
  fix(web): honor Hermes config-aware SEARXNG_URL lookup
  install.sh: hint at root-owned npm cache when desktop npm install fails (NousResearch#39688)
  fix(tools): percent-encode non-ascii URL components
  fix(skills): browse shows full catalog, not first 5000 (NousResearch#41413)
  feat(desktop+gateway): remote media relay — attach images/PDFs and display gateway images over the network
  feat(desktop): full tool-backend config (pickers + per-backend settings) in Settings (NousResearch#41232)
  hardening(api-server): scan cron prompts on REST create/update for parity with the agent tool
  fix: skip MCP preflight content-type probe on reconnect when already ready (NousResearch#40604)
  fix(kanban): sweep deferred scratch parent on non-scratch child completion + tests
  fix: defer scratch workspace cleanup when task has active children (NousResearch#33774)
  feat(onboarding): opt-in structured profile-build path on first contact (NousResearch#41114)
  feat(compression): temporal anchoring in compaction summaries (NousResearch#41102)
  test(discord): align clarify/model-picker tests with fail-closed component auth (NousResearch#41338)
  chore(release): map Dusk1e and LaPhilosophie for approval fail-closed salvage (NousResearch#33844, NousResearch#33866, NousResearch#30964)
  fix(discord): fail closed for component button auth when no allowlist set
  fix(feishu): fail closed for update prompt card actions
  fix(slack): re-check gateway auth on approval and slash-confirm buttons
  fix: guard int(os.getenv()) casts against malformed env vars (NousResearch#40598)
  fix: respect Honcho env var fallback in doctor and honcho status
  chore(release): add synapsesx to AUTHOR_MAP for NousResearch#40495 salvage
  fix(research): keep tool_call/tool_response pairs intact when compressing trajectories
  fix(simplex): accept display name in SIMPLEX_ALLOWED_USERS
  fix(desktop): make the running-turn timer per-session (NousResearch#41182)
  test(approval): regression for shell-escape denylist bypass (NousResearch#36846, NousResearch#36847)
  fix(security): strip shell escapes in denylist normalizer; fail-closed on missing approval module
  fix(stream+output-cap): guard empty streams and parse OpenRouter output-cap errors (NousResearch#40589)
  fix(desktop): bootstrap falls back to installed agent install.sh on GitHub 404
  feat(dashboard): change UI font from the theme picker, independent of theme (NousResearch#41145)
  fix(cli): return bool (not None) when a destructive-slash confirmation is cancelled (NousResearch#40583)
  fix(desktop): preserve configured base_url on same-provider model switch (NousResearch#41121)
  fix(desktop): stop bare-URL autolinker swallowing trailing emphasis asterisks (NousResearch#41093)
  fix(cron): bound the desktop run-history query to one job (NousResearch#41088)
  fix(desktop): scope in-session /model switch per-session, stop process-env leak (NousResearch#41120)
  chore: map bmoore210 author email for PR NousResearch#40550 salvage
  fix(desktop): scope session list to active profile + longer timeout
  fix: harden gateway startup and turn persistence
  fix(computer_use): honor custom vision routing
  fix(aux): honor model.default_headers on auxiliary client too (NousResearch#40033)
  fix(agent): honor model.default_headers for custom OpenAI-compatible providers (NousResearch#40033)
  docs(i18n): port deep-audit corrections to zh-Hans mirror (NousResearch#41104)
  fix(compression): don't overwrite the -1 post-compression sentinel in preflight seed (NousResearch#36718)
  chore(release): map singhsanidhya741@gmail.com to sanidhyasin (NousResearch#41094)
  ...
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants