fix: harden gateway persistence and desktop session loading#40550
fix: harden gateway persistence and desktop session loading#40550bmoore210 wants to merge 2 commits into
Conversation
alpindiay
left a comment
There was a problem hiding this comment.
Code Review: PR #40550 -- Harden gateway persistence and desktop session loading
Summary: Multi-pronged crash-resilience and hardening PR touching agent persistence, gateway SSL certs, desktop session loading, and test cleanup.
Findings:
[PASS] Early turn-start persistence (conversation_loop.py): Good pattern -- persists the inbound user turn before any provider call or tool execution. Wrapped in try/except with a warning log, non-fatal by design. The existing _last_flushed_db_idx dedup mechanism makes this idempotent.
[PASS] SSL cert hardening (gateway/run.py): Solid fix -- previously, SSL_CERT_FILE set to a nonexistent path would break all httpx/OpenAI client construction. Now validates the file exists and falls back to certifi. Test coverage present for both stale-path and valid-path cases.
[PASS] Gateway crash-resilience fallback (gateway/run.py): If the agent errors before reaching its own persistence (e.g. provider init failure), the gateway now persists the user message. Dedup logic scans last 10 transcript entries -- good enough for practical purposes.
[PASS] Desktop session loading (desktop-controller.tsx): Profile-aware session loading respects profileScope. Dependency array correctly includes profileScope.
[PASS] Longer session-list timeouts (hermes.ts): 60s timeout for session list calls (up from default 30s). Documented with a named constant. Tested.
[PASS] Test improvements: New test for early persistence in test_413_compression.py, updated assertions to use any() for better failure messages. db.close() in finally blocks in test_860_dedup.py fixes real resource leaks. New hermes.test.ts covers timeout behavior.
[NOTE] Dedup window (gateway _handle_message_with_agent): The fallback dedup scans only the last 10 transcript messages. If the transcript is very long and the same message text appears earlier, you could get a false negative (duplicate). Consider comparing against only the most recent user message, or using message_id. Very low practical risk.
[NOTE] profileScope dependency array: Adding profileScope to useCallback deps means sessions reload when scope changes -- intentional but changes re-fetch semantics slightly. Worth a smoke test in multi-profile setups.
Verdict: Strong PR. All changes are well-scoped, tested, and address real failure modes. Approve with the minor dedup window note as a future optimization.
|
Merged via #41103. Your commits were cherry-picked onto current
All four concerns from this PR shipped. The desktop session-loading fix also resolves a community report where switching to a profile agent emptied the chat-history panel on a single-host remote install (the active profile's rows were windowed out of the cross-profile recency page). One small follow-up on our side: added an AUTHOR_MAP entry so CI's attribution gate passed. Thanks for the fix! |
* upstream/main: (430 commits) fix(yuanbao): bound ws.close() so an idle server can't stall shutdown ~5s (NousResearch#40607) docs: add Urdu translation of README (NousResearch#40578) fix(hindsight): send only new-turn delta on append retains instead of whole session (NousResearch#40605) feat(gateway): render terminal tool calls as native bash code blocks on markdown platforms (NousResearch#41215) feat(desktop): stop the chat viewport from following streaming output (NousResearch#41414) chore(release): map AlchemistChaos co-author email for NousResearch#40135 salvage fix(desktop): recover chat after sleep/wake by revalidating a stale remote backend fix(web): make _has_env config-aware so SEARXNG_URL auto-detect honors Hermes config fix(web): honor Hermes config-aware SEARXNG_URL lookup install.sh: hint at root-owned npm cache when desktop npm install fails (NousResearch#39688) fix(tools): percent-encode non-ascii URL components fix(skills): browse shows full catalog, not first 5000 (NousResearch#41413) feat(desktop+gateway): remote media relay — attach images/PDFs and display gateway images over the network feat(desktop): full tool-backend config (pickers + per-backend settings) in Settings (NousResearch#41232) hardening(api-server): scan cron prompts on REST create/update for parity with the agent tool fix: skip MCP preflight content-type probe on reconnect when already ready (NousResearch#40604) fix(kanban): sweep deferred scratch parent on non-scratch child completion + tests fix: defer scratch workspace cleanup when task has active children (NousResearch#33774) feat(onboarding): opt-in structured profile-build path on first contact (NousResearch#41114) feat(compression): temporal anchoring in compaction summaries (NousResearch#41102) test(discord): align clarify/model-picker tests with fail-closed component auth (NousResearch#41338) chore(release): map Dusk1e and LaPhilosophie for approval fail-closed salvage (NousResearch#33844, NousResearch#33866, NousResearch#30964) fix(discord): fail closed for component button auth when no allowlist set fix(feishu): fail closed for update prompt card actions fix(slack): re-check gateway auth on approval and slash-confirm buttons fix: guard int(os.getenv()) casts against malformed env vars (NousResearch#40598) fix: respect Honcho env var fallback in doctor and honcho status chore(release): add synapsesx to AUTHOR_MAP for NousResearch#40495 salvage fix(research): keep tool_call/tool_response pairs intact when compressing trajectories fix(simplex): accept display name in SIMPLEX_ALLOWED_USERS fix(desktop): make the running-turn timer per-session (NousResearch#41182) test(approval): regression for shell-escape denylist bypass (NousResearch#36846, NousResearch#36847) fix(security): strip shell escapes in denylist normalizer; fail-closed on missing approval module fix(stream+output-cap): guard empty streams and parse OpenRouter output-cap errors (NousResearch#40589) fix(desktop): bootstrap falls back to installed agent install.sh on GitHub 404 feat(dashboard): change UI font from the theme picker, independent of theme (NousResearch#41145) fix(cli): return bool (not None) when a destructive-slash confirmation is cancelled (NousResearch#40583) fix(desktop): preserve configured base_url on same-provider model switch (NousResearch#41121) fix(desktop): stop bare-URL autolinker swallowing trailing emphasis asterisks (NousResearch#41093) fix(cron): bound the desktop run-history query to one job (NousResearch#41088) fix(desktop): scope in-session /model switch per-session, stop process-env leak (NousResearch#41120) chore: map bmoore210 author email for PR NousResearch#40550 salvage fix(desktop): scope session list to active profile + longer timeout fix: harden gateway startup and turn persistence fix(computer_use): honor custom vision routing fix(aux): honor model.default_headers on auxiliary client too (NousResearch#40033) fix(agent): honor model.default_headers for custom OpenAI-compatible providers (NousResearch#40033) docs(i18n): port deep-audit corrections to zh-Hans mirror (NousResearch#41104) fix(compression): don't overwrite the -1 post-compression sentinel in preflight seed (NousResearch#36718) chore(release): map singhsanidhya741@gmail.com to sanidhyasin (NousResearch#41094) ...
Summary
Verification
HERMES_HOME=$(mktemp -d) python -m pytest tests/run_agent/test_413_compression.py tests/run_agent/test_860_dedup.py tests/gateway/test_7100_transient_failure_transcript.py tests/gateway/test_ssl_cert_detection.py -q -o 'addopts=' --tb=short43 passed in 9.92scd apps/desktop && npx vitest run src/hermes.test.ts --environment jsdom2 passedcd apps/desktop && npm run type-check -- --pretty falseDesktop investigation notes
/api/sessionscorrectly returns401 UnauthorizedX-Hermes-Session-Tokenreturns session rows fromstate.dbHermes couldn't startbecause the renderer's initial all-profile session request exceeded the 15s IPC timeout on a large local profile set; the same request measured about 17.5s locally, while the active/default profile scoped request completed in about 2.3s