perf(orchestrator): skip blocking Lore memory prefetch on parent agent#3
perf(orchestrator): skip blocking Lore memory prefetch on parent agent#3davidgut1982 wants to merge 1 commit into
Conversation
The parent orchestrator calls MemoryManager.prefetch_all() ->
kb_search(top_k=30) to Lore synchronously at the start of every
run_conversation (~4.3s/request), before the first LLM call. The parent
never uses the result for routing: SOUL.md delegates ALL recall to the
`memory` child profile. The prefetch is therefore pure latency.
Pass skip_memory=True at every user-facing PARENT/orchestrator
construction point, which leaves _memory_manager=None so the prefetch is
never reached. Delegated children build their own memory manager
independently and are unaffected — recall still works when the parent
delegates to a memory-capable child.
Parent construction points changed:
- gateway/platforms/api_server.py _create_agent (HTTP /v1/chat/completions)
- gateway/run.py _run_agent (Telegram/Discord/Slack inbound)
- gateway/run.py _run_background_task (gateway background tasks)
- tui_gateway/server.py _make_agent (TUI / dashboard chat; was
conditional on HERMES_IGNORE_RULES,
now forced True for the parent)
- tui_gateway/server.py _background_agent_kwargs (TUI background tasks)
Tradeoff (passive writes): the parent's _memory_manager is also what
performs the post-turn passive transcript write
(_sync_external_memory_for_turn -> sync_all -> provider.sync_turn) and
queue_prefetch_all. With skip_memory=True these passive writes no longer
fire for PARENT-handled turns. Explicit memory(action="add") / kb_add via
the MCP knowledge server are unaffected, and delegated children still
sync their own turns. Surfaced for human review before deploy.
Tests:
- tests/gateway/test_api_server_toolset.py: assert _create_agent passes
skip_memory=True.
- tests/gateway/test_orchestrator_skip_memory.py: assert the TUI parent
helpers pass skip_memory=True; assert a real AIAgent(skip_memory=True)
yields _memory_manager is None; assert the delegated child construction
carries its own skip_memory kwarg (parent value does not leak in).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-import |
1 |
First entries
tests/gateway/test_orchestrator_skip_memory.py:21: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
✅ Fixed issues: none
Unchanged: 4957 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
|
Closing — superseded by Lore-side performance fixes (kb_get batching + Postgres GIN/random_page_cost planner fix + semantic default recall). Those dropped the parent prefetch from ~4.3s to ~115ms (measured, 40x), so skip_memory would now save only ~115ms while losing passive memory — a bad trade. Keeping this branch for reference; the change still validly decouples the router from Lore availability if ever needed, but it should not merge. |
…NousResearch#34192) (NousResearch#34382) NousResearch#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup with: /usr/bin/tini: No such file or directory The image moved from tini to s6-overlay as PID 1 (/init) earlier in 2026. Orchestration templates that still pin /usr/bin/tini as the entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no binary to exec and the container crashes immediately. Hermes has no control over the Hostinger catalog template, but we can make the image backward-compatible by symlinking /usr/bin/tini -> /init during the s6-overlay install step. External wrappers that exec /usr/bin/tini will land on the same s6-overlay reaper they would have landed on if they'd used the canonical /init entrypoint. The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim is purely for legacy external wrappers, not for the image's own runtime path. Once affected catalogs are updated, the symlink can be removed. Other issues NousResearch#34192 raises that are NOT addressed by this PR: * Problem #2 (UID 1024 vs 10000 mismatch): already fixed by NousResearch#33148 (S6_KEEP_ENV=1) and NousResearch#32412 (with-contenv shebangs). The Hostinger template likely needs to update its env-var propagation. * Problem #3 (incompatible session formats): RFC for pluggable SessionDB is tracked in NousResearch#23717. * Problem #4 (Telegram polling conflict): an operations problem on Hostinger's side, not in this codebase. This PR is scoped to the one issue that can be fixed inside Dockerfile: the missing /usr/bin/tini binary. Tests (3 in test_dockerfile_tini_compat_shim.py): - test_tini_compat_symlink_present Guard: the symlink line must exist in Dockerfile. - test_tini_compat_comment_explains_why The NousResearch#34192 anchor comment must be present so future readers know why the shim is there (avoid accidental removal). - test_entrypoint_still_init_not_tini Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is only for external wrappers. Refs: NousResearch#34192 Partial fix: addresses the immediate tini-binary crash. Catalog-side fixes still needed by Hostinger for the UID and session-format problems documented in the issue. Co-authored-by: Cursor <cursoragent@cursor.com>
What it does
The parent orchestrator calls
MemoryManager.prefetch_all()→kb_search(top_k=30)to Lore (CT 121) synchronously at the start of everyrun_conversation()(agent/conversation_loop.py~654), before the first LLM call. Measured ~4.3s on every request.The parent never uses that result for routing:
SOUL.mdexplicitly delegates ALL recall to thememorychild profile. The prefetch is therefore pure latency the orchestrator pays and discards.This PR passes the existing
skip_memory=Trueconstructor flag at every user-facing PARENT/orchestrator construction point. When set,agent_init.pyleaves_memory_manager=Noneand the prefetch is never reached. No change toconversation_loop.pyormemory_manager.py— the prefetch logic is untouched; we simply don't build a memory manager on the parent.Why it's safe
memorychild — the parent never consults prefetch output for routing decisions, so dropping it changes no routing behavior._memory_managerindependently (the genericdelegate_taskpath intools/delegate_tool.pyand profile-based children decide their ownskip_memory). Setting the flag on the parent does not flow into the child constructor — recall still works when the parent delegates to a memory-capable child.memory(action="add")andkb_addvia the MCP knowledge server are independent of the parent's_memory_managerprefetch path and continue to work._memory_manager=Noneacross turns (the cache stores the constructed instance;_init_cached_agent_for_turnnever rebuilds the manager), so the saving applies to every turn in a session, not just the first.Tradeoff surfaced for review — passive memory writes
The parent's
_memory_manageris also what performs the post-turn passive transcript write:conversation_loop.py(~4261) →AIAgent._sync_external_memory_for_turn(run_agent.py~2104) →MemoryManager.sync_all→provider.sync_turn, plusqueue_prefetch_allto warm the next turn.With
skip_memory=True, these passive writes no longer fire for PARENT-handled turns. Concretely, what stops accumulating in Lore from the parent path: the automatic(user, assistant)turn-pair transcript capture and the next-turn prefetch warming.What is not lost:
memory(action="add")writes andkb_addvia the MCP knowledge server.skip_memory=True(and re-binds built-inMEMORY.md/USER.mdfrom the parent), so it is unchanged.Net: passive parent-transcript accumulation into Lore stops; child-driven memory and all explicit writes continue. This is the one real behavioral change — flagged for a human to accept before deploy.
Parent construction points changed
gateway/platforms/api_server.py_create_agent/v1/chat/completionsgateway/run.py_run_agentgateway/run.py_run_background_tasktui_gateway/server.py_make_agentHERMES_IGNORE_RULES; now forcedTruefor the parent)tui_gateway/server.py_background_agent_kwargsConstruction points intentionally not changed:
tools/delegate_tool.py_build_child_agent— alreadyskip_memory=True(children); left as-is so child memory scoping is unchanged.gateway/platforms/feishu_comment.py, gateway_handle_compress_command, the gateway hygiene/compaction agent,background_review/curator— alreadyskip_memory=True.Flagged for reviewer
acp_adapter/session.py~624) is another user-facing top-level parent that currently builds a memory manager and pays the same ~4.3s prefetch. It was outside the stated scope (api_server / telegram / TUI) so it is left unchanged here — call out if you want it included.Tests
tests/gateway/test_api_server_toolset.py— assert_create_agentpassesskip_memory=True.tests/gateway/test_orchestrator_skip_memory.py(new) — assert the TUI parent helpers passskip_memory=True; assert a realAIAgent(skip_memory=True)yields_memory_manager is None; assert the delegated-child construction carries its ownskip_memorykwarg (parent value does not leak into the child).Targeted run:
17 passed. Fulltests/gateway/suite:5862 passedwith 9 pre-existing, unrelated failures (telegram-markdown-escaping + wecom-callback test-isolation flakes) that reproduce identically onmainwithout this change.🤖 Generated with Claude Code