feat(delegate): subagent architecture v2#3387
Conversation
Adds three tools for querying a self-hosted OpenViking server: - viking_search: semantic search over memories, resources, and skills (auto/fast/deep modes) - viking_read: read content at a viking:// URI with configurable detail levels (abstract / overview / read) - viking_browse: explore the OpenViking filesystem layout (tree/list/stat) All tools are gated behind check_fn that verifies the server is reachable, so they are silently absent when OpenViking is not running — same pattern as Honcho. The 'openviking' named toolset is also added to TOOLSETS and to the shared tools list so gateway and CLI pick them up automatically. Configure via: OPENVIKING_ENDPOINT (default: http://127.0.0.1:1933) OPENVIKING_API_KEY (optional, for secured deployments) Closes NousResearch#3368
/model was fully implemented in hermes_cli/model_switch.py and hermes_cli/main.py but was never registered in COMMAND_REGISTRY, causing it to be absent from /help output and Telegram bot commands. Fixes NousResearch#3371
…ceful no-op without structured memory
…nject skill content in child system prompt
…ic cache, detailed trace, opt-in checkpointing
Mibayy
left a comment
There was a problem hiding this comment.
Code Review — feat(delegate): subagent architecture v2
Reviewed by Hermes Agent. Overall the PR is solid — correct behavior, good test coverage, clean module separation. Found 2 bugs, 4 warnings, and 6 suggestions.
Summary
New modules — delegate_dag.py, delegate_checkpoint.py, delegate_blackboard.py, openviking_tool.py — are all well-scoped and independently testable. The wiring in delegate_tool.py is the complex part; see inline comments.
Verdict: Changes Requested — two issues must be fixed before merge (race condition on DAG results read, and double config load on depth check). The rest are improvements.
Mibayy
left a comment
There was a problem hiding this comment.
Self-review — feat(delegate): subagent architecture v2
Reviewed commit: 1cde0fca. 127 tests passing. Found 2 bugs to fix before merge, plus warnings and suggestions.
🔴 Must fix
1. Race condition on results list in DAG mode (delegate_tool.py ~line 893)
_run_task reads the shared results list inside a ThreadPoolExecutor to build completed_so_far for DAG dependency injection — no lock. In batch mode, concurrent tasks can read a stale snapshot and miss predecessor summaries.
Fix: protect with a threading.Lock or use a separate thread-safe dict:
_results_lock = threading.Lock()
_completed: dict[str, dict] = {}
def _run_task(i, t):
if dag_enabled:
with _results_lock:
completed_so_far = dict(_completed)
resolved_task = resolve_deps(t, completed_so_far)
...
with _results_lock:
_completed[str(i)] = result
return result2. _get_max_depth() called twice per depth check (delegate_tool.py ~line 774-777)
Each call loads the config file from disk. Cache in a local:
max_depth = _get_max_depth()
if depth >= max_depth:
return json.dumps({"error": f"Delegation depth limit reached ({max_depth})..."})⚠️ Warnings
3. CheckpointStore._connect() — new connection per call
check_same_thread=False is correct but creates connection churn under parallel checkpoint saves. A single threading.Lock + reused connection, or WAL mode + connection pool, would be cleaner.
4. queue.pop(0) in topological_sort is O(n)
collections.deque + popleft() is idiomatic O(1). Fine for 3-task batches but worth fixing.
5. Critic agent gets toolsets=['terminal']
A critic that only reads a summary string has no need for terminal access. toolsets=[] is more minimal and avoids unintended write capabilities in the critic subagent.
6. openviking_tool.py — no error handling on r.json()
A 502 from a proxy returns non-JSON and raises an unhandled json.JSONDecodeError. Wrap in try/except:
try:
data = r.json()
except Exception:
return json.dumps({"error": f"Non-JSON response from server: {r.status_code}"})💡 Suggestions
7. _load_skill_content — add debug log on miss
logger.debug(f"Skill '{name}' not found") helps users diagnose typos in skill names.
8. _format_structured_context — key capitalization inconsistent
Known keys get title-case headers; unknown keys get key.capitalize() which leaves underscores: my_custom_key → My_custom_key. Either strip underscores or normalize all headers the same way.
9. Critic summary truncation — magic number
summary[:4000] should be a named constant (_CRITIC_MAX_SUMMARY_CHARS = 4000).
10. openviking_tool.py — consider splitting to its own PR
It's a separate feature (OpenViking memory integration) bundled here. The current PR is already 3270 lines. Easier to review and roll back independently.
11. docs/plans/ directory — remove before merge
The 4 subagent-v2-phase*.md files are internal planning artifacts. Add to .gitignore or drop from the branch.
✅ Looks good
- Module separation clean: dag / checkpoint / blackboard are independently testable with zero coupling to each other.
- All new features opt-in — backward compatibility preserved for all existing
delegate_taskcallers. _SM_AVAILABLEgraceful no-op pattern applied consistently across cache, hot facts, and memory mode._compute_child_toolsets/_strip_blocked_toolscompat wrapper is correct.- Hot facts injection reads parent's already-loaded
_memory_store— no extra disk I/O. - Test coverage thorough: every new helper has unit tests before integration-level wiring.
- delegate_dag: deque.popleft() replaces queue.pop(0) -- O(1) vs O(n)
- delegate_checkpoint: single persistent conn + threading.Lock, WAL mode
- delegate_tool: _CRITIC_MAX_SUMMARY_CHARS constant, critic toolsets=[],
debug log on skill miss, key.replace('_',' ').title() for unknown keys,
blackboard comment
- openviking_tool: try/except on r.json() in search and read paths
- Remove docs/plans/subagent-v2-*.md planning artifacts, gitignore them
Follow-up — all review items resolved (commit
|
…for subagents
- Add DELEGATION_GUIDANCE to prompt_builder.py: injected into system prompt
when delegate_task is available. Instructs Hermes to act as a manager agent
— decompose implementation tasks (3+ steps) and delegate rather than executing
directly. Criteria: feature impl, multi-file changes, DAG-able deps, 5+ tool
calls. Exceptions: quick lookups, iterative debugging, direct replies.
- Add MCP workspace guard (mcp_tool.py):
- Thread-local _tl.subagent_guard flag, off by default
- _SUBAGENT_BLOCKED_TOOL_NAMES: {set_project_root, switch_project, reindex}
- _make_tool_handler checks flag before dispatching — blocked tools return
a clear error without touching the MCP session
- set_subagent_mcp_guard(active) public API for delegate_tool.py
- Wire guard into delegate_tool.py:
- Import set_subagent_mcp_guard
- set_subagent_mcp_guard(True) before child.run_conversation()
- set_subagent_mcp_guard(False) in finally block (always restored)
- Thread-local: each subagent thread isolated, parent unaffected
- Tests (test_mcp_subagent_guard.py, 10 tests):
- Blocked names coverage
- Read-only tools not blocked
- Block active when guard on, pass-through when guard off
- Thread-local isolation (guard in main does not leak to child thread)
|
Thanks for the thorough work here, Mibayy — there's clearly a lot of thought behind this. We're closing this PR for now, but want to explain the reasoning in detail so you can address these concerns if you'd like to reopen or submit a new PR. Overall Design ConcernThe core question we kept coming back to: what real problem does this solve? The current DAG dependencies defeat the purpose of delegation. If task B depends on task A's output, they run sequentially. The parent agent could just do them in order and keep full context between steps. Delegation's value is parallelism for independent work. Sequential delegation with dependency injection is strictly worse than the parent doing it — you lose context continuity and pay subagent overhead. Blackboard has a timing problem. Siblings run in parallel. By the time one writes a key, the others may already be done or past the point where that information would help. Non-deterministic timing makes this unreliable for anything important. Generator-critic doubles cost for something the parent already does. The parent agent reads every subagent's summary and decides what to do next — that IS the verification step. Spawning a separate critic subagent to check the work first is redundant and doubles API cost and wall-clock time. Retry is redundant with parent-level re-delegation. The parent sees failed results and can re-delegate with adjusted context. Building retry into the delegation layer adds complexity in a place where it's harder to observe and debug. Checkpointing is infrastructure without a consumer. The PR saves subagent state to SQLite but nothing ever resumes from a checkpoint. It's a storage layer for a feature that doesn't exist yet. Manager-first delegation guidance (DELEGATION_GUIDANCE) is the most concerning change. It injects a prompt telling Hermes to act as a "manager agent" and prefer delegating over direct execution — on by default, no config to disable. This would fundamentally change agent behavior for all users and frustrate people who want the agent to just do things. What IS Valuable
Bundled Unrelated ChangesThe PR also includes several unrelated pieces that should be separate PRs:
Test IssuesNearly all test files hardcode Path ForwardYou're welcome to address these concerns and open new PRs. Our suggestion:
Thanks again for the effort here. The code quality on the individual pieces is solid — it's the overall design direction we want to think through more carefully. |
… feedback Teknium feedback: generator-critic doubles cost for what the parent already does, retry is redundant with parent-level re-delegation. Changes: - Disabled _run_with_verify() calls (3 locations) - Disabled _run_with_retry() via 'if False' guards (3 locations) - Removed verify and depends_on from tool description - Functions remain in codebase for future opt-in use This keeps the feat/subagent-v2 branch for personal use while aligning with upstream direction.
feat(delegate): subagent architecture v2
Extends
delegate_taskwith a set of opt-in features for multi-agent coordination, quality control, and resilience. All existing calls work unchanged.What this PR adds
Coordination
skillsfield on each task: inject skill content directly into child system promptblackboard.enabled: shared thread-safe key-value store visible to all siblings in a batchdag.enabled+depends_onfield: topological sort with predecessor summary injectionQuality
verify: trueon a task (orverify.enabledglobally): run a critic subagent after the generator, attachesverdict(valid/invalid/unknown) andcritic_summaryto the resultretry.max_retries: retry failed tasks with the previous error injected as contextObservability and resilience
observability.detailed_trace: per-tool timing, result size, and error status intool_tracecheckpoint.enabled: save subagent state to SQLite every N iterations (uses~/.hermes/state.db)semantic_cache: hook for structured memory dedup (graceful no-op without feat(memory): native structured memory — typed SQLite fact store with FTS5 #3093)Config
DEFAULT_CONFIG['delegation']with safe defaultsmax_depthis now configurable (default 2, unchanged from prior behavior)memory_access(none/read/read-write) controls whether subagents can access structured memory (graceful no-op without feat(memory): native structured memory — typed SQLite fact store with FTS5 #3093)Manager-first strategy (new)
DELEGATION_GUIDANCEadded toprompt_builder.py: injected into the system prompt wheneverdelegate_taskis availableMCP workspace guard (new)
set_project_root/switch_project/reindexcall inside a subagent would corrupt the shared codebase-index state for all siblings and the parent_tl.subagent_guard) activated in_run_single_childbeforechild.run_conversation(), always restored infinally_make_tool_handlerchecks the flag and returns a clear error for the three blocked tool names — parent and sibling threads are unaffectedfind_symbol,search_codebase,get_structure_summary, etc.) work normally inside subagentsRelationship to other PRs
memory_accessandsemantic_cacheare graceful no-ops when feat(memory): native structured memory — typed SQLite fact store with FTS5 #3093 is not merged. When it is, subagents can read or write typed facts directly.skills=field in each task is exactly what makes that skill composable with delegation. Passskills=["codebase-index"]to a subagent and it will usesearch_codebase/find_symbolinstead of grep/cat -- same 99% token savings, now available inside delegated tasks.Dependency matrix
New files
Tests
70 new tests, all passing.
Example: subagent with codebase-index skill (pairs with #3294)
{ "tasks": [ { "goal": "Find all callers of send_message and check for missing error handling", "skills": ["codebase-index"], "toolsets": ["terminal", "file"] } ] }The subagent will call
get_dependents("send_message")andget_function_sourceinstead of grep/cat -- the token savings from #3294 apply inside delegated tasks too.Example: full DAG batch
{ "tasks": [ { "id": "design", "goal": "Design the notification system schema", "skills": ["codebase-index", "nextjs-supabase-notifications"], "verify": true }, { "id": "impl", "goal": "Implement the notification system", "depends_on": ["design"], "skills": ["codebase-index", "test-driven-development"], "verify": true } ] }Config: