Skip to content

feat(delegate): add model_hint parameter for per-task model routing#37966

Open
huu951008-gif wants to merge 3 commits into
NousResearch:mainfrom
huu951008-gif:feat/delegate-model-hint
Open

feat(delegate): add model_hint parameter for per-task model routing#37966
huu951008-gif wants to merge 3 commits into
NousResearch:mainfrom
huu951008-gif:feat/delegate-model-hint

Conversation

@huu951008-gif

@huu951008-gif huu951008-gif commented Jun 3, 2026

Copy link
Copy Markdown

Summary

Three OMO-inspired improvements bundled for atomic review:

  1. delegate_task model_hint parameter (OMO 11.3, P0) — let parent route child to different model using short names like haiku / opus or full IDs.
  2. hermes_orchestrator boulder + ultragoal (OMO 11.2, P1) — persistent evidence chain for every delegated task.
  3. Task-type keyword routing (OMO 11.1, P1) — auto-inject system-prompt templates based on goal keywords like /research, /critic, etc.

Why

Borrowed from Sisyphus Labs' OMO tool (lazyclaudecode / oh-my-openagent) — addresses three real gaps in delegate_task:

  • No model layering — all subagents inherit parent's model; can't cheaply route "explore this dir" to haiku
  • No persistence — long-task intermediate state is lost; no way to resume or audit
  • No orchestration templates — every task needs a hand-written system prompt; no reusable patterns

Changes

  • tools/delegate_tool.py — add model_hint parameter + task-type keyword detection (~+200 lines)
  • hermes_orchestrator.py (new) — OrchestratorContext + boulder + ultragoal + audit (590 lines)
  • hermes_task_templates.py (new) — keyword regex + 6 task-type templates (320 lines)
  • tests/test_hermes_orchestrator.py (new) — 22 unit tests
  • tests/test_hermes_task_templates.py (new) — 27 unit tests
  • tests/tools/test_delegate_model_hint.py (new) — 20 unit tests

Test Results

$ pytest tests/test_hermes_task_templates.py \
        tests/test_hermes_orchestrator.py \
        tests/tools/test_delegate_model_hint.py \
        tests/tools/test_delegate.py
204 passed, 1 warning in 18.68s

Plus full delegate regression: 177 passed (was 155, +27 new templates +22 orchestrator = 204).

Backward Compatibility

✅ All new features are opt-in:

  • model_hint=None → identical behavior to before
  • hermes_orchestrator.py is a separate module (not auto-imported in delegate_task)
  • hermes_task_templates.py is wrapped in try/except ImportError — if missing, delegate_task works as before

Usage Examples

1. model_hint (OMO 11.3)

# Parent on opus, child on haiku (1/20 cost)
delegate_task(
    goal="扫一下 ~/.hermes/skills/ 找 contract",
    model_hint="haiku",
    toolsets=["file", "terminal"],
)

# Batch 混搭
delegate_task(tasks=[
    {"goal": "读 README 总结", "model_hint": "haiku"},
    {"goal": "对比方案给主推荐", "model_hint": "opus"},
])

2. boulder + ultragoal (OMO 11.2)

from hermes_orchestrator import OrchestratorContext

with OrchestratorContext(
    "扫目录找 contract",
    model="claude-haiku-4-5",
    success_criteria=["找到所有 contract 相关 skill 路径"],
    anti_criteria=["不要修改文件"],
) as ctx:
    ctx.append_step(tool="terminal", args_summary="ls ~/.hermes/skills")
    ctx.complete_step(result_summary="找到 2 个文件")
    ctx.record_evidence(
        source="工具输出",
        content="ls 返回 contract-review/, contract-review-intent/",
        citation="terminal: ls ~/.hermes/skills/",
    )
# audit.md 自动生成 at ~/.hermes/orchestrator/ultragoal/{task_id}/audit.md

The source field (一手/二手/推断/工具输出/用户输入) addresses the Mavis paper's "二手 source passed as firsthand" failure mode — different source types get distinct emoji markers in audit.md so reviewers can spot goal drift at a glance.

3. Task-type keyword routing (OMO 11.1)

# 自动检测触发词 → 注入模板 + 推荐 model
delegate_task(goal="/research 当前 AI agent 生态")        # → sonnet + 调研模板
delegate_task(goal="/critic 审查这个方案")                  # → opus + Momus 7维挑刺模板
delegate_task(goal="扫一下 skills/ 找 contract")           # → haiku + 只读侦察模板
delegate_task(goal="/implement 做一个 CLI 工具")           # → opus + 5步实施模板

Caller-supplied model_hint always wins:

delegate_task(goal="/critic 方案", model_hint="haiku")  # haiku 优先

Supported Keywords

英文 中文 模板 默认 model
/research 调研 research (4-step 多源调研) sonnet
/implement 实施 implement (5-step 计划+回归) opus
/review 审查 review (多视角+必给结论) opus
/critic 挑刺 critic (Momus 7维挑刺) opus
/workflow 编排 workflow (多 agent 编排) sonnet
/explore 扫一下 explore (只读快速侦察) haiku

Thread Safety

  • hermes_orchestrator: threading.Lock() around all file ops, atomic writes (.tmp + rename)
  • hermes_task_templates: pure functions, no I/O
  • delegate_task: model_hint additions are pure value-passing, no shared state

Related

  • multi-agent-harness/SKILL.md 第十一节 11.1/11.2/11.3 路线图
  • OMO 工具调研报告 (/tmp/omo_research.html)
  • Skill 路线图 (本地): ~/.hermes/skills/multi-agent-harness/SKILL.md

Generated with Claude Code

Hermes OMO Contributor and others added 2 commits June 3, 2026 14:32
Borrowed from Sisyphus Labs' OMO tool (lazyclaudecode) — model layering
pattern: opus for decisions, haiku for reconnaissance. Lets parents route
children to a different model than their own, with short names like
"haiku"/"opus"/"sonnet" or full model ids.

- New `_resolve_model_hint()` helper + `_is_full_model_id()` detector
- `delegate_task()` signature: add `model_hint: Optional[str] = None`
- Top-level + per-task resolution (per-task wins)
- Per-task model_hint beats top-level model_hint
- Backward compatible: defaults to None, 135 existing delegate tests pass

Tests: 20 new (TestIsFullModelId / TestResolveModelHint /
TestDelegateTaskSignature / TestDelegateTaskModelResolution) +
135 original delegate tests = 155 passed.

See multi-agent-harness/SKILL.md section 11.3 (OMO P0 roadmap).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…11.2)

Borrowed from Sisyphus Labs' OMO tool ultragoal + start-work-continuation
mechanism. Solves the problem of lost intermediate results after
delegate_task completes — supports long-task resume and evidence audit.

Key design:
  - ~/.hermes/orchestrator/boulder/{task_id}.json — task progress (steps, status, breakpoints)
  - ~/.hermes/orchestrator/ultragoal/{task_id}/ — goal + embedded success/anti criteria
  - ultragoal/.../evidence/step_NNN.json — per-step evidence with source tag
  - ultragoal/.../audit.md — human-readable audit report

The evidence source field (一手/二手/推断/工具输出/用户输入) is the key
innovation that addresses Mavis' 'goal drift' / '二手 source passed as
firsthand' failure mode — different source types get distinct emoji
markers in audit.md so reviewers can spot drift at a glance.

Public API:
  - OrchestratorContext (context manager — auto-finish on exit/exception)
  - make_task_id / create_boulder / load_boulder / save_boulder
  - append_step / complete_step / finish_boulder / list_boulders
  - set_goal / load_goal / record_evidence / write_audit

Thread-safe (uses lock around all file ops).
Atomic writes (.tmp + rename) to prevent half-written state.

Tests: 22 unit tests covering CRUD, steps, ultragoal, audit, context
manager (auto-finish on success and exception), list_boulders filters.

See multi-agent-harness/SKILL.md section 11.2 (OMO P1 roadmap).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have tool/delegate Subagent delegation labels Jun 3, 2026
Borrowed from Sisyphus Labs' OMO ultrawork keyword mechanism: detect
task-type keywords in the goal string and auto-inject the corresponding
system-prompt template + recommended model_hint.

Supported keywords (English + Chinese):
  /research / 调研      → research template + sonnet
  /implement / 实施     → implement template + opus
  /review / 审查        → review template + opus
  /critic / 挑刺        → critic template (Momus) + opus
  /workflow / 编排      → workflow template + sonnet
  /explore / 扫一下     → explore template (read-only) + haiku

Caller-supplied model_hint always wins (overrides keyword recommendation).
Templates are opt-in: if no keyword is detected, delegate_task behaves
identically to before.

Regex strategy:
  - English triggers: \b word boundary
  - Chinese triggers: no lookbehind/lookahead (汉字 adjacency causes false
    positives like "调研一" being misread as "调研"+boundary)
  - Length-descending matching (longer prefix wins)

Public API:
  detect_task_type(goal) -> Optional[(template_key, model_hint, desc)]
  inject_template(goal, task_type=..., context=...) -> str

Tests: 27 unit tests covering keyword detection (English + Chinese + edge
cases), template injection, integration with delegate_task (auto
model_hint set, caller override wins).

See multi-agent-harness/SKILL.md section 11.1 (OMO P1 roadmap).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
agt-user pushed a commit to agt-user/hermes-agent that referenced this pull request Jun 9, 2026
Allow individual tasks in the  array to specify their own
 and/or , overriding the global delegation config
for that task only.

When a per-task override is present,
is called with a task-scoped config so that base_url, api_key, and
api_mode are derived correctly from the per-task provider — not the
global delegation config. Tasks without overrides fall back to the
pre-resolved  dict as before (no regression for existing usage).

Changes:
- : add  and  fields to the
  per-task object inside  array
-  loop: resolve per-task credentials when override
  is present; otherwise reuse global  (zero overhead)

Closes NousResearch#35437
Related: NousResearch#34489, NousResearch#31537, NousResearch#36790, NousResearch#30388, NousResearch#37966
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3 Low — cosmetic, nice to have tool/delegate Subagent delegation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants