feat(delegate): add model_hint parameter for per-task model routing#37966
Open
huu951008-gif wants to merge 3 commits into
Open
feat(delegate): add model_hint parameter for per-task model routing#37966huu951008-gif wants to merge 3 commits into
huu951008-gif wants to merge 3 commits into
Conversation
Borrowed from Sisyphus Labs' OMO tool (lazyclaudecode) — model layering pattern: opus for decisions, haiku for reconnaissance. Lets parents route children to a different model than their own, with short names like "haiku"/"opus"/"sonnet" or full model ids. - New `_resolve_model_hint()` helper + `_is_full_model_id()` detector - `delegate_task()` signature: add `model_hint: Optional[str] = None` - Top-level + per-task resolution (per-task wins) - Per-task model_hint beats top-level model_hint - Backward compatible: defaults to None, 135 existing delegate tests pass Tests: 20 new (TestIsFullModelId / TestResolveModelHint / TestDelegateTaskSignature / TestDelegateTaskModelResolution) + 135 original delegate tests = 155 passed. See multi-agent-harness/SKILL.md section 11.3 (OMO P0 roadmap). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…11.2)
Borrowed from Sisyphus Labs' OMO tool ultragoal + start-work-continuation
mechanism. Solves the problem of lost intermediate results after
delegate_task completes — supports long-task resume and evidence audit.
Key design:
- ~/.hermes/orchestrator/boulder/{task_id}.json — task progress (steps, status, breakpoints)
- ~/.hermes/orchestrator/ultragoal/{task_id}/ — goal + embedded success/anti criteria
- ultragoal/.../evidence/step_NNN.json — per-step evidence with source tag
- ultragoal/.../audit.md — human-readable audit report
The evidence source field (一手/二手/推断/工具输出/用户输入) is the key
innovation that addresses Mavis' 'goal drift' / '二手 source passed as
firsthand' failure mode — different source types get distinct emoji
markers in audit.md so reviewers can spot drift at a glance.
Public API:
- OrchestratorContext (context manager — auto-finish on exit/exception)
- make_task_id / create_boulder / load_boulder / save_boulder
- append_step / complete_step / finish_boulder / list_boulders
- set_goal / load_goal / record_evidence / write_audit
Thread-safe (uses lock around all file ops).
Atomic writes (.tmp + rename) to prevent half-written state.
Tests: 22 unit tests covering CRUD, steps, ultragoal, audit, context
manager (auto-finish on success and exception), list_boulders filters.
See multi-agent-harness/SKILL.md section 11.2 (OMO P1 roadmap).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Borrowed from Sisyphus Labs' OMO ultrawork keyword mechanism: detect
task-type keywords in the goal string and auto-inject the corresponding
system-prompt template + recommended model_hint.
Supported keywords (English + Chinese):
/research / 调研 → research template + sonnet
/implement / 实施 → implement template + opus
/review / 审查 → review template + opus
/critic / 挑刺 → critic template (Momus) + opus
/workflow / 编排 → workflow template + sonnet
/explore / 扫一下 → explore template (read-only) + haiku
Caller-supplied model_hint always wins (overrides keyword recommendation).
Templates are opt-in: if no keyword is detected, delegate_task behaves
identically to before.
Regex strategy:
- English triggers: \b word boundary
- Chinese triggers: no lookbehind/lookahead (汉字 adjacency causes false
positives like "调研一" being misread as "调研"+boundary)
- Length-descending matching (longer prefix wins)
Public API:
detect_task_type(goal) -> Optional[(template_key, model_hint, desc)]
inject_template(goal, task_type=..., context=...) -> str
Tests: 27 unit tests covering keyword detection (English + Chinese + edge
cases), template injection, integration with delegate_task (auto
model_hint set, caller override wins).
See multi-agent-harness/SKILL.md section 11.1 (OMO P1 roadmap).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
agt-user
pushed a commit
to agt-user/hermes-agent
that referenced
this pull request
Jun 9, 2026
Allow individual tasks in the array to specify their own and/or , overriding the global delegation config for that task only. When a per-task override is present, is called with a task-scoped config so that base_url, api_key, and api_mode are derived correctly from the per-task provider — not the global delegation config. Tasks without overrides fall back to the pre-resolved dict as before (no regression for existing usage). Changes: - : add and fields to the per-task object inside array - loop: resolve per-task credentials when override is present; otherwise reuse global (zero overhead) Closes NousResearch#35437 Related: NousResearch#34489, NousResearch#31537, NousResearch#36790, NousResearch#30388, NousResearch#37966
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three OMO-inspired improvements bundled for atomic review:
delegate_taskmodel_hint parameter (OMO 11.3, P0) — let parent route child to different model using short names likehaiku/opusor full IDs.hermes_orchestratorboulder + ultragoal (OMO 11.2, P1) — persistent evidence chain for every delegated task./research,/critic, etc.Why
Borrowed from Sisyphus Labs' OMO tool (
lazyclaudecode/oh-my-openagent) — addresses three real gaps indelegate_task:Changes
tools/delegate_tool.py— addmodel_hintparameter + task-type keyword detection (~+200 lines)hermes_orchestrator.py(new) —OrchestratorContext+ boulder + ultragoal + audit (590 lines)hermes_task_templates.py(new) — keyword regex + 6 task-type templates (320 lines)tests/test_hermes_orchestrator.py(new) — 22 unit teststests/test_hermes_task_templates.py(new) — 27 unit teststests/tools/test_delegate_model_hint.py(new) — 20 unit testsTest Results
Plus full delegate regression: 177 passed (was 155, +27 new templates +22 orchestrator = 204).
Backward Compatibility
✅ All new features are opt-in:
model_hint=None→ identical behavior to beforehermes_orchestrator.pyis a separate module (not auto-imported in delegate_task)hermes_task_templates.pyis wrapped in try/except ImportError — if missing, delegate_task works as beforeUsage Examples
1. model_hint (OMO 11.3)
2. boulder + ultragoal (OMO 11.2)
The
sourcefield (一手/二手/推断/工具输出/用户输入) addresses the Mavis paper's "二手 source passed as firsthand" failure mode — different source types get distinct emoji markers in audit.md so reviewers can spot goal drift at a glance.3. Task-type keyword routing (OMO 11.1)
Caller-supplied
model_hintalways wins:Supported Keywords
Thread Safety
hermes_orchestrator:threading.Lock()around all file ops, atomic writes (.tmp + rename)hermes_task_templates: pure functions, no I/Odelegate_task: model_hint additions are pure value-passing, no shared stateRelated
/tmp/omo_research.html)~/.hermes/skills/multi-agent-harness/SKILL.mdGenerated with Claude Code