Skip to content

feat: Orchestrator Subagents#3

Draft
pefontana wants to merge 35 commits into
mainfrom
orchestrator-role
Draft

feat: Orchestrator Subagents#3
pefontana wants to merge 35 commits into
mainfrom
orchestrator-role

Conversation

@pefontana

@pefontana pefontana commented Apr 15, 2026

Copy link
Copy Markdown
Owner

Summary

  • Subagents can now spawn their own subagents. Opt in with role="orchestrator" on delegate_task — the child retains the delegation toolset and can parallelize its own workers. Bounded by delegation.max_spawn_depth (1–3, default 2); flip delegation.orchestrator_enabled: false to disable globally.
  • Higher default parallelism. Batch mode now runs up to 5 concurrent subagents (was 3), hard cap 8. Tune via delegation.max_concurrent_children.
  • Dead config removed. delegation.default_toolsets was documented but never read; removed from the example config and docs. No behavior change — existing configs still parse.

Three delegation-related changes, each self-contained and reviewable in isolation.
image

Delegation plumbing cleanup

  • Both delegate_task call sites in run_agent.py now route through a single _dispatch_delegate_task helper. Fixes a silent drop of acp_command / acp_args on the main agent loop — those fields were in the schema but never forwarded.
  • DelegateEvent enum with back-compat aliases for the existing progress event strings consumed by the gateway SSE, ACP adapter, and CLI spinner.
  • Default max_concurrent_children: 3 → 5, with an absolute cap of 8 (aligned with OpenClaw's DEFAULT_SUBAGENT_MAX_CONCURRENT). Values above the cap clamp with a warning log.

Remove dead default_toolsets config

  • delegation.default_toolsets was declared in cli.py's CLI_CONFIG, documented in cli-config.yaml.example, and documented in the delegation feature docs, but never consulted at runtime. _load_config() ignored it entirely; the live fallback is the hardcoded DEFAULT_TOOLSETS module constant in tools/delegate_tool.py.
  • Removed from all three surfaces.
  • Regression test in tests/hermes_cli/test_config_drift.py guards against re-introduction.

Orchestrator role

  • New role: "leaf" | "orchestrator" parameter on delegate_task (top-level and per-task in batch mode). Leaf children are unchanged; orchestrator children retain the delegation toolset and receive a role-aware system prompt telling them they can spawn their own workers.
  • delegation.max_spawn_depth (1-3, default 2) bounds the delegation tree — orchestrator requests are silently coerced to leaf when the child would exceed the depth cap.
  • delegation.orchestrator_enabled (default true) is a global kill switch that forces every child to leaf regardless of the per-call role.
  • End-to-end test covers parent → orchestrator (depth 1) → two leaves (depth 2) nesting with full role/toolset/depth invariants.

Follow-ups from review

  • TASK_PROGRESS events relayed upward by nested orchestrators were falling through to the TASK_TOOL_STARTED renderer, which treated the batched summary string as if it were a tool name. Added an explicit TASK_PROGRESS branch with pass-through relay and a distinct render. Reachable only once nesting is enabled.
  • _build_child_progress_callback now accepts DelegateEvent enum values and new-style "delegate.*" strings in addition to the legacy strings.
  • website/docs/guides/delegation-patterns.md updated to match features/delegation.md on nested-delegation opt-in.

Type of Change

  • ✨ New feature (non-breaking change that adds functionality)
  • ♻️ Refactor (no behavior change)
  • 📝 Documentation update
  • ✅ Tests

How to Test

  1. pytest tests/tools/test_delegate.py tests/hermes_cli/test_config_drift.py -v — 102 passing.
  2. Schema: python -c "from tools.delegate_tool import DELEGATE_TASK_SCHEMA as S; assert S['parameters']['properties']['role']['enum'] == ['leaf', 'orchestrator']; print('schema OK')"
  3. Defaults: python -c "from hermes_cli.config import DEFAULT_CONFIG as C; assert C['delegation']['max_spawn_depth'] == 2 and C['delegation']['orchestrator_enabled'] is True; print('defaults OK')"
  4. Back-compat: python -c "from tools.delegate_tool import MAX_DEPTH; assert MAX_DEPTH == 2; print('back-compat OK')"
  5. Docs: grep default_toolsets in tools/, cli*.yaml*, and website/docs/**/*.md — only audit-only references remain (class variables on Atropos environments, local var in hermes_cli/dump.py, the regression test itself).

Backward Compatibility

  • role defaults to "leaf" — no existing caller changes behavior.
  • MAX_DEPTH = 2 constant remains as the hardcoded fallback and is still exported for tests.
  • Progress event consumers get both old string names AND new enum values during the deprecation window.
  • default_toolsets was never functional, so removing it changes no observable behavior.

Checklist

  • Tests added (102 passing)
  • Docs updated (website/docs/user-guide/features/delegation.md, website/docs/guides/delegation-patterns.md)
  • cli-config.yaml.example updated for new config keys
  • Conventional Commits (feat(delegate): / refactor(delegate): / fix(delegate): / docs(delegate): / test(delegate): / chore(delegate):)
  • No unrelated changes

… params

Both delegate_task call sites in run_agent.py hardcoded a subset of the
schema params (goal, context, toolsets, tasks, max_iterations) and
silently dropped acp_command and acp_args. Any future schema additions
would hit the same drift.

Replace both call sites with _dispatch_delegate_task() which forwards
the entire validated schema from function_args. Also threads the
conversation messages reference through for future context inheritance
plumbing (M1).
Replace informal progress event strings (_thinking, tool.started, etc.)
with a DelegateEvent enum. The callback normalises incoming legacy
strings through _LEGACY_EVENT_MAP. External consumers (gateway SSE,
ACP adapter, CLI) continue to receive legacy string names during the
deprecation window — no consumer changes required.
Raise _DEFAULT_MAX_CONCURRENT_CHILDREN from 3 to 5. Add an absolute cap
of 8 (aligned with OpenClaw's DEFAULT_SUBAGENT_MAX_CONCURRENT) — values
above 8 from config or env are clamped with a warning log.

Update schema description, cli-config.yaml.example, and delegation docs
to reflect the new defaults.
- TestDispatchDelegateTask: verifies acp_command/acp_args forwarding
  and that _dispatch_delegate_task threads messages through
- TestDelegateEventEnum: enum values, legacy map coverage, normalisation
  in the progress callback, unknown event rejection
- TestConcurrencyDefaults: default=5, cap at 8, warning log on clamp,
  env var cap, within-range passthrough
- Fix pre-existing test bug: test_task_index_prefix_in_batch_mode was
  passing tool_name as event_type (wrong signature)
- Update test_constants assertion from 3 to 5
Extract duplicated cap-check logic from _get_max_concurrent_children
into _clamp_concurrency helper. Fix stale "default 3" comment in the
schema. Simplify test_acp_args_forwarded assertion.
… enum members

Add test_progress_callback_normalises_thinking (both _thinking and
reasoning.available), test_progress_callback_tool_completed_is_noop.
Document that TASK_SPAWNED/COMPLETED/FAILED are reserved for M3.
Rename test_progress_callback_normalises_legacy_events for clarity.
Grep found 4 more places still saying "up to 3" — schema description,
tips.py, delegation-patterns.md, overview.md. Updated to match new
default of 5 (max 8).
…tring

The `messages` kwarg was threaded through `_dispatch_delegate_task` into
`delegate_task` but never referenced inside the function body. Readers
(including reviewers) kept assuming parent conversation history was being
forwarded to child agents, which it is not. Remove the dead parameter and
the test that asserted forwarding so the code matches the behavior.

Also rephrase `_dispatch_delegate_task`'s docstring: the consolidation
gives us a single call site, not automatic param forwarding — new schema
fields still need to be added in one place.
delegation.default_toolsets was declared in cli.py's CLI_CONFIG default
dict and documented in cli-config.yaml.example, but never read: none of
tools/delegate_tool.py, _load_config(), or any call site ever looked it
up. The live fallback is the DEFAULT_TOOLSETS module constant at
tools/delegate_tool.py:101, which stays as-is.

hermes_cli/config.py's DEFAULT_CONFIG["delegation"] already omits the
key — this commit aligns cli.py with that.

Adds a regression test in tests/hermes_cli/test_config_drift.py so a
future refactor that re-adds the key without wiring it up to
_load_config() fails loudly.

Part of Initiative 2 / M0.5.
Matches the default-config removal in the preceding commit.
default_toolsets was documented for users to set but was never actually
read at runtime, so showing it in the example config and the delegation
user guide was misleading.

No deprecation note is added: the key was always a no-op, so users who
copied it from the example continue to see no behavior change. Their
config.yaml still parses; the key is just silently unused, same as
before.

Part of Initiative 2 / M0.5.
…config

The prior form of this test asserted on CLI_CONFIG["delegation"] after
importing cli, which only passed by accident of pytest-xdist worker
scheduling. cli._hermes_home is frozen at module import time (cli.py:76),
before the tests/conftest.py autouse HERMES_HOME-isolation fixture can
fire, so CLI_CONFIG ends up populated by deep-merging the contributor's
actual ~/.hermes/config.yaml over the defaults (cli.py:359-366). Any
contributor (like me) who still has the legacy key set in their own
config causes a false failure the moment another test file in the same
xdist worker imports cli at module level.

Asserting on the source of load_cli_config() instead sidesteps all of
that: the test now checks the defaults literal directly and is
independent of user config, HERMES_HOME, import order, and worker
scheduling.

Demonstrated failure mode before this fix:
  pytest tests/hermes_cli/test_config_drift.py \
         tests/hermes_cli/test_skills_hub.py -o addopts=""
  -> FAILED (CLI_CONFIG["delegation"] contained "default_toolsets"
     from the user's ~/.hermes/config.yaml)

Part of Initiative 2 / M0.5.
Introduces the configurable depth cap and global kill switch for the
M3 orchestrator-role feature. No behavior change on defaults:
max_spawn_depth=2 matches the legacy MAX_DEPTH=2 hard-coded value;
orchestrator_enabled=True is a no-op until M3 commit 3 wires up role.

Changes:
- tools/delegate_tool.py: _MIN_SPAWN_DEPTH, _MAX_SPAWN_DEPTH_CAP,
  _get_max_spawn_depth() (clamps to [1, 3] with warning log, mirrors
  existing _clamp_concurrency pattern), _get_orchestrator_enabled()
  with bool/string YAML coercion.  Depth guard at delegate_task
  now reads _get_max_spawn_depth() instead of MAX_DEPTH directly.
  MAX_DEPTH stays as the hardcoded default fallback and test import.
- hermes_cli/config.py: DEFAULT_CONFIG["delegation"] seeds the two
  new keys.  Not seeded in cli.py:CLI_CONFIG — follows the
  delegation.reasoning_effort precedent; cli.py's deep-merge picks
  up user overrides regardless.
- tests/tools/test_delegate.py: TestMaxSpawnDepth (4 cases —
  default, clamp-low, clamp-high, invalid-falls-back).
Wires the 'role' param through schema -> delegate_task() -> dispatch ->
_build_child_agent -> stashed on child. No behavior change yet: Commit 3
adds the toolset re-add + role-aware prompt. Commit 2 verifies the
plumbing reaches the child and the schema description signals the
feature to the parent LLM.

Changes:
- tools/delegate_tool.py:
  - Module-level _normalize_role(r) (near _clamp_concurrency), returns
    'leaf' or 'orchestrator'; unknown strings warn and coerce to 'leaf'.
  - DELEGATE_TASK_SCHEMA: new 'role' property at top level AND per-task
    under tasks[].items.  Top-level description text split into leaf vs
    orchestrator capability statements so the parent LLM discovers that
    role='orchestrator' unlocks nested delegation.
  - delegate_task(): accepts role=Optional[str]; normalises top_role;
    single-task dict at :738 now includes 'role' for batch/single
    uniformity; child-build loop resolves effective_role = normalise(
    t.get('role') or top_role) and forwards to _build_child_agent.
  - _build_child_agent(): accepts role='leaf' kwarg; stashes
    child._delegate_role for introspection (commit 3 will overwrite
    with effective_role post-degrade).
  - Registry handler lambda: forwards role=args.get('role') for the
    Atropos dispatch path (dead for run_agent.py which short-circuits
    to _dispatch_delegate_task).
- run_agent.py:_dispatch_delegate_task: forwards role through to
  tools.delegate_tool.delegate_task.
- tests/tools/test_delegate.py:TestOrchestratorRoleSchema (4 cases —
  default→leaf, explicit orchestrator stashed, nonsense→leaf+warning,
  schema shape assertions for top-level and per-task 'role' properties).
The behavior change. Orchestrator children (role='orchestrator',
allowed by delegation.orchestrator_enabled and child_depth <
max_spawn_depth) retain the 'delegation' toolset and receive a
role-aware system prompt derived from OpenClaw's buildSubagentSystemPrompt
canSpawn branch. Leaf children are unchanged from pre-M3 behavior.

Changes:
- tools/delegate_tool.py:
  - _build_child_agent: role resolution block at the top — computes
    child_depth, max_spawn, orchestrator_ok (kill switch AND depth),
    effective_role (single degrade point).  Toolset re-add appends
    "delegation" when effective_role == 'orchestrator' (runs after the
    existing _strip_blocked_tools branches — unconditional on parent
    toolset membership since orchestrator capability is granted by role
    not inheritance; documented in test_intersection_preserves_delegation_bound).
    child._delegate_role now stashes effective_role (post-degrade).
  - _build_child_system_prompt: new role/max_spawn_depth/child_depth
    kwargs; leaf prompt unchanged; orchestrator appends a spawning
    block with WHEN/WHEN NOT to delegate guidance + literal depth
    note that branches between "children MUST be leaves" (at the
    floor) and "children can themselves be orchestrators" (below it).
    Per-call role model means "can be", not "will be" — orchestrators
    explicitly pass role='orchestrator' for nested delegation.
  - _EXCLUDED_TOOLSET_NAMES: comment explaining the "delegation"
    entry is an advertising exclusion, not a runtime block; the
    role-driven re-add in _build_child_agent overrides it.
- tests/tools/test_delegate.py: TestOrchestratorRoleBehavior (9 cases)
  - Role resolution: _keeps_delegation_at_depth_1,
    _blocked_at_max_spawn_depth, _enabled_false_forces_leaf
  - Prompt content: _leaf_does_not_mention_delegation,
    _orchestrator_mentions_delegation_capability,
    _at_depth_floor_says_children_are_leaves,
    _below_floor_allows_more_nesting
  - Batch + intersection: _batch_mode_per_task_role_override,
    _intersection_preserves_delegation_bound (documents design choice)
Satisfies parent plan §7 item 3 acceptance: parent delegates to an
orchestrator child, which delegates to two leaf grandchildren; results
bubble up correctly.

Mocking strategy (plan §3.6 G3 sketch): single run_agent.AIAgent patch
with a side_effect factory that keys on the child's
ephemeral_system_prompt — orchestrator prompts contain the string
"Orchestrator Role" (see _build_child_system_prompt), leaves don't.
The orchestrator mock's run_conversation recursively calls
delegate_task with tasks=[{goal:...},{goal:...}] to spawn two leaves.
This keeps the whole test in one patch context and avoids depth-indexed
nesting patterns that are fragile.

Also updates test_constants to cover the two new config getters
(_get_max_spawn_depth, _get_orchestrator_enabled) and the two new
bound constants (_MIN_SPAWN_DEPTH=1, _MAX_SPAWN_DEPTH_CAP=3), as
called for by plan §4 Commit 4.

Assertions: MockAgent called exactly 3 times (1 orchestrator + 2
leaves); orchestrator got 'delegation' in its toolset and an
orchestrator prompt; both grandchildren did NOT get 'delegation' and
received leaf prompts.
- cli-config.yaml.example: two commented-out lines in the delegation
  block advertising max_spawn_depth and orchestrator_enabled.
- website/docs/user-guide/features/delegation.md:
  - Replace "Depth Limit" section with "Depth Limit and Nested
    Orchestration": role='leaf' vs 'orchestrator' usage example,
    max_spawn_depth bounds, orchestrator_enabled kill switch, and the
    125-leaf cost warning for max_spawn_depth=3.
  - "Key Properties" bullets updated to reflect opt-in nested
    delegation and to split leaf/orchestrator capability statements.
  - Configuration YAML example: two commented-out lines for the new
    keys, matching the cli-config.yaml.example style.

Pure text; independently revertable.
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

2577:+            resp = httpx.post(

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…child callback

Addresses two bugs surfaced by codex in the M3 PR review
(.multi-agent/review-20260415-173416/reviews/codex.md):

1. HIGH — TASK_PROGRESS fall-through to TASK_TOOL_STARTED rendering.
   _LEGACY_EVENT_MAP maps "subagent_progress" to DelegateEvent.TASK_PROGRESS,
   but _build_child_progress_callback had no TASK_PROGRESS branch.  Any
   TASK_PROGRESS event fell through to the TASK_TOOL_STARTED display/batch
   block, which treated the pre-batched summary string (in the tool_name
   positional slot) as if it were a tool name — rendering
   '├─ ⚡ 🔀 [1] terminal, file' and re-batching accumulated emoji prefixes
   on each upward hop.  This path is newly reachable in M3: nested
   orchestrators relay subagent_progress from grandchildren upward via
   this callback.  Before M3 the toolset strip blocked the nesting that
   produces this traffic.

   Fix: explicit TASK_PROGRESS branch.  Renders with a distinct 🔀 prefix
   (no get_tool_emoji lookup) and relays upward as-is without re-batching
   (the payload is already a batched summary).

2. MEDIUM — DelegateEvent enum values silently dropped.  The callback
   only did _LEGACY_EVENT_MAP.get(event_type), so cb(DelegateEvent.TASK_THINKING,
   ...) or cb('delegate.task_thinking', ...) produced no output.  The enum
   was added in M0 as the "new normalized event type" but no call path
   accepted it.

   Fix: normalize enum instances directly, then fall back to the legacy
   map, then to DelegateEvent(str) construction for new-style "delegate.*"
   strings.

Tests added (tests/tools/test_delegate.py, in TestDelegateEventEnum):
- test_progress_callback_accepts_enum_value_directly
- test_progress_callback_accepts_new_style_string
- test_progress_callback_task_progress_not_misrendered
Addresses the hermes reviewer finding in the M3 PR review
(.multi-agent/review-20260415-173416/reviews/hermes.md §warnings #3):
the Constraints section's "No nesting" bullet was stale against M3.

Replaces with a "Nested delegation is opt-in" bullet that mirrors the
wording already landed in website/docs/user-guide/features/delegation.md
in the M3 docs commit (55aecde).  Covers role='leaf' vs 'orchestrator',
the max_spawn_depth bound, and the orchestrator_enabled kill switch —
matching the feature doc's leaf-vs-orchestrator capability distinction.
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

2577:+            resp = httpx.post(

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

Removes internal milestone references (M0, M0.5, M3) from code,
tests, and docs in the delegation PR surface. Milestone tags were
useful for tracking the rollout but carry no meaning for upstream
readers or future maintainers — the feature and its rationale
should stand on its own.

Mechanical substitutions only — no behavior change, no docstring
rewrites, no test renames.  102 tests still pass.
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

2577:+            resp = httpx.post(

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@pefontana pefontana changed the title Orchestrator role Orchestrator Subagents Apr 16, 2026
@pefontana pefontana changed the title Orchestrator Subagents feat: Orchestrator Subagents Apr 16, 2026
Two small follow-ups after the orchestrator-role PR review:

1. hermes_cli/config.py:DEFAULT_CONFIG["delegation"] now seeds
   max_concurrent_children=5 alongside max_spawn_depth and
   orchestrator_enabled.  cli-config.yaml.example, docs, and the
   schema all advertise this key; the canonical default dict was
   the only surface that still omitted it.

2. website/docs/user-guide/features/delegation.md's "always blocked
   for subagents" bullet list was stale against the new orchestrator
   role: delegation is retained for role="orchestrator" children
   (see _build_child_agent re-add).  Softened the heading from
   "always blocked" to "blocked", and rewrote the delegation bullet
   to point at the Depth Limit and Nested Orchestration section.

Tests still pass (123).
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

2577:+            resp = httpx.post(

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

pefontana pushed a commit that referenced this pull request Apr 20, 2026
…lls/

- Rename skill to touchdesigner-mcp (matches blender-mcp convention)
- Move from skills/creative/ to optional-skills/creative/
- Fix duplicate pitfall numbering (#3 appeared twice)
- Update SKILL.md cross-references for renumbered pitfalls
- Update setup.sh path for new directory location
…ousResearch#13148)

* feat(security): URL query param + userinfo + form body redaction

Port from nearai/ironclaw#2529.

Hermes already has broad value-shape coverage in agent/redact.py
(30+ vendor prefixes, JWTs, DB connstrs, etc.) but missed three
key-name-based patterns that catch opaque tokens without recognizable
prefixes:

1. URL query params - OAuth callback codes (?code=...),
   access_token, refresh_token, signature, etc. These are opaque and
   won't match any prefix regex. Now redacted by parameter NAME.

2. URL userinfo (https://user:pass@host) - for non-DB schemes. DB
   schemes were already handled by _DB_CONNSTR_RE.

3. Form-urlencoded body (k=v pairs joined by ampersands) -
   conservative, only triggers on clean pure-form inputs with no
   other text.

Sensitive key allowlist matches ironclaw's (exact case-insensitive,
NOT substring - so token_count and session_id pass through).

Tests: +20 new test cases across 3 test classes. All 75 redact tests
pass; gateway/test_pii_redaction and tools/test_browser_secret_exfil
also green.

Known pre-existing limitation: _ENV_ASSIGN_RE greedy match swallows
whole all-caps ENV-style names + trailing text when followed by
another assignment. Left untouched here (out of scope); URL query
redaction handles the lowercase case.

* feat: replace kimi-k2.5 with kimi-k2.6 on OpenRouter and Nous Portal

Update model catalogs for OpenRouter (fallback snapshot), Nous Portal,
and NVIDIA NIM to reference moonshotai/kimi-k2.6.  Add kimi-k2.6 to
the fixed-temperature frozenset in auxiliary_client.py so the 0.6
contract is enforced on aggregator routings.

Native Moonshot provider lists (kimi-coding, kimi-coding-cn, moonshot,
opencode-zen, opencode-go) are unchanged — those use Moonshot's own
model IDs which are unaffected.
…NousResearch#13021)

Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue NousResearch#9086).

Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
  (gateway/session_context.py) so parallel jobs can't clobber each
  other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
  protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
  ContextVar-aware resolution with os.environ fallback.

Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override

Based on PR NousResearch#9169 by @VenomMoth1.

Fixes NousResearch#9086
kshitijk4poor and others added 12 commits April 20, 2026 11:53
Extract 12 Codex Responses API format-conversion and normalization functions
from run_agent.py into agent/codex_responses_adapter.py, following the
existing pattern of anthropic_adapter.py and bedrock_adapter.py.

run_agent.py: 12,550 → 11,865 lines (-685 lines)

Functions moved:
- _chat_content_to_responses_parts (multimodal content conversion)
- _summarize_user_message_for_log (multimodal message logging)
- _deterministic_call_id (cache-safe fallback IDs)
- _split_responses_tool_id (composite ID splitting)
- _derive_responses_function_call_id (fc_ prefix conversion)
- _responses_tools (schema format conversion)
- _chat_messages_to_responses_input (message format conversion)
- _preflight_codex_input_items (input validation)
- _preflight_codex_api_kwargs (API kwargs validation)
- _extract_responses_message_text (response text extraction)
- _extract_responses_reasoning_text (reasoning extraction)
- _normalize_codex_response (full response normalization)

All functions are stateless module-level functions. AIAgent methods remain
as thin one-line wrappers. Both module-level helpers are re-exported from
run_agent.py for backward compatibility with existing test imports.

Includes multimodal inline image support (PR NousResearch#12969) that the original PR
was missing.

Based on PR NousResearch#12975 by @kshitijk4poor.
…in/QQ adapters

Add dm_policy and group_policy to the WhatsApp adapter, bringing parity
with WeCom/Weixin/QQ. Allows independent control of DM and group access:
disable DMs entirely, allowlist specific senders/groups, or keep open.

- dm_policy: open (default) | allowlist | disabled
- group_policy: open (default) | allowlist | disabled
- Config bridging for YAML → env vars
- 22 tests covering all policy combinations

Backward compatible — defaults preserve existing behavior.

Cherry-picked from PR NousResearch#11597 by @MassiveMassimo.
Dropped the run.py group auth bypass (would have skipped user auth
for ALL platforms, not just WhatsApp).
…iders (NousResearch#13152)

Add kimi-k2.6 as the top model in kimi-coding, kimi-coding-cn, and
moonshot static provider lists (models.py, setup.py, main.py).
kimi-k2.5 retained alongside it.
…providers

Section 3 (user-defined endpoints) added the plain ep_name to seen_slugs
but not the custom:-prefixed slug. Section 4 generates custom:<name> via
custom_provider_slug() and checks seen_slugs — since the prefixed slug
was missing, the same provider appeared twice in /model.

Register custom_provider_slug(display_name).lower() in seen_slugs after
Section 3 emits a provider, so Section 4's dedup correctly suppresses
the duplicate.

Closes NousResearch#12293.
Co-authored-by: bennytimz <bennytimz@users.noreply.github.com>
…search#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR NousResearch#13137 (@kshitijk4poor).
…abled (NousResearch#13162)

When createForumTopic fails with 'not a forum' in a private chat,
the error now tells the user exactly what to do: enable Topics in
the DM chat settings from the Telegram app.

Also adds a Prerequisites callout to the docs explaining this
client-side requirement before the config section.
…tree isolation

Adds a _resolve_path() helper that reads TERMINAL_CWD and uses it as
the base for relative path resolution. Applied to _check_sensitive_path,
read_file_tool, _update_read_timestamp, and _check_file_staleness.

Absolute paths and non-worktree sessions (no TERMINAL_CWD) are
unaffected — falls back to os.getcwd().

Fixes NousResearch#12689.
# Conflicts:
#	tests/agent/test_subagent_progress.py
#	tools/delegate_tool.py
@github-actions

Copy link
Copy Markdown

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

pefontana added a commit that referenced this pull request Apr 24, 2026
Three independent reviews surfaced a handful of real bugs. Fixing all
of them here:

* **SIGTERM orphans hook subprocesses (codex #1).** The CLI only
  installed a SIGINT handler — SIGTERM (from ``kill``, ``timeout``,
  systemd stop, CI harnesses) skips atexit entirely and leaves every
  in-flight hook subprocess running as an orphan owned by init.  Adds
  ``_async_pool_sigterm_handler`` which terminates tracked subprocess
  groups inline, then routes to ``sys.exit(128 + SIGTERM)``.  Inline
  termination is required because ``ThreadPoolExecutor`` uses
  non-daemon threads: Python waits for every worker to return before
  running atexit, and workers block inside
  ``proc.communicate(timeout=spec.timeout)`` until the subprocess
  dies.  Renamed ``_maybe_install_sigint_handler`` →
  ``_maybe_install_signal_handlers`` (with back-compat alias).
  Verified: ``kill -TERM`` on a hermes CLI running a 4 s ``sleep``
  hook now exits in ~0.7 s with no orphan, was 4 s + orphan.

* **Subprocess groups for reliable termination.** Hooks are now
  spawned with ``start_new_session=True`` so the subprocess is its
  own PGID leader.  Shutdown / SIGINT / SIGTERM paths call
  ``os.killpg`` on the group instead of ``proc.terminate()`` — without
  this, a bash script's orphaned ``sleep`` child kept the parent
  stdout FD open and blocked ``proc.communicate`` for the full sleep
  duration.  ``_terminate_group`` / ``_kill_group`` helpers fall back
  to plain ``terminate`` / ``kill`` on edge cases where ``getpgid``
  fails (already-exited proc, non-POSIX).

* **``hermes hooks test --no-wait`` blocks for full hook runtime
  (codex #2).** The flag advertised fire-and-forget but the CLI's
  ``ThreadPoolExecutor`` atexit ``pool.shutdown(wait=True)`` joined
  the worker anyway, which in turn waited for the subprocess.
  ``_cmd_test`` now polls briefly for ``_live_procs`` to fill (so the
  subprocess definitely spawned), then ``os._exit(0)`` — skipping
  atexit entirely.  The subprocess keeps running under init because
  of ``start_new_session=True``.  Verified: CLI exit dropped from 2.3
  s to 76 ms for a 2-second hook, and the hook still writes its
  audit log 3 s later after the CLI is gone.

* **Stale ``_child_role_for_batch`` test (claude #1 / hermes #2).**
  The test from commit 76d3ffd4 asserted the *old* helper field name
  — no code path sets it post-refactor (455c136f), so the test
  passed trivially without verifying anything.  Fixed to assert
  ``_child_role`` (the real field) is stripped, and added an
  explanatory message so a future failure is easier to diagnose.
  Module-header docstring updated too.

* **``submit()`` RuntimeError branch: stale-semaphore parity fix
  (claude #3).**  Same pattern I already fixed in
  ``_on_async_future_done``, missed here: a concurrent
  ``_reset_async_pool`` between ``acquire`` and ``release`` would
  cause ``_async_sem_get()`` to lazy-create a fresh sem and over-
  release on it.  Snapshot ``_async_sem_inst`` + swallow
  ``ValueError`` like the symmetric path.

* **Shutdown race: proc registered after the snapshot (claude #4 /
  hermes #1).** Worker that got between ``subprocess.Popen()`` and
  ``_register_live_proc(proc)`` would miss the shutdown-sweep
  snapshot and block for the full ``spec.timeout``.  After
  registering, the worker now checks ``_async_shutting_down`` and
  self-terminates its subprocess group.

* **WARN log noise on SIGTERM'd children (claude #5).**
  Shutdown-induced exits (rc = -15 / -9) no longer spam a per-proc
  ``WARNING`` — demoted to ``DEBUG`` when ``_async_shutting_down``
  is set.  Both the atexit path and the signal handlers now set the
  flag before terminating, so a Ctrl-C or a ``kill -TERM`` with 10
  running hooks emits zero warn lines instead of 10.

Still outstanding (documented trade-offs, not fixed here):

* Gateway shutdown blocks the event loop for up to ``grace_seconds``
  (claude #2).  Acknowledged as a follow-up candidate via
  ``loop.run_in_executor``.
* ``_maybe_install_signal_handlers`` is still leading-underscore
  (claude NousResearch#6).  Cosmetic; kept consistent with the rest of the
  module's private-by-convention API.

All 101 hook tests still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants