docs+fix(gateway): multi-agent routing guide + use_profile scope fix (6/6)#37502
Open
davidgut1982 wants to merge 12 commits into
Open
docs+fix(gateway): multi-agent routing guide + use_profile scope fix (6/6)#37502davidgut1982 wants to merge 12 commits into
davidgut1982 wants to merge 12 commits into
Conversation
50ccde5 to
a0d513a
Compare
Introduce AgentProfile dataclass and a ContextVar (_current_agent_profile) that lets path getters (get_hermes_home, get_skills_dir, get_memory_dir) resolve to the active agent's home directory under asyncio. - agent/profile.py: AgentProfile, use_profile() context manager, load_agent_registry() from GatewayConfig - hermes_constants.py: get_hermes_home() reads ContextVar before env fallback - tests/agent/test_profile_contextvar.py: ContextVar isolation under asyncio.gather, nested contexts, registry loading Single-agent installs see zero change — no profile bound means fallback to HERMES_HOME env var as before.
Add agent_id field to SessionSource and SessionEntry, prefix session keys with agent:<id>: in build_session_key. Default "main" preserves every historical key string for single-agent installs. - gateway/session.py: SessionSource.agent_id, SessionEntry.agent_id, build_session_key prefixing - hermes_state.py: sessions table migration (agent_id TEXT DEFAULT 'main'), new idx_sessions_agent index - tests/gateway/test_session.py: build_session_key prefixing for all chat_type × agent_id combinations - tests/*/test_session_boundary_hooks.py: hook payload agent_id kwarg
… hook
Add declarative routing (routes: match → agent) and a select_agent plugin
hook. _attach_agent_id injects the resolved agent_id into event.source
before build_session_key. Seven platform adapters get pre-injection for
batching paths; the rest inherit it from base.py.
- gateway/agent_routing.py: resolve_agent_id(), _route_matches()
- gateway/config.py: agents, routes, default_agent schema
- gateway/platforms/base.py: _attach_agent_id(), set_routing_context()
- gateway/platforms/{telegram,discord,slack,matrix,feishu,wecom,yuanbao}.py:
pre-batch injection
- hermes_cli/plugins.py: select_agent hook registration
- tests/gateway/test_agent_routing.py: declared-order matching, hook chain,
default fallback, profile isolation
…s agent_id to hooks
GatewayRunner loads the agent registry at init and wraps every inbound
message in use_profile(). AIAgent accepts an optional profile= kwarg.
All invoke_hook call sites gain agent_id= kwarg. _handle_message is
split into _handle_message (ContextVar plumbing) + _handle_message_inner
(legacy logic) so tests that grep the source body continue to work.
- gateway/run.py: registry loading, use_profile() wrapping, hook kwargs
- run_agent.py: AIAgent(profile=), profile-aware model/toolset resolution
- model_tools.py, tools/{approval,terminal,delegate}.py: hook agent_id
- cli.py, tui_gateway/server.py: session boundary hook agent_id
- tests/gateway/test_profile_overrides.py: per-agent model/toolset overrides
- tests/test_model_tools.py: hook payload verification
- tests/gateway/test_{update,title,reasoning}_command.py: adapt to
_handle_message split
New hermes agent subcommand group: list, show, add, remove. Manages agent profiles and routing config in ~/.hermes/config.yaml. - hermes_cli/agent.py: cmd_agent_list, cmd_agent_show, cmd_agent_add, cmd_agent_remove with profile cloning and route cleanup - hermes_cli/main.py: parser registration - tests/hermes_cli/test_agent_cli.py: list/show/add/remove coverage, route orphan warnings, SOUL summarization
…veries Cron tick and delivery routing now bind the correct profile before execution. jobs.py does NOT persist agent_id in JSON — the directory is the identity. Delivery uses nullcontext() for the unrouted case. - cron/jobs.py: in-memory agent_id stamping at read time, directory-based identity (no JSON field) - cron/scheduler.py: use_profile() wrapper in tick path - gateway/delivery.py: use_profile() wrapper per delivery target - tests/cron/test_scheduler.py: agent_id propagation in delivery targets
The OpenAI-compatible HTTP adapter was the one inbound surface from PR NousResearch#25660 that never called ``_attach_agent_id`` — every ``/v1/chat/completions``, ``/v1/responses``, and ``/v1/runs`` request fell through to ``default_agent`` regardless of the configured routes, silently undermining the multi-agent guarantee on any deployment that exposes the API server. Add a single routing entry point, ``_resolve_agent_profile``, that: * Reads ``X-Hermes-Chat-Id`` / ``X-Hermes-User-Id`` / ``X-Hermes-Thread-Id`` from the request (sanitised through the same length + control-char caps as the existing ``X-Hermes-Session-Id`` / ``X-Hermes-Session-Key``). * Builds a synthetic ``SessionSource(platform=API_SERVER, …)`` and pipes it through the shared ``_attach_agent_id`` hook so declarative routes *and* the ``select_agent`` plugin hook fire identically to every other adapter. * Looks up the resolved ``agent_id`` in ``self._gateway_ref._agent_registry`` and returns the matching ``AgentProfile`` (or ``None`` for legacy single-agent installs). The three agent-invoking handlers (chat completions, responses, runs) now resolve the profile up front and bind it via ``use_profile`` for the duration of the run. Binding happens twice — once on the asyncio side and once inside the executor thread — because asyncio's default executor does not propagate ContextVars. Behaviour is fully backward compatible: requests with no routing headers (the existing OpenAI-API contract) resolve to ``default_agent``, exactly the current behaviour. New tests in ``tests/gateway/test_api_server_routing.py`` cover: * Header sanitisation (CRLF rejection, length caps, whitespace). * Route resolution: matching, no-header fall-through, unmatched header fall-through, ``platform``-only catch-all, ``user_id`` and ``thread_id`` routes, route-order precedence. * Resilience: missing gateway reference, empty registry. * ContextVar isolation under ``asyncio.gather`` so two concurrent HTTP requests with different chat_ids stay isolated. Refs: PR NousResearch#25660 (single-gateway multi-agent).
…t_id Two Code Critic WARN findings from multi-agent apply review: 1. api_server.py: move _active_run_agents registration inside the `with use_profile` block so any post-construction lazy attribute access on the asyncio thread sees the correct per-agent profile. The executor thread re-binds independently in _run_sync. 2. delegate_tool.py: document why _build_child_progress_callback uses `subagent_id=` (TUI spawn-tree identity) while invoke_hook uses `agent_id=` (multi-agent routing). Different consumers, same run.
The MGA series added _attach_agent_id() into the handle_message hot path (base.py:handle_message). It reads three instance attributes set only in BasePlatformAdapter.__init__: _default_agent_id, _gateway_routes, and _gateway_ref. Any adapter constructed without running __init__ (or a future partial-construction path) crashes with AttributeError on _default_agent_id, which is read outside the existing try/except. Because this runs on every inbound message, a missing attribute would crash real message processing, not just tests. Harden the method to read all three via getattr() with safe defaults so a partially-constructed adapter degrades to the "main" agent instead of raising. Also fix the _make_adapter() test helper, which deliberately bypasses __init__ via object.__new__() and hand-sets attributes: it predated the MGA routing attributes and never set them. Add the three so the mock faithfully mirrors a real __init__-constructed adapter. Fixes 10 regressions in tests/gateway/test_active_session_text_merge.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…jobs NameError) The tick() fetch site was renamed from due_jobs to all_jobs in 8dcf8a840 but the sequential/parallel partition still referenced the old due_jobs name, raising NameError at runtime and failing ~17 cron tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The mga-5 feature (33fb5ea "propagate agent_id through scheduled jobs & deliveries") made _resolve_single_delivery_target always include an "agent_id" key in the resolved target dict (None when the job has no agent). That commit updated _resolve_delivery_target and most of the TestResolveDeliveryTarget expected dicts, but four cron-thread tests were missed and still asserted dicts without "agent_id", so they began failing once the feature merged. Add "agent_id": None to the four stale expected dicts to match the intended feature contract and their already-updated sibling tests. No production behavior change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
64be325 to
e2dc72f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add the multi-agent routing user guide and a sample
cli-config.yaml.example, and apply a small fix that extendsuse_profilescope and documentssubagent_idvsagent_id.Why
Ships the documentation operators need to configure multi-agent routing, and closes a scope gap so the active profile is applied consistently (including the delegate/api_server paths), clarifying the distinction between a delegated
subagent_idand the routedagent_id.How to test
Platforms tested
Linux (CT 133 / Proxmox LXC)
Part 6 of 6 in the multi-agent gateway decomposition (replaces #34741). Depends on: #37500
Final PR in the chain. After all six merge in order,
mainmatches the content of the original #34741.