Skip to content

feat(cron+api): propagate agent_id through jobs/deliveries + wire api_server routing (5/6)#37500

Open
davidgut1982 wants to merge 10 commits into
NousResearch:mainfrom
davidgut1982:feat/mga-5-cron-api-propagation
Open

feat(cron+api): propagate agent_id through jobs/deliveries + wire api_server routing (5/6)#37500
davidgut1982 wants to merge 10 commits into
NousResearch:mainfrom
davidgut1982:feat/mga-5-cron-api-propagation

Conversation

@davidgut1982

Copy link
Copy Markdown
Contributor

What

Propagate agent_id through scheduled cron jobs and gateway deliveries, and wire the api_server platform adapter into multi-agent routing.

Why

Scheduled jobs and outbound deliveries must run under the correct agent identity, and the HTTP api_server adapter must participate in routing so API-driven traffic reaches the right agent like the chat platforms do.

How to test

python -m pytest tests/cron/test_scheduler.py tests/gateway/test_api_server_routing.py -q

Platforms tested

Linux (CT 133 / Proxmox LXC)

Part 5 of 6 in the multi-agent gateway decomposition (replaces #34741). Depends on: #37498

Note on conflict resolution: cherry-picking the cron commit conflicted in cron/jobs.py where main had added a _strict_retry tracking flag for JSON auto-repair. Resolved by keeping main's _strict_retry line (it is used later in load_jobs); the agent commit added nothing at that location.

@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery comp/cron Cron scheduler and job management labels Jun 2, 2026
@davidgut1982 davidgut1982 force-pushed the feat/mga-5-cron-api-propagation branch from b4fafd3 to f2f8641 Compare June 3, 2026 03:01
02356abc and others added 10 commits June 3, 2026 23:42
Introduce AgentProfile dataclass and a ContextVar (_current_agent_profile)
that lets path getters (get_hermes_home, get_skills_dir, get_memory_dir)
resolve to the active agent's home directory under asyncio.

- agent/profile.py: AgentProfile, use_profile() context manager,
  load_agent_registry() from GatewayConfig
- hermes_constants.py: get_hermes_home() reads ContextVar before env fallback
- tests/agent/test_profile_contextvar.py: ContextVar isolation under
  asyncio.gather, nested contexts, registry loading

Single-agent installs see zero change — no profile bound means fallback
to HERMES_HOME env var as before.
Add agent_id field to SessionSource and SessionEntry, prefix session keys
with agent:<id>: in build_session_key. Default "main" preserves every
historical key string for single-agent installs.

- gateway/session.py: SessionSource.agent_id, SessionEntry.agent_id,
  build_session_key prefixing
- hermes_state.py: sessions table migration (agent_id TEXT DEFAULT 'main'),
  new idx_sessions_agent index
- tests/gateway/test_session.py: build_session_key prefixing for all
  chat_type × agent_id combinations
- tests/*/test_session_boundary_hooks.py: hook payload agent_id kwarg
… hook

Add declarative routing (routes: match → agent) and a select_agent plugin
hook. _attach_agent_id injects the resolved agent_id into event.source
before build_session_key. Seven platform adapters get pre-injection for
batching paths; the rest inherit it from base.py.

- gateway/agent_routing.py: resolve_agent_id(), _route_matches()
- gateway/config.py: agents, routes, default_agent schema
- gateway/platforms/base.py: _attach_agent_id(), set_routing_context()
- gateway/platforms/{telegram,discord,slack,matrix,feishu,wecom,yuanbao}.py:
  pre-batch injection
- hermes_cli/plugins.py: select_agent hook registration
- tests/gateway/test_agent_routing.py: declared-order matching, hook chain,
  default fallback, profile isolation
…s agent_id to hooks

GatewayRunner loads the agent registry at init and wraps every inbound
message in use_profile(). AIAgent accepts an optional profile= kwarg.
All invoke_hook call sites gain agent_id= kwarg. _handle_message is
split into _handle_message (ContextVar plumbing) + _handle_message_inner
(legacy logic) so tests that grep the source body continue to work.

- gateway/run.py: registry loading, use_profile() wrapping, hook kwargs
- run_agent.py: AIAgent(profile=), profile-aware model/toolset resolution
- model_tools.py, tools/{approval,terminal,delegate}.py: hook agent_id
- cli.py, tui_gateway/server.py: session boundary hook agent_id
- tests/gateway/test_profile_overrides.py: per-agent model/toolset overrides
- tests/test_model_tools.py: hook payload verification
- tests/gateway/test_{update,title,reasoning}_command.py: adapt to
  _handle_message split
New hermes agent subcommand group: list, show, add, remove.
Manages agent profiles and routing config in ~/.hermes/config.yaml.

- hermes_cli/agent.py: cmd_agent_list, cmd_agent_show, cmd_agent_add,
  cmd_agent_remove with profile cloning and route cleanup
- hermes_cli/main.py: parser registration
- tests/hermes_cli/test_agent_cli.py: list/show/add/remove coverage,
  route orphan warnings, SOUL summarization
…veries

Cron tick and delivery routing now bind the correct profile before
execution. jobs.py does NOT persist agent_id in JSON — the directory
is the identity. Delivery uses nullcontext() for the unrouted case.

- cron/jobs.py: in-memory agent_id stamping at read time, directory-based
  identity (no JSON field)
- cron/scheduler.py: use_profile() wrapper in tick path
- gateway/delivery.py: use_profile() wrapper per delivery target
- tests/cron/test_scheduler.py: agent_id propagation in delivery targets
The OpenAI-compatible HTTP adapter was the one inbound surface from
PR NousResearch#25660 that never called ``_attach_agent_id`` — every
``/v1/chat/completions``, ``/v1/responses``, and ``/v1/runs`` request
fell through to ``default_agent`` regardless of the configured routes,
silently undermining the multi-agent guarantee on any deployment that
exposes the API server.

Add a single routing entry point, ``_resolve_agent_profile``, that:

  * Reads ``X-Hermes-Chat-Id`` / ``X-Hermes-User-Id`` / ``X-Hermes-Thread-Id``
    from the request (sanitised through the same length + control-char
    caps as the existing ``X-Hermes-Session-Id`` / ``X-Hermes-Session-Key``).
  * Builds a synthetic ``SessionSource(platform=API_SERVER, …)`` and
    pipes it through the shared ``_attach_agent_id`` hook so declarative
    routes *and* the ``select_agent`` plugin hook fire identically to
    every other adapter.
  * Looks up the resolved ``agent_id`` in
    ``self._gateway_ref._agent_registry`` and returns the matching
    ``AgentProfile`` (or ``None`` for legacy single-agent installs).

The three agent-invoking handlers (chat completions, responses, runs)
now resolve the profile up front and bind it via ``use_profile`` for
the duration of the run.  Binding happens twice — once on the asyncio
side and once inside the executor thread — because asyncio's default
executor does not propagate ContextVars.

Behaviour is fully backward compatible: requests with no routing
headers (the existing OpenAI-API contract) resolve to
``default_agent``, exactly the current behaviour.

New tests in ``tests/gateway/test_api_server_routing.py`` cover:

  * Header sanitisation (CRLF rejection, length caps, whitespace).
  * Route resolution: matching, no-header fall-through, unmatched
    header fall-through, ``platform``-only catch-all, ``user_id`` and
    ``thread_id`` routes, route-order precedence.
  * Resilience: missing gateway reference, empty registry.
  * ContextVar isolation under ``asyncio.gather`` so two concurrent
    HTTP requests with different chat_ids stay isolated.

Refs: PR NousResearch#25660 (single-gateway multi-agent).
The MGA series added _attach_agent_id() into the handle_message hot path
(base.py:handle_message). It reads three instance attributes set only in
BasePlatformAdapter.__init__: _default_agent_id, _gateway_routes, and
_gateway_ref. Any adapter constructed without running __init__ (or a
future partial-construction path) crashes with AttributeError on
_default_agent_id, which is read outside the existing try/except.

Because this runs on every inbound message, a missing attribute would
crash real message processing, not just tests. Harden the method to read
all three via getattr() with safe defaults so a partially-constructed
adapter degrades to the "main" agent instead of raising.

Also fix the _make_adapter() test helper, which deliberately bypasses
__init__ via object.__new__() and hand-sets attributes: it predated the
MGA routing attributes and never set them. Add the three so the mock
faithfully mirrors a real __init__-constructed adapter.

Fixes 10 regressions in tests/gateway/test_active_session_text_merge.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…jobs NameError)

The tick() fetch site was renamed from due_jobs to all_jobs in 8dcf8a840
but the sequential/parallel partition still referenced the old due_jobs
name, raising NameError at runtime and failing ~17 cron tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The mga-5 feature (33fb5ea "propagate agent_id through scheduled jobs
& deliveries") made _resolve_single_delivery_target always include an
"agent_id" key in the resolved target dict (None when the job has no
agent). That commit updated _resolve_delivery_target and most of the
TestResolveDeliveryTarget expected dicts, but four cron-thread tests
were missed and still asserted dicts without "agent_id", so they began
failing once the feature merged.

Add "agent_id": None to the four stale expected dicts to match the
intended feature contract and their already-updated sibling tests.
No production behavior change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants