Skip to content

feat(gateway): single gateway, multiple agents#34741

Closed
davidgut1982 wants to merge 9 commits into
NousResearch:mainfrom
davidgut1982:feat/single-gateway-multi-agent
Closed

feat(gateway): single gateway, multiple agents#34741
davidgut1982 wants to merge 9 commits into
NousResearch:mainfrom
davidgut1982:feat/single-gateway-multi-agent

Conversation

@davidgut1982

@davidgut1982 davidgut1982 commented May 29, 2026

Copy link
Copy Markdown
Contributor

What

Enables a single hermes gateway run process to host N isolated AI agents, routing inbound messages by platform/chat/thread/user metadata. Each agent has independent memory, skills, SOUL.md, and model config. Rebases and extends #25660 (original work by @02356abc) onto v0.15.0 with conflict resolution. Adds two commits: multi-agent routing for the api_server platform adapter, and a profile-scope fix ensuring use_profile() ContextVar stays bound across async chains. Routes through a routes: table and optional select_agent hook.

Files modified: gateway/agent_routing.py, gateway/config.py, gateway/delivery.py, gateway/platforms/api_server.py, gateway/platforms/base.py, gateway/session.py, agent/conversation_loop.py, agent/profile.py, and others (16 files scanned, all Windows-footguns-clean).

Why

Single-gateway bottleneck (#23735, #7517, #9514, #12099) limits multi-user and multi-workflow deployments. Multi-agent routing lets operators run multiple isolated agents in one process while keeping their state, models, and configurations separate. Session keys are namespaced (agent:<id>:...) and the feature is fully backward compatible (default single main agent is a no-op).

Tests

pytest tests/agent/test_profile_contextvar.py tests/gateway/test_agent_routing.py tests/gateway/test_session.py -v

All tests pass; existing single-agent behavior is unchanged.

Platforms tested

Linux (CT/LXC environment, Python 3.13)

@davidgut1982 davidgut1982 force-pushed the feat/single-gateway-multi-agent branch from 9cd1a37 to f855a5f Compare May 29, 2026 17:23
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/cron Cron scheduler and job management comp/plugins Plugin system and bundled plugins platform/telegram Telegram bot adapter platform/slack Slack app adapter platform/discord Discord bot adapter platform/feishu Feishu / Lark adapter platform/wecom WeCom / WeChat Work adapter labels May 29, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Rebase of #25660 onto v0.15.0 with two additional commits (API server routing + profile propagation). Original authorship by @02356abc preserved. Related follow-up issues: #25695, #25696, #25697, #25698.

@romansoft

Copy link
Copy Markdown

This PR looks like the right architectural direction for multi-profile gateway routing.

One related enhancement that would make it especially useful for OpenAI-compatible clients like Open WebUI, LobeChat, LibreChat, etc.:

Could the API server optionally expose Hermes profiles as selectable OpenAI “models” and route requests by the incoming model field?

Current direction in this PR appears to support API-server profile routing via headers like:

  • X-Hermes-Chat-Id
  • X-Hermes-User-Id
  • X-Hermes-Thread-Id

That is useful for custom clients, but most OpenAI-compatible web UIs do not provide a clean way to set per-request custom routing headers from the model dropdown. They do already send the selected model in the request body.

Desired behavior:
GET /v1/models
could return allowed Hermes profiles as model IDs:
{
"data": [
{ "id": "default", "object": "model", "owned_by": "hermes" },
{ "id": "researcher", "object": "model", "owned_by": "hermes" },
{ "id": "coder", "object": "model", "owned_by": "hermes" }
]
}

Then:
{
"model": "researcher",
"messages": [...]
}

would route the request into the researcher Hermes profile, including that profile’s config, SOUL.md, memory, skills, toolsets, and session scope.

Important security/operational constraints:

  • Only expose profiles explicitly allowlisted for API-server use.
  • Keep current behavior as the default for backward compatibility.
  • Reject unknown or unauthorized model/profile names with a clear 404/403-style OpenAI-compatible error.
  • Preserve existing header-based routing for custom clients.
  • Ideally support both /v1/chat/completions and /v1/responses.

This would let one Hermes API server appear in Open WebUI as multiple selectable agents/profiles, instead of needing one API server process and port per profile.

@davidgut1982

Copy link
Copy Markdown
Contributor Author

@romansoft strong +1 — and worth zooming out, because you and I are asking for the same thing a lot of people have been asking for.

The demand is already well-documented. This PR alone references #23735, #7517, #9514, #12099. Beyond those, the "one gateway, many agents/personas" ask shows up repeatedly and independently:

Multiple authors, multiple platforms, multiple competing implementations — and several explicitly reject the current "run a separate gateway process per agent" answer as too heavyweight (especially for single-credential platforms). Your /v1/models idea is the cleanest client-facing expression of the same need: it makes all of that usable from the model dropdown in Open WebUI / LobeChat / LibreChat with zero client-side config.

So what's actually stopping it? Not effort — the code largely exists (this PR, #25660, #1991, #18510, #11439…). The blocker is that none of it is merged and there's no ratified design yet for how per-target routing should work: agents:/routes: (this PR) vs channel_overrides (#1991) vs topic_profiles (#18510) vs per-platform model. They overlap and partly conflict, and there's the separate agent_profiles: (delegation) concept in #9459 that uses the word "agent" too. Building a /v1/models surface on top of any one of these means betting on which routing mechanism wins review — and if it lands a different shape, the dropdown layer churns.

That's the only reason I'd sequence it as a follow-up rather than fold it in here: get the routing core ratified first, then add model-field → profile resolution on top (happy to author it, with your constraints — allowlist, backward-compatible default, clean OpenAI-shaped 404/403 for unknown profiles, headers preserved, both /v1/chat/completions and /v1/responses).

To be clear, I'm not trying to drive the direction here — just helping where I can and happy to adapt to whatever shape the maintainers prefer. The thing that would unblock all of it is a steer on the routing primitive. @teknium1 / @alt-glitch — is there a preferred direction among these (this PR's routes: table vs the per-channel/topic override approaches)? Once there's a blessed primitive, the client-facing layer @romansoft is describing is a small, well-scoped follow-up, and I'm glad to do the legwork to match whatever's chosen.

02356abc and others added 9 commits June 2, 2026 16:00
Introduce AgentProfile dataclass and a ContextVar (_current_agent_profile)
that lets path getters (get_hermes_home, get_skills_dir, get_memory_dir)
resolve to the active agent's home directory under asyncio.

- agent/profile.py: AgentProfile, use_profile() context manager,
  load_agent_registry() from GatewayConfig
- hermes_constants.py: get_hermes_home() reads ContextVar before env fallback
- tests/agent/test_profile_contextvar.py: ContextVar isolation under
  asyncio.gather, nested contexts, registry loading

Single-agent installs see zero change — no profile bound means fallback
to HERMES_HOME env var as before.
Add agent_id field to SessionSource and SessionEntry, prefix session keys
with agent:<id>: in build_session_key. Default "main" preserves every
historical key string for single-agent installs.

- gateway/session.py: SessionSource.agent_id, SessionEntry.agent_id,
  build_session_key prefixing
- hermes_state.py: sessions table migration (agent_id TEXT DEFAULT 'main'),
  new idx_sessions_agent index
- tests/gateway/test_session.py: build_session_key prefixing for all
  chat_type × agent_id combinations
- tests/*/test_session_boundary_hooks.py: hook payload agent_id kwarg
… hook

Add declarative routing (routes: match → agent) and a select_agent plugin
hook. _attach_agent_id injects the resolved agent_id into event.source
before build_session_key. Seven platform adapters get pre-injection for
batching paths; the rest inherit it from base.py.

- gateway/agent_routing.py: resolve_agent_id(), _route_matches()
- gateway/config.py: agents, routes, default_agent schema
- gateway/platforms/base.py: _attach_agent_id(), set_routing_context()
- gateway/platforms/{telegram,discord,slack,matrix,feishu,wecom,yuanbao}.py:
  pre-batch injection
- hermes_cli/plugins.py: select_agent hook registration
- tests/gateway/test_agent_routing.py: declared-order matching, hook chain,
  default fallback, profile isolation
…s agent_id to hooks

GatewayRunner loads the agent registry at init and wraps every inbound
message in use_profile(). AIAgent accepts an optional profile= kwarg.
All invoke_hook call sites gain agent_id= kwarg. _handle_message is
split into _handle_message (ContextVar plumbing) + _handle_message_inner
(legacy logic) so tests that grep the source body continue to work.

- gateway/run.py: registry loading, use_profile() wrapping, hook kwargs
- run_agent.py: AIAgent(profile=), profile-aware model/toolset resolution
- model_tools.py, tools/{approval,terminal,delegate}.py: hook agent_id
- cli.py, tui_gateway/server.py: session boundary hook agent_id
- tests/gateway/test_profile_overrides.py: per-agent model/toolset overrides
- tests/test_model_tools.py: hook payload verification
- tests/gateway/test_{update,title,reasoning}_command.py: adapt to
  _handle_message split
…veries

Cron tick and delivery routing now bind the correct profile before
execution. jobs.py does NOT persist agent_id in JSON — the directory
is the identity. Delivery uses nullcontext() for the unrouted case.

- cron/jobs.py: in-memory agent_id stamping at read time, directory-based
  identity (no JSON field)
- cron/scheduler.py: use_profile() wrapper in tick path
- gateway/delivery.py: use_profile() wrapper per delivery target
- tests/cron/test_scheduler.py: agent_id propagation in delivery targets
New hermes agent subcommand group: list, show, add, remove.
Manages agent profiles and routing config in ~/.hermes/config.yaml.

- hermes_cli/agent.py: cmd_agent_list, cmd_agent_show, cmd_agent_add,
  cmd_agent_remove with profile cloning and route cleanup
- hermes_cli/main.py: parser registration
- tests/hermes_cli/test_agent_cli.py: list/show/add/remove coverage,
  route orphan warnings, SOUL summarization
The OpenAI-compatible HTTP adapter was the one inbound surface from
PR NousResearch#25660 that never called ``_attach_agent_id`` — every
``/v1/chat/completions``, ``/v1/responses``, and ``/v1/runs`` request
fell through to ``default_agent`` regardless of the configured routes,
silently undermining the multi-agent guarantee on any deployment that
exposes the API server.

Add a single routing entry point, ``_resolve_agent_profile``, that:

  * Reads ``X-Hermes-Chat-Id`` / ``X-Hermes-User-Id`` / ``X-Hermes-Thread-Id``
    from the request (sanitised through the same length + control-char
    caps as the existing ``X-Hermes-Session-Id`` / ``X-Hermes-Session-Key``).
  * Builds a synthetic ``SessionSource(platform=API_SERVER, …)`` and
    pipes it through the shared ``_attach_agent_id`` hook so declarative
    routes *and* the ``select_agent`` plugin hook fire identically to
    every other adapter.
  * Looks up the resolved ``agent_id`` in
    ``self._gateway_ref._agent_registry`` and returns the matching
    ``AgentProfile`` (or ``None`` for legacy single-agent installs).

The three agent-invoking handlers (chat completions, responses, runs)
now resolve the profile up front and bind it via ``use_profile`` for
the duration of the run.  Binding happens twice — once on the asyncio
side and once inside the executor thread — because asyncio's default
executor does not propagate ContextVars.

Behaviour is fully backward compatible: requests with no routing
headers (the existing OpenAI-API contract) resolve to
``default_agent``, exactly the current behaviour.

New tests in ``tests/gateway/test_api_server_routing.py`` cover:

  * Header sanitisation (CRLF rejection, length caps, whitespace).
  * Route resolution: matching, no-header fall-through, unmatched
    header fall-through, ``platform``-only catch-all, ``user_id`` and
    ``thread_id`` routes, route-order precedence.
  * Resilience: missing gateway reference, empty registry.
  * ContextVar isolation under ``asyncio.gather`` so two concurrent
    HTTP requests with different chat_ids stay isolated.

Refs: PR NousResearch#25660 (single-gateway multi-agent).
…t_id

Two Code Critic WARN findings from multi-agent apply review:

1. api_server.py: move _active_run_agents registration inside the
   `with use_profile` block so any post-construction lazy attribute
   access on the asyncio thread sees the correct per-agent profile.
   The executor thread re-binds independently in _run_sync.

2. delegate_tool.py: document why _build_child_progress_callback uses
   `subagent_id=` (TUI spawn-tree identity) while invoke_hook uses
   `agent_id=` (multi-agent routing). Different consumers, same run.
@davidgut1982

Copy link
Copy Markdown
Contributor Author

Superseded by a chain of 6 smaller, focused PRs per CONTRIBUTING.md guidance (one logical change, reviewable in ~15 min). Each builds on the previous and should be merged in order:

Chain: #37495#37496#37497#37498#37500#37502

  1. feat(agent): add AgentProfile + ContextVar for per-agent paths (1/6) #37495 feat(agent): AgentProfile + ContextVar for per-agent paths
  2. feat(session): thread agent_id through session identity & DB schema (2/6) #37496 feat(session): thread agent_id through session identity & DB schema
  3. feat(gateway): route inbound messages via routes table + GatewayRunner profile binding (3/6) #37497 feat(gateway): routes table + select_agent hook + GatewayRunner profile binding
  4. feat(cli): add hermes agent subcommand for multi-agent management (4/6) #37498 feat(cli): hermes agent subcommand
  5. feat(cron+api): propagate agent_id through jobs/deliveries + wire api_server routing (5/6) #37500 feat(cron+api): propagate agent_id through jobs/deliveries + api_server routing
  6. docs+fix(gateway): multi-agent routing guide + use_profile scope fix (6/6) #37502 docs+fix: multi-agent routing guide + use_profile scope fix

The chain is rebased onto current main; the net diff of #37502 over main is the same 47-file feature set as this PR, with three minor cherry-pick conflicts resolved (documented in the respective PR bodies).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have platform/discord Discord bot adapter platform/feishu Feishu / Lark adapter platform/slack Slack app adapter platform/telegram Telegram bot adapter platform/wecom WeCom / WeChat Work adapter type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants