feat(gateway): single gateway, multiple agents#34741
Conversation
9cd1a37 to
f855a5f
Compare
|
This PR looks like the right architectural direction for multi-profile gateway routing. One related enhancement that would make it especially useful for OpenAI-compatible clients like Open WebUI, LobeChat, LibreChat, etc.: Could the API server optionally expose Hermes profiles as selectable OpenAI “models” and route requests by the incoming Current direction in this PR appears to support API-server profile routing via headers like:
That is useful for custom clients, but most OpenAI-compatible web UIs do not provide a clean way to set per-request custom routing headers from the model dropdown. They do already send the selected model in the request body. Desired behavior: Then: would route the request into the Important security/operational constraints:
This would let one Hermes API server appear in Open WebUI as multiple selectable agents/profiles, instead of needing one API server process and port per profile. |
|
@romansoft strong +1 — and worth zooming out, because you and I are asking for the same thing a lot of people have been asking for. The demand is already well-documented. This PR alone references #23735, #7517, #9514, #12099. Beyond those, the "one gateway, many agents/personas" ask shows up repeatedly and independently:
Multiple authors, multiple platforms, multiple competing implementations — and several explicitly reject the current "run a separate gateway process per agent" answer as too heavyweight (especially for single-credential platforms). Your So what's actually stopping it? Not effort — the code largely exists (this PR, #25660, #1991, #18510, #11439…). The blocker is that none of it is merged and there's no ratified design yet for how per-target routing should work: That's the only reason I'd sequence it as a follow-up rather than fold it in here: get the routing core ratified first, then add To be clear, I'm not trying to drive the direction here — just helping where I can and happy to adapt to whatever shape the maintainers prefer. The thing that would unblock all of it is a steer on the routing primitive. @teknium1 / @alt-glitch — is there a preferred direction among these (this PR's |
Introduce AgentProfile dataclass and a ContextVar (_current_agent_profile) that lets path getters (get_hermes_home, get_skills_dir, get_memory_dir) resolve to the active agent's home directory under asyncio. - agent/profile.py: AgentProfile, use_profile() context manager, load_agent_registry() from GatewayConfig - hermes_constants.py: get_hermes_home() reads ContextVar before env fallback - tests/agent/test_profile_contextvar.py: ContextVar isolation under asyncio.gather, nested contexts, registry loading Single-agent installs see zero change — no profile bound means fallback to HERMES_HOME env var as before.
Add agent_id field to SessionSource and SessionEntry, prefix session keys with agent:<id>: in build_session_key. Default "main" preserves every historical key string for single-agent installs. - gateway/session.py: SessionSource.agent_id, SessionEntry.agent_id, build_session_key prefixing - hermes_state.py: sessions table migration (agent_id TEXT DEFAULT 'main'), new idx_sessions_agent index - tests/gateway/test_session.py: build_session_key prefixing for all chat_type × agent_id combinations - tests/*/test_session_boundary_hooks.py: hook payload agent_id kwarg
… hook
Add declarative routing (routes: match → agent) and a select_agent plugin
hook. _attach_agent_id injects the resolved agent_id into event.source
before build_session_key. Seven platform adapters get pre-injection for
batching paths; the rest inherit it from base.py.
- gateway/agent_routing.py: resolve_agent_id(), _route_matches()
- gateway/config.py: agents, routes, default_agent schema
- gateway/platforms/base.py: _attach_agent_id(), set_routing_context()
- gateway/platforms/{telegram,discord,slack,matrix,feishu,wecom,yuanbao}.py:
pre-batch injection
- hermes_cli/plugins.py: select_agent hook registration
- tests/gateway/test_agent_routing.py: declared-order matching, hook chain,
default fallback, profile isolation
…s agent_id to hooks
GatewayRunner loads the agent registry at init and wraps every inbound
message in use_profile(). AIAgent accepts an optional profile= kwarg.
All invoke_hook call sites gain agent_id= kwarg. _handle_message is
split into _handle_message (ContextVar plumbing) + _handle_message_inner
(legacy logic) so tests that grep the source body continue to work.
- gateway/run.py: registry loading, use_profile() wrapping, hook kwargs
- run_agent.py: AIAgent(profile=), profile-aware model/toolset resolution
- model_tools.py, tools/{approval,terminal,delegate}.py: hook agent_id
- cli.py, tui_gateway/server.py: session boundary hook agent_id
- tests/gateway/test_profile_overrides.py: per-agent model/toolset overrides
- tests/test_model_tools.py: hook payload verification
- tests/gateway/test_{update,title,reasoning}_command.py: adapt to
_handle_message split
…veries Cron tick and delivery routing now bind the correct profile before execution. jobs.py does NOT persist agent_id in JSON — the directory is the identity. Delivery uses nullcontext() for the unrouted case. - cron/jobs.py: in-memory agent_id stamping at read time, directory-based identity (no JSON field) - cron/scheduler.py: use_profile() wrapper in tick path - gateway/delivery.py: use_profile() wrapper per delivery target - tests/cron/test_scheduler.py: agent_id propagation in delivery targets
New hermes agent subcommand group: list, show, add, remove. Manages agent profiles and routing config in ~/.hermes/config.yaml. - hermes_cli/agent.py: cmd_agent_list, cmd_agent_show, cmd_agent_add, cmd_agent_remove with profile cloning and route cleanup - hermes_cli/main.py: parser registration - tests/hermes_cli/test_agent_cli.py: list/show/add/remove coverage, route orphan warnings, SOUL summarization
The OpenAI-compatible HTTP adapter was the one inbound surface from PR NousResearch#25660 that never called ``_attach_agent_id`` — every ``/v1/chat/completions``, ``/v1/responses``, and ``/v1/runs`` request fell through to ``default_agent`` regardless of the configured routes, silently undermining the multi-agent guarantee on any deployment that exposes the API server. Add a single routing entry point, ``_resolve_agent_profile``, that: * Reads ``X-Hermes-Chat-Id`` / ``X-Hermes-User-Id`` / ``X-Hermes-Thread-Id`` from the request (sanitised through the same length + control-char caps as the existing ``X-Hermes-Session-Id`` / ``X-Hermes-Session-Key``). * Builds a synthetic ``SessionSource(platform=API_SERVER, …)`` and pipes it through the shared ``_attach_agent_id`` hook so declarative routes *and* the ``select_agent`` plugin hook fire identically to every other adapter. * Looks up the resolved ``agent_id`` in ``self._gateway_ref._agent_registry`` and returns the matching ``AgentProfile`` (or ``None`` for legacy single-agent installs). The three agent-invoking handlers (chat completions, responses, runs) now resolve the profile up front and bind it via ``use_profile`` for the duration of the run. Binding happens twice — once on the asyncio side and once inside the executor thread — because asyncio's default executor does not propagate ContextVars. Behaviour is fully backward compatible: requests with no routing headers (the existing OpenAI-API contract) resolve to ``default_agent``, exactly the current behaviour. New tests in ``tests/gateway/test_api_server_routing.py`` cover: * Header sanitisation (CRLF rejection, length caps, whitespace). * Route resolution: matching, no-header fall-through, unmatched header fall-through, ``platform``-only catch-all, ``user_id`` and ``thread_id`` routes, route-order precedence. * Resilience: missing gateway reference, empty registry. * ContextVar isolation under ``asyncio.gather`` so two concurrent HTTP requests with different chat_ids stay isolated. Refs: PR NousResearch#25660 (single-gateway multi-agent).
…t_id Two Code Critic WARN findings from multi-agent apply review: 1. api_server.py: move _active_run_agents registration inside the `with use_profile` block so any post-construction lazy attribute access on the asyncio thread sees the correct per-agent profile. The executor thread re-binds independently in _run_sync. 2. delegate_tool.py: document why _build_child_progress_callback uses `subagent_id=` (TUI spawn-tree identity) while invoke_hook uses `agent_id=` (multi-agent routing). Different consumers, same run.
f855a5f to
1828ac2
Compare
|
Superseded by a chain of 6 smaller, focused PRs per CONTRIBUTING.md guidance (one logical change, reviewable in ~15 min). Each builds on the previous and should be merged in order: Chain: #37495 → #37496 → #37497 → #37498 → #37500 → #37502
The chain is rebased onto current main; the net diff of #37502 over main is the same 47-file feature set as this PR, with three minor cherry-pick conflicts resolved (documented in the respective PR bodies). |
What
Enables a single
hermes gateway runprocess to host N isolated AI agents, routing inbound messages by platform/chat/thread/user metadata. Each agent has independent memory, skills, SOUL.md, and model config. Rebases and extends #25660 (original work by @02356abc) onto v0.15.0 with conflict resolution. Adds two commits: multi-agent routing for theapi_serverplatform adapter, and a profile-scope fix ensuringuse_profile()ContextVar stays bound across async chains. Routes through aroutes:table and optionalselect_agenthook.Files modified:
gateway/agent_routing.py,gateway/config.py,gateway/delivery.py,gateway/platforms/api_server.py,gateway/platforms/base.py,gateway/session.py,agent/conversation_loop.py,agent/profile.py, and others (16 files scanned, all Windows-footguns-clean).Why
Single-gateway bottleneck (#23735, #7517, #9514, #12099) limits multi-user and multi-workflow deployments. Multi-agent routing lets operators run multiple isolated agents in one process while keeping their state, models, and configurations separate. Session keys are namespaced (
agent:<id>:...) and the feature is fully backward compatible (default singlemainagent is a no-op).Tests
All tests pass; existing single-agent behavior is unchanged.
Platforms tested
Linux (CT/LXC environment, Python 3.13)