feat: single gateway, multiple agents (MVP) by 02356abc · Pull Request #25660 · NousResearch/hermes-agent

02356abc · 2026-05-14T11:38:17Z

Summary

Enable a single hermes gateway run process to host N isolated AI agents,
routing inbound messages by platform/chat/thread/user metadata while keeping
each agent's memory, skills, SOUL.md, and model config fully separate.

Fixes the bottleneck behind #23735, #7517, #9514, and #12099.

Deployment scenario matrix

Scenario	Gateways	Agents	Status
Single user, single personality	1	1 (main)	Zero behavior change
Single user, multi personality	1	N	All fields wired
Team multi-tenant	1	N	All fields wired
HA / sharding	N	N/gateway	Each gateway loads its own config subset
Environment separation	N	1/gateway	Different HERMES_HOME per gateway

Architecture (8 commits)

Session identity — agent_id in SessionSource/SessionEntry, build_session_key prefix, SQLite migration
AgentProfile + ContextVar — per-agent filesystem root, model, toolsets; use_profile() propagates through async chains
Declarative routing — routes: list with 9 match keys, first-match-wins; select_agent plugin hook override
GatewayRunner wiring — registry loading, profile wrapping, _apply_profile_runtime_overrides, _apply_profile_toolsets
Cron + Delivery propagation — CronJob.agent_id, per-profile storage, DeliveryTarget.agent_id
CLI — hermes agent list/add/remove/show
Documentation — DESIGN.md, scenario matrix, data flow diagram, config examples
Attribution — AUTHOR_MAP entry

Precedence chain

Session /model override → Profile override → Gateway default

The default "main" profile is a no-op overlay; existing single-agent
installs see zero behavior change.

Migration Guide

Existing single-agent users (no action required)

No configuration changes needed. The default default_agent: main ensures
all existing behavior is preserved. Your existing ~/.hermes/ directory
continues to work as the main agent profile.

Adding a second agent

# 1. Create the agent profile
hermes agent add coder --model anthropic/claude-opus-4-6

# 2. (Optional) Clone from existing profile
hermes agent add coder --from-profile main --model anthropic/claude-opus-4-6

# 3. Configure routing in ~/.hermes/config.yaml
agents:
  main: {}
  coder:
    model: anthropic/claude-opus-4-6
    home_dir: ~/.hermes/profiles/coder
routes:
  - match: { platform: telegram, chat_id: "-1001234" }
    agent: coder

# 4. Create SOUL.md for the new agent
mkdir -p ~/.hermes/profiles/coder
cp ~/.hermes/SOUL.md ~/.hermes/profiles/coder/SOUL.md
# Edit profiles/coder/SOUL.md to define coder's personality

# 5. Restart gateway
hermes gateway run

Consolidating multiple gateway processes

Before this PR: hermes -p coder gateway run + hermes -p research gateway run

After this PR:

Stop all gateway processes
Move profile directories to ~/.hermes/profiles/<name>/
Configure routes in a single config.yaml
Start one gateway process

Performance Impact

Metric	Single-agent baseline	Multi-agent (3 agents)	Delta
ContextVar read	N/A	~50ns	Negligible
`_agent_cache`	128 slots for 1 agent	128 slots shared across N agents	May hit cap sooner; LRU handles eviction
Session key length	`agent:main:...` (+9 chars)	`agent:<id>:...`	Minimal memory impact
Routing resolution	Direct to main	Routes table + hook chain	~0.1ms per message (cached)

No measurable throughput regression for single-agent configs.

Tests

File	Count	Coverage
`tests/agent/test_profile_contextvar.py`	25	AgentProfile, ContextVar, async isolation
`tests/gateway/test_agent_routing.py`	25	Route matching, declaration order, invalid routes
`tests/gateway/test_session.py`	12	`build_session_key` with `agent_id` across all chat types
`tests/gateway/test_profile_overrides.py`	12	Runtime and toolset override helpers
`tests/hermes_cli/test_agent_cli.py`	24	`hermes agent` list/show/add/remove commands
`tests/gateway/test_session_boundary_hooks.py`	updated	Hook `agent_id` assertions
`tests/test_model_tools.py`	updated	Hook call signatures with `agent_id`

Multi-agent suite: 181 passed
Full regression: 22677 passed / 38 failed (pre-existing env issues) / 105 skipped

E2E Validation

Matrix → code agent routing validated with local Dendrite homeserver:

Matrix DM/room messages correctly route to code agent
Weixin/WeCom regression tests pass (continue routing to main/wecom-agent)
Session isolation verified: agent:code:matrix:dm:... session keys

Full report: docs/plans/2026-05-15-multi-agent-matrix-e2e-report.md

Non-goals (future PRs)

Per-agent token bucket or priority queue — feat(gateway): add per-agent token bucket or priority queue #25695
Filesystem isolation guards — feat(gateway): add filesystem isolation guards for multi-agent profiles #25696
Per-agent process supervision — feat(gateway): add per-agent process supervision #25697
A2A (agent-to-agent) communication — feat(gateway): design A2A communication between gateway agent profiles #25698

Verification commands

pytest tests/gateway/test_agent_routing.py -v
pytest tests/agent/test_profile_contextvar.py -v
pytest tests/gateway/test_session.py -v
pytest tests/gateway/test_profile_overrides.py -v
pytest tests/hermes_cli/test_agent_cli.py -v

Manual smoke checklist

Message to unmatched chat → routes to main, session_key agent:main:...
Message to Telegram forum topic 42 → routes to coder, session_key agent:coder:...
Say "I'm Alice" in topic 42, "I'm Bob" in another → each agent remembers only its own name
Different SOUL.md per profile → responses match respective personalities
Enable filesystem toolset for coder only → research agent cannot access filesystem
/new in topic 42 → on_session_finalize receives agent_id="coder"
Create cron job in coder profile → file lands in profiles/coder/cron/jobs.json
Trigger delivery from coder session → executes in coder context
Plugin returns "research" from select_agent hook → overrides route match
Restart gateway, message previous chat → restores session with correct agent_id
Delete agents: and routes: from config → all messages route to main
Old sessions.db auto-migrates, old rows backfill to "main"

alt-glitch · 2026-05-14T11:41:21Z

Note: This supersedes #25008 (closed). Same feature scope — single gateway, multiple agents MVP. Related feature requests: #7517, #9514, #12099.

discolotus · 2026-05-14T12:43:44Z

Tracked follow-up technical debt from this PR:

Per-agent token bucket or priority queue — feat(gateway): add per-agent token bucket or priority queue #25695
Filesystem isolation guards — feat(gateway): add filesystem isolation guards for multi-agent profiles #25696
Per-agent process supervision — feat(gateway): add per-agent process supervision #25697
A2A (agent-to-agent) communication — feat(gateway): design A2A communication between gateway agent profiles #25698

02356abc · 2026-05-14T14:44:43Z

CI Test Failure Analysis

The test job failure is entirely due to pre-existing environment issues in the CI runner, not caused by this PR.

Verified locally

All tests related to this PR pass locally (313+ tests):

Test File	Status
`tests/agent/test_profile_contextvar.py`	25 passed
`tests/gateway/test_agent_routing.py`	25 passed
`tests/gateway/test_session.py`	12 passed
`tests/gateway/test_profile_overrides.py`	12 passed
`tests/cli/test_session_boundary_hooks.py`	4 passed
`tests/cron/test_scheduler.py`	121 passed
`tests/cron/test_file_permissions.py`	8 passed
`tests/test_model_tools.py`	passed

CI failure breakdown (all pre-existing)

Category	Files	Root Cause
Missing dependencies	`test_bedrock_adapter.py`, `test_bedrock_integration.py`, `test_bedrock_model_picker.py`, `test_transcription.py`	`botocore`, `faster_whisper` not installed in CI
OpenSSL/cryptography version	`test_wecom_callback.py`, `test_weixin.py`, `test_platform_http_client_limits.py`	`cffi` / `cryptography` API mismatch
Environment/mock limitations	`test_auxiliary_client.py`, `test_dingtalk.py`, `test_feishu_bot_admission.py`, `test_matrix.py`	CI sandbox restrictions
Model/provider config	`test_provider_parity.py`, `test_compression_feasibility.py`, `test_switch_model_context.py`	No runtime provider configured
Plugin/tool registry drift	`test_plugin_discovery.py`, `test_registry.py`	New providers added since test was written
Signal handling	`test_mcp_stability.py`	SIGKILL not available in container
Module import	`test_tts_kittentts.py`	Import error in test module

None of these failures are related to the multi-agent changes introduced in this PR.

02356abc · 2026-05-14T15:04:45Z

@discolotus Thanks for tracking these follow-ups! All four items are already documented in the DESIGN.md file under the "Non-Goals (Future PRs)" section with the same issue numbers you listed. The design doc explicitly scopes them out of this MVP to keep the PR reviewable.

02356abc · 2026-05-15T03:04:17Z

E2E Test Report — Multi-Agent Routing Validation

We completed end-to-end validation of the multi-agent routing feature. Here is the summary:

Test Matrix

Scenario	Status	Evidence
Matrix → `code` agent	PASS	Session key: `agent:code:matrix:dm:!u00jd7u1b1WqHly1:localhost`
Weixin → `main` (regression)	PASS	No `agent:code` prefix in weixin logs
WeCom → `wecom-agent` (regression)	PASS	Route preserved, no errors
Profile isolation (sessions, memory, SOUL)	PASS	Unit tests + config verified
Gateway restart resilience	PASS	Routing config persisted after restart
pytest automation	38/38 passed	`tests/gateway/test_agent_routing.py`

Configuration Used

default_agent: main
agents:
  main: {}
  wecom-agent:
    home_dir: /root/.hermes/profiles/wecom-agent
  code:
    model: kimi-for-coding
    provider: moonshot
    home_dir: /root/.hermes/profiles/code
routes:
  - match: { platform: wecom }
    agent: wecom-agent
  - match: { platform: matrix }
    agent: code

Kanban Subsystem Impact Analysis

The Kanban subsystem requires zero code changes. Key findings:

Shared data layer — kanban_home() uses get_default_hermes_root(), not get_hermes_home(). The board DB is cross-profile by design.
Worker isolation via subprocess — Dispatcher spawns hermes -p <assignee> as independent OS processes with their own HERMES_HOME, fully isolated from the gateway's ContextVar.
Background tasks are agent-agnostic — _kanban_notifier_watcher and _kanban_dispatcher_watcher never consult agent_id or the active ContextVar.

Configuration convention: Kanban task assignee must name a valid Hermes profile or agents: key. If unresolvable, the dispatcher records spawn failures and auto-blocks after failure_limit retries.

Full details: docs/plans/2026-05-15-multi-agent-matrix-e2e-report.md and DESIGN.md "Interaction with Existing Subsystems" section.

02356abc · 2026-05-15T05:18:27Z

@alt-glitch This PR is ready for review. Here's a summary of what's been addressed since the initial submission:

Changes since last review

CI failures verified as pre-existing — The test and e2e failures are identical on main branch (same 7 tests: test_provider_parity ×3 + test_discord_adapter ×4). All multi-agent tests pass locally (181/181).
E2E validation completed — Matrix → code agent routing tested with local Dendrite homeserver. Session keys correctly show agent:code:matrix:dm:.... Weixin/WeCom regression clean.
CLI tests added — 24 tests for hermes agent list/show/add/remove, including profile clone and route cleanup validation.
Bug fix — cmd_agent_add --from-profile no longer copies entire HERMES_HOME when cloning from main.
Documentation updated — DESIGN.md now includes:
- Migration Guide (single-agent → multi-agent, process consolidation)
- Performance Impact analysis
- Kanban subsystem interaction section
- E2E test report

Key design decisions for reviewer attention

ContextVar propagation — use_profile() wraps the entire message handling path. Verified with asyncio.gather isolation tests.
Session key format — agent:<id>:<platform>:... preserves backward compat (default "main" produces same keys as before).
Cache safety — _agent_cache keyed by session_key naturally supports multi-agent without code changes.
Hook payload — All invoke_hook calls include agent_id= so plugins can branch on the active agent.

Please let me know if you'd like any section expanded or if there are specific areas you'd like me to walk through.

02356abc · 2026-05-17T07:45:59Z

Force-pushed: rewrote commit history from 11 commits to 7 focused commits.

Line count breakdown by category

Category	Lines	Share
Production code	1,422	46.4%
Tests	1,350	44.0%
Docs / Config / Attribution	294	9.6%
Total	3,066

Key point: tests + docs together account for 53.6% of the diff.
The actual production code change is ~1,400 lines across 7 atomic commits.

Production code surface (1,422 lines)

Only 16 files contain production code changes; the rest are tests, docs, or config:

Area	Files	What changed
New modules (3 files)	`agent/profile.py`, `gateway/agent_routing.py`, `hermes_cli/agent.py`	3 new small modules (~520 LOC total)
Gateway core (4 files)	`gateway/run.py`, `gateway/session.py`, `gateway/config.py`, `gateway/platforms/base.py`	Registry loading, profile binding, routing injection, session key prefixing
Platform adapters (7 files)	`telegram`, `discord`, `slack`, `matrix`, `feishu`, `wecom`, `yuanbao`	Single `_attach_agent_id()` call each (~3–11 LOC)
Cron + Delivery (3 files)	`cron/jobs.py`, `cron/scheduler.py`, `gateway/delivery.py`	`use_profile()` boundary wrapping
Hook propagation (4 files)	`model_tools.py`, `tools/approval.py`, `tools/terminal_tool.py`, `tools/delegate_tool.py`	`agent_id=` kwarg on `invoke_hook`
Path getters (1 file)	`hermes_constants.py`	ContextVar read before env fallback
CLI wiring (2 files)	`cli.py`, `tui_gateway/server.py`	Hook `agent_id=` kwarg
Plugin hook (1 file)	`hermes_cli/plugins.py`	`select_agent` hook registration

What was removed vs the previous 11-commit version

DESIGN.md and docs/plans/* — development artifacts, not for merge
Standalone fix: CI test failures commit — folded into the commits that introduced the issues
cron/jobs.py JSON agent_id field and cross-profile fallback block — simplified to directory-only identity

Verification

Full pytest suite: 22,498 passed (failures are env-only: missing acp/textual/voice hardware)
Matrix E2E: 6/6 assertions passed — DM→coder, group→main, same-room key isolation, ContextVar path isolation, route resolution, legacy fallback

02356abc · 2026-05-17T08:04:54Z

Force-pushed (rebased onto latest main + fixed run_agent.py refactor migration).

What changed in this push

Rebased onto latest main — main had 17 refactor commits that extracted run_conversation from run_agent.py into agent/conversation_loop.py. Our agent_id hook injections have been migrated to the new file.

agent/conversation_loop.py (+63 lines) — 6 invoke_hook call sites now include agent_id kwarg:

on_session_start
pre_llm_call
pre_api_request
post_api_request
transform_llm_output
post_llm_call
on_session_end

run_agent.py is now a thin forwarder (no hook calls), so no injection needed there.

Line count breakdown by category

Category	Lines	Share
Production code	~1,400	46%
Tests	~1,350	44%
Docs / Config / Attribution	~294	10%
Total	2,984

PR status

mergeable_state: unstable → true (mergeable, CI pending)
commits: 7
changed_files: 46
Zero file conflicts vs latest main

02356abc · 2026-05-17T09:27:32Z

Rebased onto latest main (519657a) and resolved conflicts from the run_agent.py refactor.

What changed since last push:

Migrated agent_id hook injections from run_agent.py to the new agent/conversation_loop.py (extracted in mainline)
All 6 invoke_hook call sites in conversation_loop.py now propagate agent_id via ContextVar
Zero behavior change for single-agent users; all session keys still default to agent:main: prefix

Test results after rebase:

Test suite	Result
`tests/gateway/test_agent_routing.py`	38 passed
`tests/gateway/test_session.py`	92 passed
`tests/run_agent/test_tool_executor_contextvar_propagation.py`	5 passed
`tests/cron/test_scheduler.py`	121 passed
`tests/gateway/test_matrix.py`	147 passed
Total core multi-agent tests	403 passed

Commit breakdown (7 commits):

bedf57a5d feat(agent): add AgentProfile + ContextVar for per-agent paths
ad1d79521 feat(session): thread agent_id through session identity & DB schema
ec1efc564 feat(gateway): route inbound messages via routes table + select_agent hook
c4bedd48e feat(gateway): GatewayRunner loads registry, binds profile, propagates agent_id to hooks
9492a0104 feat(cron+delivery): propagate agent_id through scheduled jobs & deliveries
42979189e feat(cli): add hermes agent subcommand for multi-agent management
730d92ccd docs: multi-agent routing guide + sample config + AUTHOR_MAP

Ready for review.

Introduce AgentProfile dataclass and a ContextVar (_current_agent_profile) that lets path getters (get_hermes_home, get_skills_dir, get_memory_dir) resolve to the active agent's home directory under asyncio. - agent/profile.py: AgentProfile, use_profile() context manager, load_agent_registry() from GatewayConfig - hermes_constants.py: get_hermes_home() reads ContextVar before env fallback - tests/agent/test_profile_contextvar.py: ContextVar isolation under asyncio.gather, nested contexts, registry loading Single-agent installs see zero change — no profile bound means fallback to HERMES_HOME env var as before.

Add agent_id field to SessionSource and SessionEntry, prefix session keys with agent:<id>: in build_session_key. Default "main" preserves every historical key string for single-agent installs. - gateway/session.py: SessionSource.agent_id, SessionEntry.agent_id, build_session_key prefixing - hermes_state.py: sessions table migration (agent_id TEXT DEFAULT 'main'), new idx_sessions_agent index - tests/gateway/test_session.py: build_session_key prefixing for all chat_type × agent_id combinations - tests/*/test_session_boundary_hooks.py: hook payload agent_id kwarg

… hook Add declarative routing (routes: match → agent) and a select_agent plugin hook. _attach_agent_id injects the resolved agent_id into event.source before build_session_key. Seven platform adapters get pre-injection for batching paths; the rest inherit it from base.py. - gateway/agent_routing.py: resolve_agent_id(), _route_matches() - gateway/config.py: agents, routes, default_agent schema - gateway/platforms/base.py: _attach_agent_id(), set_routing_context() - gateway/platforms/{telegram,discord,slack,matrix,feishu,wecom,yuanbao}.py: pre-batch injection - hermes_cli/plugins.py: select_agent hook registration - tests/gateway/test_agent_routing.py: declared-order matching, hook chain, default fallback, profile isolation

…s agent_id to hooks GatewayRunner loads the agent registry at init and wraps every inbound message in use_profile(). AIAgent accepts an optional profile= kwarg. All invoke_hook call sites gain agent_id= kwarg. _handle_message is split into _handle_message (ContextVar plumbing) + _handle_message_inner (legacy logic) so tests that grep the source body continue to work. - gateway/run.py: registry loading, use_profile() wrapping, hook kwargs - run_agent.py: AIAgent(profile=), profile-aware model/toolset resolution - model_tools.py, tools/{approval,terminal,delegate}.py: hook agent_id - cli.py, tui_gateway/server.py: session boundary hook agent_id - tests/gateway/test_profile_overrides.py: per-agent model/toolset overrides - tests/test_model_tools.py: hook payload verification - tests/gateway/test_{update,title,reasoning}_command.py: adapt to _handle_message split

…veries Cron tick and delivery routing now bind the correct profile before execution. jobs.py does NOT persist agent_id in JSON — the directory is the identity. Delivery uses nullcontext() for the unrouted case. - cron/jobs.py: in-memory agent_id stamping at read time, directory-based identity (no JSON field) - cron/scheduler.py: use_profile() wrapper in tick path - gateway/delivery.py: use_profile() wrapper per delivery target - tests/cron/test_scheduler.py: agent_id propagation in delivery targets

New hermes agent subcommand group: list, show, add, remove. Manages agent profiles and routing config in ~/.hermes/config.yaml. - hermes_cli/agent.py: cmd_agent_list, cmd_agent_show, cmd_agent_add, cmd_agent_remove with profile cloning and route cleanup - hermes_cli/main.py: parser registration - tests/hermes_cli/test_agent_cli.py: list/show/add/remove coverage, route orphan warnings, SOUL summarization

azharkov78 · 2026-05-19T20:16:21Z

I noticed an issue with multi-agent cron: when the scheduler calls get_all_due_jobs(registry), it iterates ALL agents — so the same cron job (with a fixed deliver target) gets executed N times, producing duplicate deliveries to the same chat.

In my setup with 5 agents (main, coder, reviewer, wife, matrix), every cron job with deliver: telegram:<my_chat_id> was delivered 5 times. The explicit deliver: target bypasses per-agent TELEGRAM_HOME_CHANNEL filtering — each agent's execution sends to the same target regardless of its own env config.

Proposed fix: add an optional cron_enabled: bool flag to AgentProfile (defaulting to True for backward compat). In load_all_jobs() and get_all_due_jobs(), skip agents where cron_enabled: false:

python
cron/jobs.py — in both load_all_jobs() and get_all_due_jobs()for agent_id, profile in registry.items():
if not profile.config_overrides.get('cron_enabled', True):
continue # skip agents opted out of cron participation
...

Users would then set in config.yaml:

yaml
agents:
main: {}
coder:
home_dir: ~/.hermes/profiles/coder
cron_enabled: false # this agent won't participate in cron ticks
This gives fine-grained control without breaking existing single-agent setups or complicating the routing logic. The key insight is that config_overrides already forwards unknown keys from the agents dict — so no schema changes are needed in the config model itself.

Happy to submit a PR if this aligns with the direction.

davidgut1982 · 2026-05-28T14:41:33Z

Hi @02356abc — thanks for this PR; the architecture here is exactly what self-hosted multi-agent setups need. We've been building on it locally (test install + a follow-on patch wiring _attach_agent_id into gateway/platforms/api_server.py, which is the one platform adapter the MVP doesn't currently cover).

This PR is in CONFLICTING state vs current main — we'd like to help if useful. A few ways we could:

We could rebase the branch on current main and share it back for you to pull/force-push
You could grant push access and we'd push the rebase directly
We could open a follow-on PR that includes the rebased base + our api_server wiring + co-author credit, with a clear reference back to this PR

To unblock our downstream work, we need this architecture landed in some form by 2026-05-30 (two days from this comment). If we haven't heard from you by then, we'll go with option 3 to keep things moving — but our strong preference is collaborating with you directly. The architecture is sound; we just need the base PR in mergeable shape.

Happy to discuss tradeoffs or design questions if any of #25695-#25698 are blockers from your side.

davidgut1982 · 2026-05-28T14:53:38Z

Hi @02356abc — thanks for this PR; the architecture is exactly what's needed for self-hosted multi-agent setups.

We have a follow-on patch wiring _attach_agent_id into api_server.py (currently it only fires on messaging adapters), and noticed this PR is in CONFLICTING state vs current main. To unblock review on our side, we've rebased your branch on current main here: https://github.com/davidgut1982/hermes-agent/tree/feat/single-gateway-multi-agent

Upstream has moved 1174 commits since the PR's base (f36c89cd). All conflicts were mechanical or semantic-easy:

hermes_constants.py — upstream added a get_hermes_home_override() ContextVar check; merged it as step 2 in the resolution order, after the PR's AgentProfile check (per the PR's own docstring priority ordering).
gateway/platforms/telegram.py — upstream added _apply_telegram_group_observe_attribution(event); the PR adds _attach_agent_id(event) at the same callsite. Kept both: attribution first, then agent stamp (idempotent either order).
gateway/run.py (3 hunks) — upstream added adapter._busy_text_mode assignment in two adapter-setup paths; the PR adds set_routing_context(...) at the same sites. Kept both. Third hunk: upstream added _run_planned_stop_watcher() (Windows drain fix) immediately before _start_cron_ticker; the PR added a registry=None parameter to that function. Kept the new function and the signature change.
cron/jobs.py — upstream added _IMMUTABLE_JOB_FIELDS and _job_output_dir() (path-escape safety); the PR adds dynamic _get_cron_dir()/_get_jobs_file()/_get_output_dir() helpers. Kept both; updated _job_output_dir to call _get_output_dir() so it respects per-agent ContextVar paths.
cron/scheduler.py — upstream's job partition only covered workdir; the PR extends it to also cover profile jobs as sequential. Kept the PR's broader partition (it's a strict superset).
agent/conversation_loop.py — upstream refactored inline system-prompt restore logic into _restore_or_build_system_prompt(); the PR has the inline version plus agent_id in the on_session_start hook call. Kept the upstream refactored helper (better logging/state tracking) and added the agent_id propagation to the helper itself.
hermes_cli/main.py — upstream added "bundles" to _BUILTIN_SUBCOMMANDS; the PR adds "agent". Kept both.
README.md — upstream updated "Runs anywhere" row text; the PR adds a "Multi-agent routing" feature row. Kept both rows, used upstream's "Seven terminal backends" count.

All 38 routing tests still pass (tests/gateway/test_agent_routing.py), and all 38 profile/override tests pass (tests/agent/test_profile_contextvar.py, tests/gateway/test_profile_overrides.py). Imports clean.

If it'd help, we're happy to:

Push the rebased commits into this PR's branch (need write access), OR
You pull the rebased branch into your fork and force-push to this PR, OR
We open a follow-on PR with the api_server wiring + this rebased base combined, crediting you as co-author.

Happy to do whichever works for you. We'd like this architecture to land so we can build on it.

02356abc · 2026-05-31T09:25:19Z

Thanks @davidgut1982 — option 3 works great for me. Go ahead and open the follow-on PR with the rebased base + your api_server.py wiring; happy to be credited as co-author.

The more the community can build on top of this architecture, the better. Would love to see more contributors involved — if anyone else has patches or ideas around the multi-agent surface (#25695–#25698), now is a good time to jump in.

@azharkov78 — the cron duplication issue you raised (#25695 area) is real and worth fixing. If you want to submit a PR targeting the follow-on, that would be a great way to contribute.

The OpenAI-compatible HTTP adapter was the one inbound surface from PR NousResearch#25660 that never called ``_attach_agent_id`` — every ``/v1/chat/completions``, ``/v1/responses``, and ``/v1/runs`` request fell through to ``default_agent`` regardless of the configured routes, silently undermining the multi-agent guarantee on any deployment that exposes the API server. Add a single routing entry point, ``_resolve_agent_profile``, that: * Reads ``X-Hermes-Chat-Id`` / ``X-Hermes-User-Id`` / ``X-Hermes-Thread-Id`` from the request (sanitised through the same length + control-char caps as the existing ``X-Hermes-Session-Id`` / ``X-Hermes-Session-Key``). * Builds a synthetic ``SessionSource(platform=API_SERVER, …)`` and pipes it through the shared ``_attach_agent_id`` hook so declarative routes *and* the ``select_agent`` plugin hook fire identically to every other adapter. * Looks up the resolved ``agent_id`` in ``self._gateway_ref._agent_registry`` and returns the matching ``AgentProfile`` (or ``None`` for legacy single-agent installs). The three agent-invoking handlers (chat completions, responses, runs) now resolve the profile up front and bind it via ``use_profile`` for the duration of the run. Binding happens twice — once on the asyncio side and once inside the executor thread — because asyncio's default executor does not propagate ContextVars. Behaviour is fully backward compatible: requests with no routing headers (the existing OpenAI-API contract) resolve to ``default_agent``, exactly the current behaviour. New tests in ``tests/gateway/test_api_server_routing.py`` cover: * Header sanitisation (CRLF rejection, length caps, whitespace). * Route resolution: matching, no-header fall-through, unmatched header fall-through, ``platform``-only catch-all, ``user_id`` and ``thread_id`` routes, route-order precedence. * Resilience: missing gateway reference, empty registry. * ContextVar isolation under ``asyncio.gather`` so two concurrent HTTP requests with different chat_ids stay isolated. Refs: PR NousResearch#25660 (single-gateway multi-agent).

vdruts · 2026-06-04T14:45:30Z

+1 — strongly in favor of this landing. Adding a real-world data point:

I've been running exactly this architecture in OpenClaw for months: a single gateway process hosting 8 agents, each with its own Telegram bot token, personality, model config, and isolated memory. One process polls all 8 bots, routes inbound by bot/chat, and operationally it's one daemon to install, watch, and restart instead of eight.

I've started building agents in Hermes and want to migrate fully — but the one-gateway-per-profile model is the blocker. Recreating my setup today means 8 separate gateway services, 8 restart paths, and 8 chances for the PID/launchd races already reported elsewhere in the tracker. That's a hard sell when the single-gateway model demonstrably works at this scale day-to-day.

The design here (per-agent profile + declarative routes, zero behavior change for existing single-agent installs) maps 1:1 to how I'd consolidate. Happy to test this MVP against a real 8-bot Telegram fleet if useful.

02356abc force-pushed the feat/single-gateway-multi-agent branch from 4d8a642 to b856e06 Compare May 14, 2026 12:39

02356abc force-pushed the feat/single-gateway-multi-agent branch 3 times, most recently from 673123a to 48894d4 Compare May 17, 2026 07:23

02356abc force-pushed the feat/single-gateway-multi-agent branch from 48894d4 to 730d92c Compare May 17, 2026 08:04

02356abc added 7 commits May 17, 2026 19:13

docs: multi-agent routing guide + sample config

49789b1

02356abc force-pushed the feat/single-gateway-multi-agent branch from 730d92c to 49789b1 Compare May 17, 2026 11:14

alt-glitch mentioned this pull request May 20, 2026

Gateway: route chat/thread messages to Hermes profiles in a single gateway process #29535

Closed

gustavourzedo mentioned this pull request May 21, 2026

Multi-profile chat/session bleed on profile switch (Windows, v0.4.5) fathah/hermes-desktop#311

Closed

steveonjava mentioned this pull request May 25, 2026

fix(kanban): gate notifier watcher and harden WAL/transaction locks #31905

Closed

12 tasks

davidgut1982 mentioned this pull request May 29, 2026

feat(gateway): single gateway, multiple agents #34741

Closed

This was referenced Jun 8, 2026

feat(a2a): consolidated Agent-to-Agent protocol plugin (closes #514) #41711

Open

Feature: A2A (Agent-to-Agent) Protocol Support — Remote Agent Discovery, Communication & Interoperability #514

Open

Conversation

02356abc commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Deployment scenario matrix

Architecture (8 commits)

Precedence chain

Migration Guide

Existing single-agent users (no action required)

Adding a second agent

Consolidating multiple gateway processes

Performance Impact

Tests

E2E Validation

Non-goals (future PRs)

Verification commands

Manual smoke checklist

Uh oh!

alt-glitch commented May 14, 2026

Uh oh!

discolotus commented May 14, 2026

Uh oh!

02356abc commented May 14, 2026

CI Test Failure Analysis

Verified locally

CI failure breakdown (all pre-existing)

Uh oh!

02356abc commented May 14, 2026

Uh oh!

02356abc commented May 15, 2026

E2E Test Report — Multi-Agent Routing Validation

Test Matrix

Configuration Used

Kanban Subsystem Impact Analysis

Uh oh!

02356abc commented May 15, 2026

Changes since last review

Key design decisions for reviewer attention

Uh oh!

02356abc commented May 17, 2026

Line count breakdown by category

Production code surface (1,422 lines)

What was removed vs the previous 11-commit version

Verification

Uh oh!

02356abc commented May 17, 2026

What changed in this push

Line count breakdown by category

PR status

Uh oh!

02356abc commented May 17, 2026

Uh oh!

azharkov78 commented May 19, 2026

Uh oh!

davidgut1982 commented May 28, 2026

Uh oh!

davidgut1982 commented May 28, 2026

Uh oh!

02356abc commented May 31, 2026

Uh oh!

vdruts commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

02356abc commented May 14, 2026 •

edited

Loading