Skip to content

Route messages to per-profile workers behind one bot token#36872

Open
banditburai wants to merge 25 commits into
NousResearch:mainfrom
banditburai:feat/profile-routing
Open

Route messages to per-profile workers behind one bot token#36872
banditburai wants to merge 25 commits into
NousResearch:mainfrom
banditburai:feat/profile-routing

Conversation

@banditburai

@banditburai banditburai commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

A single bot token exposes exactly one Hermes profile today. Serving several isolated identities (SOUL, memory, skills, sessions, model) requires one process and one credential per profile, because the token is single-consumer (Telegram getUpdates allows one consumer; Discord is one gateway WS per token) and HERMES_HOME plus several import-frozen module constants (auth paths, skills/hooks dirs) are process-global — so multiple profiles cannot share one process without cross-profile state bleed.

This routes each inbound message to a target profile by context (channel / topic / thread / guild / DM / @mention). Routing is platform-agnostic — it runs in the shared adapter seam, so it applies to every gateway platform (Telegram, Discord, Slack, Feishu, Matrix, …), not one. A transport-owning front process holds the token and runs a tokenless worker (hermes -p <name> gateway run, api_server only) per routed profile; isolation comes from the worker being a separate process with its own HERMES_HOME. Unrouted messages run in-process on the host profile, unchanged.

Change

  • Route resolverresolve_profile_route (gateway/routing.py:50) scores each route via _route_score (gateway/routing.py:13): thread_id(8) > chat_id/channel_id(4) > guild_id/user_id(2) > platform(1); channel_id is an alias for chat_id; a thread with no own route matches its parent_chat_id. No match → None (host).
  • Config_validate_profile_routing (gateway/config.py:454) rejects an unknown/invalid/duplicate-signature route at load; an absent table leaves profile_routing=None.
  • Session keysbuild_session_key takes a profile arg (gateway/session.py:600); prefix = f"agent:{profile}" (gateway/session.py:629). The default "main" reproduces the existing agent:main:… keys byte-for-byte.
  • Tier-1 overlaysresolve_channel_model (gateway/platforms/base.py:1599) resolves a per-channel model from any platform's channel_models config (a session /model override still wins). Telegram additionally reads per-topic model/system_prompt from its group_topics/dm_topics config, scoped by chat_id so they do not collide across groups reusing a thread_id; other platforms express the same intent through channel_models/channel_prompts keyed by channel id.
  • Routing seam (platform-agnostic) — resolution runs in the shared base.handle_message (gateway/platforms/base.py:3628), keyed on SessionSource fields (platform/chat_id/channel_id/thread_id/guild_id/user_id), so every adapter reaching that seam routes with no per-platform code — Telegram, Slack, Feishu, Matrix, WhatsApp, Signal, and the Discord plugin among them; a platform routes on whichever of those fields it populates. On resolution error it falls back to host (no isolation boundary crossed yet). _maybe_dispatch_routed (gateway/run.py:8829) returns True on every matched path — rate-limited, success, and error (which posts a visible message) — and False only when unrouted.
  • Worker process_worker_run_args_for_profile (hermes_cli/gateway.py:626) sets HERMES_GATEWAY_ONLY_PLATFORMS=api_server and never --replace; _only_platforms_filter (gateway/run.py:1558) skips token adapters in a worker so the front keeps the single token.
  • PoolWorkerPool.acquire (gateway/worker_pool.py:135) spawns/probes/reuses; _default_interlock (gateway/worker_pool.py:79) refuses to spawn while get_running_pid(<profile>/gateway.pid) is live; idle-evict, crash circuit-break with cooldown, _maintain_worker_pool reap/sweep (gateway/run.py:8692), and shutdown() teardown on stop.
  • DispatchWorkerClient.dispatch (gateway/worker_client.py:49) posts /v1/runs, relays message.delta/response.media, and resolves approval.request; continue_session loads the worker's own transcript via _continue_session_id (gateway/platforms/api_server.py:1334), off by default.
  • Rate limitProfileRateLimiter (gateway/rate_limit.py:32) is a per-profile token bucket checked before dispatch; /new and /reset bypass it.
  • Media — files cross the front↔worker boundary as media_ref tokens, never a path; confine_to_safe_root (gateway/media_spool.py:34) confines resolution to the spool root.
  • Manual controlparse_profile_mention (gateway/chat_bindings.py:21) splits a leading @<name> <body> and routes the turn without mutating the persisted /profile binding.

Unchanged: unrouted traffic, default-profile session keys, and existing /v1/runs clients (continue_session defaults off).

Behavior

Input Resolved by Outcome
@<name> + existing profile parse_profile_mentionprofile_exists this turn routes to worker <name>; binding unchanged
persisted /profile binding ChatBindings routes to the bound worker
config route match resolve_profile_route routes to route.profile
no match, default: null resolve_profile_routeNone in-process host profile
resolution error handle_message fallback host profile (pre-isolation, fail-soft)
dispatch error after a match _maybe_dispatch_routed visible error to the user; no host fallback

Testing

uv run --extra dev pytest tests/gateway/test_{profile_routing,profile_routing_config,session_key_profile,worker_arg_builder,only_platforms_env,worker_pool,worker_client,routed_dispatch,profile_rate_limit,routed_reset_scope,routed_dm_and_thread_rules,channel_model,channel_model_routing,config_channel_models_bridge,telegram_topic_overlay,telegram_dm_topic_prompt,media_refs_inbound,response_media_outbound,media_auth_and_cleanup,profile_command,at_mention_override,worker_session_continuity,routed_worker_live}.py -q117 passed.

  • Resolution: test_profile_routing.py::test_exact_thread_wins_over_channel, ::test_thread_inherits_parent_channel, ::test_channel_beats_guild, ::test_non_dict_routes_ignored.
  • Byte-identical default keys: test_session_key_profile.py::test_default_profile_keys_unchanged.
  • Routing seam: test_routed_dispatch.py::test_unrouted_returns_false (unrouted → host), ::test_dispatch_failure_is_fail_closed (a dispatch error returns True and posts a visible message).
  • Pool: test_worker_pool.py::test_standalone_pid_refuses_spawn (interlock), ::test_lazy_respawn_records_crash_without_reap, ::test_circuit_breaker_recovers_after_cooldown, ::test_acquire_after_shutdown_is_refused.
  • Rate limit: test_profile_rate_limit.py::test_throttled_profile_is_not_dispatched, ::test_control_commands_bypass_rate_limit.
  • Continuity opt-in stays off for existing clients: test_worker_session_continuity.py::test_off_by_default_so_existing_clients_stay_stateless.
  • Manual control: test_at_mention_override.py::test_email_like_text_is_not_a_mention, ::test_mention_does_not_mutate_binding.
  • DM/thread isolation: test_routed_dm_and_thread_rules.py::test_distinct_dm_peers_get_distinct_keys_under_same_profile, ::test_thread_without_own_route_inherits_parent_channel_profile.
  • Real-socket front↔worker round-trip — test_routed_worker_live.py drives the real WorkerClient against a real APIServerAdapter on a loopback TestServer: ::test_real_socket_dispatch_roundtrip (POST /v1/runs → SSE → run.completed), ::test_real_socket_continue_session_rehydrates (worker reloads its transcript from state.db), ::test_real_socket_outbound_media_emitted (MEDIA:response.media, tag-free text), ::test_real_socket_reset_unknown_session_is_idempotent.

The model turn is stubbed (no live LLM call in CI) and the worker runs in-process rather than as a spawned subprocess; the spawn argv/env (test_worker_arg_builder.py) and route resolution (test_profile_routing.py) are covered separately.

Limitations

  • A routed reply is delivered once the turn completes; deltas are not streamed (_FrontConsumer.on_delta is a no-op).
  • A routed turn that requests tool approval is denied and the user is notified; interactive approval is not forwarded across the boundary.
  • Media resolution uses a shared spool directory, so the front and worker must share a filesystem; the wire payload is metadata-only.

Related


Closes #4321
Closes #4622
Closes #5195
Closes #8339
Closes #9514
Closes #10143
Closes #18423
Closes #19809
Closes #24913
Addresses #13633
Addresses #7517
Supersedes #18510
Supersedes #24914
Refs #23735

…d when unrouted; fail-closed dispatch guard)
…f-heal + closed-pool race guard

The pool's evict/circuit-break machinery only ran on the next inbound
message. Record a crash on the on-demand respawn path in acquire() (a dead
SERVING/PROBING worker is counted before respawn) so the breaker trips even
when no periodic reap observed the death. _broken becomes a per-profile
cooldown map (was a permanent set) that self-heals after broken_cooldown.
shutdown() now latches _closed under the lock before snapshotting so an
acquire() racing shutdown can't spawn an orphan child.
…set idempotency, control-cmd rate-limit, approval notice

- Wire pool upkeep: _maintain_worker_pool (reap+sweep) runs each
  _session_expiry_watcher tick; _worker_pool.shutdown() in _stop_impl so
  routed workers don't leak as orphans on gateway stop/restart.
- Session continuity: routed turns were amnesiac (conversation_loop never
  auto-loads by session_id). Add opt-in continue_session — worker_client
  sends the flag, /v1/runs rehydrates the worker's own transcript via
  _continue_session_id. OFF by default => existing /v1/runs clients byte-identical.
- Reset idempotency: /new or /reset before the first routed turn 404'd;
  reset_session + _aiohttp_delete now treat 404 as success.
- Rate limit: /new and /reset carry no model cost — exempt them from the
  per-profile token bucket so a reset can't be throttled.
- Approvals: a routed turn needing approval was silently auto-denied;
  _dispatch_to_worker now denies WITH a visible user notice.
`is_voice or (ext in _AUDIO_EXTS and is_voice)` is just `is_voice`;
behavior unchanged (audio still maps to voice on the next branch).
…on, media, reset)

Wire the real WorkerClient to a real APIServerAdapter on an aiohttp TestServer
(a real listening socket), stubbing only the model turn, to exercise the actual
POST /v1/runs -> SSE relay -> run.completed path, continue_session transcript
rehydration from state.db, outbound MEDIA: -> response.media, and idempotent
reset over the wire.
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels Jun 1, 2026
The Tier-1 channel_models overlay runs on every _run_agent call, including
lean runners (cron/codex paths) that build GatewayRunner without setting
self.config. Read it via getattr so a missing config skips the overlay
instead of raising AttributeError (regressed tests/cron/test_codex_execution_paths.py).

@tonydwb tonydwb left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Verdict: Approved

✅ Looks Good

  • Clean, well-architected feature implementing per-profile worker routing behind one bot token
  • Strong isolation model: each profile runs in its own hermes gateway run worker with only api_server, sharing zero memory with the front
  • WorkerPool with proper lifecycle states (SPAWNING → PROBING → SERVING → DRAINING → REAPED/UNHEALTHY), idle eviction, and crash circuit breaker
  • WorkerClient uses HTTP+SSE over existing /v1/runs API — no new transport needed
  • MediaSpool with claim-check refs: front mints refs for inbound media, worker materializes them — clean cross-process boundary
  • ProfileRateLimiter (token bucket) prevents one chatty profile from starving others
  • ChatBindings persisted in JSON file under sessions dir — survives restarts
  • Route resolution precedence: @mention > /profile binding > config routing — clear and composable
  • session_key changes from agent:main:... to agent:<profile>:... — namespaced correctly per profile
  • Fail-closed: routed message errors are reported to the user and never silently fall back to host profile
  • Channel model routing via channel_models and resolve_channel_model — scoped by chat_id, collision-free
  • .gitignore for the media spool and bindings files — but verify these are included

💡 Suggestions

  • The approval handler in _dispatch_to_worker currently fails closed with auto-deny — consider wiring interactive approval relay as a follow-up so routed profiles can use approval-gated tools
  • ProfileRateLimiter defaults (capacity=20, refill=1/sec) should be documented as configurable in production
  • The confine_to_safe_root path escape guard in media_spool.py is good — consider adding a test for path traversal attempts
  • For a PR of this scale (2846 additions), consider splitting into smaller reviewable chunks in future (e.g., media spool separately, worker pool separately)

Reviewed by Hermes Agent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment