Route messages to per-profile workers behind one bot token#36872
Open
banditburai wants to merge 25 commits into
Open
Route messages to per-profile workers behind one bot token#36872banditburai wants to merge 25 commits into
banditburai wants to merge 25 commits into
Conversation
…ssion /model still wins)
…ct-wins, thread→parent inheritance)
…d when unrouted; fail-closed dispatch guard)
…earch#18510; credit ayoahha, Donmeusi)
…f-heal + closed-pool race guard The pool's evict/circuit-break machinery only ran on the next inbound message. Record a crash on the on-demand respawn path in acquire() (a dead SERVING/PROBING worker is counted before respawn) so the breaker trips even when no periodic reap observed the death. _broken becomes a per-profile cooldown map (was a permanent set) that self-heals after broken_cooldown. shutdown() now latches _closed under the lock before snapshotting so an acquire() racing shutdown can't spawn an orphan child.
…set idempotency, control-cmd rate-limit, approval notice - Wire pool upkeep: _maintain_worker_pool (reap+sweep) runs each _session_expiry_watcher tick; _worker_pool.shutdown() in _stop_impl so routed workers don't leak as orphans on gateway stop/restart. - Session continuity: routed turns were amnesiac (conversation_loop never auto-loads by session_id). Add opt-in continue_session — worker_client sends the flag, /v1/runs rehydrates the worker's own transcript via _continue_session_id. OFF by default => existing /v1/runs clients byte-identical. - Reset idempotency: /new or /reset before the first routed turn 404'd; reset_session + _aiohttp_delete now treat 404 as success. - Rate limit: /new and /reset carry no model cost — exempt them from the per-profile token bucket so a reset can't be throttled. - Approvals: a routed turn needing approval was silently auto-denied; _dispatch_to_worker now denies WITH a visible user notice.
`is_voice or (ext in _AUDIO_EXTS and is_voice)` is just `is_voice`; behavior unchanged (audio still maps to voice on the next branch).
…on, media, reset) Wire the real WorkerClient to a real APIServerAdapter on an aiohttp TestServer (a real listening socket), stubbing only the model turn, to exercise the actual POST /v1/runs -> SSE relay -> run.completed path, continue_session transcript rehydration from state.db, outbound MEDIA: -> response.media, and idempotent reset over the wire.
The Tier-1 channel_models overlay runs on every _run_agent call, including lean runners (cron/codex paths) that build GatewayRunner without setting self.config. Read it via getattr so a missing config skips the overlay instead of raising AttributeError (regressed tests/cron/test_codex_execution_paths.py).
tonydwb
approved these changes
Jun 1, 2026
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved
✅ Looks Good
- Clean, well-architected feature implementing per-profile worker routing behind one bot token
- Strong isolation model: each profile runs in its own
hermes gateway runworker with onlyapi_server, sharing zero memory with the front WorkerPoolwith proper lifecycle states (SPAWNING → PROBING → SERVING → DRAINING → REAPED/UNHEALTHY), idle eviction, and crash circuit breakerWorkerClientuses HTTP+SSE over existing/v1/runsAPI — no new transport neededMediaSpoolwith claim-check refs: front mints refs for inbound media, worker materializes them — clean cross-process boundaryProfileRateLimiter(token bucket) prevents one chatty profile from starving othersChatBindingspersisted in JSON file under sessions dir — survives restarts- Route resolution precedence: @mention > /profile binding > config routing — clear and composable
session_keychanges fromagent:main:...toagent:<profile>:...— namespaced correctly per profile- Fail-closed: routed message errors are reported to the user and never silently fall back to host profile
- Channel model routing via
channel_modelsandresolve_channel_model— scoped by chat_id, collision-free .gitignorefor the media spool and bindings files — but verify these are included
💡 Suggestions
- The approval handler in
_dispatch_to_workercurrently fails closed with auto-deny — consider wiring interactive approval relay as a follow-up so routed profiles can use approval-gated tools ProfileRateLimiterdefaults (capacity=20, refill=1/sec) should be documented as configurable in production- The
confine_to_safe_rootpath escape guard inmedia_spool.pyis good — consider adding a test for path traversal attempts - For a PR of this scale (2846 additions), consider splitting into smaller reviewable chunks in future (e.g., media spool separately, worker pool separately)
Reviewed by Hermes Agent
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A single bot token exposes exactly one Hermes profile today. Serving several isolated identities (SOUL, memory, skills, sessions, model) requires one process and one credential per profile, because the token is single-consumer (Telegram
getUpdatesallows one consumer; Discord is one gateway WS per token) andHERMES_HOMEplus several import-frozen module constants (auth paths, skills/hooks dirs) are process-global — so multiple profiles cannot share one process without cross-profile state bleed.This routes each inbound message to a target profile by context (channel / topic / thread / guild / DM /
@mention). Routing is platform-agnostic — it runs in the shared adapter seam, so it applies to every gateway platform (Telegram, Discord, Slack, Feishu, Matrix, …), not one. A transport-owning front process holds the token and runs a tokenless worker (hermes -p <name> gateway run, api_server only) per routed profile; isolation comes from the worker being a separate process with its ownHERMES_HOME. Unrouted messages run in-process on the host profile, unchanged.Change
resolve_profile_route(gateway/routing.py:50) scores each route via_route_score(gateway/routing.py:13):thread_id(8) >chat_id/channel_id(4) >guild_id/user_id(2) >platform(1);channel_idis an alias forchat_id; a thread with no own route matches itsparent_chat_id. No match →None(host)._validate_profile_routing(gateway/config.py:454) rejects an unknown/invalid/duplicate-signature route at load; an absent table leavesprofile_routing=None.build_session_keytakes aprofilearg (gateway/session.py:600);prefix = f"agent:{profile}"(gateway/session.py:629). The default"main"reproduces the existingagent:main:…keys byte-for-byte.resolve_channel_model(gateway/platforms/base.py:1599) resolves a per-channel model from any platform'schannel_modelsconfig (a session/modeloverride still wins). Telegram additionally reads per-topicmodel/system_promptfrom itsgroup_topics/dm_topicsconfig, scoped bychat_idso they do not collide across groups reusing athread_id; other platforms express the same intent throughchannel_models/channel_promptskeyed by channel id.base.handle_message(gateway/platforms/base.py:3628), keyed onSessionSourcefields (platform/chat_id/channel_id/thread_id/guild_id/user_id), so every adapter reaching that seam routes with no per-platform code — Telegram, Slack, Feishu, Matrix, WhatsApp, Signal, and the Discord plugin among them; a platform routes on whichever of those fields it populates. On resolution error it falls back to host (no isolation boundary crossed yet)._maybe_dispatch_routed(gateway/run.py:8829) returnsTrueon every matched path — rate-limited, success, and error (which posts a visible message) — andFalseonly when unrouted._worker_run_args_for_profile(hermes_cli/gateway.py:626) setsHERMES_GATEWAY_ONLY_PLATFORMS=api_serverand never--replace;_only_platforms_filter(gateway/run.py:1558) skips token adapters in a worker so the front keeps the single token.WorkerPool.acquire(gateway/worker_pool.py:135) spawns/probes/reuses;_default_interlock(gateway/worker_pool.py:79) refuses to spawn whileget_running_pid(<profile>/gateway.pid)is live; idle-evict, crash circuit-break with cooldown,_maintain_worker_poolreap/sweep (gateway/run.py:8692), andshutdown()teardown on stop.WorkerClient.dispatch(gateway/worker_client.py:49) posts/v1/runs, relaysmessage.delta/response.media, and resolvesapproval.request;continue_sessionloads the worker's own transcript via_continue_session_id(gateway/platforms/api_server.py:1334), off by default.ProfileRateLimiter(gateway/rate_limit.py:32) is a per-profile token bucket checked before dispatch;/newand/resetbypass it.media_reftokens, never a path;confine_to_safe_root(gateway/media_spool.py:34) confines resolution to the spool root.parse_profile_mention(gateway/chat_bindings.py:21) splits a leading@<name> <body>and routes the turn without mutating the persisted/profilebinding.Unchanged: unrouted traffic, default-profile session keys, and existing
/v1/runsclients (continue_sessiondefaults off).Behavior
@<name>+ existing profileparse_profile_mention→profile_exists<name>; binding unchanged/profilebindingChatBindingsresolve_profile_routeroute.profiledefault: nullresolve_profile_route→Nonehandle_messagefallback_maybe_dispatch_routedTesting
uv run --extra dev pytest tests/gateway/test_{profile_routing,profile_routing_config,session_key_profile,worker_arg_builder,only_platforms_env,worker_pool,worker_client,routed_dispatch,profile_rate_limit,routed_reset_scope,routed_dm_and_thread_rules,channel_model,channel_model_routing,config_channel_models_bridge,telegram_topic_overlay,telegram_dm_topic_prompt,media_refs_inbound,response_media_outbound,media_auth_and_cleanup,profile_command,at_mention_override,worker_session_continuity,routed_worker_live}.py -q→ 117 passed.test_profile_routing.py::test_exact_thread_wins_over_channel,::test_thread_inherits_parent_channel,::test_channel_beats_guild,::test_non_dict_routes_ignored.test_session_key_profile.py::test_default_profile_keys_unchanged.test_routed_dispatch.py::test_unrouted_returns_false(unrouted → host),::test_dispatch_failure_is_fail_closed(a dispatch error returnsTrueand posts a visible message).test_worker_pool.py::test_standalone_pid_refuses_spawn(interlock),::test_lazy_respawn_records_crash_without_reap,::test_circuit_breaker_recovers_after_cooldown,::test_acquire_after_shutdown_is_refused.test_profile_rate_limit.py::test_throttled_profile_is_not_dispatched,::test_control_commands_bypass_rate_limit.test_worker_session_continuity.py::test_off_by_default_so_existing_clients_stay_stateless.test_at_mention_override.py::test_email_like_text_is_not_a_mention,::test_mention_does_not_mutate_binding.test_routed_dm_and_thread_rules.py::test_distinct_dm_peers_get_distinct_keys_under_same_profile,::test_thread_without_own_route_inherits_parent_channel_profile.test_routed_worker_live.pydrives the realWorkerClientagainst a realAPIServerAdapteron a loopbackTestServer:::test_real_socket_dispatch_roundtrip(POST /v1/runs→ SSE →run.completed),::test_real_socket_continue_session_rehydrates(worker reloads its transcript fromstate.db),::test_real_socket_outbound_media_emitted(MEDIA:→response.media, tag-free text),::test_real_socket_reset_unknown_session_is_idempotent.The model turn is stubbed (no live LLM call in CI) and the worker runs in-process rather than as a spawned subprocess; the spawn argv/env (
test_worker_arg_builder.py) and route resolution (test_profile_routing.py) are covered separately.Limitations
_FrontConsumer.on_deltais a no-op).Related
confine_to_safe_root,gateway/media_spool.py:34) adapts thesafe_roothelper from that PR.HERMES_HOMEContextVarswap, with/profileand@<name>controls. This change keeps those controls (gateway/chat_bindings.py:21) and isolates via worker processes.base.handle_messageseam,gateway/platforms/feishu.py:2876) together with a skill library shared across profiles; the shared-library part is served by the existingskills.external_dirsconfig and is not part of this change.Closes #4321
Closes #4622
Closes #5195
Closes #8339
Closes #9514
Closes #10143
Closes #18423
Closes #19809
Closes #24913
Addresses #13633
Addresses #7517
Supersedes #18510
Supersedes #24914
Refs #23735