Solving the Multi-Tenant Hermes Problem

**Multiplayer agentic AI is the future. Hermes can and should lead.**

_TL;DR: Memory operations bypass the hook system entirely, making tenant isolation impossible without forking core — we've been running a fix in production for months with numerous multi-tenant agents in different contexts and we're ready to upstream it and released a Hermes centric product that brings it all together._

We recently released a commons product for the good of the Hermes community called [Hermes Swarm Map](https://www.nimbleco.ai/field-research/hermes-swarm-map). It is [open-source](https://github.com/NimbleCoAI/hermes-swarm-map) and allows users to not just orchestrate runtimes, but also to control the permissions hooks needed to run Hermes agents more safely in multi-tenant environments. To do this we built on and expanded existing Hermes patterns that we think constitute a critical turning point for the community that will determine whether we win or lose the global agentic AI race. This matters not just for Hermes, but for open-source local first systems pushing back against hyper centralizing and enclosure-centric forces more broadly. Multiplayer is a novel form of alignment. From small groups of friends to enterprise installations. These are the problems we *need* to solve.

## The Problem

Every Hermes deployment beyond personal use hits the same wall: one agent = one tenant. Memory is global, sessions don't scope by tenant, and there is no isolation between groups, channels, or users. Run a Hermes agent in a private DM and a public group chat simultaneously, and sensitive information from the DM gets injected into the group session — where anyone can prompt the agent to reveal it.

This isn't a niche concern. It's the structural blocker preventing Hermes from serving teams, communities, and enterprises. The moment you deploy Hermes for more than one context — two Telegram groups, a DM and a channel, two clients on the same instance — you need tenant isolation that doesn't exist today. The workarounds are all variations of "run N separate processes," which doesn't scale and defeats the purpose of a unified agent platform.

Multi-tenancy is not an enterprise feature request. It's the next architectural frontier for agentic AI. Every major framework and platform in the space is actively solving this problem (LangChain's namespace tuples, OpenAI's thread isolation, AWS Bedrock's tenant context primitives, Dust.tt's workspace model). Hermes has the foundation to solve it well — but it needs targeted changes to close the gap.

## Community Demand

There are **12+ open issues** across this repo that are all symptoms of the same root problem. They cluster into four sub-problems:

**Gateway adapter multi-instance** — the `Dict[Platform, adapter]` architecture means one platform can only have one adapter instance. This blocks enterprise WeChat deployments (#29144: "1 gateway = 1 worker = 1 line = 1 WeChat. N workers = N gateway processes. Operations are unsustainable"), multi-account personal setups (#9756), multi-bot Telegram (#10452), and general multi-user gateways (#10030).

**Profile routing** — operators want one gateway process serving multiple profiles, routed by channel/topic/group. This is independently requested for Telegram topic-to-profile routing (#10143), Discord (#18423, #19809), Feishu (#13633), and the Hermes-A365 Microsoft Agent 365 integration (#23735, where maintainers describe needing 4 separate gateway processes for 4 agent blueprints). Issue #9514 (single-daemon multi-agent with per-topic workspace and memory isolation) is the most active, with 8 comments from production operators.

**Memory/context isolation** — the core data safety problem. #28279 requests per-chat memory isolation (scoped memory) to prevent information learned in one context from being accessible in another. #4726 requests profile-scoped memory namespaces for multi-agent setups, documenting that holographic memory uses a single shared store with no namespace isolation. #14162 requests per-conversation context isolation for multi-chat deployments. A triage contributor (konsisumer) has identified these as duplicates across at least three independent filings.

**Session key correctness** — #17068 proposes including bot identity in the session key for multi-bot Telegram setups. Currently, two bots on the same instance produce identical session keys (`agent:main:telegram:dm:{user_id}`), causing session state bleed between profiles. This manifests immediately in any multi-profile deployment.

The voices are unambiguous:

- **"This is the single reason I haven't migrated to Hermes yet."** — Donmeusi (#9514), running 9 specialized agents on a competing framework
- **A content agent read competitor monitoring memory and wrote it into a public article — the operator nearly faced legal consequences.** — jingchang0623-crypto (#9514, translated from Chinese), describing a real production incident
- **"这不是'N 个独立 bot'，这是'一个企业 agent，N 个成员以个人微信为入口接入'"** ("This isn't N independent bots — it's one enterprise agent, N members accessing via personal WeChat as entry points.") — #29144
- **Running 28 agents: "we went from ~200MB per process to a single ~400MB daemon for all 28"** — CorellisOrg (#9514), quantifying the resource win of shared infrastructure

No PRs addressing any of these issues have been merged or are currently open.

## What Hermes Already Does Well

The Hermes's plugin/hook architecture is genuinely strong. The 17-hook system — particularly `pre_gateway_dispatch` for routing and gating — handles ~60% of what a multi-tenant deployment needs. Per-adapter environment variables handle mention filtering. The profile system provides agent-level configuration separation. These aren't afterthoughts; they're well-designed extension points.

`group_sessions_per_user` correctly separates conversation state between group and DM contexts. The adapter architecture cleanly abstracts platform differences. For single-tenant deployments, the architecture is solid.

The hook system is the right foundation for solving multi-tenancy. What's needed is not a rewrite — it's extending the hook coverage to the one subsystem that hooks currently can't reach.

## The Structural Gap

The gap is specific: **memory operations bypass the hook system entirely.**

No hook fires before a memory write. No hook fires before a memory read. No plugin can intercept the filesystem path where memories are stored. This means:

- `group_sessions_per_user` separates *conversation state* but not *memory*. A fact learned in a DM is globally available in every group session.
- A plugin can gate which messages reach the agent (`pre_gateway_dispatch`), but once a message is processed and stored as memory, that memory is accessible from every context.
- Memory writes go directly to a global store. There is no hook point where a plugin could say "this memory belongs to context X and should only be readable from context X."

This is not a criticism of the design — it's a natural consequence of building for single-tenant first. But it means that no matter how sophisticated a plugin is, it cannot enforce memory isolation. The fix has to be in the memory layer itself.

## How Other Frameworks Handle This

**LangChain/LangGraph** has the most mature model: hierarchical namespace tuples `(org_id, user_id, thread_id)` scope every memory operation. Isolation is enforced at the storage layer with production backends (Postgres, Redis) supporting row-level security. The framework distinguishes agent-scoped memory (shared) from user-scoped memory (isolated) as first-class concepts.

**AutoGen** uses topic-based isolation for conversational separation but has no built-in long-term memory store to namespace. Tenant isolation requires separate runtime instances — there is no shared multi-tenant runtime.

**CrewAI** documents this as an explicit failure mode: "there is no per-user isolation for CrewAI memory types, so the system will fail fast in production." Their community works around it with external providers (Mem0, Weaviate) that have native multi-user scoping.

**OpenAI Assistants API** uses threads as the isolation primitive — threads within one assistant never share context. Simple, effective, but closed-source and ephemeral (no persistent cross-thread memory).

Emerging academic work reinforces this direction. Rezazadeh et al. (2025, arXiv:2505.18279) propose dual-tier memory (private + selectively shared) with immutable provenance on every fragment. Arceo & Narsing (2026, Red Hat, arXiv:2605.05287) identify the critical failure mode in RAG-backed multi-tenant agents: retrieval ranks by relevance, not authorization, so Tenant A's query can surface Tenant B's documents.

The pattern across all of these: **isolation must be enforced at the storage layer, not assumed from process separation.**

## Our Implementation

We've been running a multi-tenant Hermes deployment in production for months — 8 agents across multiple platforms with a range of ventures, communities, frontier experiments, journalists, and commons public goods projects. Our fork (NimbleCoAI/hermes-agent) carries a small set of core patches, adapter-level fixes, and plugins. The core mechanism is `context_id`.

**Fork composition (35 commits, rebased against current upstream):**
- **4 core patches** — memory scoping via `context_id` (~70 lines across 3 files), input sanitization, `observe_only` field on `MessageEvent` + handler in `run.py`
- **11 adapter patches** — Signal (8): UUID auth, invite policy, profile name, mention stripping, runtime allowlist, `observe_only`, SSE stale reconnect, voice memo detection (content-type based), voice memos bypass observe_only, syncMessage group auto-detect; Mattermost (3): mention gating + system post filtering, join/leave gating, `observe_only`
- **4 plugins** (zero rebase risk) — `swarm_map_policy` (2 commits), `boot_md`, `lifecycle-notify`
- **3 security/fix patches** — approval command admin gating, denial feedback messages, model cascade noise suppression
- **5 infra patches** — faster-whisper Docker pre-install, GHCR publish, weekly upstream sync CI, org rename
- **3 docs** — branding, patch documentation, rebase journal
- **1 config** — deployment reference configs

Several commits are evolutionary (e.g. boot.md started as a builtin hook, then was refactored to a standalone plugin). Net unique functional changes: ~28.

Notable: `boot.md` was originally implemented as a `hooks.py` patch, but after discovering upstream explicitly removed boot.md support from hooks.py, we converted it to a standalone plugin. This respects the upstream architectural decision while preserving the functionality we need. Plugins sit outside the rebase surface entirely.

`observe_only` passive context was added for Signal and Mattermost adapters, following the pattern Telegram already has upstream via `observe_unmentioned_group_messages`. This lets agents see group messages they aren't mentioned in without responding, maintaining conversational coherence in multi-turn group discussions.

**Rebase health:** We recently rebased our fork commits onto 410 upstream commits. Only 3 conflicts in 2 files — `tools/memory_tool.py` (our scoping composes cleanly with upstream's new promptware defense and drift detection features) and `plugins/platforms/mattermost/adapter.py` (file move from `gateway/platforms/`). The fork carries a weekly automated rebase CI workflow; when conflicts arise, it opens an issue for manual resolution. Full rebase journal is maintained in `docs/rebase-journal.md`.

When a message arrives from a group chat, we set `context_id = chat_id` on the MemoryStore. This routes memory writes to `memories/contexts/{context_id}/MEMORY.md` and merges global + context-scoped memories for reads.

Design decisions:

- **`context_id` is opaque** — any string works: chat_id, tenant_id, org_id, project_id. The memory layer doesn't interpret it.
- **Global memory is readable by all contexts** — an agent's core knowledge (name, personality, skills) is available everywhere.
- **Global memory requires explicit `scope="global"` to write** — prevents accidental cross-context leakage.
- **Fully backwards-compatible** — `context_id=None` produces identical behavior to upstream Hermes. Zero impact on existing single-tenant deployments.
- **Minimal surface area** — ~70 lines across `MemoryStore`, the memory tool, and the gateway adapter.
- **Composes with upstream security features** — upstream's new promptware defense (`_sanitize_entries_for_snapshot`) and external drift detection (`_detect_external_drift`) both work correctly with scoped paths. Security scanning runs on merged global+scoped entries; drift detection operates on `_path_for(target)` which routes correctly per scope.

This has been running in production with 8 agents for months without memory leakage incidents. Before this patch, we had the exact production incidents described in #9514 — information from one context appearing in another. All 8 agents were recently migrated to the clean fork and verified operational.

- Fork: [NimbleCoAI/hermes-agent](https://github.com/NimbleCoAI/hermes-agent)

## Proposed Architecture: The `memory:scope` Hook

Rather than merging our hardcoded `context_id = chat_id` mapping, we propose extending Hermes's hook system with a new `memory:scope` hook. This follows existing plugin patterns and gives operators full control over isolation policy.

```python
# Hook contract: memory:scope
# Called before every memory read and write operation.
# Returns the context_id that scopes this operation.
# Returning None = global scope (backwards-compatible default).

class MemoryScopeHookPayload:
    """Payload passed to memory:scope hook handlers."""
    operation: str          # "read" or "write"
    session_id: str         # Current session identifier
    platform: str           # e.g. "telegram", "discord", "api"
    channel_id: str | None  # Platform-specific channel/group ID
    user_id: str | None     # Platform-specific user ID
    agent_id: str           # Current agent/profile identifier
    metadata: dict          # Additional context from the adapter

def memory_scope_hook(payload: MemoryScopeHookPayload) -> str | None:
    """
    Plugin implements this to determine memory scope.
    
    Examples:
        # Per-chat isolation (our production pattern)
        return payload.channel_id
        
        # Per-user isolation
        return f"user:{payload.user_id}"
        
        # Per-org with shared agent knowledge
        return f"org:{payload.metadata.get('org_id')}"
        
        # No isolation (upstream default)
        return None
    """
```

This approach:
- Preserves Hermes's plugin-first architecture
- Lets operators define isolation policy without forking
- Supports arbitrary tenancy hierarchies (per-chat, per-user, per-org, per-project)
- Requires no changes for existing single-tenant deployments (no hook registered = global scope)
- Can be implemented incrementally alongside the existing hook system

## Known Gaps in Current Upstream

During our audit and production operation, we identified additional issues that affect multi-tenant and multi-platform deployments. We've fixed all of these in our fork — patches are available:

**Fixed in our fork (patches available):**
- **`observe_only` for Signal/Mattermost**: Telegram has `observe_unmentioned_group_messages` for passive group context. Signal and Mattermost lacked this. Added via `MessageEvent.observe_only` field + `run.py` handler (~100 lines). Agents can now passively observe group context between @mentions.
- **Signal mention stripping**: Group messages prefixed with @mention failed command detection (e.g. `@bot /sethome` didn't parse). Fixed — the Signal adapter now strips mentions before command parsing, matching the Mattermost adapter's existing behavior.
- **Signal group invite → allowlist**: After accepting a group invite, the group is now dynamically added to the runtime allowlist so subsequent messages are processed. Also handles founding-member scenario via syncMessage detection.
- **Signal SSE stale connection**: The SSE stream to signal-cli goes stale after a few minutes — daemon is healthy but SSE is dead. Added health monitor that forces reconnect when daemon is healthy but SSE idle >120s.
- **Voice memo `voiceNote` flag missing from SSE**: signal-cli's SSE endpoint doesn't include `voiceNote` in attachment JSON. Detection uses `contentType.startsWith("audio/")` as a workaround. Voice memos also bypass `observe_only` in groups (you can't @mention in a voice memo).

**Also fixed in our fork (security-relevant for multi-tenant):**
- **Approval command admin gating**: Destructive commands (tool installs, config changes) are now gated behind admin-only approval. In multi-tenant, non-admin users in a group could previously trigger agent-wide changes.
- **Denial feedback with pattern context**: When command guards block an action, the denial message now includes what pattern triggered and suggests concrete alternatives. Without this, agents retry the same blocked action in a loop — a common failure mode we observed in OSINT workflows (see [NimbleCoAI/hermes-agent#13](https://github.com/NimbleCoAI/hermes-agent/pull/13)).

**Still needs upstream attention:**
- **`run.py` UUID auth gap**: `_is_user_authorized()` does raw string matching against `SIGNAL_ALLOWED_USERS` phone numbers but doesn't consult the adapter's UUID-resolved sets. When signal-cli provides UUID-only sender identifiers, valid users are rejected. Our fork includes UUID pre-resolution as an adapter-level patch, but the `run.py` auth gate needs a separate fix upstream.

## About Us

Headed by: [Juniper Bevensee](https://github.com/juniperbevensee) — 10+ years in distributed systems and ML/AI, 1+ year building and operating multi-tenant agentic AI systems across multiple organizations. 

We're not theorizing about multi-tenancy. We've been running it in production, hitting the real failure modes, and building the fixes. The patches we're proposing come from operational experience and user testing, not speculation.

More about the network swarm can be found at:
- [NimbleCo AI](https://nimbleco.ai)
- [GitHub](https://github.com/NimbleCoAI)

## Path Forward

We're prepared to submit clean, tested PRs for each layer:

1. **PR1: Memory scoping via `context_id`** — the backwards-compatible storage-layer change. Zero impact on existing users. Solves #28279, #14162, #4726, and the memory isolation aspect of #9514.

2. **PR2: `memory:scope` hook** — pluggable isolation policy following existing hook patterns. Lets any plugin determine scope without forking core.

3. **Future layers** (separate discussions):
   - Session tenant key for adapter-level routing (#17068, #10452)
   - Per-tenant model/cost isolation (#23735)
   - Audit scoping and memory provenance
   - Multi-adapter instance support (#29144, #9756)
   - Signal UUID auth in `run.py` (`_is_user_authorized()` string matching gap)

   *Already implemented in our fork as adapter patches (available for separate PRs):*
   - `observe_only` parity for Signal/Mattermost
   - Signal mention stripping, group invite → runtime allowlist, SSE reconnect, voice memo detection
   - Boot.md as a standalone plugin

We maintain a clean fork with automated weekly upstream sync — anyone can try the memory scoping patch today at [NimbleCoAI/hermes-agent](https://github.com/NimbleCoAI/hermes-agent). Current fork stats: 35 commits (4 core + 11 adapter + 4 plugins + 3 security/fix + 5 infra + 3 docs + 1 config; ~28 net unique functional changes), recently rebased against 410 upstream commits with only 3 conflicts. The fork includes a CI/CD workflow that automatically attempts weekly rebases and opens issues on conflicts.

The 12+ open issues on this topic represent real operators with real deployments blocked on this capability. We'd like to work with the maintainers to land these changes upstream rather than fragmenting the ecosystem with forks. Happy to iterate on the design, adjust the hook contract, or adapt to whatever architectural direction the team prefers.

Thank you for your time and attention. We're so excited about the future of multiplayer agentic AI.

_Notes on contributor guidelines: This is Security Hardening, Bug Fix, Core Patch with minimal surface. It's broad because the issue touches many parts and brings together numerous existing issues and PRs, backed up by external sources. However, the solutions are straightforward. The plugins follow Hermes patterns and wherever possible we sought to extend compatibility rather than diverge._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solving the Multi-Tenant Hermes Problem #34352

The Problem

Community Demand

What Hermes Already Does Well

The Structural Gap

How Other Frameworks Handle This

Our Implementation

Proposed Architecture: The `memory:scope` Hook

Known Gaps in Current Upstream

About Us

Path Forward

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Solving the Multi-Tenant Hermes Problem #34352

Description

The Problem

Community Demand

What Hermes Already Does Well

The Structural Gap

How Other Frameworks Handle This

Our Implementation

Proposed Architecture: The memory:scope Hook

Known Gaps in Current Upstream

About Us

Path Forward

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Proposed Architecture: The `memory:scope` Hook