feat(api_server): per-merchant identity headers for multi-tenant deployments by apmzoom · Pull Request #12054 · NousResearch/hermes-agent

apmzoom · 2026-04-18T07:51:06Z

feat(api_server): per-merchant identity headers for multi-tenant deployments

Summary

Add three optional HTTP headers to POST /v1/chat/completions so a single
hermes-agent process can serve many isolated tenants (merchants, end-customers,
project workspaces, …) without forking. Critical for self-hosted enterprise
deployments where one Mac mini / VPS must serve hundreds of users.

X-Hermes-Merchant-Id   tenant scope, [A-Za-z0-9_-]{1,64}
X-Hermes-Active-Skill  preload SKILL.md as ephemeral system prompt
X-Hermes-Session-Id    (already existed, now namespaces under merchant)

100% backward-compatible — requests without the new headers behave identically.

Motivation

Hermes today is built around the single-user / single-machine mental model:
one global SOUL.md, one global memory, one ambient identity. This works great
for personal/dev usage and for frontier models with 200K context.

Two distinct pain points appear when scaling to multi-tenant local-LLM
production:

1. Cross-tenant identity leakage

Without per-tenant scoping, the global ~/.hermes/SOUL.md, ~/.hermes/AGENTS.md
and the persistent memory snapshot are shared across every API request. In a
hosted scenario where merchant A and merchant B both POST to the same
gateway, both see the same SOUL persona and can read each other's memory.

2. Multi-turn context bloat kills 7-14B local models

The current architecture relies on the agent calling skill_view(name) to load
SKILL.md content into the conversation. That's beautiful with frontier models —
they get progressive disclosure. But:

Every skill_view call returns the full SKILL.md (often 3-5K tokens)
Tool results live in conversation history forever
After 2-3 skill loads + a few tool calls, prompt is 12K+ tokens
Local 7-14B models (qwen3, llama3.1, etc.) start losing attention coherence
somewhere around 8K and silently degrade to incoherent / off-topic output
instead of returning errors

I tested this empirically:

qwen3:8b on a 16GB M4 Mac mini, vague prompt + 28 tools + 86 skills loaded:
3m30s, output completely off-topic (talking about /home/user not existing)
Same model, same prompt, but with X-Hermes-Active-Skill: apmzoom-products
pre-loaded as ephemeral system prompt + merchant_mode=True skipping global
context: ~55s, on-topic structured response

What this PR adds

`_load_merchant_prompt(merchant_id, active_skill)` helper

Reads merchant-scoped files (silently skipping any that don't exist):

~/.hermes/merchants/<id>/identity.md       per-merchant SOUL replacement
~/.hermes/merchants/<id>/system_extras.md  optional extra session context
~/.hermes/skills/openclaw-imports/<skill>/SKILL.md   pre-loaded skill
~/.hermes/skills/<skill>/SKILL.md                    fallback skill location

SKILL.md is truncated at 4000 chars to keep budget sane for local models.

`merchant_mode` flag on `_create_agent`

When True (auto-toggled by presence of X-Hermes-Merchant-Id):

Sets skip_context_files=True (no global SOUL/AGENTS)
Sets skip_memory=True (no shared memory snapshot)

Header parsing in `_handle_chat_completions`

Validates merchant_id and active_skill against tight regexes
Returns 400 on malformed values
Threads merchant_mode through _run_agent → _create_agent
Prefixes derived session_id with merchant-<id>- so similar conversation
fingerprints don't collide across tenants

What this PR explicitly does NOT do

❌ Authentication: still relies on existing API_SERVER_KEY (per mini, not
per merchant). Production deployments should use Cloudflare Tunnel + Access
policies for per-merchant auth at the edge.
❌ Per-merchant memory persistence: future work. Today merchants get fresh
memory each session. Frontend can persist via Idempotency-Key + session
resume.
❌ Onboarding endpoint: there's no POST /v1/merchants/init. Operators
populate ~/.hermes/merchants/<id>/identity.md via their own scripts (a
reference Python helper is included in our deployment but not in this PR).
Could be a follow-up PR.
❌ Skill scoping per merchant: skills are still globally listed in
<available_skills>. Future enhancement could read
~/.hermes/merchants/<id>/enabled_skills.txt to scope.

Compatibility

All existing tests pass (no API surface changes)
No new dependencies
New headers are opt-in
New filesystem layout is opt-in (only created when operator chooses)

Test plan

Manual: existing /v1/chat/completions calls without new headers → unchanged
Manual: with X-Hermes-Merchant-Id only → identity injected, prompt size drops ~30%
Manual: with X-Hermes-Active-Skill → skill content in system prompt, no skill_view roundtrip needed
Manual: malformed merchant_id → 400 with clear error
Manual: cross-merchant session isolation (different ids never share session storage)
Add integration tests for header parsing + merchant_mode flag (would appreciate maintainer guidance on style — happy to add)

Real-world deployment

This PR was developed against a 16GB Apple M4 Mac mini running qwen3:8b
locally via Ollama, serving an APM (Korean Dongdaemun fashion market)
merchant chat workload through a Cloudflare Tunnel. The patched hermes
replaced an in-house Hono daemon; multi-merchant isolation, identity-aware
responses, and tractable context size were all verified end-to-end.

…oyments Add three optional HTTP headers to /v1/chat/completions for enterprise multi-tenant scenarios where one mini/server hosts many merchants: X-Hermes-Merchant-Id : tenant scope, sanitized [A-Za-z0-9_-]{1,64} X-Hermes-Active-Skill : preload SKILL.md as ephemeral system prompt (avoids skill_view multi-turn context bloat) X-Hermes-Session-Id : already existed; now namespaces under merchant When X-Hermes-Merchant-Id is present: - Loads ~/.hermes/merchants/<id>/identity.md (becomes ephemeral system) - Loads ~/.hermes/merchants/<id>/system_extras.md (optional) - Sets skip_context_files=True to suppress global SOUL.md/AGENTS.md auto-injection (prevents cross-merchant identity leakage) - Sets skip_memory=True to avoid mixing memory across tenants - Prefixes session_id with "merchant-<id>-" for clean isolation When X-Hermes-Active-Skill is present: - Reads ~/.hermes/skills/openclaw-imports/<skill>/SKILL.md or ~/.hermes/skills/<skill>/SKILL.md - Truncated to 4000 chars to fit local 7-14B model context budget - Injected as part of ephemeral_system_prompt — eliminates need for the agent to call skill_view, removing a major source of multi-turn context bloat (skill_view results pile up across turns and quickly push prompt past 12K tokens, where local 7-14B models start to lose attention coherence) This is fully backward-compatible: requests without the new headers behave identically to before. Tested on: 16GB Apple M4 Mac mini running qwen3:8b via Ollama Tested with: real APM merchant API (apmzoom-products skill, nested JSON field extraction, null fidelity) Closes the gap between hermes-agent's frontier-model-oriented design and self-hosted local-LLM enterprise deployments where one process must serve hundreds of tenants without identity bleed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Browsers block any custom request header that's not in the preflight Allow-Headers list. Without this fix, frontends running on a different origin than the api_server (the common case — Cloudflare Tunnel, Open WebUI, agent-claw) cannot send X-Hermes-Merchant-Id or X-Hermes-Active-Skill, defeating the multi-tenant feature added in the previous commit. Also expose X-Hermes-Session-Id on responses so SSE clients can read it to chain follow-up requests to the same conversation across origins. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…t deployments Auto-enabled when merchant_mode is on. Strips ~700-1300 tokens of behavioural guidance and skills-index boilerplate from the system prompt, since multi-tenant frontends typically: 1. pre-select the active skill via X-Hermes-Active-Skill (so the full <available_skills> enumeration is dead weight — usually the biggest single contributor at 100-200 tokens per installed skill) 2. constrain agent behaviour through their own UI flow (so MEMORY / SESSION_SEARCH / SKILLS / TOOL_USE_ENFORCEMENT guidance blocks are redundant) Skipped sections in lean_mode: - MEMORY_GUIDANCE (~150 tokens) - SESSION_SEARCH_GUIDANCE (~50 tokens) - SKILLS_GUIDANCE (~100 tokens) - TOOL_USE_ENFORCEMENT_GUIDANCE (~180 tokens, plus model-family addenda for Google/OpenAI when they would have applied) - nous_subscription_prompt (variable) - <available_skills> enumeration (~120 tokens × N installed skills) Measured impact on 16GB Apple M4 Mac mini running Qwen3-8B-4bit through oMLX (Apple MLX), via the multi-tenant header path: prompt_tokens (short reply): 3446 → 2726 (-21%) 2-char reply latency (warm): 9.5s → 4.9s (-48% best case) TPS (median): 11 → 17 (+55%) 3-turn dialogue total: 102s → 88s (-14%) Backward compatible — lean_mode defaults to False; existing single- tenant deployments behave identically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…edentials.json When merchant_mode is on AND ~/.hermes/merchants/<id>/credentials.json exists with an `access_token` field, override the gateway-wide LLM api_key with that token for the duration of the request. Why: enterprise multi-tenant deployments often back hermes-agent with a remote LLM gateway that issues per-user JWTs (each user/merchant has their own quota, audit trail, and model permissions on the upstream service). Without this override, every merchant on the mini would share a single mini-wide service token, defeating per-tenant accounting and isolation. The token is read fresh per request, so when merchant-onboard.py refreshes credentials (e.g. after JWT expiry → refresh_token → new access_token), subsequent calls pick up the new token without a hermes restart. Compatible with any OpenAI-shaped upstream that authenticates via `Authorization: Bearer <jwt>` — including OpenRouter, Together, Fireworks, and self-hosted gateways like the apmzoom worker. For upstreams that speak a different request shape (e.g. apmzoom worker takes `{message: string}` instead of `{messages: [...]}`), pair this with a small translation proxy on the mini (see apmzoom-llm-proxy.py in our deployment). Falls back silently to the gateway api_key when the credentials file is missing or unreadable, so non-merchant-mode requests behave exactly as before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add tools/apmzoom.py — auto-registers ~/.hermes/skills/apmzoom/{read,write}/ skills as OpenAI function-calling tools (apm_* namespace). Handler parses each skill's SKILL.md frontmatter to derive api_method / api_url / auth_type and signs requests per apm_sign (AWS gateway) or bearer (worker) conventions. Per-request merchant identity is threaded via ContextVar, so token refreshes on disk are picked up live with no gateway restart. api_server.py: propagate X-Hermes-Active-Skill into the bridge and, when the active workflow's SKILL.md declares metadata.apmzoom.skills_used, prune agent.tools to that subset post-construction. Non-apm_* tools (memory, skill_view, …) are untouched. Keeps the system prompt lean for local 7-14B models — typical workflow needs 2-6 apm_* tools, not all 16 in the global allowlist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extend _load_merchant_prompt to parse `see_also:` list in the active workflow's SKILL.md frontmatter and eagerly concatenate each referenced skill's SKILL.md into the ephemeral system prompt. Local 7-14B models rarely skill_view proactively for cross-references, so they lose context that the workflow author meant to share (e.g. apmzoom-workflow-patterns/ lock-goods-id, readback-verify). Pre-inlining delivers the pattern content up-front; each ref is trimmed to 2000 chars to stay within budget. Companion mini-side artifact: apmzoom-workflow-patterns/ folder hosting reusable recipes (not committed — lives under ~/.hermes/skills/ on the merchant's mini, synced out-of-band). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t state ContextVars don't propagate into ThreadPoolExecutor workers by default, so merchant_id resolved to "" inside tool handlers even after api_server.py called set_merchant_id(), causing auth headers to never be attached. Since hermes dispatches tool execution through run_agent.py's executor, ContextVar was the wrong primitive here. Switched to plain module globals. Safe for the Mac mini single-process deployment (requests are effectively serial through this gateway); if we later need true per-request isolation under concurrent load, the fix is to wrap executor.submit in contextvars.copy_context().run(...) in run_agent.py rather than re-adopt ContextVar here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Upstream worker registers api_method="POST" for ~23 read-type AWS gateway skills (gds_m_storegoodslist, gds_m_goodseditinfo, gds_u_*, auth_me, …) but those endpoints actually only accept GET — so every real merchant call failed with 405 Method Not Allowed, making write workflows uncatchable (the initial storegoodslist lookup blew up). Rather than edit each SKILL.md (skill-sync-worker.py would overwrite) or block on upstream fix, auto-heal in the bridge: on 405 with POST, retry once with GET (params migrated to query string) and memoize the winning method per skill so subsequent calls don't pay the wasted roundtrip. In-process cache only — wiped on gateway restart, then reheats on first call per skill. Fine for this single-process mini deployment. Verified: check-stock-and-report workflow now returns real goods data from 44k2t5n59e.execute-api.ap-northeast-2.amazonaws.com. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Local 7-14B models struggle with ``{ params: object }`` tool schemas: they hallucinate field names, skip required values, and blow 400s before the bridge can recover. Two changes to close that gap: 1. **Build real JSON schemas from SKILL.md `parameters:` frontmatter.** The worker publishes a Python-repr spec listing body/query/path fields with types + required flags; we parse it (ast.literal_eval) and emit `{properties, required}` under the `params` property of each tool definition. LLMs see exact field names in their function-calling list — no more guessing. 2. **Pre-flight validation in `_execute_skill`.** Before sending HTTP, check LLM-supplied params against the required set. Missing fields short-circuit with a structured error payload containing a `hint` mapping each missing field to the upstream skill that can fetch it (e.g. `goods_id` → "先调 apm_gds_m_storegoodslist 按关键字/编号搜到 goods_id"). Local LLMs follow these hints in-loop, turning a would-be 400 into a 2-step recovery. Worker metadata bug mitigation: ~11 of the write skills (editgoodsprice, editgoodsstock, …) register `goods_id`/`sale_price`/`stock_count` as `required: False`, leaving only `ver` strict. Force these required for write skills only (verb-in-name heuristic) — read/list skills keep the registered flags so flexible entry points like `storegoodslist {keyword}` vs `storegoodslist {goods_id}` both pass validation. Unit-tested: write skills gain 2-3 required fields as expected; read list skills stay 0-required; missing-param error payloads carry proper hints. E2E confirmed on mini — merchant query → storegoodslist with keyword-only passes validation, hits API, returns real result. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Every gds_m_edit* skill declares `ver` as required — it's the optimistic concurrency token the backend checks against to reject stale writes. Without it, the LLM has to: list the goods → read ver → carry it to the edit call → hope nothing changed in between. That's 2 round trips, extra context bloat, and one more place for a local 7-14B model to drop the ball. Bridge now fetches ver itself right before validation, via gds_m_goodseditinfo?goods_id=X (the authoritative per-goods endpoint — note storegoodslist's goods_id is actually a cursor-pagination marker, not a filter). Mutates params in place. Silent no-op on any failure, so the pre-flight validator still catches the missing ver and hints the LLM to fetch manually as fallback. Side note: bumped mini's qwen3:8b-prod num_ctx 8192 → 16384 (not in repo — Modelfile on mini). Recent write workflows with see_also pattern injection + tool schemas pushed prompts past 8K; 16K gives comfortable headroom while staying well under qwen3's 40960 native. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When merchant_mode is on, memory I/O now routes to ~/.hermes/merchants/ <mid>/memories/ instead of the global ~/.hermes/memories/. Each tenant gets their own MEMORY.md / USER.md, persisting across sessions with zero bleed-through between merchants. Previously merchant_mode auto-disabled memory entirely (skip_memory=True) to avoid leaking the global knowledge file across tenants. That was safe but threw away the whole memory feature per-merchant. This change flips the logic: memory is kept ON, but its storage directory is re-pointed to a merchant-scoped path via set_merchant_memory_scope() before AIAgent construction. get_memory_dir() reads the scope at call time so writes go to the right place throughout the agent loop. Plain module global (not ContextVar), same reason as tools/apmzoom.py — hermes' ThreadPoolExecutor tool dispatch drops ContextVar state. Scope is reset to "" on non-merchant requests to preserve global-dir behaviour for CLI / health-probe traffic hitting the same process. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a new pattern for letting the LLM summon rich frontend components (uploader, price-confirm dialog, product-editor, …) via normal OpenAI function-calling. Each ui_* tool's handler doesn't make an HTTP call; it pushes a structured payload onto the streaming response queue as a custom `event: hermes.ui.prompt` SSE event, then returns a lightweight "awaiting_user_input" placeholder to the LLM so the agent loop pauses gracefully until the next turn. tools/apmzoom_ui.py (new): - UI_COMPONENTS catalogue — {props schemas} for each component kind - set_ui_emitter() — api_server plumbs in a push-to-stream-queue function - _make_ui_handler() — closure that builds the SSE payload + returns the placeholder for the LLM - Initial component library: ui_uploader, ui_price_confirm, ui_product_editor (add more by appending to UI_COMPONENTS + writing the matching React bubble on the frontend) gateway/platforms/api_server.py: - In the streaming chat-completions handler, set the UI emitter to push ("__ui_prompt__", payload) tuples onto _stream_q (same mechanism as existing __tool_progress__ events) - _emit() grows a new branch that writes `event: hermes.ui.prompt\n data: {...}\n\n` for those tuples. Compliant frontends render the React component inline; clients that don't understand the event type silently ignore per the SSE spec, so this is a non-breaking addition. Same plain-module-global pattern as tools/apmzoom.py merchant_id — the emitter set via set_ui_emitter is visible to the ThreadPoolExecutor handler thread where ContextVars would not survive. Follow-up (P2, frontend): agent-claw needs an SSE listener for hermes.ui.prompt plus a component router mapping `component` → React bubble. On user interaction the frontend sends a follow-up chat message containing `[ui-result] component=X correlation_id=Y result={...}` which the next agent turn reads to continue the workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Complements ui_uploader / ui_product_editor / ui_price_confirm for the batch-upload-and-publish workflow. Frontend receives an `items` array where each entry bundles preview_url + vision_fields + recommended class_id so the batch editor can render per-item rows and let the merchant tick which ones to publish. Follow-up on mini (not in repo): - upload-image-and-publish-product SKILL.md v2.2 — first tool call on trigger words is now ui_uploader (not a "please click the upload button" prose, which was ambiguous to the LLM — the screenshot in chat showed the model offering CSV templates instead) - batch-upload-and-publish SKILL.md v2.2 — same, ui_uploader(max=20) then concurrent vision then ui_batch_editor Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Synthetic ui_* tools now emit only ``event: hermes.ui.prompt`` — the parallel ``event: hermes.tool.progress`` from the generic tool-start callback was being misrendered by the frontend as a success pill (e.g. "✓ 已上传 1 张图片" for ui_uploader, before the user has uploaded anything — misleading and raced with the real uploader bubble). Also routes tools/apmzoom_ui.py's logger into ~/.hermes/logs/apmzoom.log so ui tool invocations ("[apmzoom_ui] emitted ui_uploader …") are visible in the same stream as HTTP skill calls, matching the file- handler attachment pattern in tools/apmzoom.py. Verified end-to-end: "上传图片" → LLM calls ui_uploader → backend emits exactly one hermes.ui.prompt event with {component: "uploader", props: {instruction, max_files, accept}, correlation_id}, no parallel tool.progress chip. Frontend follow-up (out of this repo): implement UI_REGISTRY.uploader bubble so component="uploader" renders inline instead of falling back to "请用 APM Zoom App 完成此步骤". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends _load_merchant_prompt to also read and inject ~/.hermes/merchants/<mid>/identity_dynamic.md alongside the static identity.md. This file is the output of scripts/merchant-profile- refresh.py, which polls gds_m_storegoodslist + goodsclasslist every ~30 min (or on-demand) and writes a compact summary: active SKU count, price distribution, dominant categories, low-stock / out-of- stock counts, sample product names. With this in the system prompt, the LLM can answer "我店里卖啥" / "价位大致多少" / "快没货的有哪些" without issuing tool calls — just reads the snapshot it already has. Verified against a live merchant: completion_tokens=328, zero apm_* tool calls, response cites the exact 5 sample product names + the computed avg price. The refresh script lives under ~/.hermes/scripts/ on the mini (not in this repo — per-deployment concern). Recommended cron: */30 * * * * ~/.hermes/hermes-agent/venv/bin/python3 \\ ~/.hermes/scripts/merchant-profile-refresh.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Screenshot showed LLM responding to "上传商品" (which was under the product-search workflow's active_skill) with a call to ui_uploader even though the user wanted to *upload* (i.e. run the upload workflow). Earlier the description said "Ask the merchant to upload one or more files"; that read equally like "any operation that involves products". Added a concrete ALLOW / DENY list in the description so DeepSeek (and weaker local models) stops treating ui_uploader as a generic "do something with products" tool: ONLY call when: 新品上架 / 发图 / 传照片 / 以图搜款 / 识图改款 / 商品主图 DO NOT call for: price changes, stock queries, text-only searches, category lookups, operations on existing 商品 (those have dedicated apm_* tools) Companion (mini-only, not in repo): config.yaml now points at deepseek-chat / provider=deepseek with DEEPSEEK_API_KEY in .env, so per-request latency drops from ~60s (qwen3 local) to ~3s, making the eval harness + real-user flows both practical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Screenshot showed tool chips (看图分析 / 读取商品分类) stuck with spinning ⟲ indicator long after the tools actually returned (vision 5s, classlist 2s per apmzoom.log). Frontend had no way to know the tool finished — only tool.started events were forwarded. Forward both tool.started and tool.completed through the same hermes.tool.progress SSE event, distinguished by a new `state` field ("started" | "completed"). Frontend can now transition chips from ⟲ to ✓ as soon as each tool returns, while the LLM keeps generating text or chaining further tool calls. Non-breaking for existing frontends: they can ignore the `state` field and behave as before (treat any event as "started"); the worst that happens is the chip re-renders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…skills_used Problem: DEFAULT_ALLOWLIST is hand-maintained and drifts out of sync with the workflows. Today's eval baseline exposed it: inbox-triage-and-reply/inbox-overview expected apm_pms_m_pushmsglist, but pms_m_* were never in DEFAULT_ALLOWLIST, so the tool was never registered; the LLM flailed with 23 consecutive `terminal` calls trying to figure out how to read messages. check-stock-and-report/sku-drill-down expected apm_gds_m_goodseditskuinfo, same story. And edit-product-info expected apm_gds_m_goodseditinfo, also unregistered. Fix: _allowlist() now unions DEFAULT_ALLOWLIST with the `skills_used` declared in every ~/.hermes/skills/apmzoom-workflows/*/SKILL.md. Authors only need to list the skills in their workflow's frontmatter — the bridge auto-registers them. ui_* entries are filtered out (they live in tools/apmzoom_ui.py and register themselves there). Explicit APMZOOM_TOOL_ALLOWLIST env var still overrides both, for debugging / integration tests. Verified on a live merchant: before: 16 registered, 101 skipped after: 33 registered, 84 skipped (auto-detected pms_m_*, gds_m_goodseditinfo, gds_m_goodseditskuinfo, gds_m_editgoodsdiscount(price), gds_m_editgoodsskuprice, gds_m_goodsskuiscandel, chat_completions, and others) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Screenshot showed DeepSeek narrating "现在打开商品编辑器。根据识别的商品信息(jacket/夹克),我推荐分类ID 18" and then stopping — no ui_product_editor tool_call emitted, chain dead. Seen before with ui_uploader too ("请上传主图" without the tool_call to render the uploader). Workflow SKILL.md already says "must tool_call"; DeepSeek still skipped. Diagnosis: the model treats ui_* tools as descriptive actions because their handlers don't fire HTTP — it doesn't feel like a "real" tool call. Front-loaded every ui_* description with: 🚨 THIS IS A TOOL YOU MUST INVOKE. Saying "<pattern>" is NOT enough — the UI only renders when you emit a tool_call. EMIT THIS TOOL CALL as the next action, do not end the turn with text. Four components updated: ui_uploader, ui_price_confirm, ui_product_editor, ui_batch_editor. Frontend contract unchanged — this is pure in-model messaging to nudge DeepSeek out of the narrate-then-stop pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Screenshot showed 4 看图分析 chips + 2 读取商品分类 chips, but log confirmed only 2 vision calls + 1 classlist. Root cause: my earlier patch that started emitting both tool.started and tool.completed didn't give the frontend any way to dedup — same tool fired twice (once for each state) renders as two separate chips. Add a `call_id` to the progress payload (`t1-vision_analyze`, `t2-vision_analyze`, …). Stack per tool name: push on started, pop on completed. Frontend can now match events by call_id and transition one chip ⟲ → ✓ instead of creating two. Same-tool-called-twice still gets two distinct call_ids → two chips (correct behavior — that's actually two calls). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… fields Screenshot showed the editor bubble rendering with 商品名/价格/分类/库存/ SKU only — missing 商品简介 (goods_detail), 产地 (make_address_id), and the cascade_id form of 分类 (gds_m_addgoods wants "1-6-7" cascade string, not a single class_id). Merchants clicking 确认提交 would hit a 400 from the backend for each missing required field. addgoods actually requires 7 fields per the worker's SKILL.md: goods_class_cascade_id, goods_name, sale_price, stock_count, make_address_id, goods_detail, goods_gallery Rework ui_product_editor schema so the LLM hands the frontend: - recommended_goods_name (vision.color+material+category) - recommended_class_cascade_id (1-6-7 cascade, not single id) - class_options (flattened tree with cascade ids) - make_address_options (韩国/中国/其他 enum) - recommended_make_address_id (default 1) - suggested_stock (default 10) - suggested_detail (vision.selling_points joined) Companion workflow update (mini-only): upload-image-and-publish-product step 2-4 now calls ui_product_editor with the full props set, and step 2-3.5 fetches gds_m_goodsmakeaddresslist so the editor has origin options. Frontend follow-up: editor form needs to surface 商品简介 + 产地 fields to match, and its submit payload should use cascade_id string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Live upload-image-and-publish-product hit: code 103, "起购数量最小为1" five times in a row — the LLM correctly passed stock_count=10, but addgoods rejected the call because least_buy_num was absent. The SKILL.md description says "默认1" but backend clearly doesn't honour that default on the wire. LLM misread the error as "stock_count must be ≥ 1" and retried five times with an identical payload, each time 103-ing. Bridge now carries a _PARAM_DEFAULTS map per skill. For gds_m_addgoods: least_buy_num=1 / limit_buy_num=0 / is_sell=1 / discount_percent=1 / currency_type=1 are injected before HTTP when the caller doesn't set them. Same mechanism as _maybe_autofetch_ver — caller-side (LLM) sends minimal payload, bridge fills in the production-safe defaults. Verified: posting the exact payload that was 103-ing now returns code 100 "发布成功". Single authoritative place to fix this kind of upstream-metadata drift for any skill in the future. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Browser session stuck for 55s+ on "正在思考..." after vision+classlist succeeded — the LLM was spending all that time trying to build the '1-6-12' cascade string by flattening an 8-12KB class tree in its head and constructing recommended_class_cascade_id + class_options to hand to ui_product_editor. Simplify: LLM only picks a leaf class_id (e.g. 12 for 毛衣) and passes the raw goodsclasslist result as class_tree. Bridge does the tree walk + cascade construction on addgoods. tools/apmzoom.py: - _build_class_cascade(tree, leaf_id) → "1-6-12" string - _maybe_build_cascade_id(skill, params) — if caller passed class_id or an invalid non-cascade goods_class_cascade_id, fetch the tree (via new _internal=True fetch to skip the 8KB truncation that had been breaking our internal JSON parse) and upgrade to full cascade. - _execute_skill gains _internal flag to preserve full response for bridge-internal consumers while keeping the LLM-facing 8KB cap. tools/apmzoom_ui.py ui_product_editor props: - recommended_class_cascade_id + class_options[] → replaced with recommended_class_id (leaf int) + class_tree (raw). Frontend flattens; LLM doesn't. Workflow (mini-only): step 2-4 payload simplified, step 2-3.5 goodsmakeaddresslist prefetch removed (3 options is fine to hardcode in props). Verified: addgoods({class_id: 12, ...}) now returns code=100 "发布成功" after bridge rewrites to {goods_class_cascade_id: "1-6-12", ...}. LLM turn-latency should drop from ~60s of class-tree-gymnastics back to ~5s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Live session just showed the worst kind of narrate-and-invent bug: 17:19:15 addgoods → code 100 "发布成功" (response: NO goods_id) 17:19:26 goodseditinfo({goods_id: 1776585446866}) ← hallucinated; that number is actually the timestamp from the image URL 17:19:26 goodseditinfo → 暂无数据 (no such id) 17:19:50 addgoods (again!) → code 100 ... 17:19:50 same hallucinated goods_id lookup → empty again Net result: the merchant's shop now has 2 duplicate copies of the same blazer because the LLM couldn't find the real new id and kept "correcting" by re-creating the product. Root cause: backend's addgoods response is ``{code:100, message:"发布成功"}`` with NO goods_id. LLM has no authoritative way to know the id and makes stuff up. Bridge-side enrichment: after a successful addgoods, bridge itself calls gds_m_storegoodslist({page_size:1, mark:1}) and splices the newest item's {goods_id, goods_name, sale_price, stock_count, ver} into the result.goods_id field of the original response before returning to the LLM. No workflow changes needed — LLM now reads ``result.goods_id`` directly from the tool return. Verified: a fresh addgoods call now returns: {"code":100,"message":"发布成功","result":{ "goods_id":165177638203,"goods_name":"...","ver":1, ...}} Same pattern as _maybe_autofetch_ver and _maybe_build_cascade_id — bridge carries the glue so the LLM can stay declarative. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Browser session showed a 92-second gap between vision succeeding (17:36:38) and ui_product_editor emit (17:38:10). Root cause: workflow told the LLM to pass class_tree (full goodsclasslist result, ~8KB of JSON) and make_address_options as tool_call args. DeepSeek emits tool_call args token-by-token — 2-3K tokens × ~30ms = 60-90s pure token-generation latency for data the bridge already has in hand. Swap to bridge-side enrichment. LLM now emits a lightweight tool_call with 5 props (preview_url, vision_fields, recommended_goods_name, recommended_class_id, suggested_stock, suggested_detail); bridge synchronously fetches goodsclasslist (via _execute_skill internal=True to bypass the 8KB truncation) and splices in class_tree + make_address_options before pushing the hermes.ui.prompt event. Frontend receives the same rich props it did before — no contract change. Also updated the two "heavy" prop descriptions to "DO NOT PASS THIS FIELD" so new LLMs reading the schema don't try to fill them. The fields stay in the schema's properties (not in required) just to document what the frontend consumes. Expected latency for Turn 2 (vision → classlist → ui_product_editor): before ~90-120s, after ~15s (5s vision primary + 1s classlist + 5s vision fallback + 3s editor tool_call generation + 1s SSE ping). Verified: direct handler call with LLM's 5 light props yields SSE payload containing all 9 expected props including a 5-top-level-node class_tree and the 3 hardcoded make_address options. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Silent regression: v4 eval dropped from 60% → 27.5% with all fails showing empty tools_called[] despite LLM actually calling tools (content previews referenced real goods names/IDs). SSE probe confirmed ZERO hermes.tool.progress events reached clients. Cause: _on_tool_progress references _tool_call_counter and _tool_call_stack for started↔completed call_id pairing, but the declarations of those two locals got dropped somewhere between commits (likely an earlier rebase/revert left the consumer without the producer). Each progress event raised NameError inside the agent's tool-start callback, which hermes swallows — so from the outside every invocation looked silent. Restored the declarations right next to _on_delta, with a comment calling out the regression pattern so it doesn't happen again. Verified: SSE stream from check-stock-and-report now carries hermes.tool.progress with state+call_id as expected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

doge song and others added 26 commits April 18, 2026 16:42

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 24, 2026

alt-glitch added the area/config Config system, migrations, profiles label Apr 24, 2026

gsskk mentioned this pull request May 12, 2026

feat(api-server): X-Hermes-User-* / Chat-* headers for multi-user identity #24423

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api_server): per-merchant identity headers for multi-tenant deployments#12054

feat(api_server): per-merchant identity headers for multi-tenant deployments#12054
apmzoom wants to merge 26 commits into
NousResearch:mainfrom
apmzoom:feat/api-server-multi-tenant

apmzoom commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

apmzoom commented Apr 18, 2026

feat(api_server): per-merchant identity headers for multi-tenant deployments

Summary

Motivation

1. Cross-tenant identity leakage

2. Multi-turn context bloat kills 7-14B local models

What this PR adds

_load_merchant_prompt(merchant_id, active_skill) helper

merchant_mode flag on _create_agent

Header parsing in _handle_chat_completions

What this PR explicitly does NOT do

Compatibility

Test plan

Real-world deployment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`_load_merchant_prompt(merchant_id, active_skill)` helper

`merchant_mode` flag on `_create_agent`

Header parsing in `_handle_chat_completions`