Skip to content

feat(webhook): direct delivery mode for zero-LLM push notifications#12473

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-1a63c86e
Apr 19, 2026
Merged

feat(webhook): direct delivery mode for zero-LLM push notifications#12473
teknium1 merged 1 commit into
mainfrom
hermes/hermes-1a63c86e

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

External services can POST to a webhook route and have the payload delivered directly to a user's chat via any gateway platform adapter — zero LLM tokens, sub-second delivery — by setting deliver_only: true on the route. Reuses every existing webhook primitive (HMAC auth, rate limits, idempotency, templates, cross-platform dispatch) instead of standing up a second HTTP ingress server.

Motivated by PR #12117 from the Antenna team — they identified a real gap (no zero-token external push path today) and proposed a parallel HTTP server. This PR closes the same gap additively in the existing webhook adapter with ~100 lines of new product code.

Changes

  • gateway/platforms/webhook.py — new _direct_deliver() helper + early dispatch branch in _handle_webhook when deliver_only=true. Startup validation in connect() rejects deliver_only with deliver=log (log-only direct delivery is pointless).
  • hermes_cli/main.py + hermes_cli/webhook.py--deliver-only flag on hermes webhook subscribe; list/show output marks direct-delivery routes clearly.
  • website/docs/user-guide/messaging/webhooks.md — new Direct Delivery Mode section with config example, CLI example, response codes, configuration gotchas.
  • skills/devops/webhook-subscriptions/SKILL.md — documents --deliver-only with use cases (bumped to v1.1.0, tags expanded).
  • tests/gateway/test_webhook_deliver_only.py — 14 new tests.

Example — config.yaml (Antenna's use case)

platforms:
  webhook:
    enabled: true
    extra:
      port: 8644
      secret: global-fallback
      routes:
        antenna-matches:
          secret: antenna-webhook-secret
          deliver: telegram
          deliver_only: true
          prompt: "🎉 New match: {match.user_name} matched with you!"
          deliver_extra:
            chat_id: "{match.telegram_chat_id}"

Example — dynamic subscription via CLI

hermes webhook subscribe antenna-matches \\
  --deliver telegram \\
  --deliver-chat-id 123456789 \\
  --deliver-only \\
  --prompt "🎉 New match: {match.user_name} matched with you!"

Validation

Check Result
New tests (tests/gateway/test_webhook_deliver_only.py) 14/14 pass
Full webhook suite (existing + new) 78/78 pass
E2E: real aiohttp server + real urllib POST 200 OK, agent NOT invoked, target adapter.send() called with rendered template
E2E: duplicate X-GitHub-Delivery ID status=duplicate, target called exactly once
E2E: CLI --deliver-only flag Writes deliver_only: true to subscriptions.json; rejects --deliver log combination
Startup validation deliver_only + deliver=log raises ValueError at connect()

Invariants preserved

  • HMAC signature validation (GitHub/GitLab/generic) — still enforced on every POST, verified in tests
  • Rate limiting per route — still applies, verified in tests
  • Idempotency cache via X-GitHub-Delivery / X-Request-ID — still applies, verified in tests
  • Body size limits, port conflict detection — unchanged
  • Agent-mode routes (the existing behaviour, deliver_only absent or false) — unchanged, backward compat verified in tests and CLI E2E

Credit

Problem identified by @H1an1 / Antenna in #12117. That PR proposed a separate 426-line HTTP server with its own bearer-token auth scheme bound to a different port (8643 alongside webhook's 8644). This PR delivers the same capability as a ~100-line extension to the existing webhook adapter, inheriting the stronger HMAC auth and configurable rate limits.

External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).

Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.

Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.

Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
  dispatch branch in _handle_webhook when deliver_only=true. Startup
  validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
  subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
  Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
  with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
  agent bypass, template rendering, status codes, HMAC still enforced,
  idempotency still applies, rate limit still applies, startup
  validation, and direct-deliver dispatch.

Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.

Closes the gap identified in PR #12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.
@teknium1 teknium1 merged commit 206a449 into main Apr 19, 2026
6 of 8 checks passed
@teknium1 teknium1 deleted the hermes/hermes-1a63c86e branch April 19, 2026 12:18
r266-tech added a commit to r266-tech/hermes-agent that referenced this pull request Apr 19, 2026
PR NousResearch#12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…ousResearch#12473)

External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).

Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.

Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.

Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
  dispatch branch in _handle_webhook when deliver_only=true. Startup
  validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
  subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
  Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
  with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
  agent bypass, template rendering, status codes, HMAC still enforced,
  idempotency still applies, rate limit still applies, startup
  validation, and direct-deliver dispatch.

Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.

Closes the gap identified in PR NousResearch#12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…ousResearch#12473)

External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).

Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.

Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.

Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
  dispatch branch in _handle_webhook when deliver_only=true. Startup
  validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
  subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
  Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
  with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
  agent bypass, template rendering, status codes, HMAC still enforced,
  idempotency still applies, rate limit still applies, startup
  validation, and direct-deliver dispatch.

Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.

Closes the gap identified in PR NousResearch#12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.
Luminet2023 pushed a commit to Luminet2023/hermes-agent that referenced this pull request May 1, 2026
…ousResearch#12473)

External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).

Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.

Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.

Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
  dispatch branch in _handle_webhook when deliver_only=true. Startup
  validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
  subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
  Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
  with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
  agent bypass, template rendering, status codes, HMAC still enforced,
  idempotency still applies, rate limit still applies, startup
  validation, and direct-deliver dispatch.

Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.

Closes the gap identified in PR NousResearch#12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.
teknium1 pushed a commit that referenced this pull request May 5, 2026
PR #12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
Wizarck added a commit to Wizarck/hermes-agent that referenced this pull request May 5, 2026
…ba-mcp payload routing) (#1)

* fix(setup): stop hardcoding max-iterations copy

* fix(agent): silence quiet_mode in python library use

* fix: tighten quiet-mode salvage follow-ups

Follow-up for the helix4u easy-fix salvage batch:
- route remaining context-engine quiet-mode output through
  _should_emit_quiet_tool_messages() so non-CLI/library callers stay
  silent consistently
- drop the extra senderAliases computation from WhatsApp allowlist-drop
  logging and remove the now-unused import

This keeps the batch scoped to the intended fixes while avoiding
leaked quiet-mode output and unnecessary duplicate work in the bridge.

* fix(cli): sanitize interactive command output

* feat(cron): add wakeAgent gate — scripts can skip the agent entirely

Extends the existing cron script hook with a wake gate ported from
nanoclaw #1232. When a cron job's pre-check Python script (already
sandboxed to HERMES_HOME/scripts/) writes a JSON line like
```json
{"wakeAgent": false}
```
on its last stdout line, `run_job()` returns the SILENT marker and
skips the agent entirely — no LLM call, no delivery, no tokens spent.
Useful for frequent polls (every 1-5 min) that only need to wake the
agent when something has genuinely changed.

Any other script output (non-JSON, missing key, non-dict, `wakeAgent: true`,
truthy/falsy non-False values) behaves as before: stdout is injected
as context and the agent runs normally. Strict `False` is required
to skip — avoids accidental gating from arbitrary JSON.

Refactor:
- New pure helper `_parse_wake_gate(script_output)` in cron/scheduler.py
- `_build_job_prompt` accepts optional `prerun_script` tuple so the
  script runs exactly once per job (run_job runs it for the gate check,
  reuses the output for prompt injection)
- `run_job` short-circuits with SILENT_MARKER when gate fires

Script failures (success=False) still cannot trigger the gate — the
failure is reported as context to the agent as before.

This replaces the approach in closed PR #3837, which inlined bash
scripts via tempfile and lost the path-traversal/scripts-dir sandbox
that main's impl has. The wake-gate idea (the one net-new capability)
is ported on top of the existing sandboxed Python-script model.

Tests:
- 11 pure unit tests for _parse_wake_gate (empty, whitespace, non-JSON,
  non-dict JSON, missing key, truthy/falsy non-False, multi-line,
  trailing blanks, non-last-line JSON)
- 5 integration tests for run_job wake-gate (skip returns SILENT,
  wake-true passes through, script-runs-only-once, script failure
  doesn't gate, no-script regression)
- Full tests/cron/ suite: 194/194 pass

* fix(gateway): flush undelivered tail before segment reset to preserve streamed text (#8124)

When a streaming edit fails mid-stream (flood control, transport error)
and a tool boundary arrives before the fallback threshold is reached,
the pre-boundary tail in `_accumulated` was silently discarded by
`_reset_segment_state`. The user saw a frozen partial message and
missing words on the other side of the tool call.

Flush the undelivered tail as a continuation message before the reset,
computed relative to the last successfully-delivered prefix so we don't
duplicate content the user already saw.

* fix(gateway): cancel_background_tasks must drain late-arrivals (#12471)

During gateway shutdown, a message arriving while
cancel_background_tasks is mid-await (inside asyncio.gather) spawns
a fresh _process_message_background task via handle_message and adds
it to self._background_tasks.  The original implementation's
_background_tasks.clear() at the end of cancel_background_tasks
dropped the reference; the task ran untracked against a disconnecting
adapter, logged send-failures, and lingered until it completed on
its own.

Fix: wrap the cancel+gather in a bounded loop (MAX_DRAIN_ROUNDS=5).
If new tasks appeared during the gather, cancel them in the next
round.  The .clear() at the end is preserved as a safety net for
any task that appeared after MAX_DRAIN_ROUNDS — but in practice the
drain stabilizes in 1-2 rounds.

Tests: tests/gateway/test_cancel_background_drain.py — 3 cases.
- test_cancel_background_tasks_drains_late_arrivals: spawn M1, start
  cancel, inject M2 during M1's shielded cleanup, verify M2 is
  cancelled.
- test_cancel_background_tasks_handles_no_tasks: no-op path still
  terminates cleanly.
- test_cancel_background_tasks_bounded_rounds: baseline — single
  task cancels in one round, loop terminates.

Regression-guard validated: against the unpatched implementation,
the late-arrival test fails with exactly the expected message
('task leaked').  With the fix it passes.

Blast radius is shutdown-only; the audit classified this as MED.
Shipping because the fix is small and the hygiene is worth it.

While investigating the audit's other MEDs (busy-handler double-ack,
Discord ExecApprovalView double-resolve, UpdatePromptView
double-resolve), I verified all three were false positives — the
check-and-set patterns have no await between them, so they're
atomic on single-threaded asyncio.  No fix needed for those.

* fix(gateway): strip cursor from frozen message on empty fallback continuation (#7183)

When _send_fallback_final() is called with nothing new to deliver
(the visible partial already matches final_text), the last edit may
still show the cursor character because fallback mode was entered
after a failed edit.  Before this fix the early-return path left
_already_sent = True without attempting to strip the cursor, so the
message stayed frozen with a visible ▉ permanently.

Adds a best-effort edit inside the empty-continuation branch to clean
the cursor off the last-sent text.  Harmless when fallback mode
wasn't actually armed or when the cursor isn't present.  If the strip
edit itself fails (flood still active), we return without crashing
and without corrupting _last_sent_text.

Adapted from PR #7429 onto current main — the surrounding fallback
block grew the #10807 stale-prefix handling since #7429 was written,
so the cursor strip lives in the new else-branch where we still
return early.

3 unit tests covering: cursor stripped on empty continuation, no edit
attempted when cursor is not configured, cursor-strip edit failure
handled without crash.

Originally proposed as PR #7429.

* fix(telegram): warn on docker-only media paths

* fix: tighten telegram docker-media salvage follow-ups

Follow-up on top of the helix4u #6392 cherry-pick:
- reuse one helper for actionable Docker-local file-not-found errors
  across document/image/video/audio local-media send paths
- include /outputs/... alongside /output/... in the container-local
  path hint
- soften the gateway startup warning so it does not imply custom
  host-visible mounts are broken; the warning now targets the specific
  risky pattern of emitting container-local MEDIA paths without an
  explicit export mount
- add focused regressions for /outputs/... and non-document media hint
  coverage

This keeps the salvage aligned with the actual MEDIA delivery problem on
current main while reducing false-positive operator messaging.

* docs: clarify profiles vs workspaces

* fix(gateway): stop typing loops on session interrupt

* fix(gateway): keep typing loop overrides backward-compatible

* fix: tighten gateway interrupt salvage follow-ups

Follow-up on top of the helix4u #12388 cherry-picks:
- make deferred post-delivery callbacks generation-aware end-to-end so
  stale runs cannot clear callbacks registered by a fresher run for the
  same session
- bind callback ownership to the active session event at run start and
  snapshot that generation inside base adapter processing so later event
  mutation cannot retarget cleanup
- pass run_generation through proxy mode and drop stale proxy streams /
  final results the same way local runs are dropped
- centralize stop/new interrupt cleanup into one helper and replace the
  open-coded branches with shared logic
- unify internal control interrupt reason strings via shared constants
- remove the return from base.py's finally block so cleanup no longer
  swallows cancellation/exception flow
- add focused regressions for generation forwarding, proxy stale
  suppression, and newer-callback preservation

This addresses all review findings from the initial #12388 review while
keeping the fix scoped to stale-output/typing-loop interrupt handling.

* chore: add sgaofen to AUTHOR_MAP

* fix(feishu): split fenced code blocks in post payload

* fix(feishu): harden fenced post row splitting

* fix(feishu): drop dead helper and cover repeated fenced blocks

* skills: move 7 niche mlops/mcp skills to optional (#12474)

Built-in → optional-skills/:
  mlops/training/peft         → optional-skills/mlops/peft
  mlops/training/pytorch-fsdp → optional-skills/mlops/pytorch-fsdp
  mlops/models/clip           → optional-skills/mlops/clip
  mlops/models/stable-diffusion → optional-skills/mlops/stable-diffusion
  mlops/models/whisper        → optional-skills/mlops/whisper
  mlops/cloud/modal           → optional-skills/mlops/modal
  mcp/mcporter                → optional-skills/mcp/mcporter

Built-in mlops training kept: axolotl, trl-fine-tuning, unsloth.
Built-in mlops models kept: audiocraft, segment-anything.
Built-in mlops evaluation/research/huggingface-hub/inference all kept.
native-mcp stays built-in (documents the native MCP tool); mcporter was a
redundant alternative CLI.

Also: removed now-empty skills/mlops/cloud/ dir, refreshed
skills/mlops/models/DESCRIPTION.md and skills/mcp/DESCRIPTION.md to match
what's left, and synchronized both catalog pages (skills-catalog.md,
optional-skills-catalog.md).

* feat(webhook): direct delivery mode for zero-LLM push notifications (#12473)

External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).

Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.

Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.

Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
  dispatch branch in _handle_webhook when deliver_only=true. Startup
  validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
  subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
  Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
  with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
  agent bypass, template rendering, status codes, HMAC still enforced,
  idempotency still applies, rate limit still applies, startup
  validation, and direct-deliver dispatch.

Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.

Closes the gap identified in PR #12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.

* feat: add maps skill (OpenStreetMap + Overpass + OSRM, no API key)

Adds a maps optional skill with 8 commands, 44 POI categories, and
zero external dependencies. Uses free open data: Nominatim, Overpass
API, OSRM, and TimeAPI.io.

Commands: search, reverse, nearby, distance, directions, timezone,
area, bbox.

Improvements over original PR #2015:
- Fixed directory structure (optional-skills/productivity/maps/)
- Fixed distance argparse (--to flag instead of broken dual nargs=+)
- Fixed timezone (TimeAPI.io instead of broken worldtimeapi heuristic)
- Expanded POI categories from 12 to 44
- Added directions command with turn-by-turn OSRM steps
- Added area command (bounding box + dimensions for a named place)
- Added bbox command (POI search within a geographic rectangle)
- Added 23 unit tests
- Improved haversine (atan2 for numerical stability)
- Comprehensive SKILL.md with workflow examples

Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>

* chore: remove unit tests from maps skill

Skills are self-contained scripts — they don't need test suites in
the repo.

* feat(skills): consolidate find-nearby into maps as a single location skill

find-nearby and the (new) maps optional skill both used OpenStreetMap's
Overpass + Nominatim to answer the same question — 'what's near this
location?' — so shipping both would be duplicate code for overlapping
capability. Consolidate into one active-by-default skill at
skills/productivity/maps/ that is a strict superset of find-nearby.

Moves + deletions:
- optional-skills/productivity/maps/ → skills/productivity/maps/ (active,
  no install step needed)
- skills/leisure/find-nearby/ → DELETED (fully superseded)

Upgrades to maps_client.py so it covers everything find-nearby did:
- Overpass server failover — tries overpass-api.de then
  overpass.kumi.systems so a single-mirror outage doesn't break the skill
  (new overpass_query helper, used by both nearby and bbox)
- nearby now accepts --near "<address>" as a shortcut that auto-geocodes,
  so one command replaces the old 'search → copy coords → nearby' chain
- nearby now accepts --category (repeatable) for multi-type queries in
  one call (e.g. --category restaurant --category bar), results merged
  and deduped by (osm_type, osm_id), sorted by distance, capped at --limit
- Each nearby result now includes maps_url (clickable Google Maps search
  link) and directions_url (Google Maps directions from the search point
  — only when a ref point is known)
- Promoted commonly-useful OSM tags to top-level fields on each result:
  cuisine, hours (opening_hours), phone, website — instead of forcing
  callers to dig into the raw tags dict

SKILL.md:
- Version bumped 1.1.0 → 1.2.0, description rewritten to lead with
  capability surface
- New 'Working With Telegram Location Pins' section replacing
  find-nearby's equivalent workflow
- metadata.hermes.supersedes: [find-nearby] so tooling can flag any
  lingering references to the old skill

External references updated:
- optional-skills/productivity/telephony/SKILL.md — related_skills
  find-nearby → maps
- website/docs/reference/skills-catalog.md — removed the (now-empty)
  'leisure' section, added 'maps' row under productivity
- website/docs/user-guide/features/cron.md — find-nearby example
  usages swapped to maps
- tests/tools/test_cronjob_tools.py, tests/hermes_cli/test_cron.py,
  tests/cron/test_scheduler.py — fixture string values swapped
- cli.py:5290 — /cron help-hint example swapped

Not touched:
- RELEASE_v0.2.0.md — historical record, left intact

E2E-verified live (Nominatim + Overpass, one query each):
- nearby --near "Times Square" --category restaurant --category bar → 3 results,
  sorted by distance, all with maps_url, directions_url, cuisine, phone, website
  where OSM had the tags

All 111 targeted tests pass across tests/cron/, tests/tools/, tests/hermes_cli/.

* chore(attribution): add AUTHOR_MAP entry for Mibayy

Adds the Mibayy noreply email to the AUTHOR_MAP so CI attribution checks
pass for the #3884 maps skill feat commit (7fa01faf).

* fix(tui): reject /model and agent-mutating slash passthroughs while running (#12548)

agent.switch_model() mutates self.model, self.provider, self.base_url,
self.api_key, self.api_mode, and rebuilds self.client / self._anthropic_client
in place.  The worker thread running agent.run_conversation reads those
fields on every iteration.  A concurrent config.set key=model or slash-
worker-mirrored /model / /personality / /prompt / /compress can send an
HTTP request with mismatched model + base_url (or the old client keeps
running against a new endpoint) — 400/404s the user never asked for.

Fix: same pattern as the session.undo / session.compress guards
(PR #12416) and the gateway runner's running-agent /model guard (PR
#12334).  Reject with 4009 'session busy' when session.running is True.

Two call sites guarded:
- config.set with key=model: primary /model entry point from Ink
- _mirror_slash_side_effects for model / personality / prompt /
  compress: slash-worker passthrough path that applies live-agent
  side effects

Idle sessions still switch models normally — regression guard test
verifies this.

Tests (tests/test_tui_gateway_server.py): 4 new cases.
- test_config_set_model_rejects_while_running
- test_config_set_model_allowed_when_idle (regression guard)
- test_mirror_slash_side_effects_rejects_mutating_commands_while_running
- test_mirror_slash_side_effects_allowed_when_idle (regression guard)

Validated: against unpatched server.py, the two 'rejects_while_running'
tests fail with the exact race they assert against.  With the fix all
4 pass.  Live E2E against the live Python environment confirmed both
guards enforce 4009 / 'session busy' exactly as designed.

* docs: add PR review guides, rework quickstart, slim down installation

Adds two complementary GitHub PR review guides from contest submissions:
- Cron-based PR review agent (from PR #5836 by @dieutx) — polls on a
  schedule, no server needed, teaches skills + memory authoring
- Webhook-based PR review (from PR #6503 by @gaijinkush) — real-time via
  GitHub webhooks, documents previously undocumented webhook feature
Both guides are cross-linked so users can pick the approach that fits.

Reworks quickstart.md by integrating the best content from PR #5744
by @aidil2105:
- Opinionated decision table ('The fastest path')
- Common failure modes table with causes and fixes
- Recovery toolkit sequence
- Session lifecycle verification step
- Better first-chat guidance with example prompts

Slims down installation.md:
- Removes 10-step manual/dev install section (already covered in
  developer-guide/contributing.md)
- Links to Contributing guide for dev setup
- Keeps focused on the automated installer + prerequisites + troubleshooting

* fix(tui): session.create build thread must clean up if session.close races (#12555)

When a user hits /new or /resume before the previous session finishes
initializing, session.close runs while the previous session.create's
_build thread is still constructing the agent.  session.close pops
_sessions[sid] and closes whatever slash_worker it finds (None at that
point — _build hasn't installed it yet), then returns.  _build keeps
running in the background, installs the slash_worker subprocess and
registers an approval-notify callback on a session dict that's now
unreachable via _sessions.  The subprocess leaks until process exit;
the notify callback lingers in the global registry.

Fix: _build now tracks what it allocates (worker, notify_registered)
and checks in its finally block whether _sessions[sid] still points
to the session it's building for.  If not, the build was orphaned by
a racing close, so clean up the subprocess and unregister the notify
ourselves.

tui_gateway/server.py:
- _build reads _sessions.get(sid) safely (returns early if already gone)
- tracks allocated worker + notify registration
- finally checks orphan status and cleans up

Tests (tests/test_tui_gateway_server.py): 2 new cases.
- test_session_create_close_race_does_not_orphan_worker: slow
  _make_agent, close mid-build, verify worker.close() and
  unregister_gateway_notify both fire from the build thread's
  cleanup path.
- test_session_create_no_race_keeps_worker_alive: regression guard —
  happy path does NOT over-eagerly clean up a live worker.

Validated: against the unpatched code, the race test fails with
'orphan worker was not cleaned up — closed_workers=[]'.  Live E2E
against the live Python environment confirmed the cleanup fires
exactly when the race happens.

* fix(discord): close two low-severity adapter races (#12558)

Two small races in gateway/platforms/discord.py, bundled together
since they're adjacent in the adapter and both narrow in impact.

1. on_message vs _resolve_allowed_usernames (startup window)
   DISCORD_ALLOWED_USERS accepts both numeric IDs and raw usernames.
   At connect-time, _resolve_allowed_usernames walks the bot's guilds
   (fetch_members can take multiple seconds) to swap usernames for IDs.
   on_message can fire during that window; _is_allowed_user compares
   the numeric author.id against a set that may still contain raw
   usernames — legitimate users get silently rejected for a few
   seconds after every reconnect.

   Fix: on_message awaits _ready_event (with a 30s timeout) when it
   isn't already set.  on_ready sets the event after the resolve
   completes.  In steady state this is a no-op (event already set);
   only the startup / reconnect window ever blocks.

2. join_voice_channel check-and-connect
   The existing-connection check at _voice_clients.get() and the
   channel.connect() call straddled an await boundary with no lock.
   Two concurrent /voice channel invocations could both see None and
   both call connect(); discord.py raises ClientException
   ("Already connected") on the loser.  Same race class for leave
   running concurrently with _voice_timeout_handler.

   Fix: per-guild asyncio.Lock (_voice_locks dict with lazy alloc via
   _voice_lock_for).  join_voice_channel and leave_voice_channel both
   run their body under the lock.  Sequential within a guild, still
   fully concurrent across guilds.

Both: LOW severity.  The first only affects username-based allowlists
on fast-follow-up messages at startup; the second is a narrow
exception on simultaneous voice commands.  Bundled so the adapter
gets a single coherent polish pass.

Tests (tests/gateway/test_discord_race_polish.py): 2 regression cases.
- test_concurrent_joins_do_not_double_connect: two concurrent
  join_voice_channel calls on the same guild result in exactly one
  channel.connect() invocation.
- test_on_message_blocks_until_ready_event_set: asserts the expected
  wait pattern is present in on_message (source inspection, since
  full discord.py client setup isn't practical here).

Regression-guard validated: against unpatched gateway/platforms/discord.py
both tests fail.  With the fix they pass.  Full Discord suite (118
tests) green.

* fix(tui-gateway): dispatch slow RPC handlers on a thread pool (#12546)

The stdin-read loop in entry.py calls handle_request() inline, so the
five handlers that can block for seconds to minutes
(slash.exec, cli.exec, shell.exec, session.resume, session.branch)
freeze the dispatcher. While one is running, any inbound RPC —
notably approval.respond and session.interrupt — sits unread in the
pipe buffer and lands only after the slow handler returns.

Route only those five onto a small ThreadPoolExecutor; every other
handler stays on the main thread so the fast-path ordering is
unchanged and the audit surface stays small. write_json is already
_stdout_lock-guarded, so concurrent response writes are safe. Pool
size defaults to 4 (overridable via HERMES_TUI_RPC_POOL_WORKERS).

- add _LONG_HANDLERS set + ThreadPoolExecutor + atexit shutdown
- new dispatch(req) function: pool for long handlers, inline for rest
- _run_and_emit wraps pool work in a try/except so a misbehaving
  handler still surfaces as a JSON-RPC error instead of silently
  dying in a worker
- entry.py swaps handle_request → dispatch
- 5 new tests: sync path still inline, long handlers emit via stdout,
  fast handler not blocked behind slow one, handler exceptions map to
  error responses, non-long methods always take the sync path

Manual repro confirms the fix: shell.exec(sleep 3) + terminal.resize
sent back-to-back now returns the resize response at t=0s while the
sleep finishes independently at t=3s. Before, both landed together
at t=3s.

Fixes #12546.

* chore(tui-gateway): inline one-off RPC_POOL_WORKERS, compact _LONG_HANDLERS

* chore(tui): /clean pass — inline one-off locals, tighten ConfirmPrompt

- providers.ts: drop the `dup` intermediate, fold the ternary inline
- paths.ts (fmtCwdBranch): inline `b` into the `tag` template
- prompts.tsx (ConfirmPrompt): hoist a single `lower = ch.toLowerCase()`,
  collapse the three early-return branches into two, drop the
  redundant bounds checks on arrow-key handlers (setSel is idempotent
  at 0/1), inline the `confirmLabel`/`cancelLabel` defaults at the
  use site
- modelPicker.tsx / config/env.ts / providers.test.ts: auto-formatter
  reflows picked up by `npm run fix`
- useInputHandlers.ts: drop the stray blank line that was tripping
  perfectionist/sort-imports (pre-existing lint error)

* chore(tui-gateway): inline _run_and_emit — one-off wrapper, belongs inside dispatch

* fix(tui): drain message queue on every busy → false transition

Previously the queue only drained inside the message.complete event
handler, so anything enqueued while a shell.exec (!sleep, !cmd) or a
failed agent turn was running would stay stuck forever — neither of
those paths emits message.complete. After Ctrl+C an interrupted
session would also orphan the queue because idle() flips busy=false
locally without going through message.complete.

Single source of truth: a useEffect that watches ui.busy. When the
session is settled (sid present, busy false, not editing a queue
item), pull one message and send it. Covers agent turn end,
interrupt, shell.exec completion, error recovery, and the original
startup hydration (first-sid case) all at once.

Dropped the now-redundant dequeue/sendQueued from
createGatewayEventHandler.message.complete and the accompanying
GatewayEventHandlerContext.composer field — the effect handles it.

* fix: add nous-research/ui package

* fix(compression): resolve missing config attribute in feasibility check

Commit 4a9c3565 added a reference to `self.config` in
`_check_compression_model_feasibility()` to pass the user-configured
`auxiliary.compression.context_length` to `get_model_context_length()`.
However, `AIAgent` never stores the loaded config dict as an instance
attribute — the config is loaded into a local variable `_agent_cfg` in
`__init__()` and discarded after init.

This causes an `AttributeError: 'AIAgent' object has no attribute
'config'` on every session start when compression is enabled, caught by
the try/except and logged as a non-fatal DEBUG message.

Fix: store the loaded config as `self._config` in `__init__()` and
update the reference in the feasibility check to use `self._config`.

* test(compression): cover real init feasibility override

* feat(compression): summaries now respect the conversation's language

Context compaction summaries were always produced in English regardless
of the conversation language, which injected English context into
non-English conversations and muddied the continuation experience.

Adds a one-sentence instruction to the shared `_summarizer_preamble`
used by both the initial-compaction and iterative-update prompt paths.
Placing it in the preamble (rather than adding it separately to each
prompt) means both code paths stay in sync with one edit.

Ported from anomalyco/opencode#20581. The original PR (#4670) landed
before main's prompt templates were refactored to share the
`_summarizer_preamble` and `_template_sections` blocks, so the
cherry-pick conflicted on the now-obsolete inline sections; re-applied
the essential one-line change on top of the current structure.

Verified: 48/48 existing compressor tests pass.

* fix(model_switch): enumerate dict-format models in /model picker

list_authenticated_providers() builds /model picker rows for CLI, TUI and
gateway flows, but fails to enumerate custom provider models stored in
dict form:

- custom_providers[] entries surface only the singular `model:` field,
  hiding every other model in the `models:` dict.
- providers: dict entries with dict-format `models:` are silently dropped
  and render as `(0 models)`.

Hermes's own writer (main.py::_save_custom_provider) persists configured
models as a dict keyed by model id, and most downstream readers
(agent/models_dev.py, gateway/run.py, run_agent.py, hermes_cli/config.py)
already consume that dict format. The /model picker was the only stale
path.

Add a dict branch in both sections of list_authenticated_providers(),
preferring dict (canonical) and keeping the list branch as fallback for
hand-edited / legacy configs. Dedup against the already-added default
model so nothing duplicates when the default is also a dict key.

Six new regression tests in tests/hermes_cli/ cover: dict models with a
default, dict models without a default, and default dedup against a
matching dict key.

Fixes #11677
Fixes #9148
Related: #11017

* fix(model_switch): section 3 base_url/model/dedup follow-up

On top of the salvaged PR #12505 (Jason/farion1231, which adds dict-format
models: enumeration to both sections), three section-3 refinements from
competing PR #11534 (YangManBOBO):

- accept base_url as canonical (matches Hermes's writer and custom_providers
  entries); keep api/url as fallbacks for legacy/hand-edited configs
- accept singular model as a default_model synonym, matching custom_providers
- add seen_slugs guard so the same provider slug appearing in both
  providers: dict and custom_providers: list emits exactly one picker row
  (providers: dict wins since section 3 runs first)

Two regression tests cover the new behavior. AUTHOR_MAP entry added for
farion1231 so CI doesn't reject the cherry-picked commit.

* refactor(discord): slim down the race-polish fix (#12644)

PR #12558 was heavy for what the fix actually is — essay-length
comments, a dedicated helper method where a setdefault would do, and
a source-inspection test with no real behavior coverage.  The
genuine code change is ~5 lines of new logic (1 field, 2 async with,
an on_ready wait block).

Trimmed:
- Replaced the 12-line _voice_lock_for helper with a setdefault
  one-liner at each call site (join_voice_channel, leave_voice_channel).
- Collapsed the 12-line comment on on_message's _ready_event wait to
  3 lines.  Dropped the warning log on timeout — pass-on-timeout is
  fine; if on_ready hangs that long, the bot is already broken and
  the log wouldn't help.
- Dropped the source-inspection test (greps the module source for
  expected substrings).  It was low-value scaffolding; the
  voice-serialization test covers actual behavior.

Net: -73 lines vs PR #12558.  Same two guarantees preserved, same
test passes (verified by stashing the fix and confirming failure).

* fix(agent): refresh skills prompt cache when disabled skills change

* feat(providers): add per-provider and per-model request_timeout_seconds config

Adds optional providers.<id>.request_timeout_seconds and
providers.<id>.models.<model>.timeout_seconds config, resolved via a new
hermes_cli/timeouts.py helper and applied where client_kwargs is built
in run_agent.py. Zero default behavior change: when both keys are unset,
the openai SDK default takes over.

Mirrors the existing _get_task_timeout pattern in agent/auxiliary_client.py
for auxiliary tasks - the primary turn path just never got the equivalent
knob.

Cross-project demand: openclaw/openclaw#43946 (17 reactions) asks for
exactly this config - specifically calls out Ollama cold-start hanging
the client.

* feat(providers): extend request_timeout_seconds to all client paths

Follow-up on top of mvanhorn's cherry-picked commit. Original PR only
wired request_timeout_seconds into the explicit-creds OpenAI branch at
run_agent.py init; router-based implicit auth, native Anthropic, and the
fallback chain were still hardcoded to SDK defaults.

- agent/anthropic_adapter.py: build_anthropic_client() accepts an optional
  timeout kwarg (default 900s preserved when unset/invalid).
- run_agent.py: resolve per-provider/per-model timeout once at init; apply
  to Anthropic native init + post-refresh rebuild + stale/interrupt
  rebuilds + switch_model + _restore_primary_runtime + the OpenAI
  implicit-auth path + _try_activate_fallback (with immediate client
  rebuild so the first fallback request carries the configured timeout).
- tests: cover anthropic adapter kwarg honoring; widen mock signatures
  to accept the new timeout kwarg.
- docs/example: clarify that the knob now applies to every transport,
  the fallback chain, and rebuilds after credential rotation.

* feat(providers): enforce request_timeout_seconds on OpenAI-wire primary calls

Live test with timeout_seconds: 0.5 on claude-sonnet-4.6 proved the
initial wiring was insufficient: run_agent.py was overriding the
client-level timeout on every call via hardcoded per-request kwargs.

Root cause: run_agent.py had two sites that pass an explicit timeout=
kwarg into chat.completions.create() — api_kwargs['timeout'] at line
7075 (HERMES_API_TIMEOUT=1800s default) and the streaming path's
_httpx.Timeout(..., read=HERMES_STREAM_READ_TIMEOUT=120s, ...) at line
5760. Both override the per-provider config value the client was
constructed with, so a 0.5s config timeout would silently not enforce.

This commit:
- Adds AIAgent._resolved_api_call_timeout() — config > HERMES_API_TIMEOUT env > 1800s default.
- Uses it for the non-streaming api_kwargs['timeout'] field.
- Uses it for the streaming path's httpx.Timeout(connect, read, write, pool)
  so both connect and read respect the configured value when set.
  Local-provider auto-bump (Ollama/vLLM cold-start) only applies when
  no explicit config value is set.
- New test: test_resolved_api_call_timeout_priority covers all three
  precedence cases (config, env, default).

Live verified: 0.5s config on claude-sonnet-4.6 now triggers
APITimeoutError at ~3s per retry, exhausts 3 retries in ~15s total
(was: 29-47s success with timeout ignored). Positive case (60s config
+ gpt-4o-mini) still succeeds at 1.3s.

* docs(providers): call out Bedrock as not covered by request_timeout_seconds

AWS Bedrock paths (bedrock_converse + AnthropicBedrock SDK) use boto3
with its own timeout config and are not wired to the per-provider knob.
Documented in cli-config.yaml.example and website configuration.md so
users don't expect it to take effect there.

* fix(environments): prevent terminal hang when commands background children (#8340)

When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`,
etc.), the backgrounded grandchild inherits the write-end of our stdout
pipe via fork(). The old `for line in proc.stdout` drain never EOF'd
until the grandchild closed the pipe — so for a uvicorn server, the
terminal tool hung indefinitely (users reported the whole session
deadlocking when asking the agent to restart a backend).

Fix: switch _drain() to select()-based non-blocking reads and stop
draining shortly after bash exits even if the pipe hasn't EOF'd. Any
output the grandchild writes after that point goes to an orphaned pipe,
which is exactly what the user asked for when they said '&'.

Adds regression tests covering the issue's exact repro and 5 related
patterns (plain bg, setsid+disown, streaming output, high volume,
timeout, UTF-8).

* fix(environments): use incremental UTF-8 decoder in select-based drain

The first draft of the fix called `chunk.decode("utf-8")` directly on
each 4096-byte `os.read()` result, which corrupts output whenever a
multi-byte UTF-8 character straddles a read boundary:

  * `UnicodeDecodeError` fires on the valid-but-truncated byte sequence.
  * The except handler clears ALL previously-decoded output and replaces
    the whole buffer with `[binary output detected ...]`.

Empirically: 10000 '日' chars (30001 bytes) through the wrapper loses
all 10000 characters on the first draft; the baseline TextIOWrapper
drain (which uses `encoding='utf-8', errors='replace'` on Popen)
preserves them all. This regression affects any command emitting
non-ASCII output larger than one chunk — CJK/Arabic/emoji in
`npm install`, `pip install`, `docker logs`, `kubectl logs`, etc.

Fix: swap to `codecs.getincrementaldecoder('utf-8')(errors='replace')`,
which buffers partial multi-byte sequences across chunks and substitutes
U+FFFD for genuinely invalid bytes. Flush on drain exit via
`decoder.decode(b'', final=True)` to emit any trailing replacement
character for a dangling partial sequence.

Adds two regression tests:
  * test_utf8_multibyte_across_read_boundary — 10000 U+65E5 chars,
    verifies count round-trips and no fallback fires.
  * test_invalid_utf8_uses_replacement_not_fallback — deliberate
    \xff\xfe between valid ASCII, verifies surrounding text survives.

* fix(feishu): allow bot-originated mentions from other bots

* fix(feishu): hydrate bot open_id for manual-setup users

Extends _hydrate_bot_identity() to also populate _bot_open_id (not just
_bot_name) by probing /open-apis/bot/v3/info — the same endpoint the
scan-to-create wizard uses. No extra scopes required beyond the tenant
access token.

Closes the manual-setup gap in #12450: users who configured Feishu
without running the wizard, and never set FEISHU_BOT_OPEN_ID, now get
a bot identity that _is_self_sent_bot_message() can actually use to
filter the adapter's own bot-sent events.

Each field is hydrated independently:
  - Env vars (FEISHU_BOT_OPEN_ID / FEISHU_BOT_USER_ID / FEISHU_BOT_NAME)
    still take precedence and skip their respective probe.
  - /bot/v3/info provides open_id + name.
  - Application-info endpoint remains as a best-effort fallback for
    bot_name only (needs admin:app.info:readonly scope).

Tests: 5 new cases covering env-var precedence, probe success, probe
failure fallback, and the end-to-end self-send filter gate after
hydration.

* chore: add bingo906 numeric qq email to AUTHOR_MAP

Maps 906014227@qq.com → bingo906 for PR #12450 attribution in the
weekly release notes.

* fix(agent): respect HTTP_PROXY/HTTPS_PROXY when using custom httpx transport

When creating httpx.Client with a custom transport for TCP keepalive,
proxy environment variables (HTTP_PROXY, HTTPS_PROXY) were ignored because
httpx only auto-reads them when transport=None.

Add _get_proxy_from_env() to explicitly read proxy settings and pass them
to httpx.Client, ensuring providers like kimi-coding-cn work correctly
when behind a proxy.

Fixes connection errors when HTTP_PROXY/HTTPS_PROXY are set.

* test(run_agent): pin proxy-env forwarding through keepalive transport

Adds a regression guard for the #11277 → proxy-bypass regression fixed in
42b394c3. With HTTPS_PROXY / HTTP_PROXY / ALL_PROXY set, the custom httpx
transport used for TCP keepalives must still route requests through an
HTTPProxy pool; without proxy env, no HTTPProxy mount should exist.

Also maps zrc <zhurongcheng@rcrai.com> → heykb in scripts/release.py
AUTHOR_MAP so the salvage PR passes the author-attribution CI check.

* feat: add Discord server introspection and management tool (#4753)

* feat: add Discord server introspection and management tool

Add a discord_server tool that gives the agent the ability to interact
with Discord servers when running on the Discord gateway. Uses Discord
REST API directly with the bot token — no dependency on the gateway
adapter's discord.py client.

The tool is only included in the hermes-discord toolset (zero cost for
users on other platforms) and gated on DISCORD_BOT_TOKEN via check_fn.

Actions (14):
- Introspection: list_guilds, server_info, list_channels, channel_info,
  list_roles, member_info, search_members
- Messages: fetch_messages, list_pins, pin_message, unpin_message
- Management: create_thread, add_role, remove_role

This addresses a gap where users on Discord could not ask Hermes to
review server structure, channels, roles, or members — a task competing
agents (OpenClaw) handle out of the box.

Files changed:
- tools/discord_tool.py (new): Tool implementation + registration
- model_tools.py: Add to discovery list
- toolsets.py: Add to hermes-discord toolset only
- tests/tools/test_discord_tool.py (new): 43 tests covering all actions,
  validation, error handling, registration, and toolset scoping

* feat(discord): intent-aware schema filtering + config allowlist + schema cleanup

- _detect_capabilities() hits GET /applications/@me once per process
  to read GUILD_MEMBERS / MESSAGE_CONTENT privileged intent bits.
- Schema is rebuilt per-session in model_tools.get_tool_definitions:
  hides search_members / member_info when GUILD_MEMBERS intent is off,
  annotates fetch_messages description when MESSAGE_CONTENT is off.
- New config key discord.server_actions (comma-separated or YAML list)
  lets users restrict which actions the agent can call, intersected
  with intent availability. Unknown names are warned and dropped.
- Defense-in-depth: runtime handler re-checks the allowlist so a stale
  cached schema cannot bypass a tightened config.
- Schema description rewritten as an action-first manifest (signature
  per action) instead of per-parameter 'required for X, Y, Z' cross-refs.
  ~25% shorter; model can see each action's required params at a glance.
- Added bounds: limit gets minimum=1 maximum=100, auto_archive_duration
  becomes an enum of the 4 valid Discord values.
- 403 enrichment: runtime 403 errors are mapped to actionable guidance
  (which permission is missing and what to do about it) instead of the
  raw Discord error body.
- 36 new tests: capability detection with caching and force refresh,
  config allowlist parsing (string/list/invalid/unknown), intent+allowlist
  intersection, dynamic schema build, runtime allowlist enforcement,
  403 enrichment, and model_tools integration wiring.

* docs(site): disable highlightSearchTermsOnTargetPage to keep URLs clean (#12661)

The @easyops-cn/docusaurus-search-local option appends ?_highlight=<term>
query params to links from the search bar. Docusaurus puts the query string
before the #anchor, producing URLs like

    /docs/foo?_highlight=bar#section

which look broken when copy-pasted. Turn the option off — Ctrl+F on the
landing page covers the same use case without polluting shareable links.

* feat(creative): add pixel-art-arcade and pixel-art-snes skills

* refactor(creative): consolidate pixel-art skills into single preset-based skill

Merges pixel-art-arcade and pixel-art-snes into one pixel-art skill with
named presets (arcade, snes) + parametric overrides. The underlying
pipeline was already identical across both variants — only palette size,
block size, and enhancement strength differed. A single preset-based
function is easier to discover, maintain, and extend (adding a new era
like gameboy or nes is just another preset dict).

Contributor authorship preserved on original additive commit.

* chore(release): add dodo-reach to AUTHOR_MAP

* refactor(creative): promote pixel-art from optional to built-in skills

* Fix Cloudflare 403s for openai-codex provider on server IPs

Add ChatGPT-Account-Id and originator headers when using chatgpt.com
backend-api endpoint. Matches official codex-rs CLI behavior to prevent
Cloudflare JavaScript challenges on non-residential IPs (VPS, Mac Mini,
always-on servers).

Applied in AIAgent.__init__ and _update_base_url_headers to cover both
initial setup and credential rotation paths.

* fix(codex): pin correct Cloudflare headers and extend to auxiliary client

The cherry-picked salvage (admin28980's commit) added codex headers only on the
primary chat client path, with two inaccuracies:

  - originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs,
    codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on
    the list, so the header had no mitigating effect on the 403 (the
    account-id header alone may have been carrying the fix).
  - account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs
    uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID).

Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex
branch) constructs OpenAI clients against the same chatgpt.com endpoint with
no default headers at all — so compression, title generation, vision, session
search, and web_extract all still 403 from VPS IPs.

Consolidate the header set into _codex_cloudflare_headers() in
agent/auxiliary_client.py (natural home next to _read_codex_access_token and
the existing JWT decode logic) and call it from all four insertion points:

  - run_agent.py: AIAgent.__init__ (initial construction)
  - run_agent.py: _apply_client_headers_for_base_url (credential rotation)
  - agent/auxiliary_client.py: _try_codex (aux client)
  - agent/auxiliary_client.py: resolve_provider_client raw_codex branch

Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a
single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to
match the codex-rs shape while keeping product attribution.

Tests in tests/agent/test_codex_cloudflare_headers.py cover:
  - originator value, User-Agent shape, canonical header casing
  - account-ID extraction from a real JWT fixture
  - graceful handling of malformed / non-string / claim-missing tokens
  - wiring at all four insertion points (primary init, rotation, both aux paths)
  - non-chatgpt base URLs (openrouter) do NOT get codex headers
  - switching away from chatgpt.com drops the headers

* docs(memory): steer agents to save declarative facts, not instructions (#12665)

Imperative memory entries ('Always respond concisely', 'Run tests with
pytest -n 4') get re-read as directives in future sessions, causing
repeated work or overriding the user's current request. Add a short
phrasing guideline to MEMORY_GUIDANCE so the model writes declarative
facts instead ('User prefers concise responses', 'Project uses pytest
with xdist').

Credit: observation from @Mariandipietra on X.

* fix: use grid/cell components

* fix(patch): catch silent persistence failures and escape-drift in tool-call transport (#12669)

Two hardening layers in the patch tool, triggered by a real silent failure
in the previous session:

(1) Post-write verification in patch_replace — after write_file succeeds,
re-read the file and confirm the bytes on disk match the intended write.
If not, return an error instead of the current success-with-diff. Catches
silent persistence failures from any cause (backend FS oddities, stdin
pipe truncation, concurrent task races, mount drift).

(2) Escape-drift guard in fuzzy_find_and_replace — when a non-exact
strategy matches and both old_string and new_string contain literal
\' or \" sequences but the matched file region does not, reject the
patch with a clear error pointing at the likely cause (tool-call
serialization adding a spurious backslash around apostrophes/quotes).
Exact matches bypass the guard, and legitimate edits that add or
preserve escape sequences in files that already have them still work.

Why: in a prior tool call, old_string was sent with \' where the file
has ' (tool-call transport drift). The fuzzy matcher's block_anchor
strategy matched anyway and produced a diff the tool reported as
successful — but the file was never modified on disk. The agent moved
on believing the edit landed when it hadn't.

Tests: added TestPatchReplacePostWriteVerification (3 cases) and
TestEscapeDriftGuard (6 cases). All pass, existing fuzzy match and
file_operations tests unaffected.

* fix(tests): unstick CI — sweep stale tests from recent merges (#12670)

One source fix (web_server category merge) + five test updates that
didn't travel with their feature PRs. All 13 failures on the 04-19
CI run on main are now accounted for (5 already self-healed on main;
8 fixed here).

Changes
- web_server.py: add code_execution → agent to _CATEGORY_MERGE (new
  singleton section from #11971 broke no-single-field-category invariant).
- test_browser_camofox_state: bump hardcoded _config_version 18 → 19
  (also from #11971).
- test_registry: add browser_cdp_tool (#12369) and discord_tool (#4753)
  to the expected built-in tool set.
- test_run_agent::test_tool_call_accumulation: rewrite fragment chunks
  — #0f778f77 switched streaming name-accumulation from += to = to
  fix MiniMax/NIM duplication; the test still encoded the old
  fragment-per-chunk premise.
- test_concurrent_interrupt::_Stub: no-op
  _apply_pending_steer_to_tool_results — #12116 added this call after
  concurrent tool batches; the hand-rolled stub was missing it.
- test_codex_cli_model_picker: drop the two obsolete tests that
  asserted auto-import from ~/.codex/auth.json into the Hermes auth
  store. #12360 explicitly removed that behavior (refresh-token reuse
  races with Codex CLI / VS Code); adoption is now explicit via
  `hermes auth openai-codex`. Remaining 3 tests in the file (normal
  path, Claude Code fallback, negative case) still cover the picker.

Validation
- scripts/run_tests.sh across all 6 affected files + surrounding tests
  (54 tests total) all green locally.

* feat(providers): route gemini through the native AI Studio API

- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks

* fix(gemini): tighten native routing and streaming replay

- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks

* fix: imports

* fix(tui): /model picker surfaces curated list, matching classic CLI (#12671)

model.options unconditionally overwrote each provider's curated model
list with provider_model_ids() (live /models catalog), so TUI users
saw non-agentic models that classic CLI /model and `hermes model`
filter out via the curated _PROVIDER_MODELS source.

On Nous specifically the live endpoint returns ~380 IDs including
TTS, embeddings, rerankers, and image/video generators — the TUI
picker showed all of them. Classic CLI picker showed the curated
30-model list.

Drop the overwrite. list_authenticated_providers() already populates
provider['models'] with the curated list (same source as classic CLI
at cli.py:4792), sliced to max_models=50. Honor that.

Added regression test that fails if the handler ever re-introduces
a provider_model_ids() call over the curated list.

* fix: imports

* ci: add path filters to Docker and test workflows, remove supply chain audit

- Docker build only triggers on main push (code/config changes) and
  releases, no longer on every PR
- Tests skip markdown-only and docs-only changes
- Remove supply-chain-audit workflow

* ci(security): narrow supply-chain-audit to high-signal patterns only

PR #12681 removed the audit entirely because it fired on nearly every PR
(Dockerfile edits, dependency bumps, Actions version strings, plain
base64 usage, etc.) — reviewers were ignoring it like cancer warnings.

Restore it with aggressive scope reduction:

Kept (real attack signatures):
  - .pth file additions (litellm-attack mechanism)
  - base64 decode + exec/eval on the same line
  - subprocess with base64/hex/chr-encoded command argument
  - install-hook files (setup.py, sitecustomize.py, usercustomize.py,
    __init__.pth)

Removed (low-signal noise that fired constantly):
  - plain base64 encode/decode
  - plain exec/eval
  - outbound requests.post / httpx.post / urllib
  - CI/CD workflow file edits
  - Dockerfile / compose edits
  - pyproject.toml / requirements.txt edits
  - GitHub Actions version-tag unpinning
  - marshal / pickle / compile usage

Also gates the workflow itself on path filters so it only runs on PRs
touching Python or install-hook files — no more firing on docs/CI PRs.

The workflow still fails the check and posts a PR comment on
critical findings, but by design those findings are now rare and
worth inspecting when they occur.

* ci: bump test-job timeout from 10m to 20m (#12718)

Recent main runs have been hitting the 10-minute cap repeatedly — the
full non-integration suite no longer fits in that window on
ubuntu-latest. Cancelled runs leave main without a green signal, which
masks real regressions.

Bumps only the test job. The e2e job still finishes in ~25s, so its
10-minute cap stays as-is.

* fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025) (#12717)

* [verified] fix(mcp-oauth): bridge httpx auth_flow bidirectional generator

HermesMCPOAuthProvider.async_auth_flow wrapped the SDK's auth_flow with
'async for item in super().async_auth_flow(request): yield item', which
discards httpx's .asend(response) values and resumes the inner generator
with None. This broke every OAuth MCP server on the first HTTP response
with 'NoneType' object has no attribute 'status_code' crashing at
mcp/client/auth/oauth2.py:505.

Replace with a manual bridge that forwards .asend() values into the
inner generator, preserving httpx's bidirectional auth_flow contract.

Add tests/tools/test_mcp_oauth_bidirectional.py with two regression
tests that drive the flow through real .asend() round-trips. These
catch the bug at the unit level; prior tests only exercised
_initialize() and disk-watching, never the full generator protocol.

Verified against BetterStack MCP:
  Before: 'Connection failed (11564ms): NoneType...' after 3 retries
  After:  'Connected (2416ms); Tools discovered: 83'

Regression from #11383.

* [verified] fix(mcp-oauth): seed token_expiry_time + pre-flight AS discovery on cold-load

PR #11383's consolidation fixed external-refresh reloading and 401 dedup
but left two latent bugs that surfaced on BetterStack and any other OAuth
MCP with a split-origin authorization server:

1. HermesTokenStorage persisted only a relative 'expires_in', which is
   meaningless after a process restart. The MCP SDK's OAuthContext
   does NOT seed token_expiry_time in _initialize, so is_token_valid()
   returned True for any reloaded token regardless of age. Expired
   tokens shipped to servers, and app-level auth failures (e.g.
   BetterStack's 'No teams found. Please check your authentication.')
   were invisible to the transport-layer 401 handler.

2. Even once preemptive refresh did fire, the SDK's _refresh_token
   falls back to {server_url}/token when oauth_metadata isn't cached.
   For providers whose AS is at a different origin (BetterStack:
   mcp.betterstack.com for MCP, betterstack.com/oauth/token for the
   token endpoint), that fallback 404s and drops into full browser
   re-auth on every process restart.

Fix set:

- HermesTokenStorage.set_tokens persists an absolute wall-clock
  expires_at alongside the SDK's OAuthToken JSON (time.time() + TTL
  at write time).
- HermesTokenStorage.get_tokens reconstructs expires_in from
  max(expires_at - now, 0), clamping expired tokens to zero TTL.
  Legacy files without expires_at fall back to file-mtime as a
  best-effort wall-clock proxy, self-healing on the next set_tokens.
- HermesMCPOAuthProvider._initialize calls super(), then
  update_token_expiry on the reloaded tokens so token_expiry_time
  reflects actual remaining TTL. If tokens are loaded but
  oauth_metadata is missing, pre-flight PRM + ASM discovery runs
  via httpx.AsyncClient using the MCP SDK's own URL builders and
  response handlers (build_protected_resource_metadata_discovery_urls,
  handle_auth_metadata_response, etc.) so the SDK sees the correct
  token_endpoint before the first refresh attempt. Pre-flight is
  skipped when there are no stored tokens to keep fresh-install
  paths zero-cost.

Test coverage (tests/tools/test_mcp_oauth_cold_load_expiry.py):
- set_tokens persists absolute expires_at
- set_tokens skips expires_at when token has no expires_in
- get_tokens round-trips expires_at -> remaining expires_in
- expired tokens reload with expires_in=0
- legacy files without expires_at fall back to mtime proxy
- _initialize seeds token_expiry_time from stored tokens
- _initialize flags expired-on-disk tokens as is_token_valid=False
- _initialize pre-flights PRM + ASM discovery with mock transport
- _initialize skips pre-flight when no tokens are stored

Verified against BetterStack MCP:
  hermes mcp test betterstack -> Connected (2508ms), 83 tools
  mcp_betterstack_telemetry_list_teams_tool -> real team data, not
    'No teams found. Please check your authentication.'

Reference: mcp-oauth-token-diagnosis skill, Fix A.

* chore: map hermes@noushq.ai to benbarclay in AUTHOR_MAP

Needed for CI attribution check on cherry-picked commits from PR #12025.

---------

Co-authored-by: Hermes Agent <hermes@noushq.ai>

* terminal: steer long-lived server commands to background mode

* chore(release): add etherman-os and mark-ramsell to AUTHOR_MAP

* fix(terminal): rewrite `A && B &` to `A && { B & }` to prevent subshell leak

bash parses `A && B &` with `&&` tighter than `&`, so it forks a subshell
for the compound and backgrounds the subshell. Inside the subshell, B
runs foreground, so the subshell waits for B. When B is a process that
doesn't naturally exit (`python3 -m http.server`, `yes > /dev/null`, a
long-running daemon), the subshell is stuck in `wait4` forever and leaks
as an orphan reparented to init.

Observed in production: agents running `cd X && python3 -m http.server
8000 &>/dev/null & sleep 1 && curl ...` as a "start a local server, then
verify it" one-liner. Outer bash exits cleanly; the subshell never does.
Across ~3 days of use, 8 unique stuck-terminal events and 7 leaked
bash+server pairs accumulated on the fleet, with some sessions appearing
hung from the user's perspective because the subshell's open stdout pipe
kept the terminal tool's drain thread blocked.

This is distinct from the `set +m` fix in 933fbd8f (which addressed
interactive-shell job-control waiting at exit). `set +m` doesn't help
here because `bash -c` is non-interactive and job control is already
off; the problem is the subshell's own internal wait for its foreground
B, not the outer shell's job-tracking.

The fix: walk the command shell-aware (respecting quotes, parens, brace
groups, `&>`/`>&` redirects), find `A && B &` / `A || B &` at depth 0
and rewrite the tail to `A && { B & }`. Brace groups don't fork a
subshell — they run in the current shell. `B &` inside the group is a
simple background (no subshell wait). The outer `&` is absorbed into
the group, so the compound no longer needs an explicit subshell.

`&&` error-propagation is preserved exactly: if A fails, `&&`
short-circuits and B never runs.

- Skips quoted strings, comment lines, and `(…)` subshells
- Handles `&>/dev/null`, `2>&1`, `>&2` without mistaking them for `&`
- Resets chain state at `;`, `|`, and newlines
- Tracks brace depth so already-rewritten output is idempotent
- Walks using the existing `_read_shell_token` tokenizer, matching the
  pattern of `_rewrite_real_sudo_invocations`

Called once from `BaseEnvironment.execute` right after
`_prepare_command`, so it runs for every backend (local, ssh, docker,
modal, etc.) with no per-backend plumbing.

34 new tests covering rewrite cases, preservation cases, redirect
edge-cases, quoting/parens/backticks, idempotency, and empty/edge
inputs. End-to-end verified on a test VM: the exact vela-incident
command now returns in ~1.3s with no leaked bash, only the intentional
backgrounded server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(pixel-art): add hardware palettes and video animation (#12725)

Expand the pixel-art skill from 2 presets (arcade, snes) to 14 presets
with hardware-accurate palettes (NES, Game Boy, PICO-8, C64, Apple II,
MS Paint, CRT mono), plus a procedural video overlay pipeline.

Ported from Synero/pixel-art-studio (MIT). Full attribution in
ATTRIBUTION.md.

What's in:
- scripts/palettes.py — 28 named RGB palettes (hardware + artistic)
- scripts/pixel_art.py — 14 presets, named palette support, CLI
- scripts/pixel_art_video.py — 12 animation scenes (stars, rain,
  fireflies, snow, embers, lightning, etc.) → MP4/GIF via ffmpeg
- references/palettes.md — palette catalog
- SKILL.md — clarify-tool workflow (offer style, then optional scene)

What's out (intentional):
- Wu's quantizer (PIL's built-in quantize suffices)
- Sobel edge-aware downsample (scipy dep not worth it)
- Atkinson/Bayer dither (would need numpy reimpl)
- Pollinations text-to-image (Hermes uses image_generate instead)

Video pipeline uses subprocess.run with check=True (replaces os.system)
and tempfile.TemporaryDirectory (replaces manual cleanup).

* docs(discord): document that free-response channels skip auto-threading (#12728)

Follow-up to 93fe4b35. The behavior (free-response channels bypass
auto-threading so the channel stays a lightweight inline chat) was
intentional but never documented, causing user confusion ("is this a
bug?" reports).

Adds one line to the behavior table, one paragraph under
discord.free_response_channels, and a cross-reference under
discord.auto_thread.

* refactor: remove smart_model_routing feature (#12732)

Smart model routing (auto-routing short/simple turns to a cheap model
across providers) was opt-in and disabled by default.  This removes the
feature wholesale: the routing module, its config keys, docs, tests, and
the orchestration scaffolding it required in cli.py / gateway/run.py /
cron/scheduler.py.

The /fast (Priority Processing / Anthropic fast mode) feature kept its
hooks into _resolve_turn_agent_config — those still build a route dict
and attach request_overrides when the model supports it; the route now
just always uses the session's primary model/provider rather than
running prompts through choose_cheap_model_route() first.

Also removed:
- DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out
  example sections in hermes_cli/config.py and cli-config.yaml.example
- _load_smart_model_routing() / self._smart_model_routing on GatewayRunner
- self._smart_model_routing / self._active_agent_route_signature on
  HermesCLI (signature kept; just no longer initialised through the
  smart-routing pipeline)
- route_label parameter on HermesCLI._init_agent (only set by smart
  routing; never read elsewhere)
- 'Smart Model Routing' section in website/docs/integrations/providers.md
- tip in hermes_cli/tips.py
- entries in hermes_cli/dump.py + hermes_cli/web_server.py
- row in skills/autonomous-ai-agents/hermes-agent/SKILL.md

Tests:
- Deleted tests/agent/test_smart_model_routing.py
- Rewrote tests/agent/test_credential_pool_routing.py to target the
  simplified _resolve_turn_agent_config directly (preserves credential
  pool propagation + 429 rotation coverage)
- Dropped 'cheap model' test from test_cli_provider_resolution.py
- Dropped resolve_turn_route patches from cli + gateway test_fast_command
  — they now exercise the real method end-to-end
- Removed _smart_model_routing stub assignments from gateway/cron test
  helpers

Targeted suites: 74/74 in the directly affected test files;
tests/agent + tests/cron + tests/cli pass except 5 failures that
already exist on main (cron silent-delivery + alias quick-command).

* fix(tui): bound retained state against idle OOM

Guards four unbounded growth paths reachable at idle — the shape matches
reports of the TUI hitting V8's 2GB heap limit after ~1m of idle with 0
tokens used (Mark-Compact freed ~6MB of 2045MB → pure retention).

- `GatewayClient.logs` + `gateway.stderr` events: 200-line cap is bytes-
  uncapped; a chatty Python child emitting multi-MB lines (traceback,
  dumped config, unsplit JSON) retains everything. Truncate at 4KB/line.
- `GatewayClient.bufferedEvents`: unbounded until `drain()` fires. Cap
  at 2000 so a pre-mount event storm can't pin memory indefinitely.
- `useMainApp` gateway `exit` handler: didn't reset `turnController`, so
  a mid-stream crash left `bufRef`/`reasoningText` alive forever.
- `pasteSnips` count-capped (32) but byte-uncapped. Add a 4MB total cap
  and clear snips in `clearIn` so submitted pastes don't linger.
- `StylePool.transitionCache`: uncapped `Map<number,string>`. Full-clear
  at 32k entries (mirrors `charCache` pattern).

* fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.p…
bot-ted added a commit to bot-ted/hermes-agent that referenced this pull request May 7, 2026
* fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client

When a provider returns a 429 rate-limit error (not billing-related),
the auxiliary client's call_llm/async_call_llm previously did NOT trigger
the fallback chain. This caused auxiliary tasks like session_search to
exhaust all 3 retries against the same rate-limited endpoint, losing
session metadata that depended on the summarization completing.

Root cause: `_is_payment_error()` only matched 429s containing billing
keywords ("credits", "insufficient funds", etc.). Provider-specific
rate-limit messages like Nous's "Hold up for a bit, you've exceeded the
rate limit on your API key" didn't match, so `_is_payment_error` returned
False, `_is_connection_error` returned False, and `should_fallback` was
False — all retries hit the same rate-limited provider.

Fix:
- New `_is_rate_limit_error()` function that detects 429 + rate-limit
  keywords, generic 429 without billing keywords, and OpenAI SDK
  `RateLimitError` class instances (which may omit .status_code).
- Updated `should_fallback` in both `call_llm` and `async_call_llm` to
  include `_is_rate_limit_error`.
- Updated the max_tokens retry path to also check for rate-limit errors.
- Updated the reason string to include "rate limit".

This complements the Nous rate guard (PR #10568) which prevents new calls
to Nous when already rate-limited — this fix handles the case where a
request is already in flight when the 429 arrives.

Related: #8023, #12554, #11034
Co-authored-by: Zeejay <zjtan1@gmail.com>

* chore: AUTHOR_MAP entry for zeejaytan

* fix(acp): preserve assistant reasoning metadata in session persistence

* chore: AUTHOR_MAP entry for Aslaaen

* feat(cli): add list_picker_providers for credential-filtered picker

The Telegram/Discord /model pickers currently call
list_authenticated_providers(), which returns every provider whose
credentials resolve locally and every model in its curated snapshot.
Two failure modes fall out:

- OpenRouter rows can include IDs the live catalog no longer carries.
- Provider rows can surface with zero callable models (e.g. a slug
  whose credential pool entry exists but has nothing behind it).

list_picker_providers() wraps the base function and post-processes the
result so the interactive picker only shows models the user can
actually select:

- OpenRouter's models come from fetch_openrouter_models() (live-catalog
  filtered against the curated OPENROUTER_MODELS snapshot).
- Rows with an empty models list are dropped, except custom endpoints
  (is_user_defined=True with an api_url) where the user may enter
  model ids manually.
- All other fields pass through unchanged.

The gateway /model handler switches to the new helper for the
interactive picker payload only. Typed /model <name> and the text
fallback list stay on list_authenticated_providers() so nothing is
hidden from power users or platforms without a picker.

Covered by nine focused unit tests in
tests/hermes_cli/test_list_picker_providers.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: AUTHOR_MAP entry for Tkander1715

* feat(tui): remove /provider alias for /model (#20358)

/model is the canonical command; /provider was a redundant alias that
dispatched to the same ModelPicker overlay. Drop the alias, the regex
branch in useCompletion, and the alias-coverage test.

* fix: resolve lazy session creation regressions (#18370 fallout) (#20363)

Fix three regressions introduced by PR #18370 (lazy session creation):

1. _finalize_session() uses stale session_key after compression (#20001)
2. session_key not synced after auto-compression in run_conversation (#20001)
3. pending_title ValueError leaves title wedged forever (#19029)
4. Gateway silently swallows null responses when agent did work (#18765)
5. One-time cleanup for accumulated ghost compression continuations (#20001)

Changes:
- tui_gateway/server.py: _finalize_session() now uses agent.session_id
  (falls back to session_key when agent is None). Refactor
  _sync_session_key_after_compress() with clear_pending_title and
  restart_slash_worker policy flags. Call it post-run_conversation()
  to sync session_key after auto-compression. Add ValueError handler
  to pending_title flush.
- gateway/run.py: Extract _normalize_empty_agent_response() helper that
  consolidates failed/partial/null response handling. Surfaces user-facing
  error when agent did work (api_calls > 0) but returned no text.
- hermes_state.py: Add finalize_orphaned_compression_sessions() — marks
  ghost continuation sessions as ended (non-destructive, preserves data).
- cli.py: One-time startup migration for orphaned compression sessions.

Test changes:
- tests/test_tui_gateway_server.py: Update pending_title ValueError test
  for post-#18370 architecture (title applied post-message, not at create).
- tests/test_lazy_session_regressions.py: 14 new regression tests covering
  all fixed paths.

* docs(web_tools): correct web_extract summarizer timeout comment

The comment at tools/web_tools.py:700-702 stated the runtime default for
auxiliary.web_extract.timeout is 360s. The actual runtime default is 30s
(_DEFAULT_AUX_TIMEOUT in agent/auxiliary_client.py:3140), used by
_get_task_timeout when no auxiliary.web_extract.timeout key is present in
config.yaml.

The 360s figure is the config template default written by
hermes_cli/config.py:697 into freshly-generated config.yaml files. It only
takes effect when that key exists in the user's config — not as a fallback.
Users on configs that predate commit 20b4060d (Apr 5, 2026), or who removed
the key, fall through to the 30s _DEFAULT_AUX_TIMEOUT runtime default.

The comment was introduced in 20b4060d alongside the template-default bump
from 30 to 360. The runtime default in auxiliary_client.py was not changed
in that commit and has remained 30s since 839d9d74 (Mar 28, 2026).

* docs(config): fix fallback provider config paths

* docs(prompt): clarify supported customization surfaces

* chore: AUTHOR_MAP entry for Beandon13

* docs: remove dead reference links in flash-attention skill

* docs: remove dead papers.md link from saelens references

* docs: fix broken nix-setup anchor for container-aware CLI

* fix(telegram): keep DM topic typing scoped

* refactor(telegram): make typing thread-id resolver symmetric with send

Mirror _message_thread_id_for_typing() with _message_thread_id_for_send():
both now map the General forum topic (thread id "1") to None upfront.

That removes the need for the retry-without-thread fallback in send_typing()
entirely — if _message_thread_id_for_typing() returns a non-None value, it's
a real user-created topic and falling back to the root chat is never correct.
If Telegram rejects the typing action (e.g. topic deleted mid-session), we
swallow it at debug level instead of bleeding the indicator into All Messages.

Updates the General-topic typing regression test to assert the new single-call
contract.

* docs(tts): document per-provider max_text_length caps

PR #13743 replaced the global MAX_TEXT_LENGTH=4000 with a per-provider
table and a user-override 'max_text_length:' key, but the user-guide
TTS page documented no length behaviour at all. Users hitting truncation
had no way to discover the new caps or the override.

Add an 'Input length limits' subsection after the existing Configuration
YAML block: provider default caps (Edge 5000 / OpenAI 4096 / xAI 15000 /
MiniMax 10000 / Mistral 4000 / Gemini 5000 / ElevenLabs model-aware /
NeuTTS,KittenTTS 2000), ElevenLabs model_id -> cap table (5k-40k), an
override example, and the validation rules (non-positive / non-integer /
boolean values fall through to the provider default).

* docs(skill/hermes-agent): sync slash commands + add durable-systems section

Mirrors the AGENTS.md #20226 additions (Toolsets / Delegation / Curator /
Cron / Kanban) into the user-facing hermes-agent skill, and closes the
drift in the in-session slash command list.

User report (wxrrior in Discord): the skill did not mention /goal, so a
brand-new session answering "/hermes-agent do you have any info on /goal"
confidently said it did not exist. Cross-check against the CommandDef
registry found 16 commands missing from the static list: /goal, /agents,
/busy, /copy, /curator, /debug, /footer, /gquota, /indicator, /kanban,
/redraw, /reload, /reload-skills, /snapshot, /steer, /topic.

Changes:
- Slash Commands header now tells the reader to run /help or check the
  live docs reference as the source of truth, and names the registry
  of record (hermes_cli/commands.py) so future drift gets flagged
  honestly instead of answered confidently wrong.
- Added all 16 missing commands, slotted into existing subsections
  (/goal and /steer in Session; /busy + /indicator + /footer in
  Configuration; /curator + /kanban + /reload-skills + /reload in
  Tools & Skills; /topic in Gateway; /copy in Utility; /gquota +
  /debug in Info).
- Toolsets table updated to the authoritative 30-key list from
  toolsets.py (added kanban, yuanbao, spotify, safe, debugging, video,
  feishu_doc, feishu_drive, discord, discord_admin, clarify; previously
  stopped at 20 keys).
- New "Durable & Background Systems" section before Troubleshooting
  covers Delegation, Cron, Curator, Kanban - each with a short rundown
  of CLI verbs, key invariants, and a pointer to the user-facing docs.
  Mirrors AGENTS.md #20226 but in the skill's user-facing register.
- Bumped version 2.0.0 -> 2.1.0.

* docs(cli): add --deliver-only flag to hermes webhook subscribe

PR #12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.

* perf(ui-tui): narrow overlay subscriptions to focused selectors

Subscribe overlay components to computed theme/session selectors instead of the full UI store so unrelated UI state updates trigger fewer overlay renders.

* docs(cli): add skills reset subcommand to CLI reference

PR #11468 added `hermes skills reset` but cli-commands.md was not
updated. Adds the subcommand to the table and usage examples.

Closes #11543

* feat(kanban): generic diagnostics engine for task distress signals (#20332)

* feat(kanban): generic diagnostics engine for task distress signals

Replaces the hallucination-specific ``warnings`` / ``RecoverySection``
surface (shipped in PR #20232) with a reusable diagnostic-rule engine
that covers five distress kinds in v1 and can be extended without
touching UI code. The "something's wrong with this task" signal is
no longer limited to phantom card ids.

Closes the follow-up from #20232 discussion.

New module
----------
``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule
engine. Each rule is a pure function of
``(task, events, runs, now, config) -> list[Diagnostic]``. Registry
is a simple list; adding a new distress kind is one function + one
import, no UI or API changes required.

v1 rule set
-----------
* ``hallucinated_cards`` (error) — folds the existing
  ``completion_blocked_hallucination`` event into the new surface.
* ``prose_phantom_refs`` (warning) — folds
  ``suspected_hallucinated_references``.
* ``repeated_spawn_failures`` (error → critical at 2x threshold) —
  fires when ``tasks.spawn_failures >= 3``; suggests
  ``hermes -p <profile> doctor`` / ``auth``.
* ``repeated_crashes`` (error → critical) — fires after N consecutive
  ``crashed`` run outcomes with no successful completion between;
  suggests ``hermes kanban log <id>``.
* ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked``
  state with no comments / unblock attempts; suggests commenting.

Every diagnostic carries structured ``actions`` (reclaim, reassign,
unblock, cli_hint, comment, open_docs) that render consistently in
both CLI and dashboard. Suggested actions are highlighted; generic
recovery actions (reclaim / reassign) are available on every kind as
fallbacks.

Diagnostics auto-clear when the underlying failure resolves — a
clean ``completed``/``edited`` event drops hallucination diagnostics,
a successful run drops crash diagnostics, a comment drops
stuck-blocked diagnostics. Audit events persist; the badge goes away.

API
---
``plugin_api.py``:
* ``/board`` now attaches ``diagnostics`` (full list) and
  ``warnings`` (compact summary with ``highest_severity``) per task.
* ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics
  section auto-opens on flagged tasks.
* NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by
  severity, sorted critical-first.

CLI
---
* NEW ``hermes kanban diagnostics [--severity X] [--task id]
  [--json]`` — fleet view or single-task view, matches dashboard rule
  output so CLI users see the same picture.
* ``hermes kanban show <id>`` now renders a Diagnostics section near
  the top with severity markers + suggested actions.

Dashboard
---------
* Card badge is severity-coloured (⚠ amber warning, !! orange error,
  !!! red critical) using ``warnings.highest_severity``.
* Attention strip above the toolbar counts EVERY task with active
  diagnostics (not just hallucinations), severity-coloured, lists
  affected tasks with Open buttons when expanded.
* Drawer's old ``RecoverySection`` replaced with generic
  ``DiagnosticsSection`` rendering a card per active diagnostic:
  title + detail + structured data (task-id chips when payload keys
  look like id lists) + action buttons. Reassign profile picker is
  inline per-diagnostic. Clipboard fallback uses ``.catch()`` for
  environments where writeText rejects.
* Three-rung severity palette; amber for warning, orange for error,
  red for critical. Uses CSS variables so theming is straightforward.

Tests
-----
* NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests
  covering each rule's positive/negative/threshold paths, severity
  sorting, broken-rule isolation, and sqlite3.Row integration.
* Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty,
  populated, severity-filtered), ``/board`` exposes both diagnostic
  list and compact summary with ``highest_severity``.
* Existing hallucination-specific test (``test_board_surfaces_
  warnings_field_for_hallucinated_completions``) updated to reflect
  the new contract: warning summary keys by diagnostic kind
  (``hallucinated_cards``) not event kind.

379 kanban-suite tests pass (+16 net from this PR).

Live verification
-----------------
Seeded all 5 diagnostic kinds + one clean + one plain-running task
(7 total) into an isolated HERMES_HOME, spun up the dashboard, and
verified:

* Attention strip: shows ``!! 5 tasks need attention`` in the
  error-severity orange; Show expands to a list of 5 rows ordered
  critical > error > warning.
* Card badges: error tasks render ``!!`` orange, warning tasks
  render ``⚠`` amber, clean and plain-running tasks render no badge.
* Each of the 5 rules opens a correctly-coloured, correctly-styled
  diagnostic card in the drawer with its specific suggested action.
* Live reassign from a diagnostic card flipped
  ``broken-ml-worker → alice`` and the drawer refreshed with the
  new assignee + the same diagnostic still firing (correct:
  spawn_failures counter hasn't reset yet).
* CLI ``hermes kanban diagnostics`` prints all 5 in severity order;
  ``--severity error`` narrows to 3; ``kanban show <id>`` includes
  the Diagnostics block at the top with suggested action hint.

Migration note
--------------
The old ``warnings`` shape (``{count, kinds, latest_at}``) is
preserved on the API but ``kinds`` now keys by diagnostic kind
(``hallucinated_cards``) instead of event kind
(``completion_blocked_hallucination``). ``highest_severity`` is a
new required field. The dashboard was the only consumer and has
been updated in the same commit; external API consumers of the
``warnings`` field will need to update their kind-match logic.

* feat(kanban/diagnostics): lead titles with the actual error text

The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn
N times' titles buried the actual cause in the data section. Operators
had to open logs or expand the diagnostic to see WHY the worker is
stuck — rate-limit vs insufficient quota vs bad auth vs context
overflow vs network blip all looked identical at a glance.

New titles:

  Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached
  Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance
  Agent crashed 3x: provider auth error: 401 Unauthorized
  Agent spawn failed 4x: insufficient_quota: You exceeded your current

Detail keeps the full error snippet (capped at 500 chars + ellipsis
for tracebacks). Title takes the first line capped at 160 chars.
Fallback title if no error recorded stays honest ('no error recorded').

Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total
pass (+4).

Live-verified on dashboard with 6 seeded scenarios
(rate-limit, billing, auth, context, network, spawn-billing) —
each card title leads with the actionable error text.

* docs(agent): remove stale BuiltinMemoryProvider references from memory module docstrings

The BuiltinMemoryProvider class was removed from the codebase but its
name lingered in the module-level docstrings of memory_manager.py and
memory_provider.py, creating false expectations:

- memory_manager.py docstring showed example code doing
  add_provider(BuiltinMemoryProvider(...)) which ImportError at runtime
- memory_provider.py docstring listed BuiltinMemoryProvider as
  'always present, not removable' — misleading for new contributors

The regression test (test_memory_user_id.py) already passes without
any reference to BuiltinMemoryProvider; it uses RecordingProvider
instances directly. The stale references were docs-only drift.

Update both docstrings to reflect the actual current architecture:
MemoryManager accepts external plugin providers only (one at a time).

Closes #14402

* docs(plugins): document ctx.dispatch_tool() in plugin capabilities table

* docs(guide): add Dispatch tools from slash commands section

* docs(cron): add context_from chaining section

Resolved merge against current main (new No-agent mode section added in parallel).

Co-authored-by: Tony Simons <tony@tonysimons.dev>

* chore: AUTHOR_MAP entry for asimons81

* feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path

Introduces providers/ package — single source of truth for every
inference provider. Adding a simple api-key provider now requires one
providers/<name>.py file with zero edits anywhere else.

What this PR ships:
- providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes)
- ProviderProfile declarative fields: name, api_mode, aliases, display_name,
  env_vars, base_url, models_url, auth_type, fallback_models, hostname,
  default_headers, fixed_temperature, default_max_tokens, default_aux_model
- 4 overridable hooks: prepare_messages, build_extra_body,
  build_api_kwargs_extras, fetch_models
- chat_completions.build_kwargs: profile path via _build_kwargs_from_profile,
  legacy flag path retained for lmstudio/tencent-tokenhub (which have
  session-aware reasoning probing that doesn't map cleanly to hooks yet)
- run_agent.py: profile path for all registered providers; legacy path
  variable scoping fixed (all flags defined before branching)
- Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS,
  doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER
- GeminiProfile: thinking_config translation (native + openai-compat nested)
- New tests/providers/ (79 tests covering profile declarations, transport
  parity, hook overrides, e2e kwargs assembly)

Deltas vs original PR (salvaged onto current main):
- Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth
  (were added to main since original PR)
- Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their
  reasoning_effort probing has no clean hook equivalent yet)
- Removed lmstudio alias from custom profile (it's a separate provider now)
- Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension
  (resolve_provider special-cases them; adding breaks runtime resolution)
- runtime_provider: profile.api_mode only as fallback when URL detection
  finds nothing (was breaking minimax /v1 override)
- Preserved main's legacy-path improvements: deepseek reasoning_content
  preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M
  beta recovery, etc.
- Kept agent/copilot_acp_client.py in place (rejected PR's relocation —
  main has 7 fixes landed since; relocation would revert them)
- _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing
  test imports

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #14418

* feat(providers): make all 33 providers pluggable under plugins/model-providers/

Every provider profile is now a self-contained plugin under
plugins/model-providers/<name>/, mirroring the plugins/platforms/
pattern established for IRC and Teams. The ProviderProfile ABC
stays in providers/; the per-provider profile data moves out.

- plugins/model-providers/<name>/__init__.py calls register_provider()
- plugins/model-providers/<name>/plugin.yaml declares kind: model-provider
- providers/__init__.py._discover_providers() lazily scans bundled plugins
  then $HERMES_HOME/plugins/model-providers/<name>/ (user override path)
- User plugins with the same name override bundled ones (last-writer-wins
  in register_provider)
- Legacy providers/<name>.py layout still supported for back-compat with
  out-of-tree editable installs
- Hermes PluginManager: new kind=model-provider; skipped like memory
  plugins (providers/ discovery owns them); standalone plugins with
  register_provider+ProviderProfile in their __init__.py auto-coerce to
  this kind (same heuristic as memory providers)
- skip_names extended to include 'model-providers' so the general
  PluginManager doesn't double-scan the category
- 4 new tests in tests/providers/test_plugin_discovery.py covering
  bundled discovery, user override, and general-loader isolation
- Docs updated: website/docs/developer-guide/adding-providers.md,
  provider-runtime.md, providers/README.md, plugins/model-providers/README.md

No API break: auth.py / config.py / doctor.py / models.py / runtime_provider.py /
model_metadata.py / auxiliary_client.py / chat_completions.py / run_agent.py
all still consume providers via get_provider_profile() / list_providers() —
they just now see plugin-discovered entries instead of pkgutil-iterated ones.

Third parties can now drop a single directory into
~/.hermes/plugins/model-providers/<name>/ to add or override an inference
provider without touching the repo.

* docs(cli): expand hermes import reference — add description, warning, and examples

* docs(bedrock): fix IAM permissions, add quickstart entry, add fallback provider, fix deployment section

* docs: fix Camofox Docker setup instructions

* docs(providers): Together/Groq/Perplexity cookbook via custom_providers

Three worked recipes for OpenAI-compatible cloud providers, plus the
Copilot HTTP 401 auto-recovery info block and the GMI Cloud row in the
compatible providers table. All three additions were on the original
docs/custom-providers-cookbook branch but its merge base predated 1186
main commits, making the rebase impractical (84k+ line conflict).

Replays just the providers.md additions onto current main.

* fix(tui): close slash parity gaps with CLI (#20339)

* fix(tui): close slash parity gaps with CLI

Route unsupported /skills subcommands through slash.exec, support /new <name>
titles, and handle /redraw natively so TUI behavior matches classic CLI. Also
filter gateway-only commands out of the TUI catalog while keeping /status
discoverable.

* fix(tui): run remaining CLI parity paths natively

Forward chat launch flags into the TUI runtime and handle live-session status
and skill reloads in the gateway process so TUI state no longer depends on the
slash worker's stale CLI instance.

* fix(tui): block stale snapshot restores

Prevent snapshot restore from running through the isolated slash worker because
it mutates disk state without refreshing the live TUI agent.

* chore: uptick

* fix(tui): guard async session title updates

Handle failures from the fire-and-forget session.title RPC so title-setting errors do not surface as unhandled promise rejections while preserving session-scoped messaging.

* docs(gemini): add Google Gemini guide

* chore: AUTHOR_MAP entry for jethac

* docs: align terminal-backend count and naming across docs and code

README:24 claimed "Six terminal backends" while tools/environments/ exposes
seven top-level backend choices through TERMINAL_ENV: local, docker, ssh,
singularity, modal, daytona, vercel_sandbox. Modal additionally has direct
and Nous-managed modes selected via terminal.modal_mode (the
ManagedModalEnvironment class is a Modal sub-mode, not a separate top-level
backend).

The same drift appeared in five other doc and code-comment sites with
inconsistent counts (six, seven, or implicit) and varying lists. Updated
all sites to a consistent seven-backend list in canonical order. The
configuration guide also clarifies how Modal's two modes are selected so
operators do not search for a non-existent backend: managed_modal value.

CONTRIBUTING.md:160 lists six backend filenames in a code tree but does
not carry the "Six terminal" prose; left out of scope per cohesion sweep
guidance to bundle only identical wording.

Files updated:
- README.md (line 24, marketing copy)
- website/docs/index.md (line 49, landing page)
- website/docs/user-guide/configuration.md (line 86, config guide)
- tools/environments/__init__.py (lines 3-6, package docstring)
- tools/file_operations.py (line 6, module docstring)
- environments/README.md (line 43, RL training docs — TERMINAL_ENV list)

* chore: AUTHOR_MAP entry for deep-name

* docs: refresh stale platform/LOC/test counts; clarify gateway vs plugin platforms

AGENTS.md is the AI-assistant entry doc, so its counts get used as ground
truth. Several values had drifted, and the same drift had spread to a few
user-facing surfaces. Fixing all of them in one commit so the count claims
agree and clearly distinguish gateway-core from plugin-shipped platforms.

AGENTS.md:
- run_agent.py "~12k LOC" → "~14k LOC as of 2026-05-03" (actual 14,097)
- cli.py     "~11k LOC" → "~12k LOC as of 2026-05-03" (actual 12,043)
- tools/environments/ list now lists all 7 user-selectable terminal backends
  in canonical order, matching tools/terminal_tool.py:2214-2215
- gateway/platforms/ list adds yuanbao and wecom_callback; the 19 names
  match the user-facing list at website/docs/integrations/index.md
- plugins/ tree now mentions plugins/platforms/ (irc, teams)
- tests/ snapshot "~15k tests across ~700 files as of Apr 2026" →
  "~19k tests across ~890 files as of 2026-05-03"

User-facing count claims:
- hermes_cli/tips.py:195 — "19 platforms" → "21 messaging platforms" with
  IRC and Microsoft Teams added to the named list
- website/docs/index.md:49 — "6 terminal backends" → "7 terminal backends:
  ..., Vercel Sandbox" (also corrected by PR #19044; same edit content)
- website/docs/index.md:50 — "15+ platforms from one gateway" → "21+ messaging
  platforms (19 in the gateway, plus IRC and Microsoft Teams via plugins)"
- website/docs/integrations/index.md:83-85 — "15+ messaging platforms" → "19+",
  added yuanbao to the linked list. The surrounding text scopes it to "configured
  through the same gateway subsystem", so plugin platforms (IRC, Teams) are
  intentionally not in this list
- website/scripts/generate-llms-txt.py:205 — "15+ platforms" → "21+ messaging
  platforms — 19 native to the gateway plus IRC and Microsoft Teams via plugins"

LOC and date stamps follow the existing AGENTS.md "as of <date>" convention
(line 56 already used this pattern). Source of truth for the gateway count is
gateway/config.py:130-148 (PlatformID enum); plugin platforms live in
plugins/platforms/.

Out of scope:
- RELEASE_v0.9.0.md historical "16 platforms" claim (immutable history)
- userStories.json verbatim user quotes
- Programmatic count generation from gateway/config.py + plugin manifests
  is a worthwhile build-system change but separate from these content fixes

* docs(skills): explain restoring bundled skills

* docs(docker): add section on connecting to local inference servers (vLLM, Ollama)

Adds a comprehensive guide for connecting Dockerized Hermes to local
inference servers like vLLM and Ollama, covering:
- Docker Compose networking (recommended)
- Standalone Docker run with host.docker.internal / --network host
- Connectivity verification steps
- Ollama-specific example

Closes #12308

* docs(docker): document API_SERVER_* env vars for exposing the OpenAI-compatible endpoint

Salvage of #11758. The PR's original diff was stale (the Docker Compose section on main has been heavily refactored — dashboard is now an embedded side-process, not a separate service), so the useful bit (API server env var requirements) is applied as a note on the basic `docker run` example.

Co-authored-by: xiangyong <xiangyong@zspace.cn>

* chore: AUTHOR_MAP entry for CES4751

* docs(discord): fix Server Members Intent + SSRC-mapping drift; add /voice join slash Choice

Salvage of #11350. Kept:
- Code: add an explicit /voice join Choice in the slash UI (runner accepts both 'join' and 'channel' but only 'channel' was in autocomplete).
- Docs: Server Members Intent is conditional (only needed if DISCORD_ALLOWED_USERS contains usernames); SSRC → user_id mapping uses the voice websocket SPEAKING opcode, not the Members intent.

Dropped from the original PR:
- HERMES_DISCORD_VOICE_PACKET_DUMP — this env var doesn't exist on main (it was in a different PR that isn't merged).
- DISCORD_PROXY docs — already documented on current main.
- DISCORD_ALLOW_MENTION_* docs — already on main.
- "barge-in mode" rewrite — current main actually does pause the listener during TTS (VoiceReceiver.pause() at discord.py:192); there is no barge_in_guard/barge_in_rms on main.

Co-authored-by: Michel Belleau <michel.belleau@malaiwah.com>

* docs(skills): modernize Obsidian file workflows

* chore: AUTHOR_MAP entry for counterposition

* docs(kanban): document handoff evidence metadata

* chore: AUTHOR_MAP entry for Fearvox

* docs: clarify Telegram group chat troubleshooting

* docs(codex): clarify OAuth auth prerequisite

* docs(voice): add Doubao speech integration examples (TTS + STT)

* chore: AUTHOR_MAP entry for Hypnus-Yuan

* docs(faq): use messaging extra for gateway deps

* chore: AUTHOR_MAP entry for xsfX20

* fix(kanban): unify failure counter across spawn/timeout/crash outcomes (#20410)

The dispatcher's circuit breaker only protected against spawn-side
failures (profile missing, workspace mount error, exec failure).
Workers that successfully spawned but then timed out or crashed
re-queued to ``ready`` with no counter increment, so the next tick
re-spawned them — loops forever until someone noticed. Reported
externally on Twitter (Forbidden Seeds) and confirmed by walking the
kernel: ``enforce_max_runtime`` flipped the task back to ready, emitted
a ``timed_out`` event, and never touched ``spawn_failures``; same for
``detect_crashed_workers``.

Fix: unify the counter across all non-success outcomes.

Schema
------
* ``tasks.spawn_failures`` → ``tasks.consecutive_failures``
* ``tasks.last_spawn_error`` → ``tasks.last_failure_error``
* Migration renames the columns in-place on existing DBs (``ALTER
  TABLE RENAME COLUMN`` — SQLite >= 3.25) so historical counter
  values are preserved. Row mappers fall through to the legacy names
  if both column renames and a migration somehow got out of sync.

Counter lifecycle
-----------------
New helper ``_record_task_failure(conn, task_id, error, *, outcome,
release_claim, end_run, event_payload_extra)`` is the single point
every non-success outcome funnels through:

* ``spawn_failed``  → ``_record_spawn_failure`` (kept as alias)
  calls it with ``release_claim=True, end_run=True`` — transitions
  running→ready, clears claim, closes run.
* ``timed_out`` → ``enforce_max_runtime`` already does the status
  transition + run close + event emission, then calls
  ``_record_task_failure`` with ``release_claim=False, end_run=False``
  just to bump the counter (and trip the breaker if needed).
* ``crashed`` → ``detect_crashed_workers`` same pattern, but the
  counter increment runs after the main write_txn closes (SQLite
  doesn't nest write transactions).

If the counter hits the breaker threshold (``DEFAULT_FAILURE_LIMIT=5``,
same as before), the task transitions to ``blocked`` with a ``gave_up``
event on top of whatever outcome-specific event was already emitted.

Reset semantics changed: the counter now clears only on successful
``complete_task`` (and operator ``reclaim_task`` — an explicit "I've
looked at this, try again with a fresh budget"). Previously
``_clear_spawn_failures`` ran on every successful spawn, which would
have wiped the counter before a timeout could accumulate past threshold
— exactly the loop this fix prevents.

Diagnostics
-----------
* ``_rule_repeated_spawn_failures`` → ``_rule_repeated_failures``. Now
  fires regardless of which outcome is at fault. Classifies the most
  recent failure (spawn_failed / timed_out / crashed) from the run
  history so the title ("Agent timeout x3", "Agent crash x4", "Agent
  spawn x5") and suggested action (``doctor`` for spawn, ``log`` for
  timeout/crash) stay outcome-specific without N duplicate rules.
* ``_rule_repeated_crashes`` kept as a narrower early-warning at
  threshold 2 (vs 3 for the unified rule), but now suppresses itself
  when the unified rule would also fire — avoids double-flagging.
* Diagnostic ``data`` payload now carries
  ``{consecutive_failures, most_recent_outcome, last_error}`` instead
  of spawn-specific keys.

CLI
---
* ``Task.consecutive_failures`` / ``Task.last_failure_error`` are the
  public fields now. Existing callers that referenced the old names
  get migrated (tests updated in this commit).
* Backward-compat: ``DEFAULT_SPAWN_FAILURE_LIMIT``,
  ``_clear_spawn_failures``, ``_record_spawn_failure`` stay as aliases.

Tests
-----
* 6 new kernel tests: timeout increments counter, 3 consecutive
  timeouts trip the breaker (was the reported gap), crash increments
  counter, reclaim clears counter, completion clears counter, spawn
  success does NOT clear counter.
* Diagnostic tests: updated ``repeated_spawn_failures`` cases to use
  the new kind name and add a timeout-loop test.
* Dashboard API test: spawn_failures column update → consecutive_failures.

389/389 kanban-suite tests pass.

Live verification
-----------------
Seeded 4 tasks in an isolated HERMES_HOME: 3 timeouts, 4 crashes,
2-spawn-failed + 2-timed-out, and a task that had prior failures but
completed successfully. Board correctly shows "!! 3 tasks need
attention" (the successful one has no badge because the counter
reset). Drawer for the timeout-loop task renders "Agent timeout x3"
with most_recent_outcome=timed_out and the "Check logs" suggested
action (not the spawn-flavoured "Verify profile"). The successful
task has zero diagnostics.

Closes the Forbidden-Seeds-reported gap.

* docs(guides): add guide for running Hermes locally with Ollama

Step-by-step guide covering Ollama installation, model selection,
Hermes configuration, speed optimization, and optional gateway bot
setup — all running on local hardware with zero API cost.

Includes hardware requirements, model comparison table with tool-call
support status, context window tuning, GPU offloading tips, fallback
provider setup, troubleshooting, and cost comparison.

* chore: AUTHOR_MAP entry for binhnt92

* docs: add Open WebUI bootstrap script

* chore: AUTHOR_MAP entry for acesjohnny

* docs(browser): document WSL-to-Windows Chrome MCP bridge

* chore: AUTHOR_MAP entry for liu-collab

* docs(i18n): add zh-Hans Tool Gateway, image gen, and Windows WSL guide

Made-with: Cursor

* docs: add Chinese (zh-CN) README translation

Closes #12954

- Add README.zh-CN.md with complete Simplified Chinese translation
- Add language switcher badge in README.md linking to Chinese version
- Add language switcher badge in README.zh-CN.md linking to English version

* chore: AUTHOR_MAP entry for zhanggttry

* docs: update VS Code setup instructions for ACP Client integration

* chore: AUTHOR_MAP entry for formulahendry

* test(kanban): cover metadata handoff round-trip

* feat(gateway): respect kanban.max_spawn config to limit concurrent tasks

The dispatch_once function already accepts a max_spawn parameter but the
gateway was calling it without passing any value, effectively ignoring
the configuration. This change reads kanban.max_spawn from config.yaml
and passes it through, allowing users to limit concurrent kanban tasks.

This prevents resource exhaustion scenarios where kanban dispatcher
spawns too many parallel workers on constrained hardware.

* guard kanban worker lifecycle by run id

* chore(release): AUTHOR_MAP entries for momowind and misery-hl

* feat(hindsight): probe API for update_mode='append' support, dedupe across processes

Mirrors the pattern already shipping in hindsight-integrations/openclaw:
probe `<api_url>/version` once per process, gate on Hindsight ≥ 0.5.0.
When supported, retains use a stable session-scoped `document_id`
(`session_id`) plus `update_mode='append'` so cross-process retains for
the same session merge into one document instead of producing
N-different-process-stamped duplicates. When unsupported (or probe
fails), fall back to the existing per-process unique
`f"{session_id}-{start_ts}"` document_id with no `update_mode` — the
resume-overwrite fix (#6654) keeps working unchanged on legacy servers.

Closes the dedup half of #20115. The proposed `document_id_strategy`
config knob isn't needed: auto-detection via the same /version probe
the OpenClaw plugin already uses gives the same outcome with no extra
config burden, and the choice is purely a function of what the server
can do.

Plumbing
--------
- Module-level helpers (`_meets_minimum_version`, `_fetch_hindsight_api_version`,
  `_check_api_supports_update_mode_append`) cache the result per api_url
  so every provider in the process gets one /version round-trip.
- One-time WARN logged when the API is older than 0.5.0, telling the
  user to upgrade for cross-session deduplication.
- New instance helper `_resolve_retain_target(fallback_doc_id)` returns
  `(document_id, update_mode)` based on cached capability. Wired into
  `sync_turn` and the `on_session_switch` flush path.
- For local_embedded mode, the probe URL is taken from the running
  client (`client.url`) so we hit the actual daemon port rather than
  the configured default.
- `update_mode` is set on the per-item dict; `aretain_batch` already
  threads `item['update_mode']` into the API call.

Tests
-----
- `TestUpdateModeAppendCapability` (5 cases): legacy fallback, modern
  stable+append, per-url cache, one-time warn, flush-on-switch resolves
  against the OLD session.
- Existing `_make_hindsight_provider` factory in the manager-side test
  file extended to seed `_mode`/`_api_url`/`_api_key`/`_client` and stub
  `_resolve_retain_target` so the bypass-init pattern keeps working.

E2E verified against installed `~/.hermes/hermes-agent`:
- Legacy probe (unreachable host) → `legacy-session-<ts>` doc_id,
  no `update_mode`.
- Modern probe (live local_embedded 0.5.6 daemon) → stable
  `modern-session` doc_id + `update_mode='append'`.
- `test_hermes_embedded_smoke.py` passes (90s).

* fix(api_server): SSE token batching + error handling for Open WebUI performance

Reduces SSE event rate ~500/turn → ~20/turn via 50ms text-delta batching in
_dispatch(), which eliminates markdown re-render storms on Open WebUI. Also:

- Trim tool_call.arguments in the response.completed event to 100KB
  (prevents silent hangs on 848KB+ single-line SSE events).
- Catch-all exception handlers in _write_sse_responses() + _write_sse_chat_completion()
  emit a proper error chunk instead of TransferEncodingError from incomplete
  chunked encoding when the agent crashes mid-stream.
- MAX_REQUEST_BYTES 1MB → 10MB; pass client_max_size to aiohttp Application to
  avoid silent 400s on truncated request bodies for long conversations.

Salvage of #17552 (api_server portion only). The contrib/openwebui-filter/
payload from that PR — Open WebUI Filter Function + benchmark writeup — is
a client-side user-installable add-on and doesn't need to live in the repo;
dropped here. Closes #17537.

Co-authored-by: bogerman1 <93757150+bogerman1@users.noreply.github.com>

* chore: AUTHOR_MAP entry for bogerman1

* feat(i18n): add French (fr) locale support

- Add fr.yaml with French translations for approval prompts and gateway messages
- Register 'fr' in SUPPORTED_LANGUAGES
- Add French aliases: french, français, fr-fr, fr-be, fr-ca, fr-ch
- Update locale sync comment in en.yaml

* feat(i18n): add Ukrainian locale

* chore: AUTHOR_MAP entry for olisikh

* arcee temperature + compression

* test(arcee): cover Trinity Large Thinking temperature + compression overrides

Salvage follow-up for PR #20344:
- AUTHOR_MAP entry for rob-maron (required by CI)
- 17 parametrized tests covering _is_arcee_trinity_thinking,
  _fixed_temperature_for_model Trinity override, and
  _compression_threshold_for_model, including sibling-model negatives
  (trinity-large-preview, trinity-mini) and the OpenRouter slug form.

* fix(doctor): report Kanban worker tools as runtime-gated

* fix(kanban): accept created_cards linked as child of completing task

Widens _verify_created_cards to also accept ids that are children of the
completing task in task_links. Previously we only accepted cards where
created_by matched the completing task's assignee, which was too strict
for legitimate orchestrator flows: a specifier creates a card (so
created_by=specifier, not worker), then a worker picks it up and passes
parents=[current_task] to kanban_create. The explicit link proves the
relationship and should be trusted.

Salvaged from #20022 @LeonSGP43 (full PR superseded by #20232 +
this patch; the linked-children relaxation was the portable
improvement).

* fix(kanban): measure max runtime from current run

* test(kanban): backdate task_runs.started_at alongside tasks.started_at

After #19473 landed (enforce_max_runtime reads from task_runs.started_at
rather than tasks.started_at), a regression test added earlier still
only backdated the tasks column. Backdate both so the test is robust
regardless of which column the enforcer reads from.

* fix(kanban): prevent child task dispatch when parent is not done

Add parent dependency guard to _set_status_direct so dragging
a task to the ready column is rejected (409) when its parents
are not all done. Previously the guard only existed in
recompute_ready, allowing direct status writes via the
dashboard API to bypass the dependency engine.

Root cause: after reclaiming stale workers, both T3 and T4
were set to ready via dashboard status writes in quick
succession, causing the writer to be spawned while the analyst
was blocked — upstream work wasn't done yet.

* feat(kanban): surface task_runs.summary on dashboard cards + ``kanban show``

The kanban-worker skill (built into the gateway dispatcher's spawn
prompt) instructs every worker to hand off via
``kanban_complete(summary=..., metadata=...)``. That writes the summary
onto the closing ``task_runs`` row, NOT onto ``tasks.result`` — the
latter is left NULL unless the caller passes ``result=`` explicitly.

Result: a glance at the dashboard or ``hermes kanban show <id>`` shows
a blank "Result:" section even when the worker did real work, which
on 2026-05-05 caused a Mac false-alarm ("Hermes did nothing") on a
task that had a 10-line completion summary on its run.

This patch surfaces the latest non-null run summary as
``latest_summary`` so the worker's actual handoff lands in front of
operators.

* New helpers ``kanban_db.latest_summary(conn, task_id)`` and
  ``kanban_db.latest_summaries(conn, task_ids)``. The batch variant
  uses a single window-function SELECT so the dashboard board endpoint
  doesn't pay an N+1 cost on multi-hundred-task boards.
* CLI ``hermes kanban show <id>`` prints a "Latest summary:" block
  when ``tasks.result`` is empty but a run has produced a summary
  (the existing "Result:" section still wins when populated, so the
  back-compat path for hand-edited results is untouched). JSON output
  gains a top-level ``latest_summary`` field.
* Dashboard ``/board`` and ``/tasks/{id}`` now include a
  ``latest_summary`` field on every task. Cards on /board carry a
  200-character preview (cheap to render, plenty for "what did this
  worker do?" at a glance); the drawer/detail endpoint returns the
  full summary.
* Five new tests cover: empty-runs case, post-complete surface,
  newest-of-multiple selection, empty-string skip, batch with
  missing tasks + empty input.

Smoke-tested locally against the live profile DB on the three
acceptance-criterion targets (t_f08fef91 cron-hygiene-audit,
t_007b7f1c EMA-analysis, t_05746fa4 self-assessment) — all three now
return their populated summaries via both ``latest_summary`` and
``latest_summaries``.

Test plan: 255/255 kanban tests pass + 91/91 dashboard plugin tests
pass. No regression on tasks where ``tasks.result`` is explicitly
populated (the existing "Result:" branch is preserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(kanban): wire dependency selects

* chore(release): AUTHOR_MAP entries for suncokret12 and mioimotoai-lgtm

* feat(i18n): add Turkish (tr) locale

- Add locales/tr.yaml with Turkish translations for all approval.* and gateway.* keys
- Register 'tr' in SUPPORTED_LANGUAGES
- Add Turkish aliases: turkish, türkçe, tr-tr

* fix: add Turkish locale references in config, tests, and docs

- hermes_cli/config.py: add tr to supported languages comment
- locales/en.yaml: add tr to locale file list comment
- tests/agent/test_i18n.py: add Turkish alias tests + explicit lang test
- website/docs/user-guide/configuration.md: add tr to supported values

* docs: document custom model aliases for /model command (#20475)

User-defined model aliases (config.yaml model_aliases: and
model.aliases.*) have worked since early versions but were entirely
undocumented. Add a dedicated 'Custom model aliases' section to
slash-commands.md covering both YAML config formats and the
'hermes config set' shell form, mirror a shorter version into the
configuring-models 'Alternative methods' section, and cross-link from
the two /model table rows.

Flagged by @weehowe on Twitter — he wasn't aware the feature existed.

* feat(models): add deepseek/deepseek-v4-pro to OpenRouter + Nous Portal curated lists (#20495)

Endpoint re-tested over 6 conversational turns (9 API calls, 3 tool calls)
and an 8-request burst — no rate limits, no errors, ~2-3s latency. The
historical rate-limit issues that caused its removal are gone.

- hermes_cli/models.py: add to OPENROUTER_MODELS and _PROVIDER_MODELS['nous']
- website/static/api/model-catalog.json: regenerated via build_model_catalog.py

* feat(models): add x-ai/grok-4.3 to OpenRouter + Nous Portal curated lists (#20497)

Endpoint validated over 6 conversational turns with tool calls (9 API
calls, 3 tool calls, 0 failures) and an 8-request burst (8/8 ok,
0 rate limits). Latency ~5-10s/call — slower than grok-4.20 but
expected for a reasoning model.

- hermes_cli/models.py: add to OPENROUTER_MODELS and _PROVIDER_MODELS['nous']
- website/static/api/model-catalog.json: regenerated

* fix: salvage batch — compaction guidance, memory authority, cache eviction after compression

- Fix /compact → /compress in context-overflow tips (closes #20020)
- Evict cached agent after session hygiene and /compress so system
  prompt refreshes with current SOUL.md, memory, and skills
- Restore memory authority across compaction: change 'informational
  background data' to 'authoritative reference data' in memory block
  and SUMMARY_PREFIX, with backward-compatible regex

Based on:
- PR #20027 by @LeonSGP43
- PR #18767 by @MacroAnarchy
- PR #17380 by @vominh1919

PR #17121 boundary marker fix already merged to main (2eef395e1).
PR #9262 user-message anchoring already on main via _ensure_last_user_message_in_tail().

* feat(browser): add Lightpanda engine support with automatic Chrome fallback

Add Lightpanda as an optional browser engine for local mode.
Lightpanda is a headless browser built from scratch in Zig -- faster
navigation than Chrome with significantly less memory.

One config line to enable:
  browser:
    engine: lightpanda

New functions in browser_tool.py:
- _get_browser_engine() -- config/env reader with validation + caching
- _should_inject_engine() -- only inject in local non-cloud mode
- _needs_lightpanda_fallback() -- detect empty/failed LP results
- _chrome_fallback_screenshot() -- temporary Chrome session for screenshots
- Engine injection in _run_browser_command (--engine flag)
- browser_vision pre-routes screenshots to Chrome when engine=lightpanda

Config:
- browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome)
- AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS
- /browser status shows engine info in local mode

Rebased from PR #7144 onto current main. All existing code preserved --
pure additions only (+520/-2).

25 new tests + 81 total browser tests pass (0 failures).

* fix(browser): surface Lightpanda Chrome fallback warnings

* feat(tui): collapsible sections in startup banner (skills, system prompt, MCP)

The TUI SessionPanel banner now uses collapsible \u25b8/\u25be toggle
sections matching the existing Chevron convention used for runtime
agent details. Skills, system prompt, and MCP server lists are
collapsed by default; tools remain expanded as the most actionable
info.

- tui_gateway/server.py: _session_info() now passes agent._cached_system_prompt
  through to the TUI frontend
- ui-tui/src/types.ts: added system_prompt?: string to SessionInfo
- ui-tui/src/components/branding.tsx: rewrote SessionPanel with
  CollapseToggle helper + per-section useState toggles

Default states: tools=open, skills=collapsed, system=collapsed,
mcp=collapsed. Clicking any \u25b8/\u25be header toggles that section.

* fix(tui): collapse long system messages in transcript with expand toggle

System messages over 400 chars (system prompt, AGENTS.md, etc.) now
render as a collapsed \u25b8/\u25be toggle line in the transcript, matching
the Chevron convention used for runtime details. The summary shows
the first line + char count; clicking expands to full content.

* fix(browser): tighten Lightpanda fallback edge cases

* fix(gateway): preserve model picker current context

* fix(update): drop pip --quiet so slow installs don't look hung (#20679)

On Termux/Android aarch64 (and other platforms without prebuilt wheels
for some optional extras), 'pip install -e .[all]' compiles C/Rust
extensions from source. This can run for several minutes with zero
network activity and — with --quiet — zero stdout. Users report
'hermes update hangs at Updating Python dependencies', Ctrl+C it, then
re-run and see 'up to date' (because git pull already succeeded and the
pip step was still working when they interrupted).

Pip's default output is proportional to actual work (one line per
Collecting / Building wheel for X / Installing), so removing --quiet
costs nothing on fast hardware and prevents the false-hang interrupt
loop on slow hardware.

Reported via Discord on Termux/Android. Supersedes #20466 which
misdiagnosed the hang as PYTHONPATH shadowing (install.sh doesn't run
during 'hermes update', and terminal() doesn't inherit PYTHONPATH).

* fix(cli): guard logger.debug in signal handler (#13710 regression) (#20673)

CPython's logging module is not reentrant-safe.  `Logger.isEnabledFor`
caches level results in `Logger._cache`; under shutdown races the cache
can be cleared (`Logger._clear_cache`, triggered by logging config changes
from another thread) or mid-mutation when a signal fires, raising
`KeyError: <level_int>` (e.g. `KeyError: 10` for DEBUG) inside the signal
handler.

When that happens, the KeyError escapes before the `raise KeyboardInterrupt()`
on the next line can fire, which bypasses prompt_toolkit's normal interrupt
unwind and surfaces as the EIO cascade originally reported in #13710.

Issue #13710 shipped two defenses (asyncio exception handler + outer
`except (KeyError, OSError)` with EIO suppression) that cover the EIO
unwind path.  This patch closes the remaining escape hatch: the
`logger.debug` call at the top of `_signal_handler` itself.  Wrap it in a
bare `try/except Exception: pass` so logging can never raise through a
signal handler.

Observed in the wild: debug report on 0.12.0 (commit 8163d371) shows the
exact stack — KeyError: 10 at logging/__init__.py:1742 inside the
signal handler's `logger.debug`, followed by the EIO cascade from
prompt_toolkit's emergency flush.

Tests: adds `TestSignalHandlerLoggingRace` to
`tests/hermes_cli/test_suppress_eio_on_interrupt.py` with 6 new cases:
- normal path still raises KeyboardInterrupt
- KeyError(10) from logger.debug does not escape
- any Exception from logger.debug is swallowed
- agent.interrupt still fires when logger.debug raises
- agent.interrupt raising also does not escape
- BaseException (SystemExit) is NOT swallowed — guard uses `except Exception`
  deliberately so real shutdown signals still propagate

Closes #13710 regression.

* fix: harden install.sh against inherited Python env leakage

* chore: AUTHOR_MAP entry for adybag14-cyber

* fix(ui): reduce status-line jitter while scrolling

* fix(tui): stabilize FaceTicker elapsed width to prevent composer drift

* fix(tui): restore gap before duration when verb segment is hidden

The verb-padding change dropped the leading space in durationSegment on
the assumption that the verb's trailing pad always supplies the gap. But
the unicode spinner style sets showVerb=false, making verbSegment an
empty string — in that mode the output would become `{frame}· {duration}`
with no separator. Add the space back; harmless when the verb segment
is shown (its trailing pad still provides the gap).

* chore(release): map liuguangyong@hellobike -> liuguangyong93

* fix(kanban): reset code element background inside board

The Nous DS globals.css applies a global rule:
  code { background: var(--midground); color: var(--background); }

This paints an opaque cream/yellow fill on every <code> element,
which hides text in the kanban drawer's event-payload, run-meta,
and worker-log panes (all rendered as <code>).

Fix: scope a reset inside .hermes-kanban so <code> elements inherit
their parent's color and stay transparent.

* fix(cli): recover classic CLI output after resize

* feat(skills): add shop-app personal shopping assistant (optional) (#20702)

Port Shop.app's upstream SKILL.md (https://shop.app/SKILL.md) into
optional-skills/productivity/shop-app/ with Hermes-native adaptations:

- Proper Hermes frontmatter (name, description<=60 chars, version,
  author, license, prerequisites, metadata.hermes tags + related_skills
  + homepage + upstream)
- Swap Shop.app's bespoke 'message()' tool references for Hermes
  conventions: gateway adapters handle platform formatting, so the
  skill just writes markdown (no Telegram/WhatsApp/iMessage sections
  referencing a tool Hermes doesn't ship)
- Name Hermes tools where relevant: curl via 'terminal', HTML policy
  pages via 'web_extract', try-on via 'image_generate'
- Reframe session state as 'hold in your reasoning context for this
  conversation only' and forbid writing tokens to .env / disk — matches
  Hermes ephemeral-memory discipline
- Drop NO_REPLY convention (Shop-app-runtime specific)
- Trigger-first description so the skill loader picks it up when the
  user wants to search products, track orders, returns, or reorder

* feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709)

Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.

Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:

1. Each working directory got its own full shadow git repo — no object
   dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
   /rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
   without ever asking for /rollback.

Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.

Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
  refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
  per-project metadata (store/projects/<hash>.json with workdir +
  created_at + last_touch). On first v2 init, any pre-v2 per-directory
  shadow repos are auto-migrated into legacy-<timestamp>/ so the new
  store starts clean. _prune() now actually rewrites the per-project ref
  to the last max_snapshots commits and runs git gc --prune=now. New
  _enforce_size_cap() drops oldest commits round-robin across projects
  when the store exceeds max_total_size_mb. _drop_oversize_from_index()
  filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
  (status / list / prune / clear / clear-legacy) for managing the store
  outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
  auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
  Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
  *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
  AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
  coverage for the pre-v2 prune path (per-directory shadow repos under
  base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
  shared store, new defaults, migration, and the new CLI;
  reference/cli-commands.md documents 'hermes checkpoints'.

E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
  7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
  --retention-days 0' removes its ref, index, and metadata; GC reclaims
  the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
  tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
  CLI without an agent running.

Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.

* fix(cli): catch OSError in _resolve_attachment_path to prevent ENAMETOOLONG dropping long slash commands

When the user pastes a long slash command like \`/goal <long prose>\` into
\`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose
\`starts_like_path\` prefilter accepts anything starting with \`/\` and
forwards it to \`_resolve_attachment_path()\`. That helper calls
\`Path.exists()\` which invokes \`os.stat()\`, which raises
\`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the
candidate exceeds NAME_MAX (typically 255 bytes).

The OSError propagates up to the broad \`except Exception\` in
\`process_loop\` (cli.py:11798), gets logged at WARNING level, and the
user's input is silently dropped. From the user's POV the chat prompt
hangs — the only signal is in agent.log:

  WARNING cli: process_loop unhandled error (msg may be lost):
    [Errno 63] File name too long: "/goal Drive the space board..."

This affects any slash command with prose-length arguments — \`/goal\`
in particular but also \`/skill\`, \`/cron\`, custom user commands.

Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so
structurally-invalid path candidates cleanly return None. The slash-
command dispatch path downstream (cli.py:11718) then handles the
input correctly.

Tests: two new regression cases in test_cli_file_drop.py cover the
original \`/goal\` reproducer and a synthetic long path. All 35 file-
drop tests pass.

Reproducer (without the fix):
  python -c "from cli import _detect_file_drop;
             _detect_file_drop('/goal ' + 'a'*300)"
  → OSError: [Errno 63] File name too long

* chore(release): map cleo@edaphic.xyz → curiouscleo

Follow-up to the salvaged fix for /goal ENAMETOOLONG drop — adds
AUTHOR_MAP entry so the release script resolves the commit author to
the correct GitHub user.

* docs(wsl2): expand Windows (WSL2) guide — filesystem, networking, services, pitfalls (#20748)

Replaces the 22-line stub with a ~320-line guide covering the parts of the
Windows/WSL2 split that specifically affect Hermes users:

- Why WSL2 (and not native Windows)
- Install: distro choice, WSL1→2, systemd via /etc/wsl.conf
- Filesystem boundary: /mnt/c vs \\wsl$, perf/perms/watchers/case,
  wslpath/wslview, CRLF + git core.autocrlf, clone-where guidance
- Networking in both directions:
  - WSL → Windows services: links to the canonical WSL2 Networking section
    in integrations/providers.md (mirrored mode, NAT + host IP, bind addr,
    firewall) instead of duplicating
  - Windows/LAN → Hermes in WSL: mirrored vs NAT, netsh portproxy one-liner,
    firewall rule, webhook tunneling pointer
- Long-running services: systemd gateway + Task Scheduler wsl.exe --exec
  'sleep infinity' to keep the VM alive at login
- GPU passthrough: NVIDIA works, AMD/Intel out of matrix
- Common pitfalls: connection refused, /mnt/c slowness, CRLF ^M,
  UNC warnings, post-sleep clock drift, mirrored-mode DNS with VPN,
  PATH, Defender scanning, VHDX disk reclaim

All internal links use site-absolute /docs/... form (matches the rest of
user-guide/); all seven link targets verified to exist.

* docs: pluggable surfaces coverage — model-provider guide, full plugin map, opt-in fix (#20749)

* docs(providers): add model-provider-plugin authoring guide + fix stale refs

New docs:
- website/docs/developer-guide/model-provider-plugin.md — full authoring
  guide (directory layout, minimal example, ProviderProfile fields,
  overridable hooks, user overrides, api_mode selection, auth types,
  testing, pip distribution)
- Wired into website/sidebars.ts under 'Extending'
- Cross-references added in:
  - guides/build-a-hermes-plugin.md (tip block)
  - developer-guide/adding-providers.md
  - developer-guide/provider-runtime.md

User guide:
- user-guide/features/plugins.md: Plugin types table grows from 3 to 4
  with 'Model providers' row

Stale comment cleanup (providers/*.py → plugins/model-providers/<name>/):
- hermes_cli/main.py:_is_profile_api_key_provider docstring
- hermes_cli/doctor.py:_build_apikey_providers_list docstring
- hermes_cli/auth.py: PROVIDER_REGISTRY + alias auto-extension comments
- hermes_cli/models.py: CANONICAL_PROVIDERS auto-extension comment

AGENTS.md:
- Project-structure tree: added plugins/model-providers/ row
- New section: 'Model-provider plugins' explaining discovery, override
  semantics, PluginManager integration, kind auto-coerce heuristic

Verified: docusaurus build succeeds, new page renders, all 3 cross-links
resolve. 347/347 targeted tests pass (tests/providers/,
tests/hermes_cli/test_plugins.py, tests/hermes_cli/test_runtime_provider_resolution.py,
tests/run_agent/test_provider_parity.py).

* docs(plugins): add 'pluggable interfaces at a glance' maps to plugins.md + build-a-hermes-plugin

Devs landing on either the user-guide plugin page or the build-a-plugin
guide now get an upfront table of every distinct pluggable surface with
a link to the right authoring doc. Previously they'd have to read the
full general-plugin guide to discover that model providers / platforms
/ memory / context engines are separate systems.

user-guide/features/plugins.md:
- New 'Pluggable interfaces — where to go for each' section below the
  existing…
RationallyPrime pushed a commit to RationallyPrime/hermes-agent that referenced this pull request May 8, 2026
PR NousResearch#12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
nickdlkk pushed a commit to nickdlkk/hermes-agent that referenced this pull request May 11, 2026
PR NousResearch#12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
rmulligan pushed a commit to rmulligan/hermes-agent that referenced this pull request May 11, 2026
PR NousResearch#12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
JinyuID pushed a commit to JinyuID/hermes-agent that referenced this pull request May 11, 2026
PR NousResearch#12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…ousResearch#12473)

External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).

Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.

Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.

Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
  dispatch branch in _handle_webhook when deliver_only=true. Startup
  validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
  subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
  Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
  with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
  agent bypass, template rendering, status codes, HMAC still enforced,
  idempotency still applies, rate limit still applies, startup
  validation, and direct-deliver dispatch.

Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.

Closes the gap identified in PR NousResearch#12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
PR NousResearch#12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ousResearch#12473)

External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).

Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.

Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.

Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
  dispatch branch in _handle_webhook when deliver_only=true. Startup
  validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
  subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
  Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
  with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
  agent bypass, template rendering, status codes, HMAC still enforced,
  idempotency still applies, rate limit still applies, startup
  validation, and direct-deliver dispatch.

Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.

Closes the gap identified in PR NousResearch#12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
PR NousResearch#12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#12473)

External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).

Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.

Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.

Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
  dispatch branch in _handle_webhook when deliver_only=true. Startup
  validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
  subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
  Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
  with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
  agent bypass, template rendering, status codes, HMAC still enforced,
  idempotency still applies, rate limit still applies, startup
  validation, and direct-deliver dispatch.

Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.

Closes the gap identified in PR NousResearch#12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
PR NousResearch#12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant