Skip to content

feat(profiles): --no-skills flag for empty profile creation#20986

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-1871cfcd
May 7, 2026
Merged

feat(profiles): --no-skills flag for empty profile creation#20986
teknium1 merged 1 commit into
mainfrom
hermes/hermes-1871cfcd

Conversation

@teknium1

@teknium1 teknium1 commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Lets users create a profile with zero bundled skills via hermes profile create <name> --no-skills. A .no-bundled-skills marker file in the profile root keeps hermes update from re-seeding skills on every update cycle.

Requested by @hiut1u on Twitter — use case is orchestrator profiles and narrow-task profiles where the 100+ bundled skills just dilute attention and require repeated manual deletion.

Changes

  • hermes_cli/profiles.py: create_profile() gains no_skills: bool; writes .no-bundled-skills marker; mutually exclusive with --clone / --clone-all (cloning explicitly copies skills). New has_bundled_skills_opt_out() helper + NO_BUNDLED_SKILLS_MARKER constant. seed_profile_skills() short-circuits on opted-out profiles and returns {skipped_opt_out: True}.
  • hermes_cli/main.py: adds --no-skills flag to hermes profile create; hermes update's all-profile skill-sync loop displays "opted out (--no-skills)" for marked profiles instead of re-seeding.
  • hermes_cli/web_server.py: POST /api/profiles accepts no_skills: bool for dashboard parity.
  • tests/hermes_cli/test_profiles.py: 6 new tests (TestNoSkillsOptOut) cover marker write, mutex with --clone/--clone-all, seed opt-out, fresh-profile unaffected, delete-marker-re-enables-seeding.

Validation

Before After
New profile 100+ bundled skills auto-seeded, no way to skip --no-skills creates empty profile
After hermes update Skills re-seeded, user has to delete again Marker honored, orchestrator: opted out (--no-skills)
Opt back in n/a rm .no-bundled-skills → re-seeded on next update

Targeted test run: scripts/run_tests.sh tests/hermes_cli/test_profiles.py tests/hermes_cli/test_cmd_update.py tests/hermes_cli/test_web_server.py249 passed.

E2E verified: real python -m hermes_cli.main profile create orchestrator --no-skills writes marker, skills/ stays empty; then seed_profile_skills() across all profiles reports orchestrator: opted out while coder: +89 new.

Opt-back-in path: delete .no-bundled-skills from the profile root — next hermes update seeds normally (also covered by test).

Closes: community request from @hiut1u

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-1871cfcd vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 7448 on HEAD, 7447 on base (🆕 +1)

🆕 New issues (1):

Rule Count
unresolved-attribute 1
First entries
tests/hermes_cli/test_profiles.py:348: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `None` in union `dict[Unknown, Unknown] | None`

✅ Fixed issues: none

Unchanged: 3913 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/feature New feature or request comp/cli CLI entry point, hermes_cli/, setup wizard P3 Low — cosmetic, nice to have labels May 7, 2026
Comment thread hermes_cli/profiles.py Dismissed
Comment thread hermes_cli/profiles.py Dismissed
@teknium1 teknium1 merged commit 51f9953 into main May 7, 2026
10 of 12 checks passed
@teknium1 teknium1 deleted the hermes/hermes-1871cfcd branch May 7, 2026 11:34
bot-ted added a commit to bot-ted/hermes-agent that referenced this pull request May 7, 2026
* feat(browser): add Lightpanda engine support with automatic Chrome fallback

Add Lightpanda as an optional browser engine for local mode.
Lightpanda is a headless browser built from scratch in Zig -- faster
navigation than Chrome with significantly less memory.

One config line to enable:
  browser:
    engine: lightpanda

New functions in browser_tool.py:
- _get_browser_engine() -- config/env reader with validation + caching
- _should_inject_engine() -- only inject in local non-cloud mode
- _needs_lightpanda_fallback() -- detect empty/failed LP results
- _chrome_fallback_screenshot() -- temporary Chrome session for screenshots
- Engine injection in _run_browser_command (--engine flag)
- browser_vision pre-routes screenshots to Chrome when engine=lightpanda

Config:
- browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome)
- AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS
- /browser status shows engine info in local mode

Rebased from PR #7144 onto current main. All existing code preserved --
pure additions only (+520/-2).

25 new tests + 81 total browser tests pass (0 failures).

* fix(browser): surface Lightpanda Chrome fallback warnings

* feat(tui): collapsible sections in startup banner (skills, system prompt, MCP)

The TUI SessionPanel banner now uses collapsible \u25b8/\u25be toggle
sections matching the existing Chevron convention used for runtime
agent details. Skills, system prompt, and MCP server lists are
collapsed by default; tools remain expanded as the most actionable
info.

- tui_gateway/server.py: _session_info() now passes agent._cached_system_prompt
  through to the TUI frontend
- ui-tui/src/types.ts: added system_prompt?: string to SessionInfo
- ui-tui/src/components/branding.tsx: rewrote SessionPanel with
  CollapseToggle helper + per-section useState toggles

Default states: tools=open, skills=collapsed, system=collapsed,
mcp=collapsed. Clicking any \u25b8/\u25be header toggles that section.

* fix(tui): collapse long system messages in transcript with expand toggle

System messages over 400 chars (system prompt, AGENTS.md, etc.) now
render as a collapsed \u25b8/\u25be toggle line in the transcript, matching
the Chevron convention used for runtime details. The summary shows
the first line + char count; clicking expands to full content.

* fix(browser): tighten Lightpanda fallback edge cases

* fix(gateway): preserve model picker current context

* fix(update): drop pip --quiet so slow installs don't look hung (#20679)

On Termux/Android aarch64 (and other platforms without prebuilt wheels
for some optional extras), 'pip install -e .[all]' compiles C/Rust
extensions from source. This can run for several minutes with zero
network activity and — with --quiet — zero stdout. Users report
'hermes update hangs at Updating Python dependencies', Ctrl+C it, then
re-run and see 'up to date' (because git pull already succeeded and the
pip step was still working when they interrupted).

Pip's default output is proportional to actual work (one line per
Collecting / Building wheel for X / Installing), so removing --quiet
costs nothing on fast hardware and prevents the false-hang interrupt
loop on slow hardware.

Reported via Discord on Termux/Android. Supersedes #20466 which
misdiagnosed the hang as PYTHONPATH shadowing (install.sh doesn't run
during 'hermes update', and terminal() doesn't inherit PYTHONPATH).

* fix(cli): guard logger.debug in signal handler (#13710 regression) (#20673)

CPython's logging module is not reentrant-safe.  `Logger.isEnabledFor`
caches level results in `Logger._cache`; under shutdown races the cache
can be cleared (`Logger._clear_cache`, triggered by logging config changes
from another thread) or mid-mutation when a signal fires, raising
`KeyError: <level_int>` (e.g. `KeyError: 10` for DEBUG) inside the signal
handler.

When that happens, the KeyError escapes before the `raise KeyboardInterrupt()`
on the next line can fire, which bypasses prompt_toolkit's normal interrupt
unwind and surfaces as the EIO cascade originally reported in #13710.

Issue #13710 shipped two defenses (asyncio exception handler + outer
`except (KeyError, OSError)` with EIO suppression) that cover the EIO
unwind path.  This patch closes the remaining escape hatch: the
`logger.debug` call at the top of `_signal_handler` itself.  Wrap it in a
bare `try/except Exception: pass` so logging can never raise through a
signal handler.

Observed in the wild: debug report on 0.12.0 (commit 8163d371) shows the
exact stack — KeyError: 10 at logging/__init__.py:1742 inside the
signal handler's `logger.debug`, followed by the EIO cascade from
prompt_toolkit's emergency flush.

Tests: adds `TestSignalHandlerLoggingRace` to
`tests/hermes_cli/test_suppress_eio_on_interrupt.py` with 6 new cases:
- normal path still raises KeyboardInterrupt
- KeyError(10) from logger.debug does not escape
- any Exception from logger.debug is swallowed
- agent.interrupt still fires when logger.debug raises
- agent.interrupt raising also does not escape
- BaseException (SystemExit) is NOT swallowed — guard uses `except Exception`
  deliberately so real shutdown signals still propagate

Closes #13710 regression.

* fix: harden install.sh against inherited Python env leakage

* chore: AUTHOR_MAP entry for adybag14-cyber

* fix(ui): reduce status-line jitter while scrolling

* fix(tui): stabilize FaceTicker elapsed width to prevent composer drift

* fix(tui): restore gap before duration when verb segment is hidden

The verb-padding change dropped the leading space in durationSegment on
the assumption that the verb's trailing pad always supplies the gap. But
the unicode spinner style sets showVerb=false, making verbSegment an
empty string — in that mode the output would become `{frame}· {duration}`
with no separator. Add the space back; harmless when the verb segment
is shown (its trailing pad still provides the gap).

* chore(release): map liuguangyong@hellobike -> liuguangyong93

* fix(kanban): reset code element background inside board

The Nous DS globals.css applies a global rule:
  code { background: var(--midground); color: var(--background); }

This paints an opaque cream/yellow fill on every <code> element,
which hides text in the kanban drawer's event-payload, run-meta,
and worker-log panes (all rendered as <code>).

Fix: scope a reset inside .hermes-kanban so <code> elements inherit
their parent's color and stay transparent.

* fix(cli): recover classic CLI output after resize

* feat(skills): add shop-app personal shopping assistant (optional) (#20702)

Port Shop.app's upstream SKILL.md (https://shop.app/SKILL.md) into
optional-skills/productivity/shop-app/ with Hermes-native adaptations:

- Proper Hermes frontmatter (name, description<=60 chars, version,
  author, license, prerequisites, metadata.hermes tags + related_skills
  + homepage + upstream)
- Swap Shop.app's bespoke 'message()' tool references for Hermes
  conventions: gateway adapters handle platform formatting, so the
  skill just writes markdown (no Telegram/WhatsApp/iMessage sections
  referencing a tool Hermes doesn't ship)
- Name Hermes tools where relevant: curl via 'terminal', HTML policy
  pages via 'web_extract', try-on via 'image_generate'
- Reframe session state as 'hold in your reasoning context for this
  conversation only' and forbid writing tokens to .env / disk — matches
  Hermes ephemeral-memory discipline
- Drop NO_REPLY convention (Shop-app-runtime specific)
- Trigger-first description so the skill loader picks it up when the
  user wants to search products, track orders, returns, or reorder

* feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709)

Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.

Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:

1. Each working directory got its own full shadow git repo — no object
   dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
   /rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
   without ever asking for /rollback.

Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.

Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
  refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
  per-project metadata (store/projects/<hash>.json with workdir +
  created_at + last_touch). On first v2 init, any pre-v2 per-directory
  shadow repos are auto-migrated into legacy-<timestamp>/ so the new
  store starts clean. _prune() now actually rewrites the per-project ref
  to the last max_snapshots commits and runs git gc --prune=now. New
  _enforce_size_cap() drops oldest commits round-robin across projects
  when the store exceeds max_total_size_mb. _drop_oversize_from_index()
  filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
  (status / list / prune / clear / clear-legacy) for managing the store
  outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
  auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
  Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
  *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
  AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
  coverage for the pre-v2 prune path (per-directory shadow repos under
  base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
  shared store, new defaults, migration, and the new CLI;
  reference/cli-commands.md documents 'hermes checkpoints'.

E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
  7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
  --retention-days 0' removes its ref, index, and metadata; GC reclaims
  the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
  tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
  CLI without an agent running.

Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.

* fix(cli): catch OSError in _resolve_attachment_path to prevent ENAMETOOLONG dropping long slash commands

When the user pastes a long slash command like \`/goal <long prose>\` into
\`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose
\`starts_like_path\` prefilter accepts anything starting with \`/\` and
forwards it to \`_resolve_attachment_path()\`. That helper calls
\`Path.exists()\` which invokes \`os.stat()\`, which raises
\`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the
candidate exceeds NAME_MAX (typically 255 bytes).

The OSError propagates up to the broad \`except Exception\` in
\`process_loop\` (cli.py:11798), gets logged at WARNING level, and the
user's input is silently dropped. From the user's POV the chat prompt
hangs — the only signal is in agent.log:

  WARNING cli: process_loop unhandled error (msg may be lost):
    [Errno 63] File name too long: "/goal Drive the space board..."

This affects any slash command with prose-length arguments — \`/goal\`
in particular but also \`/skill\`, \`/cron\`, custom user commands.

Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so
structurally-invalid path candidates cleanly return None. The slash-
command dispatch path downstream (cli.py:11718) then handles the
input correctly.

Tests: two new regression cases in test_cli_file_drop.py cover the
original \`/goal\` reproducer and a synthetic long path. All 35 file-
drop tests pass.

Reproducer (without the fix):
  python -c "from cli import _detect_file_drop;
             _detect_file_drop('/goal ' + 'a'*300)"
  → OSError: [Errno 63] File name too long

* chore(release): map cleo@edaphic.xyz → curiouscleo

Follow-up to the salvaged fix for /goal ENAMETOOLONG drop — adds
AUTHOR_MAP entry so the release script resolves the commit author to
the correct GitHub user.

* docs(wsl2): expand Windows (WSL2) guide — filesystem, networking, services, pitfalls (#20748)

Replaces the 22-line stub with a ~320-line guide covering the parts of the
Windows/WSL2 split that specifically affect Hermes users:

- Why WSL2 (and not native Windows)
- Install: distro choice, WSL1→2, systemd via /etc/wsl.conf
- Filesystem boundary: /mnt/c vs \\wsl$, perf/perms/watchers/case,
  wslpath/wslview, CRLF + git core.autocrlf, clone-where guidance
- Networking in both directions:
  - WSL → Windows services: links to the canonical WSL2 Networking section
    in integrations/providers.md (mirrored mode, NAT + host IP, bind addr,
    firewall) instead of duplicating
  - Windows/LAN → Hermes in WSL: mirrored vs NAT, netsh portproxy one-liner,
    firewall rule, webhook tunneling pointer
- Long-running services: systemd gateway + Task Scheduler wsl.exe --exec
  'sleep infinity' to keep the VM alive at login
- GPU passthrough: NVIDIA works, AMD/Intel out of matrix
- Common pitfalls: connection refused, /mnt/c slowness, CRLF ^M,
  UNC warnings, post-sleep clock drift, mirrored-mode DNS with VPN,
  PATH, Defender scanning, VHDX disk reclaim

All internal links use site-absolute /docs/... form (matches the rest of
user-guide/); all seven link targets verified to exist.

* docs: pluggable surfaces coverage — model-provider guide, full plugin map, opt-in fix (#20749)

* docs(providers): add model-provider-plugin authoring guide + fix stale refs

New docs:
- website/docs/developer-guide/model-provider-plugin.md — full authoring
  guide (directory layout, minimal example, ProviderProfile fields,
  overridable hooks, user overrides, api_mode selection, auth types,
  testing, pip distribution)
- Wired into website/sidebars.ts under 'Extending'
- Cross-references added in:
  - guides/build-a-hermes-plugin.md (tip block)
  - developer-guide/adding-providers.md
  - developer-guide/provider-runtime.md

User guide:
- user-guide/features/plugins.md: Plugin types table grows from 3 to 4
  with 'Model providers' row

Stale comment cleanup (providers/*.py → plugins/model-providers/<name>/):
- hermes_cli/main.py:_is_profile_api_key_provider docstring
- hermes_cli/doctor.py:_build_apikey_providers_list docstring
- hermes_cli/auth.py: PROVIDER_REGISTRY + alias auto-extension comments
- hermes_cli/models.py: CANONICAL_PROVIDERS auto-extension comment

AGENTS.md:
- Project-structure tree: added plugins/model-providers/ row
- New section: 'Model-provider plugins' explaining discovery, override
  semantics, PluginManager integration, kind auto-coerce heuristic

Verified: docusaurus build succeeds, new page renders, all 3 cross-links
resolve. 347/347 targeted tests pass (tests/providers/,
tests/hermes_cli/test_plugins.py, tests/hermes_cli/test_runtime_provider_resolution.py,
tests/run_agent/test_provider_parity.py).

* docs(plugins): add 'pluggable interfaces at a glance' maps to plugins.md + build-a-hermes-plugin

Devs landing on either the user-guide plugin page or the build-a-plugin
guide now get an upfront table of every distinct pluggable surface with
a link to the right authoring doc. Previously they'd have to read the
full general-plugin guide to discover that model providers / platforms
/ memory / context engines are separate systems.

user-guide/features/plugins.md:
- New 'Pluggable interfaces — where to go for each' section below the
  existing 4-kinds table
- 10 rows covering every register_* surface (tool, hook, slash command,
  CLI subcommand, skill, model provider, platform, memory, context
  engine, image-gen)
- Explicit note: TTS/STT are NOT plugin-extensible yet — documented
  with a pointer to the current config.yaml 'command providers' pattern
  and a note that register_tts_provider()/register_stt_provider() may
  come later

guides/build-a-hermes-plugin.md:
- New :::info 'Not sure which guide you need?' map at the top so devs
  see all pluggable interfaces before investing in this 737-line
  general-plugin walkthrough
- Existing bottom :::tip expanded to include platform adapters alongside
  model/memory/context plugins

Verified:
- All 8 cross-doc links in the new plugins.md table resolve in a
  docusaurus build (SUCCESS, no new broken links)
- TTS link corrected (features/voice → features/tts; latter exists)
- Pre-existing broken links/anchors (cron-script-only, llms.txt,
  adding-platform-adapters#step-by-step-checklist) are unchanged

* docs(plugins): correct TTS/STT pluggability \u2014 they ARE plugins (command-providers)

Previous commit incorrectly said TTS/STT 'aren't plugin-extensible'. They
are, via the config-driven command-provider pattern \u2014 any CLI that reads
text and writes audio (or vice versa for STT) is automatically a plugin
with zero Python. The tts.md docs cover this extensively and I missed it.

plugins.md:
- TTS row: 'Config-driven (not a Python plugin)', points at
  tts.md#custom-command-providers
- STT row: points at tts.md#voice-message-transcription-stt (STT docs
  live in tts.md despite the filename)
- Expanded note: TTS/STT use config-driven shell-command templates as
  their plugin surface (full tts.providers.<name> registry for TTS;
  HERMES_LOCAL_STT_COMMAND escape hatch for STT)
- Any CLI that reads/writes files is automatically a plugin \u2014 no Python
  register_* API needed
- Future register_tts_provider()/register_stt_provider() hooks mentioned
  as nice-to-have for SDK/streaming cases, not as the primary story

build-a-hermes-plugin.md:
- Same map update: TTS/STT rows explicit, footer note corrected

Verified:
- tts.md anchors (custom-command-providers, voice-message-transcription-stt)
  exist and resolve in docusaurus build (SUCCESS, no new broken links)

* docs(plugins): expand pluggable interfaces table with MCP / event hooks / shell hooks / skill taps

Broadened the scope beyond Python register_* hooks. Hermes has MULTIPLE
plugin-style extension surfaces; they're now all in one table instead of
being scattered across feature docs.

Added rows for:
- **MCP servers** — config.yaml mcp_servers.<name> auto-registers external
  tools from any MCP server. Huge extensibility surface, previously not
  linked from the plugin map.
- **Gateway event hooks** — drop HOOK.yaml + handler.py into
  ~/.hermes/hooks/<name>/ to fire on gateway:startup, session:*, agent:*,
  command:* events. Separate from Python plugin hooks.
- **Shell hooks** — hooks: block in config.yaml runs shell commands on
  events (notifications, auditing, etc.).
- **Skill sources (taps)** — hermes skills tap add <repo> to pull in new
  skill registries beyond the built-in sources.

Both docs updated:
- user-guide/features/plugins.md: table column renamed to 'How' (mixes
  Python API + config-driven + drop-in-dir surfaces accurately)
- guides/build-a-hermes-plugin.md: :::info map at top mirrors the new
  surfaces with a forward-link to the consolidated table

Note block rewritten: instead of singling out TTS/STT as the 'different
style' exception, now honestly describes that Hermes deliberately
supports three plugin styles — Python APIs, config-driven commands, and
drop-in manifest directories — and devs should pick the one that fits
their integration.

Not included (considered and rejected):
- Transport layer (register_transport) — internal, not user-facing
- Tool-call parsers — internal, VLLM phase-2 thing
- Cloud browser providers — hardcoded registry, not drop-in yet
- Terminal backends — hardcoded if/elif, not drop-in yet
- Skill sources (the ABC) — hardcoded list, only taps are user-extensible

Verified:
- All 5 new anchors resolve (gateway-event-hooks, shell-hooks, skills-hub,
  custom-command-providers, voice-message-transcription-stt)
- Docusaurus build SUCCESS, zero new broken links
- Same 3 pre-existing broken links on main (cron-script-only, llms.txt,
  adding-platform-adapters#step-by-step-checklist)

* docs(plugins): cover every pluggable surface in both the overview and how-to

Both plugins.md and build-a-hermes-plugin.md now cover every extension
surface end-to-end \u2014 general plugin APIs, specialized plugin types,
config-driven surfaces \u2014 with concrete authoring patterns for each.

plugins.md:
- 'What plugins can do' table grows from 9 rows (general ctx.register_*
  only) to 14 rows covering register_platform, register_image_gen_provider,
  register_context_engine, MemoryProvider subclass, register_provider
  (model). Each row links to its full authoring guide.
- New 'Plugin sub-categories' section under Plugin Discovery explains
  how plugins/platforms/, plugins/image_gen/, plugins/memory/,
  plugins/context_engine/, plugins/model-providers/ are routed to
  different loaders \u2014 PluginManager vs the per-category own-loader
  systems.
- Explicit mention of user-override semantics at
  ~/.hermes/plugins/model-providers/ and ~/.hermes/plugins/memory/.

build-a-hermes-plugin.md:
- New '## Specialized plugin types' section (5 sub-sections):
  - Model provider plugins \u2014 ProviderProfile + plugin.yaml example,
    auto-wiring summary, link to full guide
  - Platform plugins \u2014 BasePlatformAdapter + register_platform() skeleton
  - Memory provider plugins \u2014 MemoryProvider subclass example
  - Context engine plugins \u2014 ContextEngine subclass example
  - Image-generation backends \u2014 ImageGenProvider + kind: backend example
- New '## Non-Python extension surfaces' section (5 sub-sections):
  - MCP servers \u2014 config.yaml mcp_servers.<name> example
  - Gateway event hooks \u2014 HOOK.yaml + handler.py example
  - Shell hooks \u2014 hooks: block in config.yaml example
  - Skill sources (taps) \u2014 hermes skills tap add example
  - TTS / STT command templates \u2014 tts.providers.<name> with type: command
- Distribute via pip / NixOS promoted from ### to ## (they were orphaned
  after the reorganization)

Each specialized / non-Python section has a concrete, copy-pasteable
example plus a 'Full guide:' link to the authoritative doc. Devs arriving
at the build-a-hermes-plugin guide now see every extension surface at
their disposal, not just the general tool/hook/slash-command surface.

Verified:
- Docusaurus build SUCCESS, zero new broken links
- All new cross-links (developer-guide/model-provider-plugin,
  adding-platform-adapters, memory-provider-plugin, context-engine-plugin,
  user-guide/features/mcp, skills#skills-hub, hooks#gateway-event-hooks,
  hooks#shell-hooks, tts#custom-command-providers,
  tts#voice-message-transcription-stt) resolve
- Same 3 pre-existing broken links on main (cron-script-only, llms.txt,
  adding-platform-adapters#step-by-step-checklist)

* docs(plugins): fix opt-in inconsistency — not every plugin is gated

The 'Every plugin is disabled by default' statement was wrong. Several
plugin categories intentionally bypass plugins.enabled:

- Bundled platform plugins (IRC, Teams) auto-load so shipped gateway
  channels are available out of the box. Activation per channel is via
  gateway.platforms.<name>.enabled.
- Bundled backends (plugins/image_gen/*) auto-load so the default
  backend 'just works'. Selection via <category>.provider config.
- Memory providers are all discovered; one is active via memory.provider.
- Context engines are all discovered; one is active via context.engine.
- Model providers: all 33 discovered at first get_provider_profile();
  user picks via --provider / config.

The plugins.enabled allow-list specifically gates:
- Standalone plugins (general tools/hooks/slash commands)
- User-installed backends
- User-installed platforms (third-party gateway adapters)
- Pip entry-point backends

Which matches the actual code in hermes_cli/plugins.py:737 where the
bundled+backend/platform check bypasses the allow-list.

Rewrote '## Plugins are opt-in' to:
- Retitle to 'Plugins are opt-in (with a few exceptions)'
- Narrow opening claim to 'General plugins and user-installed backends
  are disabled by default'
- Added 'What the allow-list does NOT gate' subsection with a full
  table of which bypass the gate and how they're activated instead
- Fixed migration section wording (bundled platform/backend plugins
  never needed grandfathering)

Verified: docusaurus build SUCCESS, zero new broken links.

* change: enable ruff/ty

* feat(ci): add typecheck (warnings only in CI)

* feat(skills/linear): add Documents support + Python helper script (#20752)

* feat(skills/linear): add Documents support + Python helper script

The bundled Linear skill (PR #1230) covered issues, projects, teams, and
workflow states via curl. It had no coverage for Linear's Documents API,
so fetching an RFC/doc from a linear.app URL required hand-writing
GraphQL against an underdocumented schema.

Adds:
- Documents section in SKILL.md explaining slugId extraction from URLs,
  the contentState (markdown) vs contentState (ProseMirror) split, and
  four canonical curl examples (fetch by slugId, fetch by UUID, list
  recent, title-search).
- scripts/linear_api.py — stdlib-only Python CLI wrapping the most
  common operations (whoami, list-teams, list/get/search/create/update
  issues, add-comment, update-status, list/get/search documents, raw
  GraphQL passthrough). Zero deps, reads LINEAR_API_KEY from env.

Auth header quirk (personal key takes bare $LINEAR_API_KEY, no Bearer
prefix) is already documented in the skill.

Found during RFC review: the existing skill's lack of document support
forced falling back to the browser (which hit Linear's login wall).
Also fixes a schema gotcha — the Document field is `contentState`, not
`contentData` (which returns 400).

Tested end-to-end against the production API:
  python3 linear_api.py whoami
  python3 linear_api.py get-document 38359beef67c
Both return expected payloads.

* fix(skills/linear): point LINEAR_API_KEY setup to the correct page

The org-level Settings > API page (/settings/api) only shows OAuth apps
and workspace-member keys. Personal API keys live under Account,
Security, access (/settings/account/security). Update both the setup
link in config.py (shown during hermes setup) and the setup step in
SKILL.md so users land on the page that can create a personal key.

* docs(plugins): close the gaps \u2014 image-gen-provider-plugin guide + publishing a skill tap (#20800)

Two pluggable surfaces were mentioned in the interfaces map without a
real authoring guide behind them:

1. **Image-gen backends** — only had 'See bundled examples' pointers.
   Now a full developer-guide/image-gen-provider-plugin.md (270 lines)
   mirroring the memory/context/model provider docs:
   - How discovery works, directory structure, plugin.yaml
   - ImageGenProvider ABC with every overridable method
     (name, display_name, is_available, list_models, default_model,
     get_setup_schema, generate)
   - Full authoring walkthrough with a working MyBackendImageGenProvider
   - Response-format reference (success_response / error_response)
   - Handling b64 vs URL output (save_b64_image helper)
   - User overrides at ~/.hermes/plugins/image_gen/<name>/
   - Testing recipe + pip distribution
   - Reference examples (openai, openai-codex, xai)

2. **Skill taps** — features/skills.md mentioned the CLI commands but
   never explained the repo contract for publishing a tap. Added
   'Publishing a custom skill tap' section under Skills Hub covering:
   - Repo layout (skills/<name>/SKILL.md by default)
   - Minimal working example
   - Non-default path configuration (taps.json)
   - Installing individual skills without subscribing
   - Trust-level handling
   - Full tap management CLI + in-session /skills tap commands

Wired into:
- website/sidebars.ts: image-gen-provider-plugin added to Extending group
- website/docs/user-guide/features/plugins.md: pluggable interfaces
  table + 'What plugins can do' table now link to the real guides
  instead of 'See bundled examples'
- website/docs/guides/build-a-hermes-plugin.md: top info map and
  inline sub-sections updated, 'Full guide:' line added to
  image-gen block, tap section mentions publishing

Verified: docusaurus build SUCCESS, new page renders at
/docs/developer-guide/image-gen-provider-plugin, anchor
#publishing-a-custom-skill-tap resolves from plugins.md +
build-a-hermes-plugin.md. Pre-existing zh-Hans broken links unchanged.

* fix(opencode-go): keep users on opencode-go instead of hijacking to native providers (#20802)

OpenCode Go and OpenCode Zen are flat-namespace model resellers — their
/v1/models returns bare IDs (deepseek-v4-flash, minimax-m2.7), and the
inference API rejects vendor-prefixed names with HTTP 401 'Model not
supported'. Two bugs fixed:

1. `switch_model` in hermes_cli/model_switch.py was silently switching the
   user off opencode-go to native deepseek when they typed
   `/model deepseek-v4-flash`. Step d found the model in opencode-go's live
   catalog, but step e (detect_provider_for_model) still ran and matched
   the bare name against deepseek's static catalog. Fix: track whether
   the live catalog resolved it; skip step e when it did.

2. `normalize_model_for_provider` in hermes_cli/model_normalize.py only
   stripped the exact `opencode-zen/` prefix, leaving arbitrary vendor
   prefixes like `minimax/minimax-m2.7` (commonly copied from aggregator
   slugs into fallback_model configs) intact — causing HTTP 401s when
   the fallback chain activated. Fix: opencode-go/opencode-zen strip ANY
   leading vendor prefix because their APIs are flat-namespace.

Tests: 11 new cases in tests/hermes_cli/test_opencode_go_flat_namespace.py
covering both normalization (prefix stripping, regression guards for
opencode-zen Claude hyphenation and openrouter vendor-prepending) and
switch_model (bare-name resolution on opencode-go's live catalog must
not trigger cross-provider hijack).

Reported by @Ufonik via Discord; Kimi K2.6 always worked because moonshotai
has no overlapping entry in a native provider's static catalog. Deepseek
and minimax failed because their v4/v2.7 names existed in the native
deepseek/minimax catalogs.

* feat(dashboard): add 'default-large' built-in theme with 18px base size (#20820)

Same Hermes Teal palette as the default theme, but with baseSize 18px,
lineHeight 1.65, and spacious density so the whole dashboard scales up.
Gives users a one-click bigger-text preset and a copyable reference for
authoring custom YAML themes with their own typography settings.

* refactor(web): per-capability backend selection for search/extract split

Introduce the foundation for independently selecting web search and
extract backends — enabling future combinations like SearXNG for
search + Firecrawl for extract.

Architecture:
- tools/web_providers/base.py: WebSearchProvider and WebExtractProvider
  ABCs with normalized result contracts (mirrors CloudBrowserProvider)
- tools/web_tools.py: _get_search_backend() and _get_extract_backend()
  read per-capability config keys, fall through to shared web.backend
- hermes_cli/config.py: web.search_backend and web.extract_backend in
  DEFAULT_CONFIG (empty = inherit from web.backend)

Behavioral change:
- web_search_tool() now dispatches via _get_search_backend()
- web_extract_tool() now dispatches via _get_extract_backend()
- When per-capability keys are empty (default), behavior is identical
  to before — _get_search_backend() falls through to _get_backend()

This is purely structural — no new backends are added. SearXNG and
other search-only/extract-only providers can now be added as simple
drop-in modules in follow-up PRs.

12 new tests, 49 existing tests pass with zero regressions.

Ref: #19198

* feat(web): add SearXNG as a native search-only backend

Adds SearXNG as a free, self-hosted web search provider.  SearXNG is a
privacy-respecting metasearch engine that requires no API key — just a
running instance and SEARXNG_URL pointing at it.

## What this adds

- `tools/web_providers/searxng.py` — `SearXNGSearchProvider` implementing
  `WebSearchProvider` (search only; no extract capability)
- `_is_backend_available("searxng")` — gates on SEARXNG_URL
- `_get_backend()` — accepts "searxng" as a configured value; adds it to
  auto-detect candidates (lower priority than paid services)
- `web_search_tool` — dispatches to SearXNG when it is the active backend
- `check_web_api_key()` — includes SearXNG in availability check
- `OPTIONAL_ENV_VARS["SEARXNG_URL"]` — registered with tools=["web_search"]
- `tools_config.py` — SearXNG appears in the `hermes tools` provider picker
- `nous_subscription.py` — `direct_searxng` detection, web_active / web_available
- `setup.py` — SEARXNG_URL listed in the missing-credential hint
- 23 tests covering: is_configured, happy-path search, score sorting, limit,
  HTTP/request errors, _is_backend_available, _get_backend, check_web_api_key

## Config

```yaml
# Use SearXNG for search, any paid provider for extract
web:
  search_backend: "searxng"
  extract_backend: "firecrawl"

# Or: SearXNG as the sole backend (web_extract will use the next available)
web:
  backend: "searxng"
```

SearXNG is search-only — it does not implement WebExtractProvider.  Users
who only configure SEARXNG_URL get web_search available; web_extract falls
back to the next available extract provider (or is unavailable if none).

Closes #19198 (Phase 2 Task 4 — SearXNG provider)
Ref: #11562 (original SearXNG PR)

* docs+skill: add searxng-search optional skill and documentation

Closes the remaining gaps from PR #11562 that weren't covered by the
core SearXNG integration landed in #20823.

- optional-skills/research/searxng-search/ — installable skill with
  SKILL.md (curl-based usage, category support, Python example) and
  searxng.sh helper script for health checks and instance queries
- website/docs/user-guide/configuration.md — SearXNG added to the
  Web Search Backends section (5 backends, backend table, per-capability
  split config example, correct search-only note)
- website/docs/reference/environment-variables.md — SEARXNG_URL row
- website/docs/reference/optional-skills-catalog.md — searxng-search entry

The core SearXNG code, OPTIONAL_ENV_VARS, hermes tools picker, and tests
were already on main via #20823.  This commit is purely additive docs +
the optional skill scaffold.

Credits from #11562 salvage:
  @w4rum — original _searxng_search structure
  @nathansdev — tools_config.py integration
  @moyomartin — category support and result formatting
  @0xMihai — config/env var approach
  @nicobailon — skill and documentation structure
  @searxng-fan — error handling patterns
  @local-first — self-hosted-first philosophy and docs

* docs: add Web Search + Extract feature page with SearXNG setup guide

* fix(feishu): keep topic replies in threads

Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.

* chore: follow-up cleanup for Feishu topic thread fix

- Remove dead metadata.get('reply_to') fallback in _send_raw_message;
  nothing in the codebase ever sets 'reply_to' inside a metadata dict —
  the key only appears as a top-level send_voice() keyword argument
- Simplify _status_thread_metadata construction in run.py to use a
  single dict literal instead of create-then-mutate pattern; the
  or-{} guard was dead since source.thread_id implies _progress_thread_id
  is also set for Feishu
- Add yuqian@zmetasoft.com to AUTHOR_MAP for contributor attribution

* fix(kanban): avoid fragile failure-column renames

* chore: follow-up cleanup for Kanban migration fix

- Expand migration comment to name the primary failure mode (missing
  column OperationalError from #20842) ahead of the secondary SQLite
  schema-reparse concern; also document the stale-cols-snapshot invariant
- Add clarifying comments on from_row() legacy fallback branches noting
  they are belt-and-suspenders dead code post-migration
- Add task_events comment in existing test explaining why the table is
  required by the migrator
- Add test_legacy_migration_no_legacy_columns_at_all: Scenario A —
  explicitly asserts the exact #20842 crash no longer occurs and that
  consecutive_failures defaults to 0 on a DB that never had spawn_failures
- Add test_legacy_migration_both_columns_already_present: Scenario D —
  asserts the migration is a no-op when both columns already exist,
  preserving the existing counter value

* fix(tui): bound virtual history offset searches

* ci(docker): don't cancel overlapping builds, guard :latest

Switch top-level concurrency to cancel-in-progress=false so every push
to main gets its own SHA-tagged image published — no more discarded
builds when commits land back-to-back.

Guard the :latest tag with a second job that has its own concurrency
group with cancel-in-progress=true plus a git-ancestor check against
the revision label on the current :latest. Together these guarantee
:latest only ever moves forward in history: a slower run whose commit
isn't a descendant of the current :latest refuses to clobber it, and
a newer push mid-way through the move-latest job preempts the older
one before it can retag.

- Every main push publishes nousresearch/hermes-agent:sha-<commit>
  with an org.opencontainers.image.revision label embedded.
- move-latest job reads that label off :latest, runs merge-base
  --is-ancestor, and only retags (via buildx imagetools create,
  registry-side, no rebuild) if our commit strictly descends.
- fetch-depth bumped to 1000 so merge-base has the history it needs.
- Release tag flow unchanged (unique tag, no race).

* docs(tool-gateway): rewrite as pitch-first marketing page (#20827)

Previous version read like internal API docs \u2014 leading with env var tables,
config YAML, and 'precedence' rules before ever explaining the product.
Complete rewrite inverts the structure so readers see value first,
mechanics last.

Structure now:
- Lede: 'One subscription. Every tool built in.' + pitch paragraph
- CTA: subscribe/manage button styled as a real call-to-action
- What's included: emoji-led table with expanded descriptions per tool.
  Image gen lists all 9 models by name (FLUX 2 Klein/Pro, Z-Image Turbo,
  Nano Banana Pro, GPT Image 1.5/2, Ideogram V3, Recraft V4 Pro, Qwen)
- Why it's here: value bullets \u2014 one bill, one signup, one key, same
  quality, bring-your-own anytime
- Get started: two-command flow (hermes model \u2192 hermes status)
- Eligibility: paid-tier note with upgrade link
- Mix and match: three realistic usage patterns
- Using individual image models: ID reference table for power users
- --- separator ---
- Configuration reference (demoted): use_gateway flag, disabling,
  self-hosted gateway env vars moved below the fold where they belong
- FAQ: streamlined, removed redundant content

Fact-checked against code:
- 9 FAL models confirmed from tools/image_generation_tool.py FAL_MODELS
- Status section output verified against hermes_cli/status.py
- Portal subscription URL preserved
- Self-hosted env vars (TOOL_GATEWAY_DOMAIN etc.) kept accurate

Verified: docusaurus build SUCCESS, page renders, no new broken links.

* fix(auth): fall back to global-root auth.json for providers missing in profile

Profile processes (kanban workers, cron subprocesses, delegated subagents)
read the profile's auth.json only. If a provider was authenticated at the
global root but not inside the profile, the profile's credential_pool
comes back empty and the process fails with 'No LLM provider configured'
— even though the credentials are sitting in ~/.hermes/auth.json. #18594
propagated HERMES_HOME correctly, which is what surfaced this: workers
now land in the right profile, and the profile turns out to shadow global
with no fallback.

Semantics (read-only, per-provider shadowing):
* Profile has any entries for provider X → use profile only (global ignored).
* Profile has zero entries for provider X → fall back to global.
* Writes (write_credential_pool, _save_auth_store) still target the profile.
* Classic mode (HERMES_HOME == global root) skips the fallback entirely —
  _global_auth_file_path() returns None.

Also mirrors the fallback in get_provider_auth_state so OAuth singletons
(nous, minimax-oauth, openai-codex, spotify) inherit cleanly — the Nous
shared-token store (PR #19712) remains the authoritative path for Nous
OAuth rotation, this just makes the read side consistent with it.

Seat belt: _load_global_auth_store() refuses to read the real user's
~/.hermes/auth.json under PYTEST_CURRENT_TEST even when HERMES_HOME points
to a profile-shaped path. Guard uses $HOME (stable across fixtures) rather
than Path.home() (which fixtures often monkeypatch to a tmp root).

Reported by @SeedsForbidden on Twitter as the credential_pool shadowing
follow-up to the #18594 fix.

* feat(gateway): per-platform gateway_restart_notification flag

Adds an opt-out toggle on PlatformConfig that gates both restart
lifecycle pings: the "♻ Gateway restarted" message sent to the chat
that issued /restart, and the "♻️ Gateway online" home-channel
startup notification. Defaults to True so existing deployments are
unaffected.

The motivating split is operator vs. end-user surfaces: a back-channel
like Telegram should keep these pings, while a Slack workspace shared
with end users should not surface gateway lifecycle noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gateway): also gate pre-restart "Gateway restarting" notification

Extend the gateway_restart_notification flag to cover
_notify_active_sessions_of_shutdown — the message that fires just
before drain ("⚠️ Gateway restarting — Your current task will be
interrupted. Send any message after restart and I'll try to resume
where you left off.") sent to active sessions and home channels.

Same operator/end-user reasoning: on a Slack workspace shared with
end users, "Gateway restarting" reads as "the bot is broken" — the
operator should be able to suppress it consistently with the other
two lifecycle pings rather than having a partial opt-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: add guillaumemeyer to AUTHOR_MAP

For cherry-picked commits in PR #20801.

* fix(cli): submit LF enter in thin PTYs (#20896)

* fix(tui): refresh virtual offsets after row resize (#20898)

* fix(tui): honor skin highlight colors (#20895)

* fix(tui): steady transcript scrollbar (#20917)

* fix(tui): steady transcript scrollbar

Keep the visible scrollbar tied to committed viewport position while virtual history can still prefetch against pending scroll targets, and preserve drag grab offset synchronously for native-feeling scrollbar drags.

* fix(tui): smooth precision wheel scroll

Replace the opt-scroll throttle with frame-sized coalescing so modifier wheel gestures stay line-precise without stepping.

* fix(tui): restore voice push-to-talk parity (#20897)

* fix(tui): restore classic CLI voice push-to-talk parity

(cherry picked from commit 93b9ae301bb89f5b5e01b4b9f8ac91ffa74fbd9d)

* fix(tui): harden voice push-to-talk stop flow

Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests.

* fix(tui): preserve silent voice strike tracking

Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking.

* fix(tui): clean up voice stop failure path

Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV.

* fix(tui): report busy voice capture starts

Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up.

* fix(tui): handle busy voice record responses

Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request.

* fix(tui): clear voice recording on null response

Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors.

* fix(tui): count silent manual voice stops

Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard.

---------

Co-authored-by: Montbra <montbra@gmail.com>

* fix(gateway): don't dead-end setup wizard when only system-scope unit is installed

The setup wizard dropped non-root users at a bare shell prompt when
trying to start a system-scope gateway service. Previously
_require_root_for_system_service called sys.exit(1), which the
wizard's `except Exception` guards cannot catch (SystemExit is a
BaseException). Users with a pre-existing /etc/systemd/system unit
(e.g. from an earlier `sudo hermes setup` run) hit this whenever
they re-ran `hermes setup` as a regular user.

- Convert _require_root_for_system_service to raise a typed
  SystemScopeRequiresRootError (RuntimeError subclass) instead of
  sys.exit(1). The direct CLI path (`hermes gateway install|start|stop|
  restart|uninstall` without sudo) still exits 1 cleanly via a new
  catch at the top of gateway_command, matching the existing
  UserSystemdUnavailableError pattern.
- Add _system_scope_wizard_would_need_root() pre-check and
  _print_system_scope_remediation() helper. Both setup wizards
  (hermes_cli/setup.py and hermes_cli/gateway.py::gateway_setup) now
  detect the dead-end before prompting and print actionable guidance:
  either `sudo systemctl start <service>` this time, or uninstall the
  system unit and install a per-user one.
- Defense-in-depth: all 5 wizard prompt sites also catch
  SystemScopeRequiresRootError and fall back to the remediation
  helper if the pre-check is bypassed (race, etc.).

Tests: 12 new tests in TestSystemScopeRequiresRootError,
TestSystemScopeWizardPreCheck, TestSystemScopeRemediationOutput, and
TestGatewayCommandCatchesSystemScopeError covering the exception
contract, pre-check matrix (root vs non-root, system-only vs
user-present vs none vs explicit system=True), remediation output
for each action, and the direct-CLI exit-1 path.

* fix(gateway): wait for systemd restart readiness

* fix(discord): narrow rate-limit catch and move sync state under gateway/

Two follow-ups on top of helix4u's slash-command sync hardening:

- Only suppress exceptions that are actually Discord 429 rate limits
  (discord.RateLimited, HTTPException with status 429, or a clearly
  rate-limit-named duck type). Arbitrary failures that happen to expose
  a retry_after attribute now re-raise to the outer handler instead of
  silently swallowing a cooldown.
- Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root
  stops collecting ad-hoc runtime files.

Added a test verifying unrelated exceptions don't get misclassified as
rate limits.

* docs(kanban): fix orchestrator skill setup instructions (#20958)

* docs(kanban): fix worker skill setup instructions too (#20960)

Follow-up to #20958. The worker skill section had the same stale
'hermes skills install devops/kanban-worker' command — kanban-worker
is also bundled, so that command fails with 'Could not fetch from any
source.'

Replace with bundled-skill verification + restore pattern, matching
the orchestrator section. Uses <your-worker-profile> placeholder since
assignees vary (researcher, writer, ops, linguist, reviewer, etc.)
rather than a single fixed 'worker' profile.

* feat(profiles): --no-skills flag for empty profile creation (#20986)

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.

* fix: route Telegram image documents through photo handling

* chore: AUTHOR_MAP entry for mrcoferland

* test(docker): align Dockerfile contract tests with simplified TUI flow

The Dockerfile dropped the manual `@hermes/ink` materialisation gymnastics
in favour of letting npm workspaces resolve the bundled package
naturally. Two contract tests still asserted the older flow:

`test_dockerfile_installs_tui_dependencies` required:
    'ui-tui/packages/hermes-ink/package-lock.json' in dockerfile_text

…but the lockfile is no longer COPIED individually \u2014 the entire
`ui-tui/packages/hermes-ink/` tree is COPIED instead (the workspace
reference from `ui-tui/package.json` is `file:` so npm needs the
real source, not just a manifest stub).

`test_dockerfile_materializes_local_tui_ink_package` required a 7-clause
conjunction matching specific `rm -rf` / `npm install --omit=dev`
`--prefix node_modules/@hermes/ink` / `rm -rf .../react` invocations
that were stripped out when the workspace resolution was simplified.

Update the assertions to pin the *contract* the image actually has to
carry rather than the *exact shell incantations* the old flow used:

* TUI deps install: ui-tui/package.json + ui-tui/package-lock.json +
  ui-tui/packages/hermes-ink/ tree are all COPIED, and an npm
  install/ci step runs in ui-tui.
* Bundled hermes-ink: the workspace package source is COPIED (so
  `await import('@hermes/ink')` resolves at runtime).

This keeps the spirit of #15012 / #16690 (zombie reaping + bundled
workspace materialisation must continue to work) without locking the
Dockerfile into one specific implementation flavour.

Validation:

    $ pytest tests/tools/test_dockerfile_pid1_reaping.py -q
    6 passed in 1.43s

No production code change. Fixes the two failures observed on `main`
(run 25250051126):

`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies`
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`

* test(update): patch isatty on real streams to fix xdist-flaky --yes tests

Two CI tests for the new `--yes` update flag (#18261) flaked under
`pytest-xdist` on Linux/Python 3.11 even though they passed every
local run on macOS/Python 3.14.4:

  FAILED tests/hermes_cli/test_update_yes_flag.py
    ::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty
      `AssertionError: assert <MagicMock 'input'>.called is False`
  FAILED tests/hermes_cli/test_update_yes_flag.py
    ::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting
      `AssertionError: assert <MagicMock '_restore_stashed_changes'>.called is False`

Captured stdout for the first failure shows `cmd_update` taking the
"Non-interactive session \u2014 skipping config migration prompt." branch
\u2014 i.e. the `sys.stdin.isatty() and sys.stdout.isatty()` check at
`hermes_cli/main.py:7118` evaluated to `False` despite the test doing:

    with patch("hermes_cli.main.sys") as mock_sys:
        mock_sys.stdin.isatty.return_value = True
        mock_sys.stdout.isatty.return_value = True

The whole-module mock is fragile under xdist worker reuse: a sibling
test that imports `hermes_cli.main` first can leave another `sys`
reference resolved inside the function (re-import in a helper, etc.),
and the wholesale module replacement never gets consulted.

Switch to `patch.object(_sys.stdin, "isatty", return_value=True)` (and
the same for `stdout`). That patches the *attribute on the real stream
object* \u2014 every call site, no matter how it reached `sys.stdin`,
hits the patched method. Same fix applied to the stash-restore test
(it took the "non-TTY \u2192 skip restore prompt" branch for the same reason).

Validation:

    $ pytest tests/hermes_cli/test_update_yes_flag.py -q
    3 passed in 5.47s

No production code change. Fixes the two failures observed on `main`
(run 25250051126):

`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty`
`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting`

Refs: #18261 (added the `--yes` flag + these tests).

* fix(web): force light color-scheme on docs iframe

The Documentation tab embeds the public Hermes Agent docs site via an
<iframe>. On any system where the browser's prefers-color-scheme
resolves to dark — the default on macOS with system dark mode, and
common on Linux/Windows too — the docs body text rendered nearly
invisible against its own background.

Cause: Docusaurus intentionally leaves <html> and <body> transparent
and relies on the browser's Canvas color to fill the viewport. Inside
our iframe, the iframe element had bg-background (the dashboard's dark
canvas) AND inherited the dashboard's dark color-scheme, so the
browser set the iframe's Canvas to a dark value. Docusaurus's
transparent body exposed that dark Canvas, and the docs body text
(tuned for a light Canvas) became near-illegible. Affects every
built-in dashboard theme.

Fix: replace bg-background on the iframe with [color-scheme:light]
(spec-blessed cross-origin override of the inherited color-scheme;
forces the iframe's Canvas to light) and bg-white (belt-and-suspenders
fallback during the brief paint window before content loads). The
docs site's own theme toggle keeps working — Docusaurus stores its
choice in localStorage and applies opaque dark backgrounds to its
layout elements that cover the white Canvas we forced.

* fix(security): close TOCTOU window when saving MCP OAuth credentials

_write_json (the persistence helper used by HermesTokenStorage for both
tokens and client_info) created the temp file via Path.write_text and
only chmod'd it to 0o600 afterward. Between create and chmod the file
existed on disk at the process umask (commonly 0o644 = world-readable),
briefly exposing MCP OAuth access/refresh tokens to other local users.

Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the file is created atomically at 0o600, plus tighten the parent
dir to 0o700 so siblings can't traverse to the creds file. The temp name
also gains a per-process random suffix to avoid collisions between
concurrent writers and stale leftovers from a crashed prior write.

Mirrors the fix shipped for agent/google_oauth.py in #19673.

Adds a regression test asserting the resulting file mode is 0o600 and
the parent directory is 0o700 (skipped on Windows where POSIX mode bits
aren't enforced).

* chore(release): add Gutslabs to AUTHOR_MAP for PR #21148 salvage

* test(update): teach restart-mocks about the post-update survivor sweep

Issue #17648 added a post-update SIGTERM-survivor sweep to `cmd_update`:
~3s after issuing graceful/SIGTERM restarts, the code re-queries
`find_gateway_pids` and SIGKILLs anything still alive. That's the
right fix for stuck-drain gateways in production, but it broke three
unit tests that assumed `find_gateway_pids` would keep returning the
same PIDs forever:

  FAILED ::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways
    AssertionError: Expected 'kill' to not have been called. Called 1 times.
    Calls: [call(12345, <Signals.SIGKILL: 9>)].

  FAILED ::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm
    AssertionError: Expected 'kill' to have been called once. Called 2 times.
    Calls: [call(12345, SIGTERM), call(12345, SIGKILL)].

  FAILED ::TestServicePidExclusion::test_update_kills_manual_pid_but_not_service_pid
    assert 2 == 1
      manual_kills = [call(42999, SIGTERM), call(42999, SIGKILL)]

In each test `os.kill` is mocked, so the simulated PID never actually
exits \u2014 the sweep finds it again and escalates. The production code
is correct; the tests just need to model OS behaviour properly.

Two-test fix (profile-manual restart cases): use
`side_effect=[[12345], []]` so the first `find_gateway_pids` call
returns the live PID and the second (the sweep) returns nothing, as if
the OS had reaped the process.

Service-PID-exclusion fix: track which PIDs got killed in a closure
set, and exclude them on subsequent `fake_find` calls. `os.kill`
gets a `side_effect` that records the kill instead of swallowing it
silently. Now the sweep doesn't re-find the manual PID, no SIGKILL
escalation, `manual_kills == 1`.

Validation:

    $ pytest tests/hermes_cli/test_update_gateway_restart.py -q
    43 passed in 4.13s

No production code change. Fixes the three failures observed on `main`
(run 25250051126):

  test_update_restarts_profile_manual_gateways
  test_update_profile_manual_gateway_falls_back_to_sigterm
  test_update_kills_manual_pid_but_not_service_pid

Refs: #17648 (post-update survivor sweep that the tests didn't model).

* fix(image-routing): expose attached image paths in native multimodal text part

In native image mode (vision-capable models like gpt-4o, claude-sonnet-4),
build_native_content_parts() previously emitted only the user's caption
plus image_url parts. The local file path of each attached image never
appeared in the conversation text, so the model could see the pixels but
had no string handle for tools that take image_url: str (custom MCP
tools, vision_analyze on a re-look, attach-to-tracker workflows).

The text-mode path already injects an equivalent hint via
Runner._enrich_message_with_vision ("...vision_analyze using image_url:
<path>..."). This brings native mode to parity by appending one
"[Image attached at: <path>]" line per successfully attached image to
the user-text part of the multimodal turn. Skipped (unreadable) paths
are NOT advertised, so the model is never told a non-existent file is
attached.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(optional-skills): port Anthropic financial-services skills as optional finance bundle (#21180)

Adds 7 optional skills under optional-skills/finance/ adapted from
anthropics/financial-services (Apache-2.0):

  excel-author        — openpyxl conventions: blue/black/green cells,
                        formulas over hardcodes, named ranges, balance
                        checks, sensitivity tables. Ships recalc.py.
  pptx-author         — python-pptx for model-backed decks (pitch,
                        IC memo, earnings note) that bind every number
                        to a source workbook cell.
  dcf-model           — institutional DCF (49KB skill): projections,
                        WACC, terminal value, Bear/Base/Bull scenarios,
                        5x5 sensitivity tables. Ships validate_dcf.py.
  comps-analysis      — comparable company analysis: operating metrics,
                        multiples, statistical benchmarking.
  lbo-model           — leveraged buyout: S&U, debt schedule, cash
                        sweep, exit multiple, IRR/MOIC sensitivity.
  3-statement-model   — fully-integrated IS/BS/CF with balance-check
                        plugs. Ships references/ for formatting,
                        formulas, SEC filings.
  merger-model        — accretion/dilution analysis for M&A.

All seven are optional (not active by default). Users install via
'hermes skills install official/finance/<skill>'.

Hermesification:
- Stripped every Office JS / Office Add-in / mcp__office__*
  branch — skills assume headless openpyxl only.
- Replaced Cowork MCP data-source instructions with 'MCP first (via
  native-mcp), fall back to web_search/web_extract against SEC EDGAR
  and user-provided data'.
- Swapped Claude tool references (Bash, Read, Write, Edit, mcp__*)
  for Hermes-native equivalents and Python library calls.
- Canonical Hermes frontmatter (name/description/version/author/
  license/metadata.hermes.{tags,related_skills}).
- Descriptions tightened to 187-238 chars, trigger-first.
- Attribution preserved: author field credits 'Anthropic (adapted by
  Nous Research)', license: Apache-2.0, each SKILL.md links back to
  the upstream source directory.

Verification:
- All 7 discovered by OptionalSkillSource with source_id='official'
- Bundle fetch includes support files (scripts, references, troubleshooting)
- related_skills cross-refs all resolve within the bundle
- No Claude product / Cowork / Office JS / /mnt/skills leakage
  remains in body text (bounded mentions only in attribution blocks)

Source: https://github.com/anthropics/financial-services (Apache-2.0)

* test(skills): cover additional rescan paths in skill_commands cache (#14536)

The rescan-on-platform-change fix landed in #18739 ships one regression
test that exercises the HERMES_PLATFORM env-var path. Three other code
paths in get_skill_commands / _resolve_skill_commands_platform have no
direct coverage; this commit adds a regression test for each.

- Gateway session context (HERMES_SESSION_PLATFORM via ContextVar): the
  resolver consults get_session_env after HERMES_PLATFORM, and the
  gateway sets that variable through set_session_vars (a ContextVar),
  not os.environ. The test uses set_session_vars / clear_session_vars
  to drive the actual gateway signal, and the disabled-skill stub reads
  the same value via get_session_env. A regression that swapped
  get_session_env for plain os.getenv would still pass an env-var-based
  test but break concurrent gateway sessions, which is the bug the
  ContextVar plumbing exists to prevent.
- Returning to no-platform-scope (CLI / cron / RL rollouts after a
  gateway session): the cached telegram view must be dropped and the
  unfiltered scan repopulated when HERMES_PLATFORM is unset again.
- Same-platform cache hit: consecutive calls under the same platform
  scope must NOT rescan. The rescan trigger is change in scope, not
  "always re-resolve" — a gateway serving many consecutive telegram
  requests should pay the scan cost once, not per request.

The third test wraps scan_skill_commands with a spy after the cache is
primed, so the assertion is on call_count == 0 across three subsequent
get_skill_commands() calls.

All 39 tests in tests/agent/test_skill_commands.py pass under
scripts/run_tests.sh.

* fix(gateway): translate inbound document host paths to container paths for Docker backend

When terminal.backend is docker, inbound documents uploaded via messaging
platforms (Telegram, Slack, Discord, Feishu, Email, etc.) are cached at a host
path under ~/.hermes/cache/documents, but the container sandbox only sees them
at the auto-mounted /root/.hermes/cache/documents path.

This PR adds to_agent_visible_cache_path() in tools/credential_files.py (the
natural sibling to get_cache_directory_mounts()) and calls it at the
document-context-injection site in gateway/run.py so the agent always receives
a path it can open directly, matching the mount layout already established
by get_cache_directory_mounts() (#4846).

Scope: only Docker backend for now; other backends use different mount
semantics and are left unchanged until verified.

Fixes #18787

* feat(gateway): opt-in cleanup of temporary progress bubbles (#21186)

When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Tel…
bot-ted added a commit to bot-ted/hermes-agent that referenced this pull request May 8, 2026
* fix(kanban): avoid fragile failure-column renames

* chore: follow-up cleanup for Kanban migration fix

- Expand migration comment to name the primary failure mode (missing
  column OperationalError from #20842) ahead of the secondary SQLite
  schema-reparse concern; also document the stale-cols-snapshot invariant
- Add clarifying comments on from_row() legacy fallback branches noting
  they are belt-and-suspenders dead code post-migration
- Add task_events comment in existing test explaining why the table is
  required by the migrator
- Add test_legacy_migration_no_legacy_columns_at_all: Scenario A —
  explicitly asserts the exact #20842 crash no longer occurs and that
  consecutive_failures defaults to 0 on a DB that never had spawn_failures
- Add test_legacy_migration_both_columns_already_present: Scenario D —
  asserts the migration is a no-op when both columns already exist,
  preserving the existing counter value

* fix(tui): bound virtual history offset searches

* ci(docker): don't cancel overlapping builds, guard :latest

Switch top-level concurrency to cancel-in-progress=false so every push
to main gets its own SHA-tagged image published — no more discarded
builds when commits land back-to-back.

Guard the :latest tag with a second job that has its own concurrency
group with cancel-in-progress=true plus a git-ancestor check against
the revision label on the current :latest. Together these guarantee
:latest only ever moves forward in history: a slower run whose commit
isn't a descendant of the current :latest refuses to clobber it, and
a newer push mid-way through the move-latest job preempts the older
one before it can retag.

- Every main push publishes nousresearch/hermes-agent:sha-<commit>
  with an org.opencontainers.image.revision label embedded.
- move-latest job reads that label off :latest, runs merge-base
  --is-ancestor, and only retags (via buildx imagetools create,
  registry-side, no rebuild) if our commit strictly descends.
- fetch-depth bumped to 1000 so merge-base has the history it needs.
- Release tag flow unchanged (unique tag, no race).

* docs(tool-gateway): rewrite as pitch-first marketing page (#20827)

Previous version read like internal API docs \u2014 leading with env var tables,
config YAML, and 'precedence' rules before ever explaining the product.
Complete rewrite inverts the structure so readers see value first,
mechanics last.

Structure now:
- Lede: 'One subscription. Every tool built in.' + pitch paragraph
- CTA: subscribe/manage button styled as a real call-to-action
- What's included: emoji-led table with expanded descriptions per tool.
  Image gen lists all 9 models by name (FLUX 2 Klein/Pro, Z-Image Turbo,
  Nano Banana Pro, GPT Image 1.5/2, Ideogram V3, Recraft V4 Pro, Qwen)
- Why it's here: value bullets \u2014 one bill, one signup, one key, same
  quality, bring-your-own anytime
- Get started: two-command flow (hermes model \u2192 hermes status)
- Eligibility: paid-tier note with upgrade link
- Mix and match: three realistic usage patterns
- Using individual image models: ID reference table for power users
- --- separator ---
- Configuration reference (demoted): use_gateway flag, disabling,
  self-hosted gateway env vars moved below the fold where they belong
- FAQ: streamlined, removed redundant content

Fact-checked against code:
- 9 FAL models confirmed from tools/image_generation_tool.py FAL_MODELS
- Status section output verified against hermes_cli/status.py
- Portal subscription URL preserved
- Self-hosted env vars (TOOL_GATEWAY_DOMAIN etc.) kept accurate

Verified: docusaurus build SUCCESS, page renders, no new broken links.

* fix(auth): fall back to global-root auth.json for providers missing in profile

Profile processes (kanban workers, cron subprocesses, delegated subagents)
read the profile's auth.json only. If a provider was authenticated at the
global root but not inside the profile, the profile's credential_pool
comes back empty and the process fails with 'No LLM provider configured'
— even though the credentials are sitting in ~/.hermes/auth.json. #18594
propagated HERMES_HOME correctly, which is what surfaced this: workers
now land in the right profile, and the profile turns out to shadow global
with no fallback.

Semantics (read-only, per-provider shadowing):
* Profile has any entries for provider X → use profile only (global ignored).
* Profile has zero entries for provider X → fall back to global.
* Writes (write_credential_pool, _save_auth_store) still target the profile.
* Classic mode (HERMES_HOME == global root) skips the fallback entirely —
  _global_auth_file_path() returns None.

Also mirrors the fallback in get_provider_auth_state so OAuth singletons
(nous, minimax-oauth, openai-codex, spotify) inherit cleanly — the Nous
shared-token store (PR #19712) remains the authoritative path for Nous
OAuth rotation, this just makes the read side consistent with it.

Seat belt: _load_global_auth_store() refuses to read the real user's
~/.hermes/auth.json under PYTEST_CURRENT_TEST even when HERMES_HOME points
to a profile-shaped path. Guard uses $HOME (stable across fixtures) rather
than Path.home() (which fixtures often monkeypatch to a tmp root).

Reported by @SeedsForbidden on Twitter as the credential_pool shadowing
follow-up to the #18594 fix.

* feat(gateway): per-platform gateway_restart_notification flag

Adds an opt-out toggle on PlatformConfig that gates both restart
lifecycle pings: the "♻ Gateway restarted" message sent to the chat
that issued /restart, and the "♻️ Gateway online" home-channel
startup notification. Defaults to True so existing deployments are
unaffected.

The motivating split is operator vs. end-user surfaces: a back-channel
like Telegram should keep these pings, while a Slack workspace shared
with end users should not surface gateway lifecycle noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gateway): also gate pre-restart "Gateway restarting" notification

Extend the gateway_restart_notification flag to cover
_notify_active_sessions_of_shutdown — the message that fires just
before drain ("⚠️ Gateway restarting — Your current task will be
interrupted. Send any message after restart and I'll try to resume
where you left off.") sent to active sessions and home channels.

Same operator/end-user reasoning: on a Slack workspace shared with
end users, "Gateway restarting" reads as "the bot is broken" — the
operator should be able to suppress it consistently with the other
two lifecycle pings rather than having a partial opt-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: add guillaumemeyer to AUTHOR_MAP

For cherry-picked commits in PR #20801.

* fix(cli): submit LF enter in thin PTYs (#20896)

* fix(tui): refresh virtual offsets after row resize (#20898)

* fix(tui): honor skin highlight colors (#20895)

* fix(tui): steady transcript scrollbar (#20917)

* fix(tui): steady transcript scrollbar

Keep the visible scrollbar tied to committed viewport position while virtual history can still prefetch against pending scroll targets, and preserve drag grab offset synchronously for native-feeling scrollbar drags.

* fix(tui): smooth precision wheel scroll

Replace the opt-scroll throttle with frame-sized coalescing so modifier wheel gestures stay line-precise without stepping.

* fix(tui): restore voice push-to-talk parity (#20897)

* fix(tui): restore classic CLI voice push-to-talk parity

(cherry picked from commit 93b9ae301bb89f5b5e01b4b9f8ac91ffa74fbd9d)

* fix(tui): harden voice push-to-talk stop flow

Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests.

* fix(tui): preserve silent voice strike tracking

Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking.

* fix(tui): clean up voice stop failure path

Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV.

* fix(tui): report busy voice capture starts

Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up.

* fix(tui): handle busy voice record responses

Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request.

* fix(tui): clear voice recording on null response

Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors.

* fix(tui): count silent manual voice stops

Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard.

---------

Co-authored-by: Montbra <montbra@gmail.com>

* fix(gateway): don't dead-end setup wizard when only system-scope unit is installed

The setup wizard dropped non-root users at a bare shell prompt when
trying to start a system-scope gateway service. Previously
_require_root_for_system_service called sys.exit(1), which the
wizard's `except Exception` guards cannot catch (SystemExit is a
BaseException). Users with a pre-existing /etc/systemd/system unit
(e.g. from an earlier `sudo hermes setup` run) hit this whenever
they re-ran `hermes setup` as a regular user.

- Convert _require_root_for_system_service to raise a typed
  SystemScopeRequiresRootError (RuntimeError subclass) instead of
  sys.exit(1). The direct CLI path (`hermes gateway install|start|stop|
  restart|uninstall` without sudo) still exits 1 cleanly via a new
  catch at the top of gateway_command, matching the existing
  UserSystemdUnavailableError pattern.
- Add _system_scope_wizard_would_need_root() pre-check and
  _print_system_scope_remediation() helper. Both setup wizards
  (hermes_cli/setup.py and hermes_cli/gateway.py::gateway_setup) now
  detect the dead-end before prompting and print actionable guidance:
  either `sudo systemctl start <service>` this time, or uninstall the
  system unit and install a per-user one.
- Defense-in-depth: all 5 wizard prompt sites also catch
  SystemScopeRequiresRootError and fall back to the remediation
  helper if the pre-check is bypassed (race, etc.).

Tests: 12 new tests in TestSystemScopeRequiresRootError,
TestSystemScopeWizardPreCheck, TestSystemScopeRemediationOutput, and
TestGatewayCommandCatchesSystemScopeError covering the exception
contract, pre-check matrix (root vs non-root, system-only vs
user-present vs none vs explicit system=True), remediation output
for each action, and the direct-CLI exit-1 path.

* fix(tui): preserve session when switching personality

Previously, /personality in the TUI called _reset_session_agent() which
destroyed the agent, cleared conversation history, and effectively started
a new session. This made personality switching disruptive — users lost
their entire conversation context.

Now /personality updates the agent's ephemeral_system_prompt in-place and
injects a pivot marker into the conversation history. The marker tells
the model to adopt the new persona from that point forward, which is
necessary because LLMs tend to pattern-match their prior responses and
continue the established tone without an explicit signal.

Changes:
- tui_gateway/server.py: Rewrite _apply_personality_to_session to update
  the agent in-place instead of resetting. Inject a user-role pivot
  marker so the model actually switches style mid-conversation.
- ui-tui/src/app/slash/commands/session.ts: Update help text (no longer
  mentions history reset).
- tests/test_tui_gateway_server.py: Update test to verify history is
  preserved, pivot marker is injected, and ephemeral prompt is set.

* fix(gateway): wait for systemd restart readiness

* fix(discord): narrow rate-limit catch and move sync state under gateway/

Two follow-ups on top of helix4u's slash-command sync hardening:

- Only suppress exceptions that are actually Discord 429 rate limits
  (discord.RateLimited, HTTPException with status 429, or a clearly
  rate-limit-named duck type). Arbitrary failures that happen to expose
  a retry_after attribute now re-raise to the outer handler instead of
  silently swallowing a cooldown.
- Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root
  stops collecting ad-hoc runtime files.

Added a test verifying unrelated exceptions don't get misclassified as
rate limits.

* docs(kanban): fix orchestrator skill setup instructions (#20958)

* docs(kanban): fix worker skill setup instructions too (#20960)

Follow-up to #20958. The worker skill section had the same stale
'hermes skills install devops/kanban-worker' command — kanban-worker
is also bundled, so that command fails with 'Could not fetch from any
source.'

Replace with bundled-skill verification + restore pattern, matching
the orchestrator section. Uses <your-worker-profile> placeholder since
assignees vary (researcher, writer, ops, linguist, reviewer, etc.)
rather than a single fixed 'worker' profile.

* feat(profiles): --no-skills flag for empty profile creation (#20986)

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.

* fix: route Telegram image documents through photo handling

* chore: AUTHOR_MAP entry for mrcoferland

* test(docker): align Dockerfile contract tests with simplified TUI flow

The Dockerfile dropped the manual `@hermes/ink` materialisation gymnastics
in favour of letting npm workspaces resolve the bundled package
naturally. Two contract tests still asserted the older flow:

`test_dockerfile_installs_tui_dependencies` required:
    'ui-tui/packages/hermes-ink/package-lock.json' in dockerfile_text

…but the lockfile is no longer COPIED individually \u2014 the entire
`ui-tui/packages/hermes-ink/` tree is COPIED instead (the workspace
reference from `ui-tui/package.json` is `file:` so npm needs the
real source, not just a manifest stub).

`test_dockerfile_materializes_local_tui_ink_package` required a 7-clause
conjunction matching specific `rm -rf` / `npm install --omit=dev`
`--prefix node_modules/@hermes/ink` / `rm -rf .../react` invocations
that were stripped out when the workspace resolution was simplified.

Update the assertions to pin the *contract* the image actually has to
carry rather than the *exact shell incantations* the old flow used:

* TUI deps install: ui-tui/package.json + ui-tui/package-lock.json +
  ui-tui/packages/hermes-ink/ tree are all COPIED, and an npm
  install/ci step runs in ui-tui.
* Bundled hermes-ink: the workspace package source is COPIED (so
  `await import('@hermes/ink')` resolves at runtime).

This keeps the spirit of #15012 / #16690 (zombie reaping + bundled
workspace materialisation must continue to work) without locking the
Dockerfile into one specific implementation flavour.

Validation:

    $ pytest tests/tools/test_dockerfile_pid1_reaping.py -q
    6 passed in 1.43s

No production code change. Fixes the two failures observed on `main`
(run 25250051126):

`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies`
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`

* test(update): patch isatty on real streams to fix xdist-flaky --yes tests

Two CI tests for the new `--yes` update flag (#18261) flaked under
`pytest-xdist` on Linux/Python 3.11 even though they passed every
local run on macOS/Python 3.14.4:

  FAILED tests/hermes_cli/test_update_yes_flag.py
    ::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty
      `AssertionError: assert <MagicMock 'input'>.called is False`
  FAILED tests/hermes_cli/test_update_yes_flag.py
    ::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting
      `AssertionError: assert <MagicMock '_restore_stashed_changes'>.called is False`

Captured stdout for the first failure shows `cmd_update` taking the
"Non-interactive session \u2014 skipping config migration prompt." branch
\u2014 i.e. the `sys.stdin.isatty() and sys.stdout.isatty()` check at
`hermes_cli/main.py:7118` evaluated to `False` despite the test doing:

    with patch("hermes_cli.main.sys") as mock_sys:
        mock_sys.stdin.isatty.return_value = True
        mock_sys.stdout.isatty.return_value = True

The whole-module mock is fragile under xdist worker reuse: a sibling
test that imports `hermes_cli.main` first can leave another `sys`
reference resolved inside the function (re-import in a helper, etc.),
and the wholesale module replacement never gets consulted.

Switch to `patch.object(_sys.stdin, "isatty", return_value=True)` (and
the same for `stdout`). That patches the *attribute on the real stream
object* \u2014 every call site, no matter how it reached `sys.stdin`,
hits the patched method. Same fix applied to the stash-restore test
(it took the "non-TTY \u2192 skip restore prompt" branch for the same reason).

Validation:

    $ pytest tests/hermes_cli/test_update_yes_flag.py -q
    3 passed in 5.47s

No production code change. Fixes the two failures observed on `main`
(run 25250051126):

`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty`
`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting`

Refs: #18261 (added the `--yes` flag + these tests).

* fix(web): force light color-scheme on docs iframe

The Documentation tab embeds the public Hermes Agent docs site via an
<iframe>. On any system where the browser's prefers-color-scheme
resolves to dark — the default on macOS with system dark mode, and
common on Linux/Windows too — the docs body text rendered nearly
invisible against its own background.

Cause: Docusaurus intentionally leaves <html> and <body> transparent
and relies on the browser's Canvas color to fill the viewport. Inside
our iframe, the iframe element had bg-background (the dashboard's dark
canvas) AND inherited the dashboard's dark color-scheme, so the
browser set the iframe's Canvas to a dark value. Docusaurus's
transparent body exposed that dark Canvas, and the docs body text
(tuned for a light Canvas) became near-illegible. Affects every
built-in dashboard theme.

Fix: replace bg-background on the iframe with [color-scheme:light]
(spec-blessed cross-origin override of the inherited color-scheme;
forces the iframe's Canvas to light) and bg-white (belt-and-suspenders
fallback during the brief paint window before content loads). The
docs site's own theme toggle keeps working — Docusaurus stores its
choice in localStorage and applies opaque dark backgrounds to its
layout elements that cover the white Canvas we forced.

* fix(security): close TOCTOU window when saving MCP OAuth credentials

_write_json (the persistence helper used by HermesTokenStorage for both
tokens and client_info) created the temp file via Path.write_text and
only chmod'd it to 0o600 afterward. Between create and chmod the file
existed on disk at the process umask (commonly 0o644 = world-readable),
briefly exposing MCP OAuth access/refresh tokens to other local users.

Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the file is created atomically at 0o600, plus tighten the parent
dir to 0o700 so siblings can't traverse to the creds file. The temp name
also gains a per-process random suffix to avoid collisions between
concurrent writers and stale leftovers from a crashed prior write.

Mirrors the fix shipped for agent/google_oauth.py in #19673.

Adds a regression test asserting the resulting file mode is 0o600 and
the parent directory is 0o700 (skipped on Windows where POSIX mode bits
aren't enforced).

* chore(release): add Gutslabs to AUTHOR_MAP for PR #21148 salvage

* test(update): teach restart-mocks about the post-update survivor sweep

Issue #17648 added a post-update SIGTERM-survivor sweep to `cmd_update`:
~3s after issuing graceful/SIGTERM restarts, the code re-queries
`find_gateway_pids` and SIGKILLs anything still alive. That's the
right fix for stuck-drain gateways in production, but it broke three
unit tests that assumed `find_gateway_pids` would keep returning the
same PIDs forever:

  FAILED ::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways
    AssertionError: Expected 'kill' to not have been called. Called 1 times.
    Calls: [call(12345, <Signals.SIGKILL: 9>)].

  FAILED ::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm
    AssertionError: Expected 'kill' to have been called once. Called 2 times.
    Calls: [call(12345, SIGTERM), call(12345, SIGKILL)].

  FAILED ::TestServicePidExclusion::test_update_kills_manual_pid_but_not_service_pid
    assert 2 == 1
      manual_kills = [call(42999, SIGTERM), call(42999, SIGKILL)]

In each test `os.kill` is mocked, so the simulated PID never actually
exits \u2014 the sweep finds it again and escalates. The production code
is correct; the tests just need to model OS behaviour properly.

Two-test fix (profile-manual restart cases): use
`side_effect=[[12345], []]` so the first `find_gateway_pids` call
returns the live PID and the second (the sweep) returns nothing, as if
the OS had reaped the process.

Service-PID-exclusion fix: track which PIDs got killed in a closure
set, and exclude them on subsequent `fake_find` calls. `os.kill`
gets a `side_effect` that records the kill instead of swallowing it
silently. Now the sweep doesn't re-find the manual PID, no SIGKILL
escalation, `manual_kills == 1`.

Validation:

    $ pytest tests/hermes_cli/test_update_gateway_restart.py -q
    43 passed in 4.13s

No production code change. Fixes the three failures observed on `main`
(run 25250051126):

  test_update_restarts_profile_manual_gateways
  test_update_profile_manual_gateway_falls_back_to_sigterm
  test_update_kills_manual_pid_but_not_service_pid

Refs: #17648 (post-update survivor sweep that the tests didn't model).

* fix(image-routing): expose attached image paths in native multimodal text part

In native image mode (vision-capable models like gpt-4o, claude-sonnet-4),
build_native_content_parts() previously emitted only the user's caption
plus image_url parts. The local file path of each attached image never
appeared in the conversation text, so the model could see the pixels but
had no string handle for tools that take image_url: str (custom MCP
tools, vision_analyze on a re-look, attach-to-tracker workflows).

The text-mode path already injects an equivalent hint via
Runner._enrich_message_with_vision ("...vision_analyze using image_url:
<path>..."). This brings native mode to parity by appending one
"[Image attached at: <path>]" line per successfully attached image to
the user-text part of the multimodal turn. Skipped (unreadable) paths
are NOT advertised, so the model is never told a non-existent file is
attached.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(optional-skills): port Anthropic financial-services skills as optional finance bundle (#21180)

Adds 7 optional skills under optional-skills/finance/ adapted from
anthropics/financial-services (Apache-2.0):

  excel-author        — openpyxl conventions: blue/black/green cells,
                        formulas over hardcodes, named ranges, balance
                        checks, sensitivity tables. Ships recalc.py.
  pptx-author         — python-pptx for model-backed decks (pitch,
                        IC memo, earnings note) that bind every number
                        to a source workbook cell.
  dcf-model           — institutional DCF (49KB skill): projections,
                        WACC, terminal value, Bear/Base/Bull scenarios,
                        5x5 sensitivity tables. Ships validate_dcf.py.
  comps-analysis      — comparable company analysis: operating metrics,
                        multiples, statistical benchmarking.
  lbo-model           — leveraged buyout: S&U, debt schedule, cash
                        sweep, exit multiple, IRR/MOIC sensitivity.
  3-statement-model   — fully-integrated IS/BS/CF with balance-check
                        plugs. Ships references/ for formatting,
                        formulas, SEC filings.
  merger-model        — accretion/dilution analysis for M&A.

All seven are optional (not active by default). Users install via
'hermes skills install official/finance/<skill>'.

Hermesification:
- Stripped every Office JS / Office Add-in / mcp__office__*
  branch — skills assume headless openpyxl only.
- Replaced Cowork MCP data-source instructions with 'MCP first (via
  native-mcp), fall back to web_search/web_extract against SEC EDGAR
  and user-provided data'.
- Swapped Claude tool references (Bash, Read, Write, Edit, mcp__*)
  for Hermes-native equivalents and Python library calls.
- Canonical Hermes frontmatter (name/description/version/author/
  license/metadata.hermes.{tags,related_skills}).
- Descriptions tightened to 187-238 chars, trigger-first.
- Attribution preserved: author field credits 'Anthropic (adapted by
  Nous Research)', license: Apache-2.0, each SKILL.md links back to
  the upstream source directory.

Verification:
- All 7 discovered by OptionalSkillSource with source_id='official'
- Bundle fetch includes support files (scripts, references, troubleshooting)
- related_skills cross-refs all resolve within the bundle
- No Claude product / Cowork / Office JS / /mnt/skills leakage
  remains in body text (bounded mentions only in attribution blocks)

Source: https://github.com/anthropics/financial-services (Apache-2.0)

* test(skills): cover additional rescan paths in skill_commands cache (#14536)

The rescan-on-platform-change fix landed in #18739 ships one regression
test that exercises the HERMES_PLATFORM env-var path. Three other code
paths in get_skill_commands / _resolve_skill_commands_platform have no
direct coverage; this commit adds a regression test for each.

- Gateway session context (HERMES_SESSION_PLATFORM via ContextVar): the
  resolver consults get_session_env after HERMES_PLATFORM, and the
  gateway sets that variable through set_session_vars (a ContextVar),
  not os.environ. The test uses set_session_vars / clear_session_vars
  to drive the actual gateway signal, and the disabled-skill stub reads
  the same value via get_session_env. A regression that swapped
  get_session_env for plain os.getenv would still pass an env-var-based
  test but break concurrent gateway sessions, which is the bug the
  ContextVar plumbing exists to prevent.
- Returning to no-platform-scope (CLI / cron / RL rollouts after a
  gateway session): the cached telegram view must be dropped and the
  unfiltered scan repopulated when HERMES_PLATFORM is unset again.
- Same-platform cache hit: consecutive calls under the same platform
  scope must NOT rescan. The rescan trigger is change in scope, not
  "always re-resolve" — a gateway serving many consecutive telegram
  requests should pay the scan cost once, not per request.

The third test wraps scan_skill_commands with a spy after the cache is
primed, so the assertion is on call_count == 0 across three subsequent
get_skill_commands() calls.

All 39 tests in tests/agent/test_skill_commands.py pass under
scripts/run_tests.sh.

* fix(gateway): translate inbound document host paths to container paths for Docker backend

When terminal.backend is docker, inbound documents uploaded via messaging
platforms (Telegram, Slack, Discord, Feishu, Email, etc.) are cached at a host
path under ~/.hermes/cache/documents, but the container sandbox only sees them
at the auto-mounted /root/.hermes/cache/documents path.

This PR adds to_agent_visible_cache_path() in tools/credential_files.py (the
natural sibling to get_cache_directory_mounts()) and calls it at the
document-context-injection site in gateway/run.py so the agent always receives
a path it can open directly, matching the mount layout already established
by get_cache_directory_mounts() (#4846).

Scope: only Docker backend for now; other backends use different mount
semantics and are left unchanged until verified.

Fixes #18787

* feat(gateway): opt-in cleanup of temporary progress bubbles (#21186)

When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Telegram); silently no-ops elsewhere. Off by default.
Failed runs skip cleanup so bubbles stay as breadcrumbs.

Minimal plumbing: base.py's existing post_delivery_callback slot now chains
new registrations onto any existing callback (with per-callback exception
isolation) rather than clobbering. Stale-generation registrations are
rejected so they can't step on a fresher run's callbacks. This lets the
cleanup callback coexist with the background-review release hook already
registered on the same slot.

Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>

* fix(kanban): heartbeat tool extends claim TTL, not just last_heartbeat_at

The kanban_heartbeat tool called heartbeat_worker but never
heartbeat_claim, so a worker that loops the tool while a single tool
call blocks the agent for >DEFAULT_CLAIM_TTL_SECONDS still got
reclaimed by release_stale_claims. The function name and
heartbeat_claim's own docstring imply otherwise:

  "Workers that know they'll exceed 15 minutes should call this
   every few minutes to keep ownership."

But there was no caller in the worker tool path. Workers couldn't
invoke heartbeat_claim themselves either — it isn't exposed as a tool.

Fix: _handle_heartbeat now calls heartbeat_claim first, reading
HERMES_KANBAN_CLAIM_LOCK from the worker env (the dispatcher pins
this in _default_spawn). Falls back to _claimer_id() for locally-
driven workers that didn't go through dispatcher spawn.

Test: tests/tools/test_kanban_tools.py::test_heartbeat_extends_claim_expires
rewinds claim_expires into the past, calls the tool, and asserts the
new value is at least now + DEFAULT_CLAIM_TTL_SECONDS // 2. Verified to
fail against the unfixed code (claim_expires stays at the rewound
value).

Closes the root cause underlying the symptom in #21141 (15-min
respawns of long-running workers). #21141 separately addresses
post-reclaim cleanup; this fixes the upstream "shouldn't have been
reclaimed in the first place" half.

* chore(release): map stephen0110 noreply email

* fix(kanban): stop reclaimed workers before retry

* fix(kanban): reap completed worker children in dispatch_once

The gateway-embedded dispatcher (default since `kanban.dispatch_in_gateway
= true`) is the parent of every spawned kanban worker. `_default_spawn`
calls `subprocess.Popen(..., start_new_session=True)` and returns the
pid — `start_new_session` detaches the controlling tty but does not
reparent to init, so the gateway keeps each worker as a child until it
`wait()`s for them.

Nothing in the dispatch loop ever calls `waitpid`. Result: every
completed worker becomes a `<defunct>` zombie that lingers until the
gateway exits. We hit ~430 zombies on a single hermes-agent container
after ~40 days of steady kanban traffic, approaching process-table
exhaustion on the host.

Fix: add a non-blocking reap loop at the top of `dispatch_once`, so
every dispatcher tick (default 60s) drains zombies that accumulated
since the last tick. WNOHANG keeps the call non-blocking; ChildProcessError
means no children to reap.

Why here, not a SIGCHLD handler:
- signal.signal requires the main thread; gateway threading model makes
  that placement non-trivial.
- Bounded staleness: at default interval=60s the maximum live zombie
  count is one tick's worth of worker completions.
- No interaction with detect_crashed_workers: that function only inspects
  rows where status='running', and rows reach 'done' (and stop being
  inspected) before their workers exit.

* chore(release): map sonic-netizen noreply email

* fix: auto-block repeated kanban retries

* chore(release): map mwnickerson noreply email

* feat(gateway): auto-resume interrupted sessions after restart

* fix(gateway): preserve resume marker on interrupted restart

* refactor(gateway): simplify auto-resume + extend to crash recovery

Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape,
wider coverage.

Changes:
- Drop the synthetic '[System note: ...]' in the internal MessageEvent.
  The existing _is_resume_pending branch in _handle_message_with_agent
  (run.py ~L13738) already injects a reason-aware recovery system note
  on the next turn.  With kyan's text in place the model saw two stacked
  system notes.  Now the event text is empty and the existing injection
  path owns the wording.
- Drop SessionStore.list_resume_pending() as a new public method.  The
  filter is 8 lines inline in _schedule_resume_pending_sessions() —
  one caller, no other pluggability need.
- Add 'restart_interrupted' to the auto-resume reason set.  That's the
  reason SessionStore.suspend_recently_active() stamps on sessions
  recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker).
  Previously those sessions had to wait for a real user message to
  auto-resume; now they continue automatically at startup like
  drain-timeout interruptions do.
- Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so
  future reasons (e.g. 'manual_resume_request') can be opted in with
  one line.

Test coverage added:
- drain-timeout + crash-recovery both scheduled
- stale entries skipped (outside freshness window)
- suspended entries skipped (suspended > resume_pending)
- originless entries skipped (no routing target)
- disallowed reasons skipped (graceful forward-compat)

E2E verified end-to-end with a real on-disk SessionStore: 2 eligible
sessions scheduled, 2 ineligible skipped, empty-text internal events
delivered to the adapter.

Co-authored-by: Kevin Yan <kevyan1998@gmail.com>

* fix(auth): sync shared Nous refresh tokens

* refactor(auth): dedupe file-lock helper; document Nous lock order

Extract the shared flock/msvcrt boilerplate from _auth_store_lock and
_nous_shared_store_lock into a single _file_lock(lock_path, holder,
timeout, message) helper. Each caller keeps its own threading.local
holder so reentrancy state stays per-lock.

Also document the lock-ordering invariant on both wrappers:
_auth_store_lock is OUTER, _nous_shared_store_lock is INNER for all
runtime refresh paths. The one exception is _try_import_shared_nous_state,
which holds the shared lock alone across the full HTTP refresh+mint
cycle to prevent concurrent sibling imports from racing on the single-
use shared refresh token; that helper must not be called with the auth
lock already held.

* fix(gateway): avoid duplicated responses history

* chore: add AUTHOR_MAP entries for thelumiereguy and counterposition

* fix(gateway): use monotonic deadlines in QR onboarding flows

* fix(oauth,gateway): monotonic deadlines for polling/timeout loops

Widen PR #20314's fix to the other timeout-polling sites in the codebase
that share the same wall-clock-jump bug class. All of these measure elapsed
timeout duration, not civil time, so they belong on time.monotonic().

- hermes_cli/auth.py: auth-store file-lock timeout, Spotify OAuth callback
  wait, Nous portal device-auth token poll.
- hermes_cli/copilot_auth.py: Copilot OAuth device-flow token poll.
- hermes_cli/gateway.py: gateway systemd restart wait.
- hermes_cli/web_server.py: dashboard Codex device-auth user_code wait,
  dashboard Nous device-auth token poll. (sess["expires_at"] stays on
  time.time() — it's a persisted absolute timestamp, not a local
  deadline-polling variable.)
- agent/copilot_acp_client.py: Copilot ACP JSON-RPC request timeout.

* fix(weixin): replace all aiohttp ClientTimeout with asyncio.wait_for()

aiohttp ClientTimeout uses BaseTimerContext which calls
loop.call_later() internally. When invoked via
asyncio.run_coroutine_threadsafe() from cron jobs, this
triggers "Timeout context manager should be used inside a task"
errors, causing message delivery failures.

Replace all direct ClientTimeout usage with asyncio.wait_for():
- _upload_ciphertext: CDN upload (120s timeout)
- _download_bytes: CDN download (configurable timeout)
- _download_remote_media: remote media fetch (30s timeout)

Also set total=None on _send_session to disable aiohttp built-in
timeout, and change trust_env=True to False to bypass proxy for
WeChat CDN connections.

* test(weixin): update timeout assertion for asyncio.wait_for migration

* chore: AUTHOR_MAP entry for chenlinfeng@ruije / @noOne-list

* feat(security): enable secret redaction by default (#17691, #20785) (#21193)

Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor
already wired into send_message_tool, logs, and tool output actually runs
on a fresh install.

- agent/redact.py: env-var default "" → "true"
- hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True;
  two config-template comments rewritten
- gateway/run.py + cli.py: startup log / banner warning when the user
  has explicitly opted out, so the downgrade is visible in agent.log
  and at CLI banner time
- docs/reference/environment-variables.md: description reconciled
- tests: flipped the default-pin, restructured the force=True
  regression test to explicit-false instead of unset

Users who need raw credential values (redactor development) can still
opt out via security.redact_secrets: false in config.yaml or
HERMES_REDACT_SECRETS=false in .env.

Closes #17691.
Addresses #20785 (short-term output-pipeline recommendation).

* feat: add Discord message deletion action

* chore: AUTHOR_MAP entry for @likejudy

* fix(security): close TOCTOU window in hermes_cli/auth.py credential writers (#21194)

`_save_auth_store`, `_save_qwen_cli_tokens`, and `_write_shared_nous_state`
all created the temp file via `Path.open('w')` / `Path.write_text` and only
tightened permissions to 0o600 afterward. Between create and chmod the file
existed at the process umask (commonly 0o644 = world-readable on multi-user
hosts), briefly exposing OAuth access/refresh tokens for Nous, Codex,
Copilot, Claude, Qwen, Gemini, and every other native OAuth provider that
flows through auth.json.

Switch all three to `os.open(O_WRONLY|O_CREAT|O_EXCL, 0o600)` + `os.fdopen`
+ `fsync` so the file is atomic at 0o600 on creation. Tighten each parent
directory (`~/.hermes/`, Qwen auth dir, Nous shared auth dir) to 0o700 so
siblings can't traverse to the creds. `_save_auth_store` also gains a
per-process random temp suffix to match `agent/google_oauth.py` (#19673)
and `tools/mcp_oauth.py` (#21148).

Adds `tests/hermes_cli/test_auth_toctou_file_modes.py` asserting final
file mode 0o600 and parent dir mode 0o700 across all three writers, plus
an explicit `os.open(flags, mode)` check on the main auth.json writer
that would fail if anyone reintroduces the `Path.open('w')` pattern.
POSIX-only (mode bits skipped on Windows).

* fix(delegate): correct ACP docs — Claude Code CLI has no --acp flag

The delegate_task tool schema descriptions referenced 'claude --acp --stdio'
as an example, but Claude Code CLI does not support --acp or --stdio flags.

The ACP subprocess transport (agent/copilot_acp_client.py) is specifically
built for GitHub Copilot CLI ('copilot --acp --stdio').

Changes:
- Per-task acp_command example: 'claude' → 'copilot'
- Top-level acp_command description: remove 'Claude Code' reference,
  clarify requirement for ACP-compatible CLI (currently Copilot only)
- acp_args description: remove misleading claude-opus-4-6 example

Fixes #19055

* fix: exclude hidden and archive dirs from _find_skill rglob

* fix(gateway): preserve thread routing from cached live session sources

* fix(gateway): cap cached session sources with LRU eviction

Follow-up on top of Zyproth's session-source cache: swap the unbounded
dict for an OrderedDict with a 512-entry LRU cap so long-running
gateways can't accumulate stale entries for dead sessions forever.

- self._session_sources is now an OrderedDict
- _cache_session_source() move_to_end + popitem(last=False) above cap
- _get_cached_session_source() move_to_end on hit (LRU read bump)
- restart_test_helpers.py wires OrderedDict + _session_sources_max

* fix(mcp): give 'mcp add --command' a distinct argparse dest

The --command flag of `hermes mcp add` shared its argparse dest with the
top-level subparser (`dest="command"` in `hermes_cli/_parser.py`). When
the flag was omitted, argparse still wrote `args.command = None`,
clobbering the top-level value of `"mcp"`. The dispatcher then saw
`args.command is None` and fell through to interactive chat, so
`hermes mcp add ...` silently launched chat instead of registering the
server. `cmd_mcp_add` was never reached.

Use `dest="mcp_command"` on the flag and read it from `cmd_mcp_add`.
The user-facing CLI flag `--command` is unchanged; only the in-memory
namespace attribute moves. Also updates the `_make_args` helper in
`tests/hermes_cli/test_mcp_config.py` to populate the new dest, and
adds `tests/hermes_cli/test_mcp_add_command_dest.py` with a parser-
level regression test.

Closes #19785.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: add discodirector email to AUTHOR_MAP

* fix(bedrock): preserve reasoningContent across converse normalization

* feat(gateway): support [[as_document]] directive for skill media routing

Skills that produce large/lossless images (e.g. info-graph, where a
rendered JPG is 1-2 MB) currently lose quality in Telegram delivery
because `_IMAGE_EXTS` membership routes the file through
`send_multiple_images` → `sendMediaGroup`, which Telegram's server
re-encodes to JPEG @ 1280px max edge. The original bytes only survive
when the file goes through `send_document`, which the dispatch tables
in three places (`_process_message_background`, `_deliver_media_from_response`,
and the `send_message` tool's telegram path) only reach for files
whose extension is NOT in `_IMAGE_EXTS`.

This commit adds an `[[as_document]]` directive that mirrors the
existing `[[audio_as_voice]]` shape: a skill emits the directive once
in its response, and every image-extension MEDIA: file in that response
is delivered via `send_document` instead of `send_multiple_images` /
`sendPhoto`. The directive is detected at the dispatch sites (which see
the raw response) and the directive string is stripped from the
user-visible cleaned text in `extract_media` so it never leaks.

Granularity is intentionally all-or-nothing per response, matching
[[audio_as_voice]]'s scope. Skills that need fine control can split into
two responses.

Verified the targeted use case: info-graph emits

    信息图已生成(...)
    [[as_document]]
    MEDIA:/tmp/info-graph-x/infographic.jpg

→ Telegram receives `infographic.jpg` via sendDocument, original 1MB
JPEG bytes preserved, no recompression. Forwarding and download
filenames stay clean (`infographic.jpg`).

Tests: +3 cases in TestExtractMedia covering directive strip, isolation
from voice flag, and coexistence with [[audio_as_voice]]. All
113 pre-existing media/extract/send tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: update send_message_tool mocks for force_document kwarg

* chore: AUTHOR_MAP entry for @leon7609

* fix(model_switch): live model discovery for custom_providers in /model picker

custom_providers entries (section 4 of list_authenticated_providers) only
read the static models: dict from config.yaml, ignoring the live /v1/models
endpoint.  This means gateways like Bifrost that expose hundreds of models
only show the handful explicitly listed in config.

Add live discovery via fetch_api_models() for custom_providers entries
that have api_key + base_url, matching the existing behavior for user
providers: entries (section 3).  When the endpoint is reachable and
returns models, the live list replaces the static subset.

Fixes: /model picker showing only 9 models from a Bifrost gateway that
actually exposes 581.

* fix(memory): support OpenViking local resource uploads

* test(memory): harden OpenViking local upload coverage

* fix(memory): harden OpenViking local path uploads

* chore(release): add AUTHOR_MAP entries for ggnnggez and ehz0ah

Contributors to OpenViking local resource upload fix (#19569).

* docs(readme): drop misleading RL install-extras claim, defer to CONTRIBUTING

README.md:163 said atroposlib and tinker were pulled in by .[all,dev], but
.[all] does not include .[rl] — those dependencies live in pyproject.toml's
[rl] extra (lines 95-101). With the original wording, a contributor running
uv pip install -e ".[all,dev]" would not have atroposlib or tinker
installed.

Rather than swap one extra for another (which paths users to either of two
parallel install conventions — pip [rl] extra vs tinker-atropos submodule —
without saying which the project considers canonical), this PR drops the
specific install command from the README and links to CONTRIBUTING.md,
which already documents the actual development setup.

* fix(kanban): auto-block workers that exit without completing (#20894) (#21214)

When a kanban worker subprocess exits rc=0 but its task is still in
status='running', the agent almost certainly answered the task
conversationally without calling kanban_complete or kanban_block. The
dispatcher used to classify this as a generic crash and respawn, which
loops forever on small local models (gemma4-e2b q4 etc.) that keep
returning clean but unproductive output.

Dispatcher changes:
- The waitpid reap loop at the top of dispatch_once now records each
  reaped child's raw exit status in a bounded module registry
  (_recent_worker_exits, TTL 600s, size cap 4096).
- _classify_worker_exit distinguishes clean_exit / nonzero_exit /
  signaled / unknown using os.WIFEXITED / WIFSIGNALED.
- detect_crashed_workers consults the classification when a worker
  is found dead. clean_exit → protocol_violation event + immediate
  circuit-breaker trip (failure_limit=1). Everything else keeps the
  existing crashed-event + counter behavior.
- DispatchResult.auto_blocked now includes protocol-violation trips.

Gateway fix (Bug A in #20894):
- gateway.run._notify_active_sessions_of_shutdown snapshots
  self.adapters with list(...) before iterating. adapter.send() can
  hit a fatal-error path that pops the adapter from the dict, which
  was raising 'RuntimeError: dictionary changed size during iteration'
  during shutdown.

Regression tests:
- test_detect_crashed_workers_protocol_violation_auto_blocks verifies
  rc=0 + still-running → status=blocked on first occurrence with
  protocol_violation + gave_up events and NO crashed event.
- test_detect_crashed_workers_nonzero_exit_uses_default_limit verifies
  non-zero exits keep the existing 2-strike behavior.

Closes #20894.

* fix(dashboard): stabilize embedded chat resume and scrollback

* fix(dashboard): let embedded chat use a single scroll system

* fix(dashboard): route browser wheel into inner TUI scrolling

* chore: AUTHOR_MAP entry for @nouseman666

* fix(cli): honor positive tool preview length

* chore: AUTHOR_MAP entry for @GinWU05

* fix(credential_pool): resolve key mix-up when custom providers share base_url

When multiple custom_providers share the same base_url but have different API keys,

get_custom_provider_pool_key() always returned the first match, causing wrong-key

unauthorized errors. Add provider_name parameter to prefer exact name matches

over base_url-only matching, with fallback for backward compatibility.

Fixes #19083

* feat(cli): show context compression count in status bar

Display the number of context compressions in the CLI status bar when
compressions > 0, helping users understand conversation compression
pressure during long sessions.

- Wide layout (>=76 cols): shows 'cmp N' between context percent and duration
- Medium layout (52-75 cols): shows 'cmp N' between percent and duration
- Narrow layout (<52 cols): omitted to save space
- Color-coded: dim for 1-4, warn for 5-9, bad for 10+
- Hidden when zero to keep the bar clean for new sessions

Closes #18564

* refactor: replace 'cmp' text with 🗜️ emoji in status bar

Address review feedback to use the clamp emoji (��️) instead of
the plain text 'cmp' prefix for the compression count indicator.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(tui): surface compression count in Ink status bar

Parity with the classic CLI status bar (PR #18579). The Python backend
already exposes 'compressions' on SessionUsageResponse; this wires it
through the Ink Usage type and renders 'cmp N' next to the duration
segment of StatusRule.

- types.ts Usage: add optional compressions field
- appChrome.tsx StatusRule: render 'cmp N' when > 0, color-tiered by
  pressure (muted <5, warn 5-9, error 10+)
- Plain text 'cmp' token (no emoji) matches PR #18579's original author
  rationale and avoids Ink layout drift from VS16 emoji width

* chore(release): map altriatree@gmail.com -> @TruaShamu

* fix(curator): make manual runs synchronous

* docs(curator): update CLI docs for synchronous-by-default manual run

Follow-up to the previous commit which flipped 'hermes curator run'
default from async to sync. Updates the curator.md feature page and
cli-commands.md reference to show --background as the opt-in async
flag and note that the default now blocks until the LLM pass finishes.

* fix(install): remove uv exclude-newer cutoff

* docs: clarify API server tool execution locality

* fix(kanban): treat dashboard event-stream cancellation as normal shutdown

Stopping `hermes dashboard` with Ctrl-C while the Kanban dashboard is
open prints an ASGI traceback ending in
`plugins/kanban/dashboard/plugin_api.py::stream_events` at the
`asyncio.sleep(_EVENT_POLL_SECONDS)` line. This is a normal shutdown
path: Uvicorn cancels the open websocket task while it is sleeping in
the 300 ms poll loop. `asyncio.CancelledError` is a `BaseException` in
Python 3.8+ — the bare `except Exception:` handler below the existing
`WebSocketDisconnect:` clause does NOT catch it, so the cancellation
surfaces as an application traceback and routine dashboard exit looks
like a runtime failure.

Add an explicit `except asyncio.CancelledError: return` clause beside
the existing `WebSocketDisconnect` handler. Disconnection (client
closed the tab) and shutdown cancellation (dashboard process exiting)
are conceptually different paths but both warrant a quiet return; the
two clauses are kept separate to keep that intent explicit.

`asyncio` is already imported and used in this scope, so no new
import is needed. The bare `except Exception:` handler is preserved
verbatim, so genuine runtime failures still log a warning and close
the socket cleanly.

Closes #20790.

* chore(release): map SandroHub013 email

* test(kanban): regression for CancelledError swallow in stream_events

Drives stream_events directly and cancels the task while it is sleeping
in the poll loop, asserting the coroutine returns cleanly instead of
letting CancelledError bubble. Regression coverage for the Uvicorn
application traceback on dashboard Ctrl-C fixed by the preceding commit.

* fix(model_tools): log plugin hook exceptions instead of silently swallowing them

* feat(gateway): add `hermes gateway list` to show all profiles' gateway status

Add a new `hermes gateway list` subcommand that shows the running
status of gateways across all profiles in a single view:

    Gateways:
      ✓ default (current)        — PID 155469
      ✓ wx1                      — PID 166893
      ✗ dev                      — not running

Also includes `_print_other_profiles_gateway_status()` which appends
an "Other profiles" section to `hermes gateway status` output when
other profile gateways are running.

Both use existing `list_profiles()` and `find_profile_gateway_processes()`
— no new dependencies.

Closes #19127
Related: #19113, #4402, #4587

* fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226)

The MCP SDK discovers OAuth server metadata (token_endpoint, etc.) on
demand and keeps it in memory only. Without disk persistence, a restart
with valid cached refresh tokens forces the SDK to fall back to the
guessed '{server_url}/token' path — which returns 404 on most real
providers (Notion, Atlassian, GitHub remote MCP, etc.) and triggers a
full browser re-authorization even though the refresh token is fine.

Add a .meta.json file next to the existing tokens/client_info files:

  HERMES_HOME/mcp-tokens/<server>.json        -- tokens (existing)
  HERMES_HOME/mcp-tokens/<server>.client.json -- client info (existing)
  HERMES_HOME/mcp-tokens/<server>.meta.json   -- oauth metadata (new)

Changes:
- HermesTokenStorage.save_oauth_metadata / load_oauth_metadata / _meta_path
  — disk layer for the discovered OAuthMetadata.
- HermesTokenStorage.remove() now also clears .meta.json so
  'hermes mcp remove <name>' and the manager's remove() path clean up fully.
- HermesMCPOAuthProvider._initialize cold-restores from disk before the
  existing pre-flight discovery runs. If disk has metadata we skip the
  discovery HTTP round-trips entirely.
- HermesMCPOAuthProvider._prefetch_oauth_metadata now persists ASM as
  soon as it's discovered, so even the first pre-flight run seeds disk.
- HermesMCPOAuthProvider._persist_oauth_metadata_if_changed() is called
  at the end of async_auth_flow so metadata discovered via the SDK's
  lazy 401-branch (not pre-flight) is also saved for next time.

Tests cover the storage roundtrip (save/load/missing/corrupt/remove) and
the manager provider path (cold-load restore, skip-when-in-memory,
persist-on-discover, noop-when-unchanged, end-to-end async_auth_flow).

Co-authored-by: nocturnum91 <50326054+nocturnum91@users.noreply.github.com>

* feat: add SSE transport support for MCP client

Add support for MCP servers using the SSE transport protocol
(SseServerTransport) alongside the existing Streamable HTTP and stdio
transports. Many MCP servers use SSE (GET /sse + POST /messages/)
which was previously unsupported -- the client silently fell back to
Streamable HTTP, causing 10s connection timeouts.

Changes:
- Import mcp.client.sse.sse_client with graceful fallback
- Check config.get('transport') == 'sse' in _run_http() to select
  the SSE transport path with proper timeout handling
- Read transport type from config in get_mcp_status() instead of
  hardcoding 'http' for URL-based servers
- Update docstring, example config, and feature list

* fix(browser): enforce cloud-metadata SSRF floor in hybrid routing (#16234) (#21228)

Cloud metadata endpoints (169.254.169.254 etc.) are now always blocked
by browser_navigate regardless of hybrid routing, allow_private_urls,
or backend.

Bug: commit 42c076d3 (#16136) added hybrid routing that flips
auto_local_this_nav=True for private URLs and short-circuits
_is_safe_url(). IMDS endpoints are technically private (169.254/16
link-local), so the sidecar happily routed them to a local Chromium,
and the agent could read IAM credentials via browser_snapshot. On
EC2/GCP/Azure this is a full SSRF-to-credential-theft.

Fix: new is_always_blocked_url() in url_safety.py — a narrow floor
that checks _BLOCKED_HOSTNAMES, _ALWAYS_BLOCKED_IPS,
_ALWAYS_BLOCKED_NETWORKS only. Applied as an independent gate in
browser_navigate's pre-nav and post-redirect checks, BEFORE
auto_local_this_nav gets a chance to short-circuit. Ordinary private
URLs (localhost, 192.168.x, 10.x, .local, CGNAT) still route to the
local sidecar as the #16136 feature intends.

Secondary fix (reporter's finding): _url_is_private() now explicitly
checks 172.16.0.0/12. ipaddress.is_private only covers that range on
Python ≥3.11 (bpo-40791), so on 3.10 runtimes those URLs were routed
to cloud instead of the local sidecar. No security impact — just a
correctness fix for the hybrid-routing feature.

Closes #16234.

* fix: WhatsApp bridge process leak and disable config asymmetry

- Add PID file mechanism to track bridge processes and kill stale ones on startup
- Improve _kill_port_process() with lsof fallback when fuser is not available
- Support explicit WhatsApp disable via config.yaml (whatsapp.enabled: false)
- Respect WHATSAPP_ENABLED=false env var to disable WhatsApp

Fixes #19124

* docs(contributing): align tool discovery and test runner with AGENTS.md

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(kanban): make dashboard board pin authoritative over server current file (#21230)

When the user created a new board via the dashboard with "switch" checked,
the server-side `current` file was flipped to the new board. Clicking the
original board's tab then showed no cards even though the count badge read
correctly — the REST fetch dropped `?board=` when the selection was
"default" and the backend fell through to `current` (= the new board),
returning a different board's data than the tab the user clicked.

Fix:
- `withBoard()` always appends `?board=<slug>` when a board is selected,
  including "default". The dashboard's tab selection becomes authoritative
  instead of silently deferring to the server's `current` file.
- `writeSelectedBoard()` persists every selection (including "default")
  to localStorage. Previously "default" was stripped, which meant the
  next page load had nothing to pin to and fell through to `current`.
- Same change applied to the WebSocket query builder in `openWs()`.

Contract verified live:
  current_board = "proj2"
  GET /board                  → proj2's tasks   (bug shape: falls through to current)
  GET /board?board=default    → default's tasks (fix: explicit pin wins)
  GET /board?board=proj2      → proj2's tasks

Closes #20879.

* fix: avoid unsupported anthropic context beta by default

* Follow latest child session on dashboard resume

* fix(openviking): add Bearer auth header and omit empty/legacy tenant headers (#21232)

Authenticated remote OpenViking servers derive tenancy from the Bearer
key, but the client was always sending X-OpenViking-Account and
X-OpenViking-User — defaulted to the literal string "default" — which
overrode the key-derived tenant and broke auth.

- _headers(): skip X-OpenViking-Account/-User when blank or "default"
  (treats the legacy default value as unset, so existing installs don't
  need to touch their .env)
- _headers(): send Authorization: Bearer <key> alongside X-API-Key for
  standard HTTP auth compatibility
- health(): include auth headers so /health works against servers that
  require authentication

Tests cover bearer emission, legacy "default" suppression, empty
suppression, real tenant passthrough, and authenticated health checks.

Fixes the same user report as #20695 (from @ZaynJarvis); that PR could
not be merged because its branch was stale against main and would have
reverted recent OpenViking work (#15696, local resource uploads, summary
URI normalization, fs-stat pre-check).

* feat: add transform_llm_output plugin hook

Enables plugins to transform LLM output text after generation,
useful for vocabulary/personality transformation without burning
inference tokens.

Follows same pattern as transform_tool_result and transform_terminal_output:
- First non-empty string result wins
- Fail-open: exceptions logged as warnings, agent continues
- Signature: (response_text, session_id, model, platform)

* test+docs: cover transform_llm_output hook + release author map

- tests/test_transform_llm_output_hook.py: dispatch semantics
  (kwargs contract, first-non-empty-string-wins, empty-string
  pass-through, raising-plugin fail-open, no-plugins = no-op)
- tests/hermes_cli/test_plugins.py: assert the new hook name is in
  VALID_HOOKS alongside the other transform_* hooks
- website/docs/user-guide/features/hooks.md: summary-table entry +
  full section mirroring transform_tool_result / transform_terminal_output
- scripts/release.py: map barnacleboy.jezzahehn@agentmail.to -> JezzaHehn
  (existing entry only covers the gmail address)

* feat(curator): add `hermes curator list-archived` command (#21236)

Lists the skills sitting in ~/.hermes/skills/.archive/ so users have
something to pass to `hermes curator restore`. `curator status` already
shows counts; this fills the name-discovery gap.

Archive layout is flat (`archive_skill` writes to `.archive/<skill>/`),
so the directory name IS the skill name — no frontmatter parsing
needed. Timestamped collision directories (`<skill>-<ts>`) are listed
literally; user can still pass them to `restore`.

Reshape of @EvilDrag0n's #20651, simplified: drop the frontmatter
rglob + preamble/trailer output + duplicate subcommand registration.

Co-authored-by: EvilDrag0n <lxl694522264@gmail.com>

* fix: require memory schema fields by action

* fix(tui): refresh scroll height at cached bottom

* fix(gateway): preserve max turns after env reload

* fix(discord): scope DISCORD_ALLOWED_ROLES to originating guild (CVSS 8.1)

The initial DISCORD_ALLOWED_ROLES implementation (#11608, merged from #9873)
scans every mutual guild when resolving a user's roles. This allows a
cross-guild DM bypass:

1. Bot is in both public server A and private server B.
2. User holds the allowed role in server A only.
3. User DMs the bot. The role check finds the role in A and authorizes the
   DM, granting access as if the user were trusted in server B.

Fix:
- DMs (no guild context) disable role-based auth by default. Opt-in via
  DISCORD_DM_ROLE_AUTH_GUILD=<guild_id> restricts role lookup to one
  explicitly-trusted guild.
- Guild messages check roles only in the originating guild
  (message.guild), never in other mutual guilds.
- Reject cached author.roles when the Member came from a different guild
  than the current message.

Backwards compatibility:
- DISCORD_ALLOWED_USERS behavior is unchanged (still works in both DMs
  and guild messages).
- Deployments that rely on roles in guild channels continue to work;
  role checks are now strictly scoped to that guild.
- Deployments that intentionally want role-based DM auth can opt into a
  single trusted guild via DISCORD_DM_ROLE_AUTH_GUILD.

Tests: 9 new regression guards in
tests/gateway/test_discord_roles_dm_scope.py covering the bypass path,
the opt-in path, cross-guild guild-message bypass, and backwards-compat
user-ID paths. 47/47 discord-auth tests pass.

Refs: #11608 (initial implementation), #7871 (feature request),
  #9873 (PR author credit @0xyg3n)

* fix(discord): extend role-scope fix to slash surface + fixture update

Sibling-site fix: _evaluate_slash_authorization was the fourth
_is_allowed_user caller and didn't pass guild/is_dm through, so slash
interactions would take the DM branch regardless of whether they came
from a guild channel. Now reads interaction.guild + in_dm and forwards.

Also updates test_discord_slash_auth fixture (_make_interaction) so
the SimpleNamespace guild mock has a get_member(uid)->None method —
required by the new guild-scoped fallback path in _is_allowed_user.
Tests exercising positive role paths still work via user.roles.

Three new regression tests in test_discord_roles_dm_scope:
- Slash DM + role in mutual public guild → rejected
- Slash in guild B + role only in guild A → rejected
- Slash in guild B + role in guild B → allowed (positive control)

368 Discord tests pass. test_discord_free_channel_skips_auto_thread
also fails on clean main (pre-existing, unrelated to this fix).

* fix(discord): route DM role-auth opt-in through config.yaml (not env var)

Per repo policy, ~/.hermes/.env is for secrets only. Guild IDs are
behavioral configuration, not secrets. Replacing the
DISCORD_DM_ROLE_AUTH_GUILD env var from the original fix with
discord.dm_role_auth_guild in config.yaml.

- New module-level _read_dm_role_auth_guild() helper reads
  hermes_cli.config.read_raw_config()['discord']['dm_role_auth_guild'].
  Fails closed on any parse error (safe default = DM role-auth off).
- DEFAULT_CONFIG['discord'] gains dm_role_auth_guild: '' with a comment
  documenting the opt-in.
- …
deestax added a commit to T3-Venture-Labs-Limited/hermes-agent that referenced this pull request May 8, 2026
…gin v1.0.0 docs (#26)

* fix(opencode-go): keep users on opencode-go instead of hijacking to native providers (#20802)

OpenCode Go and OpenCode Zen are flat-namespace model resellers — their
/v1/models returns bare IDs (deepseek-v4-flash, minimax-m2.7), and the
inference API rejects vendor-prefixed names with HTTP 401 'Model not
supported'. Two bugs fixed:

1. `switch_model` in hermes_cli/model_switch.py was silently switching the
   user off opencode-go to native deepseek when they typed
   `/model deepseek-v4-flash`. Step d found the model in opencode-go's live
   catalog, but step e (detect_provider_for_model) still ran and matched
   the bare name against deepseek's static catalog. Fix: track whether
   the live catalog resolved it; skip step e when it did.

2. `normalize_model_for_provider` in hermes_cli/model_normalize.py only
   stripped the exact `opencode-zen/` prefix, leaving arbitrary vendor
   prefixes like `minimax/minimax-m2.7` (commonly copied from aggregator
   slugs into fallback_model configs) intact — causing HTTP 401s when
   the fallback chain activated. Fix: opencode-go/opencode-zen strip ANY
   leading vendor prefix because their APIs are flat-namespace.

Tests: 11 new cases in tests/hermes_cli/test_opencode_go_flat_namespace.py
covering both normalization (prefix stripping, regression guards for
opencode-zen Claude hyphenation and openrouter vendor-prepending) and
switch_model (bare-name resolution on opencode-go's live catalog must
not trigger cross-provider hijack).

Reported by @Ufonik via Discord; Kimi K2.6 always worked because moonshotai
has no overlapping entry in a native provider's static catalog. Deepseek
and minimax failed because their v4/v2.7 names existed in the native
deepseek/minimax catalogs.

* feat(dashboard): add 'default-large' built-in theme with 18px base size (#20820)

Same Hermes Teal palette as the default theme, but with baseSize 18px,
lineHeight 1.65, and spacious density so the whole dashboard scales up.
Gives users a one-click bigger-text preset and a copyable reference for
authoring custom YAML themes with their own typography settings.

* refactor(web): per-capability backend selection for search/extract split

Introduce the foundation for independently selecting web search and
extract backends — enabling future combinations like SearXNG for
search + Firecrawl for extract.

Architecture:
- tools/web_providers/base.py: WebSearchProvider and WebExtractProvider
  ABCs with normalized result contracts (mirrors CloudBrowserProvider)
- tools/web_tools.py: _get_search_backend() and _get_extract_backend()
  read per-capability config keys, fall through to shared web.backend
- hermes_cli/config.py: web.search_backend and web.extract_backend in
  DEFAULT_CONFIG (empty = inherit from web.backend)

Behavioral change:
- web_search_tool() now dispatches via _get_search_backend()
- web_extract_tool() now dispatches via _get_extract_backend()
- When per-capability keys are empty (default), behavior is identical
  to before — _get_search_backend() falls through to _get_backend()

This is purely structural — no new backends are added. SearXNG and
other search-only/extract-only providers can now be added as simple
drop-in modules in follow-up PRs.

12 new tests, 49 existing tests pass with zero regressions.

Ref: #19198

* feat(web): add SearXNG as a native search-only backend

Adds SearXNG as a free, self-hosted web search provider.  SearXNG is a
privacy-respecting metasearch engine that requires no API key — just a
running instance and SEARXNG_URL pointing at it.

## What this adds

- `tools/web_providers/searxng.py` — `SearXNGSearchProvider` implementing
  `WebSearchProvider` (search only; no extract capability)
- `_is_backend_available("searxng")` — gates on SEARXNG_URL
- `_get_backend()` — accepts "searxng" as a configured value; adds it to
  auto-detect candidates (lower priority than paid services)
- `web_search_tool` — dispatches to SearXNG when it is the active backend
- `check_web_api_key()` — includes SearXNG in availability check
- `OPTIONAL_ENV_VARS["SEARXNG_URL"]` — registered with tools=["web_search"]
- `tools_config.py` — SearXNG appears in the `hermes tools` provider picker
- `nous_subscription.py` — `direct_searxng` detection, web_active / web_available
- `setup.py` — SEARXNG_URL listed in the missing-credential hint
- 23 tests covering: is_configured, happy-path search, score sorting, limit,
  HTTP/request errors, _is_backend_available, _get_backend, check_web_api_key

## Config

```yaml
# Use SearXNG for search, any paid provider for extract
web:
  search_backend: "searxng"
  extract_backend: "firecrawl"

# Or: SearXNG as the sole backend (web_extract will use the next available)
web:
  backend: "searxng"
```

SearXNG is search-only — it does not implement WebExtractProvider.  Users
who only configure SEARXNG_URL get web_search available; web_extract falls
back to the next available extract provider (or is unavailable if none).

Closes #19198 (Phase 2 Task 4 — SearXNG provider)
Ref: #11562 (original SearXNG PR)

* docs+skill: add searxng-search optional skill and documentation

Closes the remaining gaps from PR #11562 that weren't covered by the
core SearXNG integration landed in #20823.

- optional-skills/research/searxng-search/ — installable skill with
  SKILL.md (curl-based usage, category support, Python example) and
  searxng.sh helper script for health checks and instance queries
- website/docs/user-guide/configuration.md — SearXNG added to the
  Web Search Backends section (5 backends, backend table, per-capability
  split config example, correct search-only note)
- website/docs/reference/environment-variables.md — SEARXNG_URL row
- website/docs/reference/optional-skills-catalog.md — searxng-search entry

The core SearXNG code, OPTIONAL_ENV_VARS, hermes tools picker, and tests
were already on main via #20823.  This commit is purely additive docs +
the optional skill scaffold.

Credits from #11562 salvage:
  @w4rum — original _searxng_search structure
  @nathansdev — tools_config.py integration
  @moyomartin — category support and result formatting
  @0xMihai — config/env var approach
  @nicobailon — skill and documentation structure
  @searxng-fan — error handling patterns
  @local-first — self-hosted-first philosophy and docs

* docs: add Web Search + Extract feature page with SearXNG setup guide

* fix(feishu): keep topic replies in threads

Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.

* chore: follow-up cleanup for Feishu topic thread fix

- Remove dead metadata.get('reply_to') fallback in _send_raw_message;
  nothing in the codebase ever sets 'reply_to' inside a metadata dict —
  the key only appears as a top-level send_voice() keyword argument
- Simplify _status_thread_metadata construction in run.py to use a
  single dict literal instead of create-then-mutate pattern; the
  or-{} guard was dead since source.thread_id implies _progress_thread_id
  is also set for Feishu
- Add yuqian@zmetasoft.com to AUTHOR_MAP for contributor attribution

* fix(kanban): avoid fragile failure-column renames

* chore: follow-up cleanup for Kanban migration fix

- Expand migration comment to name the primary failure mode (missing
  column OperationalError from #20842) ahead of the secondary SQLite
  schema-reparse concern; also document the stale-cols-snapshot invariant
- Add clarifying comments on from_row() legacy fallback branches noting
  they are belt-and-suspenders dead code post-migration
- Add task_events comment in existing test explaining why the table is
  required by the migrator
- Add test_legacy_migration_no_legacy_columns_at_all: Scenario A —
  explicitly asserts the exact #20842 crash no longer occurs and that
  consecutive_failures defaults to 0 on a DB that never had spawn_failures
- Add test_legacy_migration_both_columns_already_present: Scenario D —
  asserts the migration is a no-op when both columns already exist,
  preserving the existing counter value

* fix(tui): bound virtual history offset searches

* ci(docker): don't cancel overlapping builds, guard :latest

Switch top-level concurrency to cancel-in-progress=false so every push
to main gets its own SHA-tagged image published — no more discarded
builds when commits land back-to-back.

Guard the :latest tag with a second job that has its own concurrency
group with cancel-in-progress=true plus a git-ancestor check against
the revision label on the current :latest. Together these guarantee
:latest only ever moves forward in history: a slower run whose commit
isn't a descendant of the current :latest refuses to clobber it, and
a newer push mid-way through the move-latest job preempts the older
one before it can retag.

- Every main push publishes nousresearch/hermes-agent:sha-<commit>
  with an org.opencontainers.image.revision label embedded.
- move-latest job reads that label off :latest, runs merge-base
  --is-ancestor, and only retags (via buildx imagetools create,
  registry-side, no rebuild) if our commit strictly descends.
- fetch-depth bumped to 1000 so merge-base has the history it needs.
- Release tag flow unchanged (unique tag, no race).

* docs(tool-gateway): rewrite as pitch-first marketing page (#20827)

Previous version read like internal API docs \u2014 leading with env var tables,
config YAML, and 'precedence' rules before ever explaining the product.
Complete rewrite inverts the structure so readers see value first,
mechanics last.

Structure now:
- Lede: 'One subscription. Every tool built in.' + pitch paragraph
- CTA: subscribe/manage button styled as a real call-to-action
- What's included: emoji-led table with expanded descriptions per tool.
  Image gen lists all 9 models by name (FLUX 2 Klein/Pro, Z-Image Turbo,
  Nano Banana Pro, GPT Image 1.5/2, Ideogram V3, Recraft V4 Pro, Qwen)
- Why it's here: value bullets \u2014 one bill, one signup, one key, same
  quality, bring-your-own anytime
- Get started: two-command flow (hermes model \u2192 hermes status)
- Eligibility: paid-tier note with upgrade link
- Mix and match: three realistic usage patterns
- Using individual image models: ID reference table for power users
- --- separator ---
- Configuration reference (demoted): use_gateway flag, disabling,
  self-hosted gateway env vars moved below the fold where they belong
- FAQ: streamlined, removed redundant content

Fact-checked against code:
- 9 FAL models confirmed from tools/image_generation_tool.py FAL_MODELS
- Status section output verified against hermes_cli/status.py
- Portal subscription URL preserved
- Self-hosted env vars (TOOL_GATEWAY_DOMAIN etc.) kept accurate

Verified: docusaurus build SUCCESS, page renders, no new broken links.

* fix(auth): fall back to global-root auth.json for providers missing in profile

Profile processes (kanban workers, cron subprocesses, delegated subagents)
read the profile's auth.json only. If a provider was authenticated at the
global root but not inside the profile, the profile's credential_pool
comes back empty and the process fails with 'No LLM provider configured'
— even though the credentials are sitting in ~/.hermes/auth.json. #18594
propagated HERMES_HOME correctly, which is what surfaced this: workers
now land in the right profile, and the profile turns out to shadow global
with no fallback.

Semantics (read-only, per-provider shadowing):
* Profile has any entries for provider X → use profile only (global ignored).
* Profile has zero entries for provider X → fall back to global.
* Writes (write_credential_pool, _save_auth_store) still target the profile.
* Classic mode (HERMES_HOME == global root) skips the fallback entirely —
  _global_auth_file_path() returns None.

Also mirrors the fallback in get_provider_auth_state so OAuth singletons
(nous, minimax-oauth, openai-codex, spotify) inherit cleanly — the Nous
shared-token store (PR #19712) remains the authoritative path for Nous
OAuth rotation, this just makes the read side consistent with it.

Seat belt: _load_global_auth_store() refuses to read the real user's
~/.hermes/auth.json under PYTEST_CURRENT_TEST even when HERMES_HOME points
to a profile-shaped path. Guard uses $HOME (stable across fixtures) rather
than Path.home() (which fixtures often monkeypatch to a tmp root).

Reported by @SeedsForbidden on Twitter as the credential_pool shadowing
follow-up to the #18594 fix.

* feat(gateway): per-platform gateway_restart_notification flag

Adds an opt-out toggle on PlatformConfig that gates both restart
lifecycle pings: the "♻ Gateway restarted" message sent to the chat
that issued /restart, and the "♻️ Gateway online" home-channel
startup notification. Defaults to True so existing deployments are
unaffected.

The motivating split is operator vs. end-user surfaces: a back-channel
like Telegram should keep these pings, while a Slack workspace shared
with end users should not surface gateway lifecycle noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gateway): also gate pre-restart "Gateway restarting" notification

Extend the gateway_restart_notification flag to cover
_notify_active_sessions_of_shutdown — the message that fires just
before drain ("⚠️ Gateway restarting — Your current task will be
interrupted. Send any message after restart and I'll try to resume
where you left off.") sent to active sessions and home channels.

Same operator/end-user reasoning: on a Slack workspace shared with
end users, "Gateway restarting" reads as "the bot is broken" — the
operator should be able to suppress it consistently with the other
two lifecycle pings rather than having a partial opt-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: add guillaumemeyer to AUTHOR_MAP

For cherry-picked commits in PR #20801.

* fix(cli): submit LF enter in thin PTYs (#20896)

* fix(tui): refresh virtual offsets after row resize (#20898)

* fix(tui): honor skin highlight colors (#20895)

* fix(tui): steady transcript scrollbar (#20917)

* fix(tui): steady transcript scrollbar

Keep the visible scrollbar tied to committed viewport position while virtual history can still prefetch against pending scroll targets, and preserve drag grab offset synchronously for native-feeling scrollbar drags.

* fix(tui): smooth precision wheel scroll

Replace the opt-scroll throttle with frame-sized coalescing so modifier wheel gestures stay line-precise without stepping.

* fix(tui): restore voice push-to-talk parity (#20897)

* fix(tui): restore classic CLI voice push-to-talk parity

(cherry picked from commit 93b9ae301bb89f5b5e01b4b9f8ac91ffa74fbd9d)

* fix(tui): harden voice push-to-talk stop flow

Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests.

* fix(tui): preserve silent voice strike tracking

Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking.

* fix(tui): clean up voice stop failure path

Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV.

* fix(tui): report busy voice capture starts

Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up.

* fix(tui): handle busy voice record responses

Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request.

* fix(tui): clear voice recording on null response

Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors.

* fix(tui): count silent manual voice stops

Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard.

---------

Co-authored-by: Montbra <montbra@gmail.com>

* fix(gateway): don't dead-end setup wizard when only system-scope unit is installed

The setup wizard dropped non-root users at a bare shell prompt when
trying to start a system-scope gateway service. Previously
_require_root_for_system_service called sys.exit(1), which the
wizard's `except Exception` guards cannot catch (SystemExit is a
BaseException). Users with a pre-existing /etc/systemd/system unit
(e.g. from an earlier `sudo hermes setup` run) hit this whenever
they re-ran `hermes setup` as a regular user.

- Convert _require_root_for_system_service to raise a typed
  SystemScopeRequiresRootError (RuntimeError subclass) instead of
  sys.exit(1). The direct CLI path (`hermes gateway install|start|stop|
  restart|uninstall` without sudo) still exits 1 cleanly via a new
  catch at the top of gateway_command, matching the existing
  UserSystemdUnavailableError pattern.
- Add _system_scope_wizard_would_need_root() pre-check and
  _print_system_scope_remediation() helper. Both setup wizards
  (hermes_cli/setup.py and hermes_cli/gateway.py::gateway_setup) now
  detect the dead-end before prompting and print actionable guidance:
  either `sudo systemctl start <service>` this time, or uninstall the
  system unit and install a per-user one.
- Defense-in-depth: all 5 wizard prompt sites also catch
  SystemScopeRequiresRootError and fall back to the remediation
  helper if the pre-check is bypassed (race, etc.).

Tests: 12 new tests in TestSystemScopeRequiresRootError,
TestSystemScopeWizardPreCheck, TestSystemScopeRemediationOutput, and
TestGatewayCommandCatchesSystemScopeError covering the exception
contract, pre-check matrix (root vs non-root, system-only vs
user-present vs none vs explicit system=True), remediation output
for each action, and the direct-CLI exit-1 path.

* fix(tui): preserve session when switching personality

Previously, /personality in the TUI called _reset_session_agent() which
destroyed the agent, cleared conversation history, and effectively started
a new session. This made personality switching disruptive — users lost
their entire conversation context.

Now /personality updates the agent's ephemeral_system_prompt in-place and
injects a pivot marker into the conversation history. The marker tells
the model to adopt the new persona from that point forward, which is
necessary because LLMs tend to pattern-match their prior responses and
continue the established tone without an explicit signal.

Changes:
- tui_gateway/server.py: Rewrite _apply_personality_to_session to update
  the agent in-place instead of resetting. Inject a user-role pivot
  marker so the model actually switches style mid-conversation.
- ui-tui/src/app/slash/commands/session.ts: Update help text (no longer
  mentions history reset).
- tests/test_tui_gateway_server.py: Update test to verify history is
  preserved, pivot marker is injected, and ephemeral prompt is set.

* fix(gateway): wait for systemd restart readiness

* fix(discord): narrow rate-limit catch and move sync state under gateway/

Two follow-ups on top of helix4u's slash-command sync hardening:

- Only suppress exceptions that are actually Discord 429 rate limits
  (discord.RateLimited, HTTPException with status 429, or a clearly
  rate-limit-named duck type). Arbitrary failures that happen to expose
  a retry_after attribute now re-raise to the outer handler instead of
  silently swallowing a cooldown.
- Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root
  stops collecting ad-hoc runtime files.

Added a test verifying unrelated exceptions don't get misclassified as
rate limits.

* docs(kanban): fix orchestrator skill setup instructions (#20958)

* docs(kanban): fix worker skill setup instructions too (#20960)

Follow-up to #20958. The worker skill section had the same stale
'hermes skills install devops/kanban-worker' command — kanban-worker
is also bundled, so that command fails with 'Could not fetch from any
source.'

Replace with bundled-skill verification + restore pattern, matching
the orchestrator section. Uses <your-worker-profile> placeholder since
assignees vary (researcher, writer, ops, linguist, reviewer, etc.)
rather than a single fixed 'worker' profile.

* feat(profiles): --no-skills flag for empty profile creation (#20986)

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.

* fix: route Telegram image documents through photo handling

* chore: AUTHOR_MAP entry for mrcoferland

* test(docker): align Dockerfile contract tests with simplified TUI flow

The Dockerfile dropped the manual `@hermes/ink` materialisation gymnastics
in favour of letting npm workspaces resolve the bundled package
naturally. Two contract tests still asserted the older flow:

`test_dockerfile_installs_tui_dependencies` required:
    'ui-tui/packages/hermes-ink/package-lock.json' in dockerfile_text

…but the lockfile is no longer COPIED individually \u2014 the entire
`ui-tui/packages/hermes-ink/` tree is COPIED instead (the workspace
reference from `ui-tui/package.json` is `file:` so npm needs the
real source, not just a manifest stub).

`test_dockerfile_materializes_local_tui_ink_package` required a 7-clause
conjunction matching specific `rm -rf` / `npm install --omit=dev`
`--prefix node_modules/@hermes/ink` / `rm -rf .../react` invocations
that were stripped out when the workspace resolution was simplified.

Update the assertions to pin the *contract* the image actually has to
carry rather than the *exact shell incantations* the old flow used:

* TUI deps install: ui-tui/package.json + ui-tui/package-lock.json +
  ui-tui/packages/hermes-ink/ tree are all COPIED, and an npm
  install/ci step runs in ui-tui.
* Bundled hermes-ink: the workspace package source is COPIED (so
  `await import('@hermes/ink')` resolves at runtime).

This keeps the spirit of #15012 / #16690 (zombie reaping + bundled
workspace materialisation must continue to work) without locking the
Dockerfile into one specific implementation flavour.

Validation:

    $ pytest tests/tools/test_dockerfile_pid1_reaping.py -q
    6 passed in 1.43s

No production code change. Fixes the two failures observed on `main`
(run 25250051126):

`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies`
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`

* test(update): patch isatty on real streams to fix xdist-flaky --yes tests

Two CI tests for the new `--yes` update flag (#18261) flaked under
`pytest-xdist` on Linux/Python 3.11 even though they passed every
local run on macOS/Python 3.14.4:

  FAILED tests/hermes_cli/test_update_yes_flag.py
    ::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty
      `AssertionError: assert <MagicMock 'input'>.called is False`
  FAILED tests/hermes_cli/test_update_yes_flag.py
    ::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting
      `AssertionError: assert <MagicMock '_restore_stashed_changes'>.called is False`

Captured stdout for the first failure shows `cmd_update` taking the
"Non-interactive session \u2014 skipping config migration prompt." branch
\u2014 i.e. the `sys.stdin.isatty() and sys.stdout.isatty()` check at
`hermes_cli/main.py:7118` evaluated to `False` despite the test doing:

    with patch("hermes_cli.main.sys") as mock_sys:
        mock_sys.stdin.isatty.return_value = True
        mock_sys.stdout.isatty.return_value = True

The whole-module mock is fragile under xdist worker reuse: a sibling
test that imports `hermes_cli.main` first can leave another `sys`
reference resolved inside the function (re-import in a helper, etc.),
and the wholesale module replacement never gets consulted.

Switch to `patch.object(_sys.stdin, "isatty", return_value=True)` (and
the same for `stdout`). That patches the *attribute on the real stream
object* \u2014 every call site, no matter how it reached `sys.stdin`,
hits the patched method. Same fix applied to the stash-restore test
(it took the "non-TTY \u2192 skip restore prompt" branch for the same reason).

Validation:

    $ pytest tests/hermes_cli/test_update_yes_flag.py -q
    3 passed in 5.47s

No production code change. Fixes the two failures observed on `main`
(run 25250051126):

`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty`
`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting`

Refs: #18261 (added the `--yes` flag + these tests).

* fix(web): force light color-scheme on docs iframe

The Documentation tab embeds the public Hermes Agent docs site via an
<iframe>. On any system where the browser's prefers-color-scheme
resolves to dark — the default on macOS with system dark mode, and
common on Linux/Windows too — the docs body text rendered nearly
invisible against its own background.

Cause: Docusaurus intentionally leaves <html> and <body> transparent
and relies on the browser's Canvas color to fill the viewport. Inside
our iframe, the iframe element had bg-background (the dashboard's dark
canvas) AND inherited the dashboard's dark color-scheme, so the
browser set the iframe's Canvas to a dark value. Docusaurus's
transparent body exposed that dark Canvas, and the docs body text
(tuned for a light Canvas) became near-illegible. Affects every
built-in dashboard theme.

Fix: replace bg-background on the iframe with [color-scheme:light]
(spec-blessed cross-origin override of the inherited color-scheme;
forces the iframe's Canvas to light) and bg-white (belt-and-suspenders
fallback during the brief paint window before content loads). The
docs site's own theme toggle keeps working — Docusaurus stores its
choice in localStorage and applies opaque dark backgrounds to its
layout elements that cover the white Canvas we forced.

* fix(security): close TOCTOU window when saving MCP OAuth credentials

_write_json (the persistence helper used by HermesTokenStorage for both
tokens and client_info) created the temp file via Path.write_text and
only chmod'd it to 0o600 afterward. Between create and chmod the file
existed on disk at the process umask (commonly 0o644 = world-readable),
briefly exposing MCP OAuth access/refresh tokens to other local users.

Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the file is created atomically at 0o600, plus tighten the parent
dir to 0o700 so siblings can't traverse to the creds file. The temp name
also gains a per-process random suffix to avoid collisions between
concurrent writers and stale leftovers from a crashed prior write.

Mirrors the fix shipped for agent/google_oauth.py in #19673.

Adds a regression test asserting the resulting file mode is 0o600 and
the parent directory is 0o700 (skipped on Windows where POSIX mode bits
aren't enforced).

* chore(release): add Gutslabs to AUTHOR_MAP for PR #21148 salvage

* test(update): teach restart-mocks about the post-update survivor sweep

Issue #17648 added a post-update SIGTERM-survivor sweep to `cmd_update`:
~3s after issuing graceful/SIGTERM restarts, the code re-queries
`find_gateway_pids` and SIGKILLs anything still alive. That's the
right fix for stuck-drain gateways in production, but it broke three
unit tests that assumed `find_gateway_pids` would keep returning the
same PIDs forever:

  FAILED ::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways
    AssertionError: Expected 'kill' to not have been called. Called 1 times.
    Calls: [call(12345, <Signals.SIGKILL: 9>)].

  FAILED ::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm
    AssertionError: Expected 'kill' to have been called once. Called 2 times.
    Calls: [call(12345, SIGTERM), call(12345, SIGKILL)].

  FAILED ::TestServicePidExclusion::test_update_kills_manual_pid_but_not_service_pid
    assert 2 == 1
      manual_kills = [call(42999, SIGTERM), call(42999, SIGKILL)]

In each test `os.kill` is mocked, so the simulated PID never actually
exits \u2014 the sweep finds it again and escalates. The production code
is correct; the tests just need to model OS behaviour properly.

Two-test fix (profile-manual restart cases): use
`side_effect=[[12345], []]` so the first `find_gateway_pids` call
returns the live PID and the second (the sweep) returns nothing, as if
the OS had reaped the process.

Service-PID-exclusion fix: track which PIDs got killed in a closure
set, and exclude them on subsequent `fake_find` calls. `os.kill`
gets a `side_effect` that records the kill instead of swallowing it
silently. Now the sweep doesn't re-find the manual PID, no SIGKILL
escalation, `manual_kills == 1`.

Validation:

    $ pytest tests/hermes_cli/test_update_gateway_restart.py -q
    43 passed in 4.13s

No production code change. Fixes the three failures observed on `main`
(run 25250051126):

  test_update_restarts_profile_manual_gateways
  test_update_profile_manual_gateway_falls_back_to_sigterm
  test_update_kills_manual_pid_but_not_service_pid

Refs: #17648 (post-update survivor sweep that the tests didn't model).

* fix(image-routing): expose attached image paths in native multimodal text part

In native image mode (vision-capable models like gpt-4o, claude-sonnet-4),
build_native_content_parts() previously emitted only the user's caption
plus image_url parts. The local file path of each attached image never
appeared in the conversation text, so the model could see the pixels but
had no string handle for tools that take image_url: str (custom MCP
tools, vision_analyze on a re-look, attach-to-tracker workflows).

The text-mode path already injects an equivalent hint via
Runner._enrich_message_with_vision ("...vision_analyze using image_url:
<path>..."). This brings native mode to parity by appending one
"[Image attached at: <path>]" line per successfully attached image to
the user-text part of the multimodal turn. Skipped (unreadable) paths
are NOT advertised, so the model is never told a non-existent file is
attached.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(optional-skills): port Anthropic financial-services skills as optional finance bundle (#21180)

Adds 7 optional skills under optional-skills/finance/ adapted from
anthropics/financial-services (Apache-2.0):

  excel-author        — openpyxl conventions: blue/black/green cells,
                        formulas over hardcodes, named ranges, balance
                        checks, sensitivity tables. Ships recalc.py.
  pptx-author         — python-pptx for model-backed decks (pitch,
                        IC memo, earnings note) that bind every number
                        to a source workbook cell.
  dcf-model           — institutional DCF (49KB skill): projections,
                        WACC, terminal value, Bear/Base/Bull scenarios,
                        5x5 sensitivity tables. Ships validate_dcf.py.
  comps-analysis      — comparable company analysis: operating metrics,
                        multiples, statistical benchmarking.
  lbo-model           — leveraged buyout: S&U, debt schedule, cash
                        sweep, exit multiple, IRR/MOIC sensitivity.
  3-statement-model   — fully-integrated IS/BS/CF with balance-check
                        plugs. Ships references/ for formatting,
                        formulas, SEC filings.
  merger-model        — accretion/dilution analysis for M&A.

All seven are optional (not active by default). Users install via
'hermes skills install official/finance/<skill>'.

Hermesification:
- Stripped every Office JS / Office Add-in / mcp__office__*
  branch — skills assume headless openpyxl only.
- Replaced Cowork MCP data-source instructions with 'MCP first (via
  native-mcp), fall back to web_search/web_extract against SEC EDGAR
  and user-provided data'.
- Swapped Claude tool references (Bash, Read, Write, Edit, mcp__*)
  for Hermes-native equivalents and Python library calls.
- Canonical Hermes frontmatter (name/description/version/author/
  license/metadata.hermes.{tags,related_skills}).
- Descriptions tightened to 187-238 chars, trigger-first.
- Attribution preserved: author field credits 'Anthropic (adapted by
  Nous Research)', license: Apache-2.0, each SKILL.md links back to
  the upstream source directory.

Verification:
- All 7 discovered by OptionalSkillSource with source_id='official'
- Bundle fetch includes support files (scripts, references, troubleshooting)
- related_skills cross-refs all resolve within the bundle
- No Claude product / Cowork / Office JS / /mnt/skills leakage
  remains in body text (bounded mentions only in attribution blocks)

Source: https://github.com/anthropics/financial-services (Apache-2.0)

* test(skills): cover additional rescan paths in skill_commands cache (#14536)

The rescan-on-platform-change fix landed in #18739 ships one regression
test that exercises the HERMES_PLATFORM env-var path. Three other code
paths in get_skill_commands / _resolve_skill_commands_platform have no
direct coverage; this commit adds a regression test for each.

- Gateway session context (HERMES_SESSION_PLATFORM via ContextVar): the
  resolver consults get_session_env after HERMES_PLATFORM, and the
  gateway sets that variable through set_session_vars (a ContextVar),
  not os.environ. The test uses set_session_vars / clear_session_vars
  to drive the actual gateway signal, and the disabled-skill stub reads
  the same value via get_session_env. A regression that swapped
  get_session_env for plain os.getenv would still pass an env-var-based
  test but break concurrent gateway sessions, which is the bug the
  ContextVar plumbing exists to prevent.
- Returning to no-platform-scope (CLI / cron / RL rollouts after a
  gateway session): the cached telegram view must be dropped and the
  unfiltered scan repopulated when HERMES_PLATFORM is unset again.
- Same-platform cache hit: consecutive calls under the same platform
  scope must NOT rescan. The rescan trigger is change in scope, not
  "always re-resolve" — a gateway serving many consecutive telegram
  requests should pay the scan cost once, not per request.

The third test wraps scan_skill_commands with a spy after the cache is
primed, so the assertion is on call_count == 0 across three subsequent
get_skill_commands() calls.

All 39 tests in tests/agent/test_skill_commands.py pass under
scripts/run_tests.sh.

* fix(gateway): translate inbound document host paths to container paths for Docker backend

When terminal.backend is docker, inbound documents uploaded via messaging
platforms (Telegram, Slack, Discord, Feishu, Email, etc.) are cached at a host
path under ~/.hermes/cache/documents, but the container sandbox only sees them
at the auto-mounted /root/.hermes/cache/documents path.

This PR adds to_agent_visible_cache_path() in tools/credential_files.py (the
natural sibling to get_cache_directory_mounts()) and calls it at the
document-context-injection site in gateway/run.py so the agent always receives
a path it can open directly, matching the mount layout already established
by get_cache_directory_mounts() (#4846).

Scope: only Docker backend for now; other backends use different mount
semantics and are left unchanged until verified.

Fixes #18787

* feat(gateway): opt-in cleanup of temporary progress bubbles (#21186)

When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Telegram); silently no-ops elsewhere. Off by default.
Failed runs skip cleanup so bubbles stay as breadcrumbs.

Minimal plumbing: base.py's existing post_delivery_callback slot now chains
new registrations onto any existing callback (with per-callback exception
isolation) rather than clobbering. Stale-generation registrations are
rejected so they can't step on a fresher run's callbacks. This lets the
cleanup callback coexist with the background-review release hook already
registered on the same slot.

Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>

* fix(kanban): heartbeat tool extends claim TTL, not just last_heartbeat_at

The kanban_heartbeat tool called heartbeat_worker but never
heartbeat_claim, so a worker that loops the tool while a single tool
call blocks the agent for >DEFAULT_CLAIM_TTL_SECONDS still got
reclaimed by release_stale_claims. The function name and
heartbeat_claim's own docstring imply otherwise:

  "Workers that know they'll exceed 15 minutes should call this
   every few minutes to keep ownership."

But there was no caller in the worker tool path. Workers couldn't
invoke heartbeat_claim themselves either — it isn't exposed as a tool.

Fix: _handle_heartbeat now calls heartbeat_claim first, reading
HERMES_KANBAN_CLAIM_LOCK from the worker env (the dispatcher pins
this in _default_spawn). Falls back to _claimer_id() for locally-
driven workers that didn't go through dispatcher spawn.

Test: tests/tools/test_kanban_tools.py::test_heartbeat_extends_claim_expires
rewinds claim_expires into the past, calls the tool, and asserts the
new value is at least now + DEFAULT_CLAIM_TTL_SECONDS // 2. Verified to
fail against the unfixed code (claim_expires stays at the rewound
value).

Closes the root cause underlying the symptom in #21141 (15-min
respawns of long-running workers). #21141 separately addresses
post-reclaim cleanup; this fixes the upstream "shouldn't have been
reclaimed in the first place" half.

* chore(release): map stephen0110 noreply email

* fix(kanban): stop reclaimed workers before retry

* fix(kanban): reap completed worker children in dispatch_once

The gateway-embedded dispatcher (default since `kanban.dispatch_in_gateway
= true`) is the parent of every spawned kanban worker. `_default_spawn`
calls `subprocess.Popen(..., start_new_session=True)` and returns the
pid — `start_new_session` detaches the controlling tty but does not
reparent to init, so the gateway keeps each worker as a child until it
`wait()`s for them.

Nothing in the dispatch loop ever calls `waitpid`. Result: every
completed worker becomes a `<defunct>` zombie that lingers until the
gateway exits. We hit ~430 zombies on a single hermes-agent container
after ~40 days of steady kanban traffic, approaching process-table
exhaustion on the host.

Fix: add a non-blocking reap loop at the top of `dispatch_once`, so
every dispatcher tick (default 60s) drains zombies that accumulated
since the last tick. WNOHANG keeps the call non-blocking; ChildProcessError
means no children to reap.

Why here, not a SIGCHLD handler:
- signal.signal requires the main thread; gateway threading model makes
  that placement non-trivial.
- Bounded staleness: at default interval=60s the maximum live zombie
  count is one tick's worth of worker completions.
- No interaction with detect_crashed_workers: that function only inspects
  rows where status='running', and rows reach 'done' (and stop being
  inspected) before their workers exit.

* chore(release): map sonic-netizen noreply email

* fix: auto-block repeated kanban retries

* chore(release): map mwnickerson noreply email

* feat(gateway): auto-resume interrupted sessions after restart

* fix(gateway): preserve resume marker on interrupted restart

* refactor(gateway): simplify auto-resume + extend to crash recovery

Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape,
wider coverage.

Changes:
- Drop the synthetic '[System note: ...]' in the internal MessageEvent.
  The existing _is_resume_pending branch in _handle_message_with_agent
  (run.py ~L13738) already injects a reason-aware recovery system note
  on the next turn.  With kyan's text in place the model saw two stacked
  system notes.  Now the event text is empty and the existing injection
  path owns the wording.
- Drop SessionStore.list_resume_pending() as a new public method.  The
  filter is 8 lines inline in _schedule_resume_pending_sessions() —
  one caller, no other pluggability need.
- Add 'restart_interrupted' to the auto-resume reason set.  That's the
  reason SessionStore.suspend_recently_active() stamps on sessions
  recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker).
  Previously those sessions had to wait for a real user message to
  auto-resume; now they continue automatically at startup like
  drain-timeout interruptions do.
- Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so
  future reasons (e.g. 'manual_resume_request') can be opted in with
  one line.

Test coverage added:
- drain-timeout + crash-recovery both scheduled
- stale entries skipped (outside freshness window)
- suspended entries skipped (suspended > resume_pending)
- originless entries skipped (no routing target)
- disallowed reasons skipped (graceful forward-compat)

E2E verified end-to-end with a real on-disk SessionStore: 2 eligible
sessions scheduled, 2 ineligible skipped, empty-text internal events
delivered to the adapter.

Co-authored-by: Kevin Yan <kevyan1998@gmail.com>

* fix(auth): sync shared Nous refresh tokens

* refactor(auth): dedupe file-lock helper; document Nous lock order

Extract the shared flock/msvcrt boilerplate from _auth_store_lock and
_nous_shared_store_lock into a single _file_lock(lock_path, holder,
timeout, message) helper. Each caller keeps its own threading.local
holder so reentrancy state stays per-lock.

Also document the lock-ordering invariant on both wrappers:
_auth_store_lock is OUTER, _nous_shared_store_lock is INNER for all
runtime refresh paths. The one exception is _try_import_shared_nous_state,
which holds the shared lock alone across the full HTTP refresh+mint
cycle to prevent concurrent sibling imports from racing on the single-
use shared refresh token; that helper must not be called with the auth
lock already held.

* fix(gateway): avoid duplicated responses history

* chore: add AUTHOR_MAP entries for thelumiereguy and counterposition

* fix(gateway): use monotonic deadlines in QR onboarding flows

* fix(oauth,gateway): monotonic deadlines for polling/timeout loops

Widen PR #20314's fix to the other timeout-polling sites in the codebase
that share the same wall-clock-jump bug class. All of these measure elapsed
timeout duration, not civil time, so they belong on time.monotonic().

- hermes_cli/auth.py: auth-store file-lock timeout, Spotify OAuth callback
  wait, Nous portal device-auth token poll.
- hermes_cli/copilot_auth.py: Copilot OAuth device-flow token poll.
- hermes_cli/gateway.py: gateway systemd restart wait.
- hermes_cli/web_server.py: dashboard Codex device-auth user_code wait,
  dashboard Nous device-auth token poll. (sess["expires_at"] stays on
  time.time() — it's a persisted absolute timestamp, not a local
  deadline-polling variable.)
- agent/copilot_acp_client.py: Copilot ACP JSON-RPC request timeout.

* fix(weixin): replace all aiohttp ClientTimeout with asyncio.wait_for()

aiohttp ClientTimeout uses BaseTimerContext which calls
loop.call_later() internally. When invoked via
asyncio.run_coroutine_threadsafe() from cron jobs, this
triggers "Timeout context manager should be used inside a task"
errors, causing message delivery failures.

Replace all direct ClientTimeout usage with asyncio.wait_for():
- _upload_ciphertext: CDN upload (120s timeout)
- _download_bytes: CDN download (configurable timeout)
- _download_remote_media: remote media fetch (30s timeout)

Also set total=None on _send_session to disable aiohttp built-in
timeout, and change trust_env=True to False to bypass proxy for
WeChat CDN connections.

* test(weixin): update timeout assertion for asyncio.wait_for migration

* chore: AUTHOR_MAP entry for chenlinfeng@ruije / @noOne-list

* feat(security): enable secret redaction by default (#17691, #20785) (#21193)

Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor
already wired into send_message_tool, logs, and tool output actually runs
on a fresh install.

- agent/redact.py: env-var default "" → "true"
- hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True;
  two config-template comments rewritten
- gateway/run.py + cli.py: startup log / banner warning when the user
  has explicitly opted out, so the downgrade is visible in agent.log
  and at CLI banner time
- docs/reference/environment-variables.md: description reconciled
- tests: flipped the default-pin, restructured the force=True
  regression test to explicit-false instead of unset

Users who need raw credential values (redactor development) can still
opt out via security.redact_secrets: false in config.yaml or
HERMES_REDACT_SECRETS=false in .env.

Closes #17691.
Addresses #20785 (short-term output-pipeline recommendation).

* feat: add Discord message deletion action

* chore: AUTHOR_MAP entry for @likejudy

* fix(security): close TOCTOU window in hermes_cli/auth.py credential writers (#21194)

`_save_auth_store`, `_save_qwen_cli_tokens`, and `_write_shared_nous_state`
all created the temp file via `Path.open('w')` / `Path.write_text` and only
tightened permissions to 0o600 afterward. Between create and chmod the file
existed at the process umask (commonly 0o644 = world-readable on multi-user
hosts), briefly exposing OAuth access/refresh tokens for Nous, Codex,
Copilot, Claude, Qwen, Gemini, and every other native OAuth provider that
flows through auth.json.

Switch all three to `os.open(O_WRONLY|O_CREAT|O_EXCL, 0o600)` + `os.fdopen`
+ `fsync` so the file is atomic at 0o600 on creation. Tighten each parent
directory (`~/.hermes/`, Qwen auth dir, Nous shared auth dir) to 0o700 so
siblings can't traverse to the creds. `_save_auth_store` also gains a
per-process random temp suffix to match `agent/google_oauth.py` (#19673)
and `tools/mcp_oauth.py` (#21148).

Adds `tests/hermes_cli/test_auth_toctou_file_modes.py` asserting final
file mode 0o600 and parent dir mode 0o700 across all three writers, plus
an explicit `os.open(flags, mode)` check on the main auth.json writer
that would fail if anyone reintroduces the `Path.open('w')` pattern.
POSIX-only (mode bits skipped on Windows).

* fix(delegate): correct ACP docs — Claude Code CLI has no --acp flag

The delegate_task tool schema descriptions referenced 'claude --acp --stdio'
as an example, but Claude Code CLI does not support --acp or --stdio flags.

The ACP subprocess transport (agent/copilot_acp_client.py) is specifically
built for GitHub Copilot CLI ('copilot --acp --stdio').

Changes:
- Per-task acp_command example: 'claude' → 'copilot'
- Top-level acp_command description: remove 'Claude Code' reference,
  clarify requirement for ACP-compatible CLI (currently Copilot only)
- acp_args description: remove misleading claude-opus-4-6 example

Fixes #19055

* fix: exclude hidden and archive dirs from _find_skill rglob

* fix(gateway): preserve thread routing from cached live session sources

* fix(gateway): cap cached session sources with LRU eviction

Follow-up on top of Zyproth's session-source cache: swap the unbounded
dict for an OrderedDict with a 512-entry LRU cap so long-running
gateways can't accumulate stale entries for dead sessions forever.

- self._session_sources is now an OrderedDict
- _cache_session_source() move_to_end + popitem(last=False) above cap
- _get_cached_session_source() move_to_end on hit (LRU read bump)
- restart_test_helpers.py wires OrderedDict + _session_sources_max

* fix(mcp): give 'mcp add --command' a distinct argparse dest

The --command flag of `hermes mcp add` shared its argparse dest with the
top-level subparser (`dest="command"` in `hermes_cli/_parser.py`). When
the flag was omitted, argparse still wrote `args.command = None`,
clobbering the top-level value of `"mcp"`. The dispatcher then saw
`args.command is None` and fell through to interactive chat, so
`hermes mcp add ...` silently launched chat instead of registering the
server. `cmd_mcp_add` was never reached.

Use `dest="mcp_command"` on the flag and read it from `cmd_mcp_add`.
The user-facing CLI flag `--command` is unchanged; only the in-memory
namespace attribute moves. Also updates the `_make_args` helper in
`tests/hermes_cli/test_mcp_config.py` to populate the new dest, and
adds `tests/hermes_cli/test_mcp_add_command_dest.py` with a parser-
level regression test.

Closes #19785.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: add discodirector email to AUTHOR_MAP

* fix(bedrock): preserve reasoningContent across converse normalization

* feat(gateway): support [[as_document]] directive for skill media routing

Skills that produce large/lossless images (e.g. info-graph, where a
rendered JPG is 1-2 MB) currently lose quality in Telegram delivery
because `_IMAGE_EXTS` membership routes the file through
`send_multiple_images` → `sendMediaGroup`, which Telegram's server
re-encodes to JPEG @ 1280px max edge. The original bytes only survive
when the file goes through `send_document`, which the dispatch tables
in three places (`_process_message_background`, `_deliver_media_from_response`,
and the `send_message` tool's telegram path) only reach for files
whose extension is NOT in `_IMAGE_EXTS`.

This commit adds an `[[as_document]]` directive that mirrors the
existing `[[audio_as_voice]]` shape: a skill emits the directive once
in its response, and every image-extension MEDIA: file in that response
is delivered via `send_document` instead of `send_multiple_images` /
`sendPhoto`. The directive is detected at the dispatch sites (which see
the raw response) and the directive string is stripped from the
user-visible cleaned text in `extract_media` so it never leaks.

Granularity is intentionally all-or-nothing per response, matching
[[audio_as_voice]]'s scope. Skills that need fine control can split into
two responses.

Verified the targeted use case: info-graph emits

    信息图已生成(...)
    [[as_document]]
    MEDIA:/tmp/info-graph-x/infographic.jpg

→ Telegram receives `infographic.jpg` via sendDocument, original 1MB
JPEG bytes preserved, no recompression. Forwarding and download
filenames stay clean (`infographic.jpg`).

Tests: +3 cases in TestExtractMedia covering directive strip, isolation
from voice flag, and coexistence with [[audio_as_voice]]. All
113 pre-existing media/extract/send tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: update send_message_tool mocks for force_document kwarg

* chore: AUTHOR_MAP entry for @leon7609

* fix(model_switch): live model discovery for custom_providers in /model picker

custom_providers entries (section 4 of list_authenticated_providers) only
read the static models: dict from config.yaml, ignoring the live /v1/models
endpoint.  This means gateways like Bifrost that expose hundreds of models
only show the handful explicitly listed in config.

Add live discovery via fetch_api_models() for custom_providers entries
that have api_key + base_url, matching the existing behavior for user
providers: entries (section 3).  When the endpoint is reachable and
returns models, the live list replaces the static subset.

Fixes: /model picker showing only 9 models from a Bifrost gateway that
actually exposes 581.

* fix(memory): support OpenViking local resource uploads

* test(memory): harden OpenViking local upload coverage

* fix(memory): harden OpenViking local path uploads

* chore(release): add AUTHOR_MAP entries for ggnnggez and ehz0ah

Contributors to OpenViking local resource upload fix (#19569).

* docs(readme): drop misleading RL install-extras claim, defer to CONTRIBUTING

README.md:163 said atroposlib and tinker were pulled in by .[all,dev], but
.[all] does not include .[rl] — those dependencies live in pyproject.toml's
[rl] extra (lines 95-101). With the original wording, a contributor running
uv pip install -e ".[all,dev]" would not have atroposlib or tinker
installed.

Rather than swap one extra for another (which paths users to either of two
parallel install conventions — pip [rl] extra vs tinker-atropos submodule —
without saying which the project considers canonical), this PR drops the
specific install command from the README and links to CONTRIBUTING.md,
which already documents the actual development setup.

* fix(kanban): auto-block workers that exit without completing (#20894) (#21214)

When a kanban worker subprocess exits rc=0 but its task is still in
status='running', the agent almost certainly answered the task
conversationally without calling kanban_complete or kanban_block. The
dispatcher used to classify this as a generic crash and respawn, which
loops forever on small local models (gemma4-e2b q4 etc.) that keep
returning clean but unproductive output.

Dispatcher changes:
- The waitpid reap loop at the top of dispatch_once now records each
  reaped child's raw exit status in a bounded module registry
  (_recent_worker_exits, TTL 600s, size cap 4096).
- _classify_worker_exit distinguishes clean_exit / nonzero_exit /
  signaled / unknown using os.WIFEXITED / WIFSIGNALED.
- detect_crashed_workers consults the classification when a worker
  is found dead. clean_exit → protocol_violation event + immediate
  circuit-breaker trip (failure_limit=1). Everything else keeps the
  existing crashed-event + counter behavior.
- DispatchResult.auto_blocked now includes protocol-violation trips.

Gateway fix (Bug A in #20894):
- gateway.run._notify_active_sessions_of_shutdown snapshots
  self.adapters with list(...) before iterating. adapter.send() can
  hit a fatal-error path that pops the adapter from the dict, which
  was raising 'RuntimeError: dictionary changed size during iteration'
  during shutdown.

Regression tests:
- test_detect_crashed_workers_protocol_violation_auto_blocks verifies
  rc=0 + still-running → status=blocked on first occurrence with
  protocol_violation + gave_up events and NO crashed event.
- test_detect_crashed_workers_nonzero_exit_uses_default_limit verifies
  non-zero exits keep the existing 2-strike behavior.

Closes #20894.

* fix(dashboard): stabilize embedded chat resume and scrollback

* fix(dashboard): let embedded chat use a single scroll system

* fix(dashboard): route browser wheel into inner TUI scrolling

* chore: AUTHOR_MAP entry for @nouseman666

* fix(cli): honor positive tool preview length

* chore: AUTHOR_MAP entry for @GinWU05

* fix(credential_pool): resolve key mix-up when custom providers share base_url

When multiple custom_providers share the same base_url but have different API keys,

get_custom_provider_pool_key() always returned the first match, causing wrong-key

unauthorized errors. Add provider_name parameter to prefer exact name matches

over base_url-only matching, with fallback for backward compatibility.

Fixes #19083

* feat(cli): show context compression count in status bar

Display the number of context compressions in the CLI status bar when
compressions > 0, helping users understand conversation compression
pressure during long sessions.

- Wide layout (>=76 cols): shows 'cmp N' between context percent and duration
- Medium layout (52-75 cols): shows 'cmp N' between percent and duration
- Narrow layout (<52 cols): omitted to save space
- Color-coded: dim for 1-4, warn for 5-9, bad for 10+
- Hidden when zero to keep the bar clean for new sessions

Closes #18564

* refactor: replace 'cmp' text with 🗜️ emoji in status bar

Address review feedback to use the clamp emoji (��️) instead of
the plain text 'cmp' prefix for the compression count indicator.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(tui): surface compression count in Ink status bar

Parity with the classic CLI status bar (PR #18579). The Python backend
already exposes 'compressions' on SessionUsageResponse; this wires it
through the Ink Usage type and renders 'cmp N' next to the duration
segment of StatusRule.

- types.ts Usage: add optional compressions field
- appChrome.tsx StatusRule: render 'cmp N' when > 0, color-tiered by
  pressure (muted <5, warn 5-9, error 10+)
- Plain text 'cmp' token (no emoji) matches PR #18579's original author
  rationale and avoids Ink layout drift from VS16 emoji width

* chore(release): map altriatree@gmail.com -> @TruaShamu

* fix(curator): make manual runs synchronous

* docs(curator): update CLI docs for synchronous-by-default manual run

Follow-up to the previous commit which flipped 'hermes curator run'
default from async to sync. Updates the curator.md feature page and
cli-commands.md reference to show --background as the opt-in async
flag and note that the default now blocks until the LLM pass finishes.

* fix(install): remove uv exclude-newer cutoff

* docs: clarify API server tool execution locality

* fix(kanban): treat dashboard event-stream cancellation as normal shutdown

Stopping `hermes dashboard` with Ctrl-C while the Kanban dashboard is
open prints an ASGI traceback ending in
`plugins/kanban/dashboard/plugin_api.py::stream_events` at the
`asyncio.sleep(_EVENT_POLL_SECONDS)` line. This is a normal shutdown
path: Uvicorn cancels the open websocket task while it is sleeping in
the 300 ms poll loop. `asyncio.CancelledError` is a `BaseException` in
Python 3.8+ — the bare `except Exception:` handler below the existing
`WebSocketDisconnect:` clause does NOT catch it, so the cancellation
surfaces as an application traceback and routine dashboard exit looks
like a runtime failure.

Add an explicit `except asyncio.CancelledError: return` clause beside
the existing `WebSocketDisconnect` handler. Disconnection (client
closed the tab) and shutdown cancellation (dashboard process exiting)
are conceptually different paths but both warrant a quiet return; the
two clauses are kept separate to keep that intent explicit.

`asyncio` is already imported and used in this scope, so no new
import is needed. The bare `except Exception:` handler is preserved
verbatim, so genuine runtime failures still log a warning and close
the socket cleanly.

Closes #20790.

* chore(release): map SandroHub013 email

* test(kanban): regression for CancelledError swallow in stream_events

Drives stream_events directly and cancels the task while it is sleeping
in the poll loop, asserting the coroutine returns cleanly instead of
letting CancelledError bubble. Regression coverage for the Uvicorn
application traceback on dashboard Ctrl-C fixed by the preceding commit.

* fix(model_tools): log plugin hook exceptions instead of silently swallowing them

* feat(gateway): add `hermes gateway list` to show all profiles' gateway status

Add a new `hermes gateway list` subcommand that shows the running
status of gateways across all profiles in a single view:

    Gateways:
      ✓ default (current)        — PID 155469
      ✓ wx1                      — PID 166893
      ✗ dev                      — not running

Also includes `_print_other_profiles_gateway_status()` which appends
an "Other profiles" section to `hermes gateway status` output when
other profile gateways are running.

Both use existing `list_profiles()` and `find_profile_gateway_processes()`
— no new dependencies.

Closes #19127
Related: #19113, #4402, #4587

* fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226)

The MCP SDK discovers OAuth server metadata (token_endpoint, etc.) on
demand and keeps it in memory only. Without disk persistence, a restart
with valid cached refresh tokens forces the SDK to fall back to the
guessed '{server_url}/token' path — which returns 404 on most real
providers (Notion, Atlassian, GitHub remote MCP, etc.) and triggers a
full browser re-authorization even though the refresh token is fine.

Add a .meta.json file next to the existing tokens/client_info files:

  HERMES_HOME/mcp-tokens/<server>.json        -- tokens (existing)
  HERMES_HOME/mcp-tokens/<server>.client.json -- client info (existing)
  HERMES_HOME/mcp-tokens/<server>.meta.json   -- oauth metadata (new)

Changes:
- HermesTokenStorage.save_oauth_metadata / load_oauth_metadata / _meta_path
  — disk layer for the discovered OAuthMetadata.
- HermesTokenStorage.remove() now also clears .meta.json so
  'hermes mcp remove <name>' and the manager's remove() path clean up fully.
- HermesMCPOAuthProvider._initialize cold-restores from disk before the
  existing pre-flight discovery runs. If disk has metadata we skip the
  discovery HTTP round-trips entirely.
- HermesMCPOAuthProvider._prefetch_oauth_metadata now persists ASM as
  soon as it's discovered, so even the first pre-flight run seeds disk.
- HermesMCPOAuthProvider._persist_oauth_metadata_if_changed() is called
  at the end of async_auth_flow so metadata discovered via the SDK's
  lazy 401-branch (not pre-flight) is also saved for next time.

Tests cover the storage roundtrip (save/load/missing/corrupt/remove) and
the manager provider path (cold-load restore, skip-when-in-memory,
persist-on-discover, noop-when-unchanged, end-to-end async_auth_flow).

Co-authored-by: nocturnum91 <50326054+nocturnum91@users.noreply.github.com>

* feat: add SSE transport support for MCP client

Add support for MCP servers using the SSE transport protocol
(SseServerTransport) alongside the existing Streamable HTTP and stdio
transports. Many MCP servers use SSE (GET /sse + POST /messages/)
which was previously unsupported -- the client silently fell back to
Streamable HTTP, causing 10s connection timeouts.

Changes:
- Import mcp.client.sse.sse_client with graceful fallback
- Check config.get('transport') == 'sse' in _run_http() to select
  the SSE transport path with proper timeout handling
- Read transport type from config in get_mcp_status() instead of
  hardcoding 'http' for URL-based servers
- Update docstring, example config, and feature list

* fix(browser): enforce cloud-metadata SSRF floor in hybrid routing (#16234) (#21228)

Cloud metadata endpoints (169.254.169.254 etc.) are now always blocked
by browser_navigate regardless of hybrid routing, allow_private_urls,
or backend.

Bug: commit 42c076d3 (#16136) added hybrid routing that flips
auto_local_this_nav=True for private URLs and short-circuits
_is_safe_url(). IMDS endpoints are technically private (169.254/16
link-local), so the sidecar happily routed them to a local Chromium,
and the agent could read IAM credentials via browser_snapshot. On
EC2/GCP/Azure this is a full SSRF-to-credential-theft.

Fix: new is_always_blocked_url() in url_safety.py — a narrow floor
that checks _BLOCKED_HOSTNAMES, _ALWAYS_BLOCKED_IPS,
_ALWAYS_BLOCKED_NETWORKS only. Applied as an independent gate in
browser_navigate's pre-nav and post-redirect checks, BEFORE
auto_local_this_nav gets a chance to short-circuit. Ordinary private
URLs (localhost, 192.168.x, 10.x, .local, CGNAT) still route to the
local sidecar as the #16136 feature intends.

Secondary fix (reporter's finding): _url_is_private() now explicitly
checks 172.16.0.0/12. ipaddress.is_private only covers that range on
Python ≥3.11 (bpo-40791), so on 3.10 runtimes those URLs were routed
to cloud instead of the local sidecar. No security impact — just a
correctness fix for the hybrid-routing feature.

Closes #16234.

* fix: WhatsApp bridge process leak and disable config asymmetry

- Add PID file mechanism to track bridge processes and kill stale ones on startup
- Improve _kill_port_process() with lsof fallback when fuser is not available
- Support explicit WhatsApp disable via config.yaml (whatsapp.enabled: false)
- Respect WHATSAPP_ENABLED=false env var to disable WhatsApp

Fixes #19124

* docs(contributing): align tool discovery and test runner with AGENTS.md

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(kanban): make dashboard board pin authoritative over server current file (#21230)

When the user created a new board via the dashboard with "switch" checked,
the server-side `current` file was flipped to the new board. Clicking the
original board's tab then showed no card…
RationallyPrime pushed a commit to RationallyPrime/hermes-agent that referenced this pull request May 8, 2026
…arch#20986)

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
rmulligan pushed a commit to rmulligan/hermes-agent that referenced this pull request May 11, 2026
…arch#20986)

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
JinyuID pushed a commit to JinyuID/hermes-agent that referenced this pull request May 11, 2026
…arch#20986)

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
…arch#20986)

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
teknium1 added a commit that referenced this pull request May 19, 2026
Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR #27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR #24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  #28116 / #28118 / #28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs #27663 / #19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR #28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR #25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR #26824).
- `x_search` auto-enables when xAI credentials are present (PR #27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  #26534).
- NVIDIA NIM billing-origin header is set automatically (PR #26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR #28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  #27822).
- Document `dep_ensure` Windows bootstrap (PR #27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR #27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR #26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR #21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR #27245).
- Discord clarify-choice button rendering (PR #25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  #22759).
- Telegram `notifications` mode (`important` vs `all`) (PR #22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR #21210).

CLI / TUI
- `/new [name]` argument (PR #19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR #25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR #22687).
- Status-bar additions: ▶ N background indicator (PR #27175), context
  compression count (PR #21218), YOLO mode banner+statusbar warning (PR
  #26238).
- `display.timestamps` + `docker_extra_args` config keys (PR #23599).
- TUI collapsible startup banner sections (PR #20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR #22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs #27590 / #27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR #21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR #21337).
- ACP session-scoped edit auto-approval modes (PR #27862).
- Curator rename map in the user-visible per-run summary (PR #22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR #23828).
- Cron per-job profile parameter (PR #28124).
- `--no-skills` flag for `hermes profile create` (PR #20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).
bot-ted added a commit to bot-ted/hermes-agent that referenced this pull request May 20, 2026
* fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards

* fix(kanban): keep board-management commands independent from board override

* fix(kanban): preserve notifier_profile for dashboard home subscriptions

* fix(kanban): promote dependents when a parent is archived

* fix(cli): make kanban specify max_tokens configurable

* fix(kanban): sync slash subcommands with live parser

* fix(kanban): promote blocked tasks when parent dependencies complete

recompute_ready only scanned 'todo' tasks for promotion, ignoring
'blocked' tasks entirely. When a task was blocked (e.g. by the circuit
breaker) and its parent dependencies later completed, the task stayed
stuck in 'blocked' forever unless manually unblocked.

Now recompute_ready also scans 'blocked' tasks. When all parents are
done/archived, the blocked task is promoted to 'ready' with failure
counters reset — equivalent to an automatic unblock.

Includes a regression test for the blocked-parent-done promotion path.

* fix(kanban): use 'is not None' check for max_runtime_seconds in create_task

max_runtime_seconds=0 was being silently coerced to None due to a falsy
check (if max_runtime_seconds). Zero is a valid value that causes the
dispatcher to immediately time out a task. The adjacent max_retries
parameter already used the correct 'is not None' pattern.

Fixes the inconsistency by aligning max_runtime_seconds with max_retries.

* fix(kanban): reset failure counters on unblock_task

When a task is manually unblocked (blocked → ready/todo), the
consecutive_failures counter and last_failure_error were left intact.
The next failure would immediately re-trip the circuit breaker because
the counter was still at or above the failure limit.

Reset both fields on unblock so the task gets a fresh retry budget.

Includes a regression test that verifies counters are zeroed.

* fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion

When a systemic failure (provider outage, auth expiry, OOM) crashes
multiple workers simultaneously, detect_crashed_workers increments
each task failure counter independently. The circuit breaker only
trips after N × failure_limit retries across the fleet.

Fingerprint crash errors by normalizing host-specific details (PIDs,
timestamps). When 3+ tasks crash with the same fingerprint in a
single detection cycle, immediately trip the circuit breaker
(failure_limit=1) instead of waiting for repeated failures.

Isolated crashes (unique fingerprints) retain their normal retry
budget. Protocol violations continue to trip immediately.

Includes regression tests for systemic and isolated crash paths.

* fix(kanban): align board_exists with board discovery rules

* fix(kanban): demote ready children when a parent is reopened

* fix(kanban): serialize DB initialization

* fix(kanban): task_age() tolerates ISO-8601 timestamps

Prevents ValueError crash in dashboard get_board() when a task has
an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch
int. Adds _to_epoch() helper that normalises both formats.

* Fix Kanban dashboard initial board selection

* fix(kanban): persist worker session metadata on completion

Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id
from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive
commit (not the AUTHOR_MAP fixup tip) onto current main.

* fix(kanban): make claim ttl configurable

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(kanban): pass accept-hooks to worker chat subprocess

* feat(kanban): add board-level default workdir (#25430)

* docs(kanban-worker): document notification routing configuration

* fix(kanban): preserve worker tools with restricted toolsets

* fix(kanban): make legacy task migration idempotent

(cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90)

* fix: harden Kanban worker Hermes command resolution

* feat(kanban): allow trimmed task comments

SS-1647 live SHIP validation: real code + tests for kanban comment --max-len.

* fix: show scheduled kanban tasks in dashboard

* fix: assign single-task kanban decompositions

* fix(kanban-dashboard): make Orchestration mode checkbox label static

The checkbox label echoed its state ("Auto (default)" / "Manual") instead
of describing the action, so a checked box reading "Auto" parsed as a
status indicator rather than a control. The accompanying sub-description
was also static and started with "When on, ...", which read awkwardly
when the box was unchecked.

Replace the dynamic label with a static action label
("Auto-decompose triage tasks") and flip the sub-description between the
two modes so it stays accurate either way. The top-of-page Orchestration
pill is unchanged — that one is intentionally a status badge / toggle.

Fixes #28178

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956)

Salvages the env-vars docs portion of #21956 by @Bartok9.
The ascii-guard-ignore tags from the original PR already landed on main.

* fix(kanban): close sqlite connection on init failure to prevent fd leak

Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema
init raises after sqlite3.connect() succeeds, the new connection was
leaking. Wrap the body in try/except so the connection is closed before
the exception propagates.

* fix(kanban): don't crash dispatched workers when kanban-worker skill is absent

Salvages #27372 by @oemtalks. The dispatcher unconditionally injected
`--skills kanban-worker` into every worker spawn, but worker profiles
sometimes don't have that bundled skill in their skills dir, which is
fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`).

Adds `_kanban_worker_skill_available(hermes_home)` and only injects the
flag when the skill resolves. The MANDATORY lifecycle still ships via
KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe.

* fix(packaging): ship dashboard plugin assets in wheel

Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/
glob entries to setuptools package-data so wheel installs ship the
bundled dashboard plugin assets (kanban, achievements, etc.). Without
these, /api/dashboard/plugins can't discover plugin assets outside a
source checkout.

* docs(kanban): document worker protocol auto-blocks

Salvages #21585 by @helix4u. Documents the protocol_violation event
(worker exits successfully while task is still running), adds
--max-retries to the create flag list and --failure-limit to dispatch.

* fix(oneshot): pass fallback_providers from profile config to AIAgent

Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers
spawned via 'hermes -p <profile> chat -q ...') were not honouring the
profile's fallback_providers / fallback_model chain because oneshot.py
never read the config and never passed fallback_model= to AIAgent.

Reads cfg.get('fallback_providers') (new list format) or
cfg.get('fallback_model') (legacy single-dict) with the same
normalization cli.py applies, then forwards as fallback_model=_fb.

* fix(kanban): reject direct running transitions in dashboard bulk updates

Salvages #24050 by @kronexoi. The single-task PATCH already rejects
direct status='running' since it bypasses the dispatcher/claim invariant,
but the bulk-update endpoint still accepted it. Aligns bulk with single
by emitting an error result row for any 'running' entry.

* feat(kanban): add initial-status for human-ops cards

Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag
(running|blocked, default running) to 'kanban create', threaded through
kanban_db.create_task() and the kanban_create tool schema. 'blocked'
parks the task directly in the blocked column for R3 human-ops review,
skipping the brief running-to-blocked transition.

Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and
slash-handler error formatting changes that were bundled in the
original PR — those should ship as their own focused changes if still
wanted.

* fix(kanban): release scratch workspace and tmux session on task completion

Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace()
and _cleanup_worker_tmux() after marking a task complete.

Scratch workspaces (used by swarm agents) accumulate on disk — hundreds
of MB per task, never released. Stale tmux sessions from completed
agents also persist indefinitely.

Both gates are safe:
- workspace_kind == 'scratch' gate preserves user worktree/dir workspaces
- tmux #{pane_dead} == 1 gate only kills sessions where the worker has
  already exited
- best-effort: cleanup failures never block task completion

* fix(kanban): honor severity thresholds in diagnostics

Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics
was using exact-match (severity == filter), so '--severity warning'
hid 'error' and 'critical' diagnostics. Adds severity_at_or_above()
helper to kanban_diagnostics and uses it in the dashboard endpoint
(CLI already used SEVERITY_ORDER comparison correctly).

* test: isolate Kanban env pins in hermetic fixture

Salvages the substantive part of #22295 by @steezkelly. Adds the
missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK,
HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so
ambient developer-shell pins on those vars don't bleed into pytest runs.

The frozenset extraction + standalone regression test from the original
PR were dropped to keep the change minimal — main already maintains the
list inline.

* feat(kanban): add max_in_progress config to cap concurrent running tasks

Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config
that caps simultaneously running tasks. When the board already has N
running, dispatcher skips spawning so slow workers (local LLMs,
resource-constrained hosts) don't pile up and time out.

Threads through dispatch_once(max_in_progress=) and gateway dispatcher
config parsing with validation (warns on invalid/below-1 values).

* fix(packaging): ship bundled skills in wheel

Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and
optional-skills/ because pyproject's [tool.setuptools.packages.find]
only includes Python packages — the skills directories don't have
__init__.py so they were silently dropped from the wheel.

Adds setup.py with data_files spec emitting skills/* and optional-skills/*
under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper
in hermes_constants that discovers the wheel-installed location via
sysconfig before falling back to a source-checkout path. tools/skills_sync
uses the helper so 'hermes update' works for pip-installed users.

* fix: 4 small surgical bugs

Salvages #23302 by @Bartok9. Four independent one-area fixes:

1. kanban boards delete alias now hard-deletes (not archives) — the
   alias didn't carry --delete, so getattr(args, 'delete', False)
   returned False. Detect boards_action=='delete' explicitly.
2. Gateway auto-title failures no longer leak as user-visible
   warnings — debug-log only since they're not actionable.
3. Background process completion notification snaps truncation to
   the next newline boundary, prepends a marker when content is
   dropped.
4. _cprint() schedules the run_in_terminal coroutine via
   asyncio.ensure_future so output isn't silently dropped from
   background threads (fixes #23185 Bug A). Skips the
   double-print fallback that would fire for mock paths.

* perf(prompt): cache kanban worker guidance at session init

Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens)
is session-static — the dispatcher decides at spawn time whether the
process is a kanban worker via the kanban_show tool's check_fn (gated
on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in
valid_tool_names and re-loading the reference on every system-prompt
rebuild (init + each context compression) is wasted work.

Caches the resolved string on agent._kanban_worker_guidance once in
agent_init and consumes it in system_prompt.build_system_prompt(),
with a getattr fallback for code paths that bypass agent_init.

* feat(kanban): add --sort option to 'hermes kanban list'

Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc,
priority,priority-desc,status,assignee,title,updated} to 'hermes kanban
list'. Validated against VALID_SORT_ORDERS map; invalid values raise
ValueError. Default behaviour (priority DESC, created ASC) is unchanged
when --sort is omitted.

* docs: add kanban codex lane skill

* feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect)

Adds three read-only endpoints to the kanban dashboard plugin so the
SwitchUI workspace (and any other dashboard consumer) can track
workers across tasks without N+1 round-trips through /tasks/{task_id}.

- GET /workers/active
  Single SQL JOIN of task_runs + tasks where ended_at IS NULL,
  worker_pid IS NOT NULL, status='running'. Returns
  {workers: [...], count, checked_at}.

- GET /runs/{run_id}
  Direct lookup of any task_run row by id. Reuses existing
  kanban_db.get_run() helper and _run_dict() serialiser. 404 when
  not found. Mirrors GET /tasks/{task_id} 404 shape.

- GET /runs/{run_id}/inspect
  Live PID stats via psutil.Process.as_dict() — cpu_percent,
  memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status,
  create_time, cmdline. Short-circuits with alive:false when run
  has ended, has no worker_pid, the pid is gone, or psutil is
  unavailable. AccessDenied surfaces as alive:true with error
  rather than a 500.

11 new tests in tests/plugins/test_kanban_worker_runs.py cover the
empty-board case, running-task case, ended-run filtering,
missing-pid filtering, 404 paths, already-ended inspect, no-pid
inspect, dead-pid inspect, and live-pid inspect (psutil mocked).
All pass.

Companion termination endpoint (POST /runs/{run_id}/terminate) is
intentionally out of scope here — opening a separate issue first
since the RBAC and dispatcher-mediated soft-cancel design needs
maintainer input before code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): map contributor email for attribution check

* test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744)

- Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the
  enriched 409 detail names the blocking parent (id, quoted title, and
  current status), so the dashboard always has something actionable to
  render.
- New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline``
  pins the frontend wiring: the ``parseApiErrorMessage`` helper exists,
  the drag/drop banner runs through it, and the drawer maintains a
  visible ``patchErr`` state that's cleared between PATCHes and tasks.

* docs(codex_app_server): document multi-root Kanban writable_roots (#27941)

Update the Codex app-server runtime guide's Kanban section to reflect
the new behaviour:

  * The sandbox override now adds the board DB directory plus every
    Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT,
    HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated,
    DB-dir first.
  * The motivation note now includes the cross-mount artifact-write
    scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate
    drive) and links to issue #27941 so readers can find the original
    bug report.

* fix(gateway): quiet corrupt kanban dispatcher boards

Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board
DBs ("file is not a database" / "database disk image is malformed")
and disables them by fingerprint until they're repaired, instead of
flooding the gateway log with repeated logger.exception tracebacks every
tick.

Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was
an unrelated _is_dir OSError fix for service-path lookup. Dropped a
small test reformat that was bundled in the same commit.

* docs: align kanban readiness docs and smoke tests

Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current
tool registration: dispatcher-spawned task workers get task tools,
profiles that explicitly enable the kanban toolset get orchestrator
routing tools (kanban_list, kanban_unblock). Corrects failure-limit
text to current default of 2. Hardens the e2e subprocess script to
resolve repo root and use the spawnable default assignee. Updates the
diagnostics severity fixture to assert error below the critical
threshold.

* feat(kanban): surface per-task model_override in show + tool output

Salvages #26897 by @loicnico96. The per-task model_override DB column
already exists on main, but it wasn't exposed in user-facing surfaces.
This adds:
- 'kanban show' prints 'model: <name>' when model_override is set
- kanban_show / kanban_list tool responses include the model_override field

Original branch was stale (PR was authored against an older field name
'model'); applied the substantive surface exposure manually using the
current 'model_override' field name.

* feat(cli): add kanban swarm topology helper

Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a
durable Kanban Swarm v1 graph: a completed root/blackboard card,
parallel worker cards, a verifier gated on all workers, and a
synthesizer gated on the verifier. Stores shared swarm blackboard
updates as structured JSON comments on the root card.

Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring +
unit tests.

* feat(kanban): add optional board parameter to all MCP tools

Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9
kanban_* MCP tools via shared _connect helper. Backwards compatible —
omitting board keeps current pinned-board behavior. Useful for
orchestrator profiles that route across multiple boards.

Two-file scope: tools/kanban_tools.py + tests.

* feat(kanban): stamp originating ACP session_id on tasks

Salvages #23208 by @awizemann. Tracks which chat session created a
kanban task so clients can render a per-session board without falling
back to tenant + time-window heuristics.

- Schema: tasks gains nullable session_id TEXT column with index
  (additive migration in _migrate_add_optional_columns).
- ACP: server.py exposes the originating session id via HERMES_SESSION_ID
  with save/restore around the agent loop.
- Tool: kanban_create reads HERMES_SESSION_ID (with explicit override).
- CLI: 'hermes kanban list --session <id>' filter; JSON output exposes
  session_id.

* feat(kanban): wire dispatcher to dispatch review agents from review column

Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task
status and extends dispatch_once to monitor the review column as a
second dispatch source (in addition to the existing ready column).

- Adds 'review' to VALID_STATUSES
- Adds claim_review_task() — atomically transitions review → running
- Adds has_spawnable_review() — health telemetry mirror
- Extends dispatch_once with a review column dispatch loop
- Review agents get 'sdlc-review' skill auto-loaded

Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state,
test file additions). Adapted claim_review_task to main's
ttl_seconds: Optional[int] = None convention (matches claim_task).

* feat(kanban): stale detection for running tasks in dispatcher

Salvages #23790 by @thewillhuang. Adds detect_stale_running() to
the dispatcher cycle. Running tasks that have been started for longer
than dispatch_stale_timeout_seconds (default 14400 = 4h) without a
heartbeat in the last hour are auto-reclaimed to ready.

- New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables)
- New 'stale' field on DispatchResult
- detect_stale_running() in kanban_db.py with heartbeat freshness check
- Records outcome='stale' on run close + 'stale' event; ticks failure counter
- Wires config through gateway embedded dispatcher
- Updates _cmd_dispatch verbose/JSON output and daemon logging

Resolved test-file end-of-file conflict by appending both halves.

* feat(kanban): filter tasks by workflow fields and runs by status/outcome

Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing
workflow_template_id and current_step_key columns:

- list_tasks() accepts workflow_template_id and current_step_key kwargs
- 'hermes kanban list' adds matching CLI flags
- dashboard plugin_api also exposes the filters

Resolved a small conflict in list_tasks signature alongside main's
session_id and order_by additions; combined all three into the single
filter list.

* feat(kanban): add respawn guard to block repeat worker storms

Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker
spawn for tasks where:
- a recent run already succeeded (recent_success — within guard window)
- the previous run hit a quota/auth error (blocker_auth, also auto-blocks)
- a recent task comment includes a GitHub PR URL (active_pr)

The guard prevents repeat worker storms on the same bug/task. Includes
the contributor's review-findings fixup (regex hardening, observability,
auth coverage).

Resolved a small DispatchResult conflict alongside main's 'stale' field;
kept both. Authorship preserved via rebase merge.

* feat(kanban): show dashboard cron jobs across profiles

Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron
jobs from all profiles, with profile-aware filter UI and storage
routing. Includes test coverage for cross-profile listing, mutation,
deletion, and validation.

Also fixes orphan conflict markers in config.py left by an earlier
salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested
in HEAD/PR markers from #28452 salvage of #23790).

* fix(kanban): remove orphan conflict markers from config.py (#28458)

PR #28452 (salvage of #23790, stale detection) merged with leftover
git conflict markers in hermes_cli/config.py around the
`dispatch_stale_timeout_seconds` config block, breaking config import
and any code path that loads it. Cleans up the markers and keeps both
config blocks (worker log rotation/orchestrator + stale detection).

Resolves a self-introduced regression.

* fix(kanban): remove orphan conflict markers from kanban.py (#28459)

PR #28454 (salvage of #26745, workflow filter) merged with leftover
git conflict markers in hermes_cli/kanban.py at three sites:
- _task_to_dict() (session_id alongside workflow_template_id/current_step_key)
- p_list parser (--sort alongside --workflow-template-id/--step-key)
- _cmd_list (order_by alongside the new filter kwargs)

Cleans up the markers and keeps both halves at each site.

Resolves a self-introduced regression.

* feat(kanban): configure worktree paths and branches

Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so
tasks with workspace_kind='worktree' can pin a target branch on
create. Schema migration added to _migrate_add_optional_columns.

- Task.branch_name field + DB column + migration
- create_task accepts branch_name kwarg
- hermes kanban create --branch <name> flag
- kanban show output includes 'Branch: <name>' when set

Cherry-picked the substantive commit (a7558cf27); the PR's tip was
an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list
and show-output conflicts alongside main's session_id and
max_runtime_seconds additions; kept all three.

* feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373)

Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.

Use cases:
- /backend-dev → loads github-code-review + test-driven-development
  + github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.

Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
  at invocation time, the same way /<skill> does. Prompt cache stays
  intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
  /help display.

Schema (~/.hermes/skill-bundles/<slug>.yaml):
    name: backend-dev
    description: Backend feature work.
    skills:
      - github-code-review
      - test-driven-development
    instruction: |
      Optional extra guidance prepended to the loaded skills.

New module: agent/skill_bundles.py — load, scan, resolve, build
invocation message, save, delete. yaml.safe_load only; broken
bundles log a warning and are skipped, never raise.

New CLI subcommand: hermes bundles {list,show,create,delete,reload}.
Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py.
'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it.

New in-session slash command: /bundles lists installed bundles in
both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py)
and gateway (gateway/run.py) before the existing /<skill-name> path.

Autocomplete: SlashCommandCompleter gained an optional
skill_bundles_provider parameter that defaults to None — the prompt
shows '▣ <description> (N skills)' for bundles vs '⚡' for skills.

Tests:
- tests/agent/test_skill_bundles.py — 33 tests covering slugify,
  scan/cache freshness, resolve (including underscore→hyphen
  Telegram alias), build_bundle_invocation_message (loading, missing
  skills, user/bundle instruction injection, dedup), save/delete,
  reload diff, list sort.
- tests/hermes_cli/test_bundles.py — 8 tests for the CLI
  subcommand (create/list/show/delete/reload, --force, missing
  bundle errors).
- tests/gateway/test_bundles_command.py — 4 tests for the gateway
  handler and bundle resolution priority.

Live E2E: verified subprocess invocations of hermes bundles
{list,create,show,reload,delete} round-trip correctly against an
isolated HERMES_HOME.

Docs:
- website/docs/user-guide/features/skills.md — new 'Skill Bundles'
  section with quick example, YAML schema, management commands,
  behavior notes.
- website/docs/reference/cli-commands.md — 'hermes bundles' added to
  the top-level command table and given its own subcommand section.

* feat(kanban): add scheduled status for delayed follow-ups

Salvages #24533 by @roycepersonalassistant. Adds a first-class
'scheduled' Kanban status for time-delay follow-ups that aren't
waiting on human input.

- hermes kanban schedule <task_id> [reason] CLI command
- Dashboard/API transitions to/from Scheduled
- unblock_task() now releases both 'blocked' AND 'scheduled' tasks
  (re-checking parent dependencies before moving to ready/todo)
- i18n + docs updates

Resolved conflicts: kept HEAD's failure-counter reset on unblock
alongside the PR's scheduled state, kept HEAD's 'running' direct-set
rejection, combined both bulk-status branches. Dropped the dist/
bundle changes (months-stale; would need rebuild from source).

* feat(kanban): drag-to-delete trash zone + bulk delete for task cards

Salvages #28125 by @Jpalmer95. Adds:
- Drag-to-delete trash zone in the kanban dashboard
- Bulk delete endpoint with cascading delete_task cleanup
- Frontend updates (drag visual + drop handler)
- Confirmation prompt before delete

Resolved end-of-file test conflict by appending both halves.

* docs: add Korean Kanban documentation

Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and
translates Kanban documentation (kanban.md, kanban-tutorial.md) and the
two related skills (devops-kanban-orchestrator, devops-kanban-worker).

Purely additive — adds ko to the locales list in docusaurus.config.ts
and creates the website/i18n/ko/ tree.

* fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465)

- aux_config: drop session_search from _AUX_TASKS and remove stale test
  (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG)
- compression_boundary_hook: set compressor._last_compress_aborted=False
  on MagicMock so the post-compress abort branch (PR #28117) doesn't
  short-circuit before the session-id rotation under test
- kanban_dashboard_plugin: use consecutive_failures=3 so severity stays
  'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a,
  so failures=5 now crosses the critical floor of 2*2=4)
- cli_manual_compress: accept force kwarg on DummyAgent._compress_context
  (cli._manual_compress now passes force=True)

* fix(telegram): render full clarify choice text in message body, use short button labels

When Telegram clarify prompts offer long choices, mobile clients
truncate the inline button labels, making options unreadable.
Previously only the question was shown in the message body with
truncated choice text in button labels.

Fix: append the full numbered option list to the message body
so users can read complete choice text on any client.  Buttons
now use short numeric labels (1, 2, ...) to avoid Telegram
truncation.  The 'Other (type answer)' button is unchanged.

Long choice labels are now rendered in full (not truncated to
57 chars + '...') since they appear in the body instead of
button labels.

Closes: #27497

* chore(release): map @asdlem for PR #27852 salvage

* fix(telegram): default streaming transport to edit

* fix(telegram): respect reply_to_mode for DM topic reply fallback

The DM topic reply fallback code in send() hardcoded should_thread=True
when telegram_dm_topic_reply_fallback metadata was present, bypassing
_should_thread_reply() and ignoring reply_to_mode config. This caused
quote bubbles on every response even with reply_to_mode: 'off'.

Fix:
- Add reply_to_mode param to _reply_to_message_id_for_send() and
  _thread_kwargs_for_send() classmethods
- In send(), check self._reply_to_mode != 'off' for DM topic fallback
- Suppress reply anchor and reply_to_message_id when mode is 'off'
  while preserving message_thread_id for correct topic routing
- Thread reply_to_mode through all 29 call sites

Regression coverage: 10 new tests in test_telegram_reply_mode.py
covering classmethod behavior, send() integration, and backward
compatibility.

Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994

* fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870)

Telegram distinguishes three kinds of audio payloads:
  - message.voice  → Opus/OGG voice messages  → STT pipeline  ✓
  - message.audio  → audio file attachments   → bypasses STT  ← was broken
  - message.document (audio mime) → generic file route

**Root cause** — the inbound message routing block in gateway/run.py
matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths,
which were then fed unconditionally to _enrich_message_with_transcription.
Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed
instead of being treated as files, making the transcribe skill unusable
from Telegram because the path it needed was never surfaced.

**Fix**
- Introduce a new audio_file_paths list populated exclusively by
  MessageType.AUDIO events.
- Narrow the audio_paths selector to MessageType.VOICE (and bare
  audio/ mime-type events that are not explicitly AUDIO or DOCUMENT).
- After the STT block, inject a document-style context note for each
  audio_file_path, giving the agent the file path and asking what to do
  with it (consistent with how plain documents are handled).

**Tests** — 5 new tests in test_telegram_audio_vs_voice.py:
  - voice message still transcribed (regression guard)
  - audio attachment skips STT (core fix)
  - audio attachment context note format
  - STT disabled still produces file note (not STT-disabled notice)
  - MessageType.AUDIO != MessageType.VOICE sanity check

Fixes #24870

* chore(release): map bartok9 noreply for PR #24879 salvage

* fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY

When the send_message tool runs outside the gateway process (agent loop,
TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone
path in _send_telegram constructs Bot(token=token) directly, bypassing
any configured proxy. In regions where api.telegram.org is blocked, the
send times out after ~5s with 'Telegram send failed: Timed out' and
nothing ever shows up in gateway.log because the request never reaches
the gateway.

Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url,
which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just
before constructing the Bot. When a proxy is found, attach an
HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request',
matching what gateway/platforms/telegram.py already does for in-gateway
sends and what the Discord standalone sender already does. Any
exception attaching the proxy falls back cleanly to a direct connection,
preserving prior behaviour for users without a proxy configured.

Adds tests/tools/test_send_message_telegram_proxy.py covering both the
proxy-configured and no-proxy cases.

* chore(release): map @pepelax for PR #25419 salvage

* fix(kanban-dashboard): restore implementations dropped during salvages (#28481)

Four kanban dashboard test failures, all from PR salvages that picked up
the test additions but dropped the corresponding implementations.

- BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the
  board API never grew the column → test_board_empty failed because
  VALID_STATUSES - {archived} mismatched the rendered columns).
- update_task: enrich the 'ready' 409 detail with the blocking parent
  list (id, title, status) and add _parents_blocking_ready helper.
  Implementation lost in the #26744 salvage (commit e215558ba) which
  pinned the test but not the server-side code.
- dist/index.js: add parseApiErrorMessage helper, wire it through the
  drag/drop banner, add patchErr state to the TaskDrawer and surface
  it inline by the action row. Lost in the same #26744 salvage.
- test_diagnostics_endpoint_severity_filter: update to at-or-above
  semantics (PR a94ddd807 changed the filter from exact-match so the
  warning filter now correctly includes error+critical too).

* fix(gateway): roll over Telegram tool progress bubbles

* fix(gateway): scope audio_file_paths outside media_urls guard

The audio-file-paths handling block at line 7334 references the variable
unconditionally, but #24879 initialized it inside the 'if event.media_urls'
block — so events without media_urls hit UnboundLocalError.

Found via test_run_agent_queued_message_does_not_treat_commentary_as_final
after PR #28478 landed.

* fix(gateway): keep tool-progress edits alive after Telegram flood control

When a progress-message edit hits Telegram flood control (RetryAfter),
can_edit was unconditionally set to False, permanently disabling coalescing
for the rest of the run. Subsequent tool updates were posted as separate
new messages instead of updating the existing progress bubble.

Fix: only set can_edit=False for non-recoverable edit errors. On flood
control, back off by resetting _last_edit_ts so the throttle interval is
respected before the next edit attempt.

Fixes #25188

* chore(release): map @erhnysr for PR #25198 salvage

* fix(telegram): preserve can_edit after transient network errors in progress edits (#27828)

When edit_message_text fails with a transient error (httpx.ConnectError,
NetworkError, server disconnected, timeouts), the progress-message sender
must not permanently set can_edit = False — that would convert a single
Telegram network hiccup into separate per-tool bubbles for the rest of the run.

Changes:
- gateway/platforms/telegram.py: edit_message now returns retryable=True for
  transient network errors (ConnectError, NetworkError, timeouts, server
  disconnects, temporarily unavailable). Permanent failures (flood control,
  message-not-found, permissions) remain retryable=False.
- gateway/run.py: send_progress_messages checks result.retryable before
  setting can_edit = False. Transient failures skip the fallback-send and
  continue — the next edit cycle catches up with the accumulated lines.
  Permanent failures (flood, message-not-found, etc.) still disable editing.

Tests: 22 new tests in test_telegram_progress_edit_transient.py covering
transient vs permanent error classification, SendResult.retryable semantics,
and the can_edit decision logic.

Fixes #27828

* fix(telegram): recover from post-update polling conflict without entering limbo

* fix(test+release): update conflict retry count for MAX=5; map @CryptoByz

* fix(gateway): route background-process notifications into Telegram DM topics

Background-process completion notifications (notify_on_complete) and
watch-pattern notifications were always delivered to the Telegram main
chat instead of the originating private-chat topic.

Hermes-created Telegram DM topic lanes only render a send when it carries
both message_thread_id and a reply anchor. The synthetic MessageEvent
injected on process completion had no message_id, so _reply_anchor_for_event
returned None and _thread_kwargs_for_send dropped message_thread_id
entirely — routing the notification to the main chat.

Capture the triggering message id at spawn time and thread it through to
the synthetic event so it can be reply-anchored back into the topic:

- session_context: add HERMES_SESSION_MESSAGE_ID context var
- telegram adapter: populate SessionSource.message_id on inbound messages
- terminal tool: persist watcher_message_id on the process session
- process registry: carry/persist message_id on watcher dicts + checkpoint
- gateway: set MessageEvent.message_id on injected notifications

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): map @fabiosiqueira for PR #27212 salvage

* fix(telegram): route resumed DM topic sends directly

* fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages

TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button
actions but not for inbound messages. Unauthorized users triggered an
'Unauthorized user' log warning but their messages were still processed
by the agent — a P0 security bypass (issue #23778).

Fix: add allowlist check in _should_process_message() which is called
for all message types (text, command, media, location). If the sender
is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately
with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow
all users (existing behavior).

Fixes #23778

* fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty

The _is_callback_user_authorized fallback returned True when
TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user
to interact with the bot. Change to fail-closed: deny by default
unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.

Fixes #24457

* test(telegram): stub _is_callback_user_authorized in trigger-gating fixture

After PR #24468 made the empty-allowlist callback auth fail-closed
(and #23795 wired _is_callback_user_authorized into _should_process_message),
trigger-gating tests started failing because their fake messages from
user 111 hit the new deny-by-default path before trigger evaluation.

Force-authorize all senders in _make_adapter() so the trigger logic
under test runs.  The fail-closed behavior itself is covered by
test_telegram_callback_auth_fail_closed.py.

* fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS

When a sticky fallback IP (from DoH discovery) becomes unreachable,
the transport previously got stuck in an attempt_order that only
tried the dead IP.  This prevented the gateway from recovering
until the service was restarted.

Changes:
- Always include primary DNS path (None) after the sticky IP in the
  attempt_order so that a primary-path retry happens on sticky failure.
- Reset self._sticky_ip to None when the currently sticky IP hits
  a connect timeout / connect error, allowing the next request to
  retry from scratch.

Fixes silent Telegram disconnection when discovered fallback IPs
are transiently or permanently unreachable.

* test+release: align stale sticky-IP test for #24511; map @falconexe

* fix(telegram): propagate extra base_url config

* feat(send_message): auto-detect @username mentions and create Telegram entities

When sending messages containing @username patterns, auto-generate
MessageEntity(type='mention') entries so that the receiving bot's
require_mention filter can trigger. This enables proper bot-to-bot
interop where mention-based routing is used.

* test+release: align send_message mocks for MessageEntity import; map @fonhal

* fix(telegram): resume typing indicator after inline approval click (#27853)

The text /approve and /deny paths in gateway/run.py call
resume_typing_for_chat() after resolve_gateway_approval() succeeds, but
the Telegram inline-button (ea:*) callback in _handle_callback_query did
not. Typing is paused when the approval is sent (gateway/run.py:15658),
so without a matching resume the typing indicator stayed gone for the
remainder of a long-running turn after a button click.

Symmetry-match the text path: after a successful resolve, call
self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0
to match /approve's "if not count" early-return — if nothing was
actually resolved, the agent thread was never unblocked, so typing
should remain paused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): mark final voice reply as notify-worthy so Telegram delivers it audibly

In Telegram "important" notifications mode (default), TelegramPlatformAdapter
sets ``disable_notification=True`` on every send unless metadata carries
``notify=True``.  GatewayRunner._send_voice_reply already passes thread
metadata through to ``adapter.send_voice``, but never marks the final
auto-TTS voice reply as notify-worthy — so users with the default mode get
the final voice note delivered silently with no push notification.

Mirror the final-text path in gateway/platforms/base.py (the existing
text-response final send already adds ``metadata["notify"] = True``).

Issue #27970 Bug 2.  Bug 1 (MP3 vs. native OGG voice-note) is being
addressed by existing PRs #20182 / #20878 — this PR is intentionally
scoped to the silent-delivery bug only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: avoid Telegram group reply thread session splits

* chore(release): map @eliteworkstation94-ai for PR #28157 salvage

* fix(gateway): avoid duplicate Telegram text after auto-TTS voice replies

* chore(release): map @Zyrixtrex for PR #26754 salvage

* fix(telegram): escape send_slash_confirm preview with format_message

send_slash_confirm() sent the raw command preview with ParseMode.MARKDOWN,
skipping the format_message() conversion applied to every other dynamic
send in the adapter. Commands with underscores, dots, brackets, or other
MarkdownV2-sensitive characters raised BadRequest: Can't parse entities;
the exception was swallowed by the outer try/except, so the confirmation
prompt silently never appeared.

Fix: wrap preview through format_message() and switch to MARKDOWN_V2,
symmetric with send_update_prompt and the callback sends fixed in
a69404052.

* chore(release): map @nftpoetrist for PR #25856 salvage

* fix(telegram): retry wrapped connect timeouts

* chore(release): map @samahn0601 for PR #27887 salvage

* fix(tts): keep native audio outside Telegram voice delivery

* chore(release): map @aqilaziz for PR #26406 salvage

* fix(gateway): pin Telegram DM-topic routing to user's current topic

Topic-mode DM replies were fragmenting one conversation across many sessions: a Reply on a message in another topic delivered Telegram's message_thread_id for *that* topic, and #3206's strip routed plain replies to the lobby. Both pulled the user away from their current session. Fix: when topic mode is on, rewrite source.thread_id to the user's most-recent binding if the inbound id is missing/General or not a known topic. Non-topic-mode users unchanged.

* chore(release): map @karthikeyann for PR #26609 salvage

* fix(send_message): add thread-not-found retry for Telegram forum topic sends

The standalone _send_telegram path in send_message_tool lacked the
thread-not-found fallback that the gateway adapter has. When a forum
topic thread_id was stale or deleted, the send would fail entirely
instead of retrying to the General topic.

Changes:
- Add _is_telegram_thread_not_found() helper matching gateway adapter
- Add thread-not-found retry in text send path
- Add thread-not-found retry in media send path (with f.seek(0))
- Separate text_kwargs from thread_kwargs to prevent
  disable_web_page_preview leaking into send_photo/send_video calls

Closes #27012

* test(send_message): add thread-not-found retry tests for Telegram forum topics

Adds two tests to TestSendTelegramThreadIdMapping:
- test_thread_not_found_retries_without_message_thread_id
- test_thread_not_found_for_media_retries_without_message_thread_id

Refs #27012

* test(send_message): add thread-not-found retry tests for Telegram topics

Three tests covering the #27012 fix:
- test_is_thread_not_found_matches_expected_errors
- test_text_send_retries_without_thread_id_on_thread_not_found
- test_disable_web_page_preview_not_leaked_to_media_sends

116/116 existing tests still pass (no regressions).

* chore(release): map @kunci115 for PR #27098 salvage

* fix(gateway): register Telegram commands for groups

Register Telegram bot commands across default, private, and group scopes so
the slash-command menu is available outside DMs.

Changes from review feedback:
- Add asyncio.Lock to prevent race condition in _ensure_forum_commands
- Extract MAX_COMMANDS_PER_SCOPE constant (30) to avoid magic number
- Upgrade error logging from debug->warning in forum registration
- Add tests covering lazy forum registration and concurrent safety
- Remove /start handler from this PR (separate feature)

Fixes review: needs_work (race, magic number, log levels, missing tests)

* test+release: fix test fixture for forum_commands; map @chromalinx

* fix(telegram): gate profile bots by allowed topics

* chore(release): map @booker1207 for PR #25132 salvage

* fix(cron): route Telegram cron deliveries to a dedicated topic via TELEGRAM_CRON_THREAD_ID

When Telegram topic mode is enabled, cron messages delivered to the bot's
root DM (TELEGRAM_HOME_CHANNEL without a thread id) land in the system
lobby — replies there are rebuffed with the lobby reminder and
reply_to_message_id is dropped, so users cannot interact with the cron
output (#24409).

Add an optional TELEGRAM_CRON_THREAD_ID env var that overrides
TELEGRAM_HOME_CHANNEL_THREAD_ID for cron deliveries only. Operators can
create a "Cron" forum topic in the DM, point this var at its thread id,
and replies to cron messages will land in that topic's existing session
instead of the lobby. The home-channel thread id (used elsewhere, e.g.
restart notifications) is unchanged, and explicit
deliver="telegram:chat:thread" targets continue to win over the env var.

Per the reporter's clarification on 2026-05-13, option (a) (cron-side
route to a dedicated topic + config knob) was chosen.

Fixes #24409

* fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline

When users send images as documents (Telegram file picker), they were
rejected with "Unsupported document type" because SUPPORTED_DOCUMENT_TYPES
only includes text/office formats. Add SUPPORTED_IMAGE_DOCUMENT_TYPES
to base.py and handle them in telegram.py before the document check.

- Add SUPPORTED_IMAGE_DOCUMENT_TYPES constant to base.py
- Add MIME reverse-lookup for image types in telegram.py
- Route image documents through cache_image_from_bytes + vision pipeline
- Handle media groups for image documents

Closes: #20128, #18620

* test+release: stub auth in test_telegram_documents fixture; map @kiranvk-2011

* fix(gateway): prevent Windows Telegram /restart leaving gateway stopped

* chore(release): map @rak135 for PR #25960 salvage

* fix(telegram): preserve topic metadata on overflow edits

* feat(telegram): add disable_topic_auto_rename gateway flag

When Hermes auto-titles a session in a Telegram DM topic it currently
renames the topic itself to the generated title. That works for
operator-managed lanes (extra.dm_topics) but is disruptive for
ad-hoc Threaded-Mode topics that users name by hand — every first
exchange overwrites their chosen title.

Add gateway.platforms.telegram.extra.disable_topic_auto_rename (default
False, preserving prior behaviour). When set, both
_schedule_telegram_topic_title_rename and the underlying
_rename_telegram_topic_for_session_title short-circuit before touching
the Telegram API. Internal session titles (sessions list, TUI) keep
working unchanged.

Also bridge the legacy top-level telegram.disable_topic_auto_rename key
through to gateway.platforms.telegram.extra so users on the older
config layout don't have to migrate to enable it.

- Tests cover the runtime flag, the scheduling entry-point, and string
  truthiness coercion for YAML-loaded values.
- Docs updated in messaging/telegram.md with an example block.

* chore(release): map @B0Tch1 for PR #27634 salvage

* fix(gateway): restore Telegram DM topic thread_id after session split (#27166)

When context compression triggers a mid-turn session split, source.thread_id
can be None on synthetic/recovered events. _thread_metadata_for_source then
returns None, causing the Telegram adapter to send with no message_thread_id
and the response lands in the General thread instead of the active DM topic.

Fix:
- hermes_state.py: Add get_telegram_topic_binding_by_session() for reverse
  lookup by session_id (enabled by the existing UNIQUE INDEX on session_id).
- gateway/run.py: After session-split detection, if source is a Telegram DM
  and source.thread_id is None, recover it from the binding via the new
  method so _thread_metadata_for_source produces the correct thread routing.
- tests/: Coverage for the new lookup method and the recovery flow.

* chore(release): map @jackjin1997 for PR #27239 salvage

* fix(gateway): allow chat-scoped telegram auth without sender user_id

* chore(release): map @soynchux for PR #27806 salvage

* fix(telegram): add DM topic typing fallback when message_thread_id rejected

When a DM topic lane's message_thread_id is rejected by Telegram
(e.g. stale or deleted topic), send_typing now falls back to sending
the typing indicator without thread_id so it at least appears in the
main DM view, rather than being silently swallowed.

Also adds test for the fallback behavior.

* fix(telegram): report cron topic fallback

* chore(release): map @el-analista for PR #25368 salvage

* fix(telegram): wire gt: callback dispatch for gmail-triage buttons

The gmail-triage skill's Telegram inline buttons emit callback_data of the
form `gt:<verb>:<arg>`, but `_handle_callback_query` had no `gt:` branch —
taps fell through silently and the spinner sat there until Telegram timed it
out.

Add `_handle_gmail_triage_callback`, dispatched from the existing callback
router, that:

- Authorizes the caller via the same `_is_callback_user_authorized` path as
  the approval / slash-confirm / clarify handlers.
- Maps each verb to a script under `~/.hermes/scripts/gmail-triage/` and runs
  it async with a 60s timeout.
- Splits verbs into one-shots (send / archive / draft / spam) — append the
  confirmation and strip the keyboard so the action can't fire twice — and
  sticky-state changes (mute / trust / vip ± -domain) — append the
  confirmation but leave the keyboard tappable so the user can stack actions
  on one email.
- On failure: toast only, keyboard preserved so the user can retry.
- Logs every callback outcome to gateway.log for debugging.

* chore(release): map @khungate for PR #25829 salvage

* feat(telegram): support quick-command-only menus

* chore(release): map @stevehq26-bot for PR #28015 salvage

* fix(telegram): handle channel post updates

* test: address telegram channel post review

* test+release: stub auth in channel_posts fixture; map @brndnsvr

* Quiet noisy Telegram gateway errors

* chore(release): map oracle@jarviss-mbp.home for PR #24014 salvage

* Route Telegram multi-bot mentions exclusively

* Document Telegram multi-profile gateway commands

* fix: ignore Telegram messages for other bots

* chore(release): map @OCWC22 for PR #24581 salvage

* feat(telegram): ignore_root_dm with system command lobby

* docs(telegram): document ignore_root_dm feature

* chore(release): map @ai-hana-ai for PR #23928 salvage

* feat(telegram): pin incoming user message for duration of agent turn

When a user sends a message on Telegram, the incoming message is now
automatically pinned at the start of processing and unpinned when the
agent finishes its turn. This gives the user a visual indicator that
their message is being worked on, and keeps the conversation anchored.

Changes:
- telegram.py: Added pinChatMessage in on_processing_start and
  unpinChatMessage in on_processing_complete. Restructured both
  hooks so pin/unpin runs independently of the reactions feature
  (reactions are optional; pinning is always on).
- telegram.py: Pass message_id through SessionSource so it's
  available in the session context.
- session_context.py: Added HERMES_SESSION_MESSAGE_ID context var.
- run.py: Pass source.message_id through set_session_vars.

Pinning is silent (disable_notification=True) and failures are
logged at debug level without interrupting message processing.
Only the user's incoming message is pinned -- never the agent's
replies. Auto-resume events (which have no message_id) are
correctly skipped.

* chore(release): map @indigokarasu for PR #26636 salvage

* feat(telegram): skip-STT audio path + 2GB cap via local Bot API server

Two coordinated changes that unblock downstream audio pipelines
(diarization, custom transcription, archival) on attachments larger
than the public Bot API's 20MB getFile ceiling.

- `stt.enabled: false` no longer drops voice/audio with a generic
  "transcription disabled" note. The gateway probes the cached file's
  duration (wave → mutagen → ffprobe ladder) and surfaces
  `[The user sent a voice message: <abs path> (duration: M:SS)]` to
  the agent so a skill or tool can pick up the raw file. The previous
  placeholder is replaced rather than appended when present.

- `platforms.telegram.extra.base_url` set → adapter auto-lifts its
  document size cap from 20MB to 2GB (the local telegram-bot-api
  `--local` ceiling) and the "too large" reply reports the active
  limit dynamically. No new config knob; presence of `base_url` is the
  opt-in.

- `platforms.telegram.extra.local_mode: true` wires
  `Application.builder().local_mode(True)` on the python-telegram-bot
  builder. PTB then reads files from disk instead of HTTP, which is
  required when telegram-bot-api runs in `--local` mode (the server
  returns absolute filesystem paths, not `/file/bot...` URLs).

- gateway/run.py: rewrites the `stt.enabled: false` branch of
  `_enrich_message_with_transcription`. New `_format_duration` +
  `_probe_audio_duration` helpers.
- gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute
  derived from `extra.base_url`; `local_mode` builder wiring;
  dynamic "too large" message.
- tests/gateway/test_stt_config.py: covers path-surfacing with and
  without an existing user message, and placeholder replacement.
- tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB
  without base_url, 2GB when set, empty-string base_url keeps default.
- website/docs/user-guide/messaging/telegram.md: new "Skipping STT"
  subsection under Voice Messages and a full "Large Files (>20MB) via
  Local Bot API Server" walkthrough (api_id/api_hash, docker-compose,
  one-time `logOut` migration, `platforms.telegram.extra` config, the
  `local_mode` disk-access requirement, the silent HTTP-fallback 404).
- website/docs/user-guide/features/voice-mode.md: documents the
  `stt.enabled` knob in the config reference.

- `pytest tests/gateway/test_telegram_max_doc_bytes.py
  tests/gateway/test_stt_config.py` → 9/9 passing.
- Verified end-to-end on a live deployment: gateway log shows
  `Using custom Telegram base_url: http://...` and
  `Using Telegram local_mode (read files from disk)` on startup;
  voice messages above 20MB cache to disk and surface their path to
  the agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(release): map @alber70g for PR #25280 salvage

* fix(web): add scheduled column to i18n type definitions (#28549)

columnLabels and columnHelp in en.ts include a scheduled entry but the
Translations interface in types.ts did not declare it, causing a
TypeScript build failure in the Nix derivation. Made the field optional
since only en.ts provides it currently.

* docs: comprehensive 2-week sweep of feature/PR coverage gaps (#28497)

Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR #27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR #24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  #28116 / #28118 / #28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs #27663 / #19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR #28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR #25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR #26824).
- `x_search` auto-enables when xAI credentials are present (PR #27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  #26534).
- NVIDIA NIM billing-origin header is set automatically (PR #26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR #28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  #27822).
- Document `dep_ensure` Windows bootstrap (PR #27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR #27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR #26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR #21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR #27245).
- Discord clarify-choice button rendering (PR #25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  #22759).
- Telegram `notifications` mode (`important` vs `all`) (PR #22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR #21210).

CLI / TUI
- `/new [name]` argument (PR #19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR #25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR #22687).
- Status-bar additions: ▶ N background indicator (PR #27175), context
  compression count (PR #21218), YOLO mode banner+statusbar warning (PR
  #26238).
- `display.timestamps` + `docker_extra_args` config keys (PR #23599).
- TUI collapsible startup banner sections (PR #20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR #22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs #27590 / #27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR #21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR #21337).
- ACP session-scoped edit auto-approval modes (PR #27862).
- Curator rename map in the user-visible per-run summary (PR #22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR #23828).
- Cron per-job profile parameter (PR #28124).
- `--no-skills` flag for `hermes profile create` (PR #20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).

* chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 9 (#28571)

Pre-stages AUTHOR_MAP entries for 9 new/under-mapped contributors whose
PRs are being salvaged in the May 2026 LHF batch group 9.

Contributors:
- jdelmerico (#28278 — signal require_mention filter)
- justemu (#27996 — matrix thread_require_mention)
- YuanHanzhong (#28029 — dashboard browser scrollback)
- noctilust (#28080 — drop stale TUI resume env)
- MoonJuhan (#28288 — tolerate unreadable JSONL transcripts)
- outsourc-e (#28164 — cron emoji ZWJ sequences)
- Zyrixtrex (#28275 — Google OAuth urlopen timeout)
- ooovenenoso (#28256 — tool loop recovery hints)
- vanthinh6886 (#28018 — yaml/flock/atomic write guards; non-noreply email)

Per references/batch-pr-salvage-may14-additions.md.

* feat(signal): add require_mention filter for group chats

Add a configurable mention filter to the Signal adapter so the bot
only responds in groups when it is explicitly @mentioned.

Changes:
- gateway/platforms/signal.py: read require_mention from adapter
  extra config or SIGNAL_REQUIRE_MENTION env var; skip group messages
  that don't mention the bot account (checked in rendered text and
  raw mention metadata)
- gateway/config.py: map signal.require_mention YAML key to the
  SIGNAL_REQUIRE_MENTION env var (env var takes precedence)

Config example:
  signal:
    require_mention: true

Or via env var:
  SIGNAL_REQUIRE_MENTION=true

* Revert "feat(telegram): pin incoming user message for duration of agent turn"

This reverts commit a724c3b9cf5f01e28365322ae5ae3a9579567806.

* Revert "feat(telegram): support quick-command-only menus"

This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4.

* Revert "feat(send_message): auto-detect @username mentions and create Telegram entities"

This reverts commit cf814c96f613b38bd891ac941c32da653e81c7ad.

* Revert "fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages"

This reverts commit db50af910be6b4171ea9cf54f4cc38be27ac1da6.

* fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856)

Pre-mark all running agent sessions as resume_pending BEFORE the drain
wait begins. If the service manager kills the process during the drain
(window), the durable marker is already written so the next gateway boot
can recover in-flight sessions. On graceful drain completion, clear the
early markers for sessions that finished successfully.

* fix(matrix): implement thread_require_mention to prevent multi-agent reply loops

In multi-agent shared Matrix rooms, multiple bots all participating in the
same thread could trigger infinite reply loops — each bot's reply re-engaged
the others because they were all in the bot-thread set. Discord has a
`thread_require_mention` opt-in for this; Matrix didn't.

Add `_parse_thread_require_mention(config)` (mirrors Discord's pattern).
In `_resolve_message_context`, when enabled and the message is in a
bot-participated thread (not a free-response room), require @mention
before processing.

Salvage of @justemu's 2-commit stack (#27996). Fixes #27995.

* fix(cli): show active profile in TUI prompt

* fix(tui): preserve dunder identifiers in markdown

* test(file_ops): add regression tests for git baseline warning in write_file

Adds TestGitBaselineCheck with 6 unit tests covering _check_git_baseline
and the warning field in write_file result:
- Git not available → None
- Not in a git repo → None
- Clean repo → None
- Dirty repo → returns warning string with branch name
- write_file result includes warning when dirty
- write_file result omits warning when clean

* fix(dashboard): use browser scrollback for chat wheel

* fix(cli): ignore stale HERMES_TUI_RESUME env

HERMES_TUI_RESUME is an internal env var the Python wrapper exports to hand
a session ID off to the Ink TUI. Because…
dimavrem22 pushed a commit to inkbox-ai/hermes-agent that referenced this pull request May 20, 2026
* fix(kanban): seed bundled skills (e.g. kanban-worker) on kanban init

Closes #23725

* fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards

* fix(kanban): keep board-management commands independent from board override

* fix(kanban): preserve notifier_profile for dashboard home subscriptions

* fix(kanban): promote dependents when a parent is archived

* fix(cli): make kanban specify max_tokens configurable

* fix(kanban): sync slash subcommands with live parser

* fix(kanban): promote blocked tasks when parent dependencies complete

recompute_ready only scanned 'todo' tasks for promotion, ignoring
'blocked' tasks entirely. When a task was blocked (e.g. by the circuit
breaker) and its parent dependencies later completed, the task stayed
stuck in 'blocked' forever unless manually unblocked.

Now recompute_ready also scans 'blocked' tasks. When all parents are
done/archived, the blocked task is promoted to 'ready' with failure
counters reset — equivalent to an automatic unblock.

Includes a regression test for the blocked-parent-done promotion path.

* fix(kanban): use 'is not None' check for max_runtime_seconds in create_task

max_runtime_seconds=0 was being silently coerced to None due to a falsy
check (if max_runtime_seconds). Zero is a valid value that causes the
dispatcher to immediately time out a task. The adjacent max_retries
parameter already used the correct 'is not None' pattern.

Fixes the inconsistency by aligning max_runtime_seconds with max_retries.

* fix(kanban): reset failure counters on unblock_task

When a task is manually unblocked (blocked → ready/todo), the
consecutive_failures counter and last_failure_error were left intact.
The next failure would immediately re-trip the circuit breaker because
the counter was still at or above the failure limit.

Reset both fields on unblock so the task gets a fresh retry budget.

Includes a regression test that verifies counters are zeroed.

* fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion

When a systemic failure (provider outage, auth expiry, OOM) crashes
multiple workers simultaneously, detect_crashed_workers increments
each task failure counter independently. The circuit breaker only
trips after N × failure_limit retries across the fleet.

Fingerprint crash errors by normalizing host-specific details (PIDs,
timestamps). When 3+ tasks crash with the same fingerprint in a
single detection cycle, immediately trip the circuit breaker
(failure_limit=1) instead of waiting for repeated failures.

Isolated crashes (unique fingerprints) retain their normal retry
budget. Protocol violations continue to trip immediately.

Includes regression tests for systemic and isolated crash paths.

* fix(kanban): align board_exists with board discovery rules

* fix(kanban): demote ready children when a parent is reopened

* fix(kanban): serialize DB initialization

* fix(kanban): task_age() tolerates ISO-8601 timestamps

Prevents ValueError crash in dashboard get_board() when a task has
an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch
int. Adds _to_epoch() helper that normalises both formats.

* Fix Kanban dashboard initial board selection

* fix(kanban): persist worker session metadata on completion

Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id
from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive
commit (not the AUTHOR_MAP fixup tip) onto current main.

* fix(kanban): make claim ttl configurable

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(kanban): pass accept-hooks to worker chat subprocess

* feat(kanban): add board-level default workdir (#25430)

* docs(kanban-worker): document notification routing configuration

* fix(kanban): preserve worker tools with restricted toolsets

* fix(kanban): make legacy task migration idempotent

(cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90)

* fix: harden Kanban worker Hermes command resolution

* feat(kanban): allow trimmed task comments

SS-1647 live SHIP validation: real code + tests for kanban comment --max-len.

* fix: show scheduled kanban tasks in dashboard

* fix: assign single-task kanban decompositions

* fix(kanban-dashboard): make Orchestration mode checkbox label static

The checkbox label echoed its state ("Auto (default)" / "Manual") instead
of describing the action, so a checked box reading "Auto" parsed as a
status indicator rather than a control. The accompanying sub-description
was also static and started with "When on, ...", which read awkwardly
when the box was unchecked.

Replace the dynamic label with a static action label
("Auto-decompose triage tasks") and flip the sub-description between the
two modes so it stays accurate either way. The top-of-page Orchestration
pill is unchanged — that one is intentionally a status badge / toggle.

Fixes #28178

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956)

Salvages the env-vars docs portion of #21956 by @Bartok9.
The ascii-guard-ignore tags from the original PR already landed on main.

* fix(kanban): close sqlite connection on init failure to prevent fd leak

Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema
init raises after sqlite3.connect() succeeds, the new connection was
leaking. Wrap the body in try/except so the connection is closed before
the exception propagates.

* fix(kanban): don't crash dispatched workers when kanban-worker skill is absent

Salvages #27372 by @oemtalks. The dispatcher unconditionally injected
`--skills kanban-worker` into every worker spawn, but worker profiles
sometimes don't have that bundled skill in their skills dir, which is
fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`).

Adds `_kanban_worker_skill_available(hermes_home)` and only injects the
flag when the skill resolves. The MANDATORY lifecycle still ships via
KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe.

* fix(packaging): ship dashboard plugin assets in wheel

Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/
glob entries to setuptools package-data so wheel installs ship the
bundled dashboard plugin assets (kanban, achievements, etc.). Without
these, /api/dashboard/plugins can't discover plugin assets outside a
source checkout.

* docs(kanban): document worker protocol auto-blocks

Salvages #21585 by @helix4u. Documents the protocol_violation event
(worker exits successfully while task is still running), adds
--max-retries to the create flag list and --failure-limit to dispatch.

* fix(oneshot): pass fallback_providers from profile config to AIAgent

Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers
spawned via 'hermes -p <profile> chat -q ...') were not honouring the
profile's fallback_providers / fallback_model chain because oneshot.py
never read the config and never passed fallback_model= to AIAgent.

Reads cfg.get('fallback_providers') (new list format) or
cfg.get('fallback_model') (legacy single-dict) with the same
normalization cli.py applies, then forwards as fallback_model=_fb.

* fix(kanban): reject direct running transitions in dashboard bulk updates

Salvages #24050 by @kronexoi. The single-task PATCH already rejects
direct status='running' since it bypasses the dispatcher/claim invariant,
but the bulk-update endpoint still accepted it. Aligns bulk with single
by emitting an error result row for any 'running' entry.

* feat(kanban): add initial-status for human-ops cards

Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag
(running|blocked, default running) to 'kanban create', threaded through
kanban_db.create_task() and the kanban_create tool schema. 'blocked'
parks the task directly in the blocked column for R3 human-ops review,
skipping the brief running-to-blocked transition.

Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and
slash-handler error formatting changes that were bundled in the
original PR — those should ship as their own focused changes if still
wanted.

* fix(kanban): release scratch workspace and tmux session on task completion

Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace()
and _cleanup_worker_tmux() after marking a task complete.

Scratch workspaces (used by swarm agents) accumulate on disk — hundreds
of MB per task, never released. Stale tmux sessions from completed
agents also persist indefinitely.

Both gates are safe:
- workspace_kind == 'scratch' gate preserves user worktree/dir workspaces
- tmux #{pane_dead} == 1 gate only kills sessions where the worker has
  already exited
- best-effort: cleanup failures never block task completion

* fix(kanban): honor severity thresholds in diagnostics

Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics
was using exact-match (severity == filter), so '--severity warning'
hid 'error' and 'critical' diagnostics. Adds severity_at_or_above()
helper to kanban_diagnostics and uses it in the dashboard endpoint
(CLI already used SEVERITY_ORDER comparison correctly).

* test: isolate Kanban env pins in hermetic fixture

Salvages the substantive part of #22295 by @steezkelly. Adds the
missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK,
HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so
ambient developer-shell pins on those vars don't bleed into pytest runs.

The frozenset extraction + standalone regression test from the original
PR were dropped to keep the change minimal — main already maintains the
list inline.

* feat(kanban): add max_in_progress config to cap concurrent running tasks

Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config
that caps simultaneously running tasks. When the board already has N
running, dispatcher skips spawning so slow workers (local LLMs,
resource-constrained hosts) don't pile up and time out.

Threads through dispatch_once(max_in_progress=) and gateway dispatcher
config parsing with validation (warns on invalid/below-1 values).

* fix(packaging): ship bundled skills in wheel

Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and
optional-skills/ because pyproject's [tool.setuptools.packages.find]
only includes Python packages — the skills directories don't have
__init__.py so they were silently dropped from the wheel.

Adds setup.py with data_files spec emitting skills/* and optional-skills/*
under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper
in hermes_constants that discovers the wheel-installed location via
sysconfig before falling back to a source-checkout path. tools/skills_sync
uses the helper so 'hermes update' works for pip-installed users.

* fix: 4 small surgical bugs

Salvages #23302 by @Bartok9. Four independent one-area fixes:

1. kanban boards delete alias now hard-deletes (not archives) — the
   alias didn't carry --delete, so getattr(args, 'delete', False)
   returned False. Detect boards_action=='delete' explicitly.
2. Gateway auto-title failures no longer leak as user-visible
   warnings — debug-log only since they're not actionable.
3. Background process completion notification snaps truncation to
   the next newline boundary, prepends a marker when content is
   dropped.
4. _cprint() schedules the run_in_terminal coroutine via
   asyncio.ensure_future so output isn't silently dropped from
   background threads (fixes #23185 Bug A). Skips the
   double-print fallback that would fire for mock paths.

* perf(prompt): cache kanban worker guidance at session init

Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens)
is session-static — the dispatcher decides at spawn time whether the
process is a kanban worker via the kanban_show tool's check_fn (gated
on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in
valid_tool_names and re-loading the reference on every system-prompt
rebuild (init + each context compression) is wasted work.

Caches the resolved string on agent._kanban_worker_guidance once in
agent_init and consumes it in system_prompt.build_system_prompt(),
with a getattr fallback for code paths that bypass agent_init.

* feat(kanban): add --sort option to 'hermes kanban list'

Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc,
priority,priority-desc,status,assignee,title,updated} to 'hermes kanban
list'. Validated against VALID_SORT_ORDERS map; invalid values raise
ValueError. Default behaviour (priority DESC, created ASC) is unchanged
when --sort is omitted.

* docs: add kanban codex lane skill

* feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect)

Adds three read-only endpoints to the kanban dashboard plugin so the
SwitchUI workspace (and any other dashboard consumer) can track
workers across tasks without N+1 round-trips through /tasks/{task_id}.

- GET /workers/active
  Single SQL JOIN of task_runs + tasks where ended_at IS NULL,
  worker_pid IS NOT NULL, status='running'. Returns
  {workers: [...], count, checked_at}.

- GET /runs/{run_id}
  Direct lookup of any task_run row by id. Reuses existing
  kanban_db.get_run() helper and _run_dict() serialiser. 404 when
  not found. Mirrors GET /tasks/{task_id} 404 shape.

- GET /runs/{run_id}/inspect
  Live PID stats via psutil.Process.as_dict() — cpu_percent,
  memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status,
  create_time, cmdline. Short-circuits with alive:false when run
  has ended, has no worker_pid, the pid is gone, or psutil is
  unavailable. AccessDenied surfaces as alive:true with error
  rather than a 500.

11 new tests in tests/plugins/test_kanban_worker_runs.py cover the
empty-board case, running-task case, ended-run filtering,
missing-pid filtering, 404 paths, already-ended inspect, no-pid
inspect, dead-pid inspect, and live-pid inspect (psutil mocked).
All pass.

Companion termination endpoint (POST /runs/{run_id}/terminate) is
intentionally out of scope here — opening a separate issue first
since the RBAC and dispatcher-mediated soft-cancel design needs
maintainer input before code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): map contributor email for attribution check

* test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744)

- Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the
  enriched 409 detail names the blocking parent (id, quoted title, and
  current status), so the dashboard always has something actionable to
  render.
- New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline``
  pins the frontend wiring: the ``parseApiErrorMessage`` helper exists,
  the drag/drop banner runs through it, and the drawer maintains a
  visible ``patchErr`` state that's cleared between PATCHes and tasks.

* docs(codex_app_server): document multi-root Kanban writable_roots (#27941)

Update the Codex app-server runtime guide's Kanban section to reflect
the new behaviour:

  * The sandbox override now adds the board DB directory plus every
    Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT,
    HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated,
    DB-dir first.
  * The motivation note now includes the cross-mount artifact-write
    scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate
    drive) and links to issue #27941 so readers can find the original
    bug report.

* fix(gateway): quiet corrupt kanban dispatcher boards

Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board
DBs ("file is not a database" / "database disk image is malformed")
and disables them by fingerprint until they're repaired, instead of
flooding the gateway log with repeated logger.exception tracebacks every
tick.

Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was
an unrelated _is_dir OSError fix for service-path lookup. Dropped a
small test reformat that was bundled in the same commit.

* docs: align kanban readiness docs and smoke tests

Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current
tool registration: dispatcher-spawned task workers get task tools,
profiles that explicitly enable the kanban toolset get orchestrator
routing tools (kanban_list, kanban_unblock). Corrects failure-limit
text to current default of 2. Hardens the e2e subprocess script to
resolve repo root and use the spawnable default assignee. Updates the
diagnostics severity fixture to assert error below the critical
threshold.

* feat(kanban): surface per-task model_override in show + tool output

Salvages #26897 by @loicnico96. The per-task model_override DB column
already exists on main, but it wasn't exposed in user-facing surfaces.
This adds:
- 'kanban show' prints 'model: <name>' when model_override is set
- kanban_show / kanban_list tool responses include the model_override field

Original branch was stale (PR was authored against an older field name
'model'); applied the substantive surface exposure manually using the
current 'model_override' field name.

* feat(cli): add kanban swarm topology helper

Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a
durable Kanban Swarm v1 graph: a completed root/blackboard card,
parallel worker cards, a verifier gated on all workers, and a
synthesizer gated on the verifier. Stores shared swarm blackboard
updates as structured JSON comments on the root card.

Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring +
unit tests.

* feat(kanban): add optional board parameter to all MCP tools

Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9
kanban_* MCP tools via shared _connect helper. Backwards compatible —
omitting board keeps current pinned-board behavior. Useful for
orchestrator profiles that route across multiple boards.

Two-file scope: tools/kanban_tools.py + tests.

* feat(kanban): stamp originating ACP session_id on tasks

Salvages #23208 by @awizemann. Tracks which chat session created a
kanban task so clients can render a per-session board without falling
back to tenant + time-window heuristics.

- Schema: tasks gains nullable session_id TEXT column with index
  (additive migration in _migrate_add_optional_columns).
- ACP: server.py exposes the originating session id via HERMES_SESSION_ID
  with save/restore around the agent loop.
- Tool: kanban_create reads HERMES_SESSION_ID (with explicit override).
- CLI: 'hermes kanban list --session <id>' filter; JSON output exposes
  session_id.

* feat(kanban): wire dispatcher to dispatch review agents from review column

Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task
status and extends dispatch_once to monitor the review column as a
second dispatch source (in addition to the existing ready column).

- Adds 'review' to VALID_STATUSES
- Adds claim_review_task() — atomically transitions review → running
- Adds has_spawnable_review() — health telemetry mirror
- Extends dispatch_once with a review column dispatch loop
- Review agents get 'sdlc-review' skill auto-loaded

Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state,
test file additions). Adapted claim_review_task to main's
ttl_seconds: Optional[int] = None convention (matches claim_task).

* feat(kanban): stale detection for running tasks in dispatcher

Salvages #23790 by @thewillhuang. Adds detect_stale_running() to
the dispatcher cycle. Running tasks that have been started for longer
than dispatch_stale_timeout_seconds (default 14400 = 4h) without a
heartbeat in the last hour are auto-reclaimed to ready.

- New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables)
- New 'stale' field on DispatchResult
- detect_stale_running() in kanban_db.py with heartbeat freshness check
- Records outcome='stale' on run close + 'stale' event; ticks failure counter
- Wires config through gateway embedded dispatcher
- Updates _cmd_dispatch verbose/JSON output and daemon logging

Resolved test-file end-of-file conflict by appending both halves.

* feat(kanban): filter tasks by workflow fields and runs by status/outcome

Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing
workflow_template_id and current_step_key columns:

- list_tasks() accepts workflow_template_id and current_step_key kwargs
- 'hermes kanban list' adds matching CLI flags
- dashboard plugin_api also exposes the filters

Resolved a small conflict in list_tasks signature alongside main's
session_id and order_by additions; combined all three into the single
filter list.

* feat(kanban): add respawn guard to block repeat worker storms

Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker
spawn for tasks where:
- a recent run already succeeded (recent_success — within guard window)
- the previous run hit a quota/auth error (blocker_auth, also auto-blocks)
- a recent task comment includes a GitHub PR URL (active_pr)

The guard prevents repeat worker storms on the same bug/task. Includes
the contributor's review-findings fixup (regex hardening, observability,
auth coverage).

Resolved a small DispatchResult conflict alongside main's 'stale' field;
kept both. Authorship preserved via rebase merge.

* feat(kanban): show dashboard cron jobs across profiles

Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron
jobs from all profiles, with profile-aware filter UI and storage
routing. Includes test coverage for cross-profile listing, mutation,
deletion, and validation.

Also fixes orphan conflict markers in config.py left by an earlier
salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested
in HEAD/PR markers from #28452 salvage of #23790).

* fix(kanban): remove orphan conflict markers from config.py (#28458)

PR #28452 (salvage of #23790, stale detection) merged with leftover
git conflict markers in hermes_cli/config.py around the
`dispatch_stale_timeout_seconds` config block, breaking config import
and any code path that loads it. Cleans up the markers and keeps both
config blocks (worker log rotation/orchestrator + stale detection).

Resolves a self-introduced regression.

* fix(kanban): remove orphan conflict markers from kanban.py (#28459)

PR #28454 (salvage of #26745, workflow filter) merged with leftover
git conflict markers in hermes_cli/kanban.py at three sites:
- _task_to_dict() (session_id alongside workflow_template_id/current_step_key)
- p_list parser (--sort alongside --workflow-template-id/--step-key)
- _cmd_list (order_by alongside the new filter kwargs)

Cleans up the markers and keeps both halves at each site.

Resolves a self-introduced regression.

* feat(kanban): configure worktree paths and branches

Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so
tasks with workspace_kind='worktree' can pin a target branch on
create. Schema migration added to _migrate_add_optional_columns.

- Task.branch_name field + DB column + migration
- create_task accepts branch_name kwarg
- hermes kanban create --branch <name> flag
- kanban show output includes 'Branch: <name>' when set

Cherry-picked the substantive commit (a7558cf27); the PR's tip was
an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list
and show-output conflicts alongside main's session_id and
max_runtime_seconds additions; kept all three.

* feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373)

Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.

Use cases:
- /backend-dev → loads github-code-review + test-driven-development
  + github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.

Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
  at invocation time, the same way /<skill> does. Prompt cache stays
  intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
  /help display.

Schema (~/.hermes/skill-bundles/<slug>.yaml):
    name: backend-dev
    description: Backend feature work.
    skills:
      - github-code-review
      - test-driven-development
    instruction: |
      Optional extra guidance prepended to the loaded skills.

New module: agent/skill_bundles.py — load, scan, resolve, build
invocation message, save, delete. yaml.safe_load only; broken
bundles log a warning and are skipped, never raise.

New CLI subcommand: hermes bundles {list,show,create,delete,reload}.
Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py.
'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it.

New in-session slash command: /bundles lists installed bundles in
both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py)
and gateway (gateway/run.py) before the existing /<skill-name> path.

Autocomplete: SlashCommandCompleter gained an optional
skill_bundles_provider parameter that defaults to None — the prompt
shows '▣ <description> (N skills)' for bundles vs '⚡' for skills.

Tests:
- tests/agent/test_skill_bundles.py — 33 tests covering slugify,
  scan/cache freshness, resolve (including underscore→hyphen
  Telegram alias), build_bundle_invocation_message (loading, missing
  skills, user/bundle instruction injection, dedup), save/delete,
  reload diff, list sort.
- tests/hermes_cli/test_bundles.py — 8 tests for the CLI
  subcommand (create/list/show/delete/reload, --force, missing
  bundle errors).
- tests/gateway/test_bundles_command.py — 4 tests for the gateway
  handler and bundle resolution priority.

Live E2E: verified subprocess invocations of hermes bundles
{list,create,show,reload,delete} round-trip correctly against an
isolated HERMES_HOME.

Docs:
- website/docs/user-guide/features/skills.md — new 'Skill Bundles'
  section with quick example, YAML schema, management commands,
  behavior notes.
- website/docs/reference/cli-commands.md — 'hermes bundles' added to
  the top-level command table and given its own subcommand section.

* feat(kanban): add scheduled status for delayed follow-ups

Salvages #24533 by @roycepersonalassistant. Adds a first-class
'scheduled' Kanban status for time-delay follow-ups that aren't
waiting on human input.

- hermes kanban schedule <task_id> [reason] CLI command
- Dashboard/API transitions to/from Scheduled
- unblock_task() now releases both 'blocked' AND 'scheduled' tasks
  (re-checking parent dependencies before moving to ready/todo)
- i18n + docs updates

Resolved conflicts: kept HEAD's failure-counter reset on unblock
alongside the PR's scheduled state, kept HEAD's 'running' direct-set
rejection, combined both bulk-status branches. Dropped the dist/
bundle changes (months-stale; would need rebuild from source).

* feat(kanban): drag-to-delete trash zone + bulk delete for task cards

Salvages #28125 by @Jpalmer95. Adds:
- Drag-to-delete trash zone in the kanban dashboard
- Bulk delete endpoint with cascading delete_task cleanup
- Frontend updates (drag visual + drop handler)
- Confirmation prompt before delete

Resolved end-of-file test conflict by appending both halves.

* docs: add Korean Kanban documentation

Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and
translates Kanban documentation (kanban.md, kanban-tutorial.md) and the
two related skills (devops-kanban-orchestrator, devops-kanban-worker).

Purely additive — adds ko to the locales list in docusaurus.config.ts
and creates the website/i18n/ko/ tree.

* fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465)

- aux_config: drop session_search from _AUX_TASKS and remove stale test
  (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG)
- compression_boundary_hook: set compressor._last_compress_aborted=False
  on MagicMock so the post-compress abort branch (PR #28117) doesn't
  short-circuit before the session-id rotation under test
- kanban_dashboard_plugin: use consecutive_failures=3 so severity stays
  'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a,
  so failures=5 now crosses the critical floor of 2*2=4)
- cli_manual_compress: accept force kwarg on DummyAgent._compress_context
  (cli._manual_compress now passes force=True)

* fix(telegram): render full clarify choice text in message body, use short button labels

When Telegram clarify prompts offer long choices, mobile clients
truncate the inline button labels, making options unreadable.
Previously only the question was shown in the message body with
truncated choice text in button labels.

Fix: append the full numbered option list to the message body
so users can read complete choice text on any client.  Buttons
now use short numeric labels (1, 2, ...) to avoid Telegram
truncation.  The 'Other (type answer)' button is unchanged.

Long choice labels are now rendered in full (not truncated to
57 chars + '...') since they appear in the body instead of
button labels.

Closes: #27497

* chore(release): map @asdlem for PR #27852 salvage

* fix(telegram): default streaming transport to edit

* fix(telegram): respect reply_to_mode for DM topic reply fallback

The DM topic reply fallback code in send() hardcoded should_thread=True
when telegram_dm_topic_reply_fallback metadata was present, bypassing
_should_thread_reply() and ignoring reply_to_mode config. This caused
quote bubbles on every response even with reply_to_mode: 'off'.

Fix:
- Add reply_to_mode param to _reply_to_message_id_for_send() and
  _thread_kwargs_for_send() classmethods
- In send(), check self._reply_to_mode != 'off' for DM topic fallback
- Suppress reply anchor and reply_to_message_id when mode is 'off'
  while preserving message_thread_id for correct topic routing
- Thread reply_to_mode through all 29 call sites

Regression coverage: 10 new tests in test_telegram_reply_mode.py
covering classmethod behavior, send() integration, and backward
compatibility.

Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994

* fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870)

Telegram distinguishes three kinds of audio payloads:
  - message.voice  → Opus/OGG voice messages  → STT pipeline  ✓
  - message.audio  → audio file attachments   → bypasses STT  ← was broken
  - message.document (audio mime) → generic file route

**Root cause** — the inbound message routing block in gateway/run.py
matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths,
which were then fed unconditionally to _enrich_message_with_transcription.
Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed
instead of being treated as files, making the transcribe skill unusable
from Telegram because the path it needed was never surfaced.

**Fix**
- Introduce a new audio_file_paths list populated exclusively by
  MessageType.AUDIO events.
- Narrow the audio_paths selector to MessageType.VOICE (and bare
  audio/ mime-type events that are not explicitly AUDIO or DOCUMENT).
- After the STT block, inject a document-style context note for each
  audio_file_path, giving the agent the file path and asking what to do
  with it (consistent with how plain documents are handled).

**Tests** — 5 new tests in test_telegram_audio_vs_voice.py:
  - voice message still transcribed (regression guard)
  - audio attachment skips STT (core fix)
  - audio attachment context note format
  - STT disabled still produces file note (not STT-disabled notice)
  - MessageType.AUDIO != MessageType.VOICE sanity check

Fixes #24870

* chore(release): map bartok9 noreply for PR #24879 salvage

* fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY

When the send_message tool runs outside the gateway process (agent loop,
TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone
path in _send_telegram constructs Bot(token=token) directly, bypassing
any configured proxy. In regions where api.telegram.org is blocked, the
send times out after ~5s with 'Telegram send failed: Timed out' and
nothing ever shows up in gateway.log because the request never reaches
the gateway.

Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url,
which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just
before constructing the Bot. When a proxy is found, attach an
HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request',
matching what gateway/platforms/telegram.py already does for in-gateway
sends and what the Discord standalone sender already does. Any
exception attaching the proxy falls back cleanly to a direct connection,
preserving prior behaviour for users without a proxy configured.

Adds tests/tools/test_send_message_telegram_proxy.py covering both the
proxy-configured and no-proxy cases.

* chore(release): map @pepelax for PR #25419 salvage

* fix(kanban-dashboard): restore implementations dropped during salvages (#28481)

Four kanban dashboard test failures, all from PR salvages that picked up
the test additions but dropped the corresponding implementations.

- BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the
  board API never grew the column → test_board_empty failed because
  VALID_STATUSES - {archived} mismatched the rendered columns).
- update_task: enrich the 'ready' 409 detail with the blocking parent
  list (id, title, status) and add _parents_blocking_ready helper.
  Implementation lost in the #26744 salvage (commit e215558ba) which
  pinned the test but not the server-side code.
- dist/index.js: add parseApiErrorMessage helper, wire it through the
  drag/drop banner, add patchErr state to the TaskDrawer and surface
  it inline by the action row. Lost in the same #26744 salvage.
- test_diagnostics_endpoint_severity_filter: update to at-or-above
  semantics (PR a94ddd807 changed the filter from exact-match so the
  warning filter now correctly includes error+critical too).

* fix(gateway): roll over Telegram tool progress bubbles

* fix(gateway): scope audio_file_paths outside media_urls guard

The audio-file-paths handling block at line 7334 references the variable
unconditionally, but #24879 initialized it inside the 'if event.media_urls'
block — so events without media_urls hit UnboundLocalError.

Found via test_run_agent_queued_message_does_not_treat_commentary_as_final
after PR #28478 landed.

* fix(gateway): keep tool-progress edits alive after Telegram flood control

When a progress-message edit hits Telegram flood control (RetryAfter),
can_edit was unconditionally set to False, permanently disabling coalescing
for the rest of the run. Subsequent tool updates were posted as separate
new messages instead of updating the existing progress bubble.

Fix: only set can_edit=False for non-recoverable edit errors. On flood
control, back off by resetting _last_edit_ts so the throttle interval is
respected before the next edit attempt.

Fixes #25188

* chore(release): map @erhnysr for PR #25198 salvage

* fix(telegram): preserve can_edit after transient network errors in progress edits (#27828)

When edit_message_text fails with a transient error (httpx.ConnectError,
NetworkError, server disconnected, timeouts), the progress-message sender
must not permanently set can_edit = False — that would convert a single
Telegram network hiccup into separate per-tool bubbles for the rest of the run.

Changes:
- gateway/platforms/telegram.py: edit_message now returns retryable=True for
  transient network errors (ConnectError, NetworkError, timeouts, server
  disconnects, temporarily unavailable). Permanent failures (flood control,
  message-not-found, permissions) remain retryable=False.
- gateway/run.py: send_progress_messages checks result.retryable before
  setting can_edit = False. Transient failures skip the fallback-send and
  continue — the next edit cycle catches up with the accumulated lines.
  Permanent failures (flood, message-not-found, etc.) still disable editing.

Tests: 22 new tests in test_telegram_progress_edit_transient.py covering
transient vs permanent error classification, SendResult.retryable semantics,
and the can_edit decision logic.

Fixes #27828

* fix(telegram): recover from post-update polling conflict without entering limbo

* fix(test+release): update conflict retry count for MAX=5; map @CryptoByz

* fix(gateway): route background-process notifications into Telegram DM topics

Background-process completion notifications (notify_on_complete) and
watch-pattern notifications were always delivered to the Telegram main
chat instead of the originating private-chat topic.

Hermes-created Telegram DM topic lanes only render a send when it carries
both message_thread_id and a reply anchor. The synthetic MessageEvent
injected on process completion had no message_id, so _reply_anchor_for_event
returned None and _thread_kwargs_for_send dropped message_thread_id
entirely — routing the notification to the main chat.

Capture the triggering message id at spawn time and thread it through to
the synthetic event so it can be reply-anchored back into the topic:

- session_context: add HERMES_SESSION_MESSAGE_ID context var
- telegram adapter: populate SessionSource.message_id on inbound messages
- terminal tool: persist watcher_message_id on the process session
- process registry: carry/persist message_id on watcher dicts + checkpoint
- gateway: set MessageEvent.message_id on injected notifications

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): map @fabiosiqueira for PR #27212 salvage

* fix(telegram): route resumed DM topic sends directly

* fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages

TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button
actions but not for inbound messages. Unauthorized users triggered an
'Unauthorized user' log warning but their messages were still processed
by the agent — a P0 security bypass (issue #23778).

Fix: add allowlist check in _should_process_message() which is called
for all message types (text, command, media, location). If the sender
is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately
with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow
all users (existing behavior).

Fixes #23778

* fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty

The _is_callback_user_authorized fallback returned True when
TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user
to interact with the bot. Change to fail-closed: deny by default
unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.

Fixes #24457

* test(telegram): stub _is_callback_user_authorized in trigger-gating fixture

After PR #24468 made the empty-allowlist callback auth fail-closed
(and #23795 wired _is_callback_user_authorized into _should_process_message),
trigger-gating tests started failing because their fake messages from
user 111 hit the new deny-by-default path before trigger evaluation.

Force-authorize all senders in _make_adapter() so the trigger logic
under test runs.  The fail-closed behavior itself is covered by
test_telegram_callback_auth_fail_closed.py.

* fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS

When a sticky fallback IP (from DoH discovery) becomes unreachable,
the transport previously got stuck in an attempt_order that only
tried the dead IP.  This prevented the gateway from recovering
until the service was restarted.

Changes:
- Always include primary DNS path (None) after the sticky IP in the
  attempt_order so that a primary-path retry happens on sticky failure.
- Reset self._sticky_ip to None when the currently sticky IP hits
  a connect timeout / connect error, allowing the next request to
  retry from scratch.

Fixes silent Telegram disconnection when discovered fallback IPs
are transiently or permanently unreachable.

* test+release: align stale sticky-IP test for #24511; map @falconexe

* fix(telegram): propagate extra base_url config

* feat(send_message): auto-detect @username mentions and create Telegram entities

When sending messages containing @username patterns, auto-generate
MessageEntity(type='mention') entries so that the receiving bot's
require_mention filter can trigger. This enables proper bot-to-bot
interop where mention-based routing is used.

* test+release: align send_message mocks for MessageEntity import; map @fonhal

* fix(telegram): resume typing indicator after inline approval click (#27853)

The text /approve and /deny paths in gateway/run.py call
resume_typing_for_chat() after resolve_gateway_approval() succeeds, but
the Telegram inline-button (ea:*) callback in _handle_callback_query did
not. Typing is paused when the approval is sent (gateway/run.py:15658),
so without a matching resume the typing indicator stayed gone for the
remainder of a long-running turn after a button click.

Symmetry-match the text path: after a successful resolve, call
self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0
to match /approve's "if not count" early-return — if nothing was
actually resolved, the agent thread was never unblocked, so typing
should remain paused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): mark final voice reply as notify-worthy so Telegram delivers it audibly

In Telegram "important" notifications mode (default), TelegramPlatformAdapter
sets ``disable_notification=True`` on every send unless metadata carries
``notify=True``.  GatewayRunner._send_voice_reply already passes thread
metadata through to ``adapter.send_voice``, but never marks the final
auto-TTS voice reply as notify-worthy — so users with the default mode get
the final voice note delivered silently with no push notification.

Mirror the final-text path in gateway/platforms/base.py (the existing
text-response final send already adds ``metadata["notify"] = True``).

Issue #27970 Bug 2.  Bug 1 (MP3 vs. native OGG voice-note) is being
addressed by existing PRs #20182 / #20878 — this PR is intentionally
scoped to the silent-delivery bug only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: avoid Telegram group reply thread session splits

* chore(release): map @eliteworkstation94-ai for PR #28157 salvage

* fix(gateway): avoid duplicate Telegram text after auto-TTS voice replies

* chore(release): map @Zyrixtrex for PR #26754 salvage

* fix(telegram): escape send_slash_confirm preview with format_message

send_slash_confirm() sent the raw command preview with ParseMode.MARKDOWN,
skipping the format_message() conversion applied to every other dynamic
send in the adapter. Commands with underscores, dots, brackets, or other
MarkdownV2-sensitive characters raised BadRequest: Can't parse entities;
the exception was swallowed by the outer try/except, so the confirmation
prompt silently never appeared.

Fix: wrap preview through format_message() and switch to MARKDOWN_V2,
symmetric with send_update_prompt and the callback sends fixed in
a69404052.

* chore(release): map @nftpoetrist for PR #25856 salvage

* fix(telegram): retry wrapped connect timeouts

* chore(release): map @samahn0601 for PR #27887 salvage

* fix(tts): keep native audio outside Telegram voice delivery

* chore(release): map @aqilaziz for PR #26406 salvage

* fix(gateway): pin Telegram DM-topic routing to user's current topic

Topic-mode DM replies were fragmenting one conversation across many sessions: a Reply on a message in another topic delivered Telegram's message_thread_id for *that* topic, and #3206's strip routed plain replies to the lobby. Both pulled the user away from their current session. Fix: when topic mode is on, rewrite source.thread_id to the user's most-recent binding if the inbound id is missing/General or not a known topic. Non-topic-mode users unchanged.

* chore(release): map @karthikeyann for PR #26609 salvage

* fix(send_message): add thread-not-found retry for Telegram forum topic sends

The standalone _send_telegram path in send_message_tool lacked the
thread-not-found fallback that the gateway adapter has. When a forum
topic thread_id was stale or deleted, the send would fail entirely
instead of retrying to the General topic.

Changes:
- Add _is_telegram_thread_not_found() helper matching gateway adapter
- Add thread-not-found retry in text send path
- Add thread-not-found retry in media send path (with f.seek(0))
- Separate text_kwargs from thread_kwargs to prevent
  disable_web_page_preview leaking into send_photo/send_video calls

Closes #27012

* test(send_message): add thread-not-found retry tests for Telegram forum topics

Adds two tests to TestSendTelegramThreadIdMapping:
- test_thread_not_found_retries_without_message_thread_id
- test_thread_not_found_for_media_retries_without_message_thread_id

Refs #27012

* test(send_message): add thread-not-found retry tests for Telegram topics

Three tests covering the #27012 fix:
- test_is_thread_not_found_matches_expected_errors
- test_text_send_retries_without_thread_id_on_thread_not_found
- test_disable_web_page_preview_not_leaked_to_media_sends

116/116 existing tests still pass (no regressions).

* chore(release): map @kunci115 for PR #27098 salvage

* fix(gateway): register Telegram commands for groups

Register Telegram bot commands across default, private, and group scopes so
the slash-command menu is available outside DMs.

Changes from review feedback:
- Add asyncio.Lock to prevent race condition in _ensure_forum_commands
- Extract MAX_COMMANDS_PER_SCOPE constant (30) to avoid magic number
- Upgrade error logging from debug->warning in forum registration
- Add tests covering lazy forum registration and concurrent safety
- Remove /start handler from this PR (separate feature)

Fixes review: needs_work (race, magic number, log levels, missing tests)

* test+release: fix test fixture for forum_commands; map @chromalinx

* fix(telegram): gate profile bots by allowed topics

* chore(release): map @booker1207 for PR #25132 salvage

* fix(cron): route Telegram cron deliveries to a dedicated topic via TELEGRAM_CRON_THREAD_ID

When Telegram topic mode is enabled, cron messages delivered to the bot's
root DM (TELEGRAM_HOME_CHANNEL without a thread id) land in the system
lobby — replies there are rebuffed with the lobby reminder and
reply_to_message_id is dropped, so users cannot interact with the cron
output (#24409).

Add an optional TELEGRAM_CRON_THREAD_ID env var that overrides
TELEGRAM_HOME_CHANNEL_THREAD_ID for cron deliveries only. Operators can
create a "Cron" forum topic in the DM, point this var at its thread id,
and replies to cron messages will land in that topic's existing session
instead of the lobby. The home-channel thread id (used elsewhere, e.g.
restart notifications) is unchanged, and explicit
deliver="telegram:chat:thread" targets continue to win over the env var.

Per the reporter's clarification on 2026-05-13, option (a) (cron-side
route to a dedicated topic + config knob) was chosen.

Fixes #24409

* fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline

When users send images as documents (Telegram file picker), they were
rejected with "Unsupported document type" because SUPPORTED_DOCUMENT_TYPES
only includes text/office formats. Add SUPPORTED_IMAGE_DOCUMENT_TYPES
to base.py and handle them in telegram.py before the document check.

- Add SUPPORTED_IMAGE_DOCUMENT_TYPES constant to base.py
- Add MIME reverse-lookup for image types in telegram.py
- Route image documents through cache_image_from_bytes + vision pipeline
- Handle media groups for image documents

Closes: #20128, #18620

* test+release: stub auth in test_telegram_documents fixture; map @kiranvk-2011

* fix(gateway): prevent Windows Telegram /restart leaving gateway stopped

* chore(release): map @rak135 for PR #25960 salvage

* fix(telegram): preserve topic metadata on overflow edits

* feat(telegram): add disable_topic_auto_rename gateway flag

When Hermes auto-titles a session in a Telegram DM topic it currently
renames the topic itself to the generated title. That works for
operator-managed lanes (extra.dm_topics) but is disruptive for
ad-hoc Threaded-Mode topics that users name by hand — every first
exchange overwrites their chosen title.

Add gateway.platforms.telegram.extra.disable_topic_auto_rename (default
False, preserving prior behaviour). When set, both
_schedule_telegram_topic_title_rename and the underlying
_rename_telegram_topic_for_session_title short-circuit before touching
the Telegram API. Internal session titles (sessions list, TUI) keep
working unchanged.

Also bridge the legacy top-level telegram.disable_topic_auto_rename key
through to gateway.platforms.telegram.extra so users on the older
config layout don't have to migrate to enable it.

- Tests cover the runtime flag, the scheduling entry-point, and string
  truthiness coercion for YAML-loaded values.
- Docs updated in messaging/telegram.md with an example block.

* chore(release): map @B0Tch1 for PR #27634 salvage

* fix(gateway): restore Telegram DM topic thread_id after session split (#27166)

When context compression triggers a mid-turn session split, source.thread_id
can be None on synthetic/recovered events. _thread_metadata_for_source then
returns None, causing the Telegram adapter to send with no message_thread_id
and the response lands in the General thread instead of the active DM topic.

Fix:
- hermes_state.py: Add get_telegram_topic_binding_by_session() for reverse
  lookup by session_id (enabled by the existing UNIQUE INDEX on session_id).
- gateway/run.py: After session-split detection, if source is a Telegram DM
  and source.thread_id is None, recover it from the binding via the new
  method so _thread_metadata_for_source produces the correct thread routing.
- tests/: Coverage for the new lookup method and the recovery flow.

* chore(release): map @jackjin1997 for PR #27239 salvage

* fix(gateway): allow chat-scoped telegram auth without sender user_id

* chore(release): map @soynchux for PR #27806 salvage

* fix(telegram): add DM topic typing fallback when message_thread_id rejected

When a DM topic lane's message_thread_id is rejected by Telegram
(e.g. stale or deleted topic), send_typing now falls back to sending
the typing indicator without thread_id so it at least appears in the
main DM view, rather than being silently swallowed.

Also adds test for the fallback behavior.

* fix(telegram): report cron topic fallback

* chore(release): map @el-analista for PR #25368 salvage

* fix(telegram): wire gt: callback dispatch for gmail-triage buttons

The gmail-triage skill's Telegram inline buttons emit callback_data of the
form `gt:<verb>:<arg>`, but `_handle_callback_query` had no `gt:` branch —
taps fell through silently and the spinner sat there until Telegram timed it
out.

Add `_handle_gmail_triage_callback`, dispatched from the existing callback
router, that:

- Authorizes the caller via the same `_is_callback_user_authorized` path as
  the approval / slash-confirm / clarify handlers.
- Maps each verb to a script under `~/.hermes/scripts/gmail-triage/` and runs
  it async with a 60s timeout.
- Splits verbs into one-shots (send / archive / draft / spam) — append the
  confirmation and strip the keyboard so the action can't fire twice — and
  sticky-state changes (mute / trust / vip ± -domain) — append the
  confirmation but leave the keyboard tappable so the user can stack actions
  on one email.
- On failure: toast only, keyboard preserved so the user can retry.
- Logs every callback outcome to gateway.log for debugging.

* chore(release): map @khungate for PR #25829 salvage

* feat(telegram): support quick-command-only menus

* chore(release): map @stevehq26-bot for PR #28015 salvage

* fix(telegram): handle channel post updates

* test: address telegram channel post review

* test+release: stub auth in channel_posts fixture; map @brndnsvr

* Quiet noisy Telegram gateway errors

* chore(release): map oracle@jarviss-mbp.home for PR #24014 salvage

* Route Telegram multi-bot mentions exclusively

* Document Telegram multi-profile gateway commands

* fix: ignore Telegram messages for other bots

* chore(release): map @OCWC22 for PR #24581 salvage

* feat(telegram): ignore_root_dm with system command lobby

* docs(telegram): document ignore_root_dm feature

* chore(release): map @ai-hana-ai for PR #23928 salvage

* feat(telegram): pin incoming user message for duration of agent turn

When a user sends a message on Telegram, the incoming message is now
automatically pinned at the start of processing and unpinned when the
agent finishes its turn. This gives the user a visual indicator that
their message is being worked on, and keeps the conversation anchored.

Changes:
- telegram.py: Added pinChatMessage in on_processing_start and
  unpinChatMessage in on_processing_complete. Restructured both
  hooks so pin/unpin runs independently of the reactions feature
  (reactions are optional; pinning is always on).
- telegram.py: Pass message_id through SessionSource so it's
  available in the session context.
- session_context.py: Added HERMES_SESSION_MESSAGE_ID context var.
- run.py: Pass source.message_id through set_session_vars.

Pinning is silent (disable_notification=True) and failures are
logged at debug level without interrupting message processing.
Only the user's incoming message is pinned -- never the agent's
replies. Auto-resume events (which have no message_id) are
correctly skipped.

* chore(release): map @indigokarasu for PR #26636 salvage

* feat(telegram): skip-STT audio path + 2GB cap via local Bot API server

Two coordinated changes that unblock downstream audio pipelines
(diarization, custom transcription, archival) on attachments larger
than the public Bot API's 20MB getFile ceiling.

- `stt.enabled: false` no longer drops voice/audio with a generic
  "transcription disabled" note. The gateway probes the cached file's
  duration (wave → mutagen → ffprobe ladder) and surfaces
  `[The user sent a voice message: <abs path> (duration: M:SS)]` to
  the agent so a skill or tool can pick up the raw file. The previous
  placeholder is replaced rather than appended when present.

- `platforms.telegram.extra.base_url` set → adapter auto-lifts its
  document size cap from 20MB to 2GB (the local telegram-bot-api
  `--local` ceiling) and the "too large" reply reports the active
  limit dynamically. No new config knob; presence of `base_url` is the
  opt-in.

- `platforms.telegram.extra.local_mode: true` wires
  `Application.builder().local_mode(True)` on the python-telegram-bot
  builder. PTB then reads files from disk instead of HTTP, which is
  required when telegram-bot-api runs in `--local` mode (the server
  returns absolute filesystem paths, not `/file/bot...` URLs).

- gateway/run.py: rewrites the `stt.enabled: false` branch of
  `_enrich_message_with_transcription`. New `_format_duration` +
  `_probe_audio_duration` helpers.
- gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute
  derived from `extra.base_url`; `local_mode` builder wiring;
  dynamic "too large" message.
- tests/gateway/test_stt_config.py: covers path-surfacing with and
  without an existing user message, and placeholder replacement.
- tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB
  without base_url, 2GB when set, empty-string base_url keeps default.
- website/docs/user-guide/messaging/telegram.md: new "Skipping STT"
  subsection under Voice Messages and a full "Large Files (>20MB) via
  Local Bot API Server" walkthrough (api_id/api_hash, docker-compose,
  one-time `logOut` migration, `platforms.telegram.extra` config, the
  `local_mode` disk-access requirement, the silent HTTP-fallback 404).
- website/docs/user-guide/features/voice-mode.md: documents the
  `stt.enabled` knob in the config reference.

- `pytest tests/gateway/test_telegram_max_doc_bytes.py
  tests/gateway/test_stt_config.py` → 9/9 passing.
- Verified end-to-end on a live deployment: gateway log shows
  `Using custom Telegram base_url: http://...` and
  `Using Telegram local_mode (read files from disk)` on startup;
  voice messages above 20MB cache to disk and surface their path to
  the agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(release): map @alber70g for PR #25280 salvage

* fix(web): add scheduled column to i18n type definitions (#28549)

columnLabels and columnHelp in en.ts include a scheduled entry but the
Translations interface in types.ts did not declare it, causing a
TypeScript build failure in the Nix derivation. Made the field optional
since only en.ts provides it currently.

* docs: comprehensive 2-week sweep of feature/PR coverage gaps (#28497)

Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR #27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR #24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  #28116 / #28118 / #28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs #27663 / #19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR #28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR #25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR #26824).
- `x_search` auto-enables when xAI credentials are present (PR #27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  #26534).
- NVIDIA NIM billing-origin header is set automatically (PR #26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR #28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  #27822).
- Document `dep_ensure` Windows bootstrap (PR #27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR #27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR #26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR #21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR #27245).
- Discord clarify-choice button rendering (PR #25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  #22759).
- Telegram `notifications` mode (`important` vs `all`) (PR #22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR #21210).

CLI / TUI
- `/new [name]` argument (PR #19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR #25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR #22687).
- Status-bar additions: ▶ N background indicator (PR #27175), context
  compression count (PR #21218), YOLO mode banner+statusbar warning (PR
  #26238).
- `display.timestamps` + `docker_extra_args` config keys (PR #23599).
- TUI collapsible startup banner sections (PR #20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR #22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs #27590 / #27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR #21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR #21337).
- ACP session-scoped edit auto-approval modes (PR #27862).
- Curator rename map in the user-visible per-run summary (PR #22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR #23828).
- Cron per-job profile parameter (PR #28124).
- `--no-skills` flag for `hermes profile create` (PR #20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).

* chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 9 (#28571)

Pre-stages AUTHOR_MAP entries for 9 new/under-mapped contributors whose
PRs are being salvaged in the May 2026 LHF batch group 9.

Contributors:
- jdelmerico (#28278 — signal require_mention filter)
- justemu (#27996 — matrix thread_require_mention)
- YuanHanzhong (#28029 — dashboard browser scrollback)
- noctilust (#28080 — drop stale TUI resume env)
- MoonJuhan (#28288 — tolerate unreadable JSONL transcripts)
- outsourc-e (#28164 — cron emoji ZWJ sequences)
- Zyrixtrex (#28275 — Google OAuth urlopen timeout)
- ooovenenoso (#28256 — tool loop recovery hints)
- vanthinh6886 (#28018 — yaml/flock/atomic write guards; non-noreply email)

Per references/batch-pr-salvage-may14-additions.md.

* feat(signal): add require_mention filter for group chats

Add a configurable mention filter to the Signal adapter so the bot
only responds in groups when it is explicitly @mentioned.

Changes:
- gateway/platforms/signal.py: read require_mention from adapter
  extra config or SIGNAL_REQUIRE_MENTION env var; skip group messages
  that don't mention the bot account (checked in rendered text and
  raw mention metadata)
- gateway/config.py: map signal.require_mention YAML key to the
  SIGNAL_REQUIRE_MENTION env var (env var takes precedence)

Config example:
  signal:
    require_mention: true

Or via env var:
  SIGNAL_REQUIRE_MENTION=true

* Revert "feat(telegram): pin incoming user message for duration of agent turn"

This reverts commit a724c3b9cf5f01e28365322ae5ae3a9579567806.

* Revert "feat(telegram): support quick-command-only menus"

This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4.

* Revert "feat(send_message): auto-detect @username mentions and create Telegram entities"

This reverts commit cf814c96f613b38bd891ac941c32da653e81c7ad.

* Revert "fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages"

This reverts commit db50af910be6b4171ea9cf54f4cc38be27ac1da6.

* fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856)

Pre-mark all running agent sessions as resume_pending BEFORE the drain
wait begins. If the service manager kills the process during the drain
(window), the durable marker is already written so the next gateway boot
can recover in-flight sessions. On graceful drain completion, clear the
early markers for sessions that finished successfully.

* fix(matrix): implement thread_require_mention to prevent multi-agent reply loops

In multi-agent shared Matrix rooms, multiple bots all participating in the
same thread could trigger infinite reply loops — each bot's reply re-engaged
the others because they were all in the bot-thread set. Discord has a
`thread_require_mention` opt-in for this; Matrix didn't.

Add `_parse_thread_require_mention(config)` (mirrors Discord's pattern).
In `_resolve_message_context`, when enabled and the message is in a
bot-participated thread (not a free-response room), require @mention
before processing.

Salvage of @justemu's 2-commit stack (#27996). Fixes #27995.

* fix(cli): show active profile in TUI prompt

* fix(tui): preserve dunder identifiers in markdown

* test(file_ops): add regression tests for git baseline warning in write_file

Adds TestGitBaselineCheck with 6 unit tests covering _check_git_baseline
and the warning field in write_file result:
- Git not available → None
- Not in a git repo → None
- Clean repo → None
- Dirty repo → returns warning string with branch name
- write_file result includes warning when dirty
- write_file result omits warning when clean

* fix(dashboard): use browser scrollback for chat wheel

* fix(cli): ignore stale HERMES_TUI_RESUME env

HERMES_TUI_RESUME is an internal env va…
Arvuno added a commit to Arvuno/hermes-agent that referenced this pull request May 20, 2026
…ecture decisions (#4)

* docs(kanban): document worker protocol auto-blocks

Salvages #21585 by @helix4u. Documents the protocol_violation event
(worker exits successfully while task is still running), adds
--max-retries to the create flag list and --failure-limit to dispatch.

* fix(oneshot): pass fallback_providers from profile config to AIAgent

Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers
spawned via 'hermes -p <profile> chat -q ...') were not honouring the
profile's fallback_providers / fallback_model chain because oneshot.py
never read the config and never passed fallback_model= to AIAgent.

Reads cfg.get('fallback_providers') (new list format) or
cfg.get('fallback_model') (legacy single-dict) with the same
normalization cli.py applies, then forwards as fallback_model=_fb.

* fix(kanban): reject direct running transitions in dashboard bulk updates

Salvages #24050 by @kronexoi. The single-task PATCH already rejects
direct status='running' since it bypasses the dispatcher/claim invariant,
but the bulk-update endpoint still accepted it. Aligns bulk with single
by emitting an error result row for any 'running' entry.

* feat(kanban): add initial-status for human-ops cards

Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag
(running|blocked, default running) to 'kanban create', threaded through
kanban_db.create_task() and the kanban_create tool schema. 'blocked'
parks the task directly in the blocked column for R3 human-ops review,
skipping the brief running-to-blocked transition.

Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and
slash-handler error formatting changes that were bundled in the
original PR — those should ship as their own focused changes if still
wanted.

* fix(kanban): release scratch workspace and tmux session on task completion

Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace()
and _cleanup_worker_tmux() after marking a task complete.

Scratch workspaces (used by swarm agents) accumulate on disk — hundreds
of MB per task, never released. Stale tmux sessions from completed
agents also persist indefinitely.

Both gates are safe:
- workspace_kind == 'scratch' gate preserves user worktree/dir workspaces
- tmux #{pane_dead} == 1 gate only kills sessions where the worker has
  already exited
- best-effort: cleanup failures never block task completion

* fix(kanban): honor severity thresholds in diagnostics

Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics
was using exact-match (severity == filter), so '--severity warning'
hid 'error' and 'critical' diagnostics. Adds severity_at_or_above()
helper to kanban_diagnostics and uses it in the dashboard endpoint
(CLI already used SEVERITY_ORDER comparison correctly).

* test: isolate Kanban env pins in hermetic fixture

Salvages the substantive part of #22295 by @steezkelly. Adds the
missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK,
HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so
ambient developer-shell pins on those vars don't bleed into pytest runs.

The frozenset extraction + standalone regression test from the original
PR were dropped to keep the change minimal — main already maintains the
list inline.

* feat(kanban): add max_in_progress config to cap concurrent running tasks

Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config
that caps simultaneously running tasks. When the board already has N
running, dispatcher skips spawning so slow workers (local LLMs,
resource-constrained hosts) don't pile up and time out.

Threads through dispatch_once(max_in_progress=) and gateway dispatcher
config parsing with validation (warns on invalid/below-1 values).

* fix(packaging): ship bundled skills in wheel

Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and
optional-skills/ because pyproject's [tool.setuptools.packages.find]
only includes Python packages — the skills directories don't have
__init__.py so they were silently dropped from the wheel.

Adds setup.py with data_files spec emitting skills/* and optional-skills/*
under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper
in hermes_constants that discovers the wheel-installed location via
sysconfig before falling back to a source-checkout path. tools/skills_sync
uses the helper so 'hermes update' works for pip-installed users.

* fix: 4 small surgical bugs

Salvages #23302 by @Bartok9. Four independent one-area fixes:

1. kanban boards delete alias now hard-deletes (not archives) — the
   alias didn't carry --delete, so getattr(args, 'delete', False)
   returned False. Detect boards_action=='delete' explicitly.
2. Gateway auto-title failures no longer leak as user-visible
   warnings — debug-log only since they're not actionable.
3. Background process completion notification snaps truncation to
   the next newline boundary, prepends a marker when content is
   dropped.
4. _cprint() schedules the run_in_terminal coroutine via
   asyncio.ensure_future so output isn't silently dropped from
   background threads (fixes #23185 Bug A). Skips the
   double-print fallback that would fire for mock paths.

* perf(prompt): cache kanban worker guidance at session init

Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens)
is session-static — the dispatcher decides at spawn time whether the
process is a kanban worker via the kanban_show tool's check_fn (gated
on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in
valid_tool_names and re-loading the reference on every system-prompt
rebuild (init + each context compression) is wasted work.

Caches the resolved string on agent._kanban_worker_guidance once in
agent_init and consumes it in system_prompt.build_system_prompt(),
with a getattr fallback for code paths that bypass agent_init.

* feat(kanban): add --sort option to 'hermes kanban list'

Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc,
priority,priority-desc,status,assignee,title,updated} to 'hermes kanban
list'. Validated against VALID_SORT_ORDERS map; invalid values raise
ValueError. Default behaviour (priority DESC, created ASC) is unchanged
when --sort is omitted.

* docs: add kanban codex lane skill

* feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect)

Adds three read-only endpoints to the kanban dashboard plugin so the
SwitchUI workspace (and any other dashboard consumer) can track
workers across tasks without N+1 round-trips through /tasks/{task_id}.

- GET /workers/active
  Single SQL JOIN of task_runs + tasks where ended_at IS NULL,
  worker_pid IS NOT NULL, status='running'. Returns
  {workers: [...], count, checked_at}.

- GET /runs/{run_id}
  Direct lookup of any task_run row by id. Reuses existing
  kanban_db.get_run() helper and _run_dict() serialiser. 404 when
  not found. Mirrors GET /tasks/{task_id} 404 shape.

- GET /runs/{run_id}/inspect
  Live PID stats via psutil.Process.as_dict() — cpu_percent,
  memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status,
  create_time, cmdline. Short-circuits with alive:false when run
  has ended, has no worker_pid, the pid is gone, or psutil is
  unavailable. AccessDenied surfaces as alive:true with error
  rather than a 500.

11 new tests in tests/plugins/test_kanban_worker_runs.py cover the
empty-board case, running-task case, ended-run filtering,
missing-pid filtering, 404 paths, already-ended inspect, no-pid
inspect, dead-pid inspect, and live-pid inspect (psutil mocked).
All pass.

Companion termination endpoint (POST /runs/{run_id}/terminate) is
intentionally out of scope here — opening a separate issue first
since the RBAC and dispatcher-mediated soft-cancel design needs
maintainer input before code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): map contributor email for attribution check

* test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744)

- Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the
  enriched 409 detail names the blocking parent (id, quoted title, and
  current status), so the dashboard always has something actionable to
  render.
- New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline``
  pins the frontend wiring: the ``parseApiErrorMessage`` helper exists,
  the drag/drop banner runs through it, and the drawer maintains a
  visible ``patchErr`` state that's cleared between PATCHes and tasks.

* docs(codex_app_server): document multi-root Kanban writable_roots (#27941)

Update the Codex app-server runtime guide's Kanban section to reflect
the new behaviour:

  * The sandbox override now adds the board DB directory plus every
    Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT,
    HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated,
    DB-dir first.
  * The motivation note now includes the cross-mount artifact-write
    scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate
    drive) and links to issue #27941 so readers can find the original
    bug report.

* fix(gateway): quiet corrupt kanban dispatcher boards

Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board
DBs ("file is not a database" / "database disk image is malformed")
and disables them by fingerprint until they're repaired, instead of
flooding the gateway log with repeated logger.exception tracebacks every
tick.

Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was
an unrelated _is_dir OSError fix for service-path lookup. Dropped a
small test reformat that was bundled in the same commit.

* docs: align kanban readiness docs and smoke tests

Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current
tool registration: dispatcher-spawned task workers get task tools,
profiles that explicitly enable the kanban toolset get orchestrator
routing tools (kanban_list, kanban_unblock). Corrects failure-limit
text to current default of 2. Hardens the e2e subprocess script to
resolve repo root and use the spawnable default assignee. Updates the
diagnostics severity fixture to assert error below the critical
threshold.

* feat(kanban): surface per-task model_override in show + tool output

Salvages #26897 by @loicnico96. The per-task model_override DB column
already exists on main, but it wasn't exposed in user-facing surfaces.
This adds:
- 'kanban show' prints 'model: <name>' when model_override is set
- kanban_show / kanban_list tool responses include the model_override field

Original branch was stale (PR was authored against an older field name
'model'); applied the substantive surface exposure manually using the
current 'model_override' field name.

* feat(cli): add kanban swarm topology helper

Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a
durable Kanban Swarm v1 graph: a completed root/blackboard card,
parallel worker cards, a verifier gated on all workers, and a
synthesizer gated on the verifier. Stores shared swarm blackboard
updates as structured JSON comments on the root card.

Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring +
unit tests.

* feat(kanban): add optional board parameter to all MCP tools

Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9
kanban_* MCP tools via shared _connect helper. Backwards compatible —
omitting board keeps current pinned-board behavior. Useful for
orchestrator profiles that route across multiple boards.

Two-file scope: tools/kanban_tools.py + tests.

* feat(kanban): stamp originating ACP session_id on tasks

Salvages #23208 by @awizemann. Tracks which chat session created a
kanban task so clients can render a per-session board without falling
back to tenant + time-window heuristics.

- Schema: tasks gains nullable session_id TEXT column with index
  (additive migration in _migrate_add_optional_columns).
- ACP: server.py exposes the originating session id via HERMES_SESSION_ID
  with save/restore around the agent loop.
- Tool: kanban_create reads HERMES_SESSION_ID (with explicit override).
- CLI: 'hermes kanban list --session <id>' filter; JSON output exposes
  session_id.

* feat(kanban): wire dispatcher to dispatch review agents from review column

Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task
status and extends dispatch_once to monitor the review column as a
second dispatch source (in addition to the existing ready column).

- Adds 'review' to VALID_STATUSES
- Adds claim_review_task() — atomically transitions review → running
- Adds has_spawnable_review() — health telemetry mirror
- Extends dispatch_once with a review column dispatch loop
- Review agents get 'sdlc-review' skill auto-loaded

Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state,
test file additions). Adapted claim_review_task to main's
ttl_seconds: Optional[int] = None convention (matches claim_task).

* feat(kanban): stale detection for running tasks in dispatcher

Salvages #23790 by @thewillhuang. Adds detect_stale_running() to
the dispatcher cycle. Running tasks that have been started for longer
than dispatch_stale_timeout_seconds (default 14400 = 4h) without a
heartbeat in the last hour are auto-reclaimed to ready.

- New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables)
- New 'stale' field on DispatchResult
- detect_stale_running() in kanban_db.py with heartbeat freshness check
- Records outcome='stale' on run close + 'stale' event; ticks failure counter
- Wires config through gateway embedded dispatcher
- Updates _cmd_dispatch verbose/JSON output and daemon logging

Resolved test-file end-of-file conflict by appending both halves.

* feat(kanban): filter tasks by workflow fields and runs by status/outcome

Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing
workflow_template_id and current_step_key columns:

- list_tasks() accepts workflow_template_id and current_step_key kwargs
- 'hermes kanban list' adds matching CLI flags
- dashboard plugin_api also exposes the filters

Resolved a small conflict in list_tasks signature alongside main's
session_id and order_by additions; combined all three into the single
filter list.

* feat(kanban): add respawn guard to block repeat worker storms

Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker
spawn for tasks where:
- a recent run already succeeded (recent_success — within guard window)
- the previous run hit a quota/auth error (blocker_auth, also auto-blocks)
- a recent task comment includes a GitHub PR URL (active_pr)

The guard prevents repeat worker storms on the same bug/task. Includes
the contributor's review-findings fixup (regex hardening, observability,
auth coverage).

Resolved a small DispatchResult conflict alongside main's 'stale' field;
kept both. Authorship preserved via rebase merge.

* feat(kanban): show dashboard cron jobs across profiles

Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron
jobs from all profiles, with profile-aware filter UI and storage
routing. Includes test coverage for cross-profile listing, mutation,
deletion, and validation.

Also fixes orphan conflict markers in config.py left by an earlier
salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested
in HEAD/PR markers from #28452 salvage of #23790).

* fix(kanban): remove orphan conflict markers from config.py (#28458)

PR #28452 (salvage of #23790, stale detection) merged with leftover
git conflict markers in hermes_cli/config.py around the
`dispatch_stale_timeout_seconds` config block, breaking config import
and any code path that loads it. Cleans up the markers and keeps both
config blocks (worker log rotation/orchestrator + stale detection).

Resolves a self-introduced regression.

* fix(kanban): remove orphan conflict markers from kanban.py (#28459)

PR #28454 (salvage of #26745, workflow filter) merged with leftover
git conflict markers in hermes_cli/kanban.py at three sites:
- _task_to_dict() (session_id alongside workflow_template_id/current_step_key)
- p_list parser (--sort alongside --workflow-template-id/--step-key)
- _cmd_list (order_by alongside the new filter kwargs)

Cleans up the markers and keeps both halves at each site.

Resolves a self-introduced regression.

* feat(kanban): configure worktree paths and branches

Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so
tasks with workspace_kind='worktree' can pin a target branch on
create. Schema migration added to _migrate_add_optional_columns.

- Task.branch_name field + DB column + migration
- create_task accepts branch_name kwarg
- hermes kanban create --branch <name> flag
- kanban show output includes 'Branch: <name>' when set

Cherry-picked the substantive commit (a7558cf27); the PR's tip was
an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list
and show-output conflicts alongside main's session_id and
max_runtime_seconds additions; kept all three.

* feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373)

Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.

Use cases:
- /backend-dev → loads github-code-review + test-driven-development
  + github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.

Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
  at invocation time, the same way /<skill> does. Prompt cache stays
  intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
  /help display.

Schema (~/.hermes/skill-bundles/<slug>.yaml):
    name: backend-dev
    description: Backend feature work.
    skills:
      - github-code-review
      - test-driven-development
    instruction: |
      Optional extra guidance prepended to the loaded skills.

New module: agent/skill_bundles.py — load, scan, resolve, build
invocation message, save, delete. yaml.safe_load only; broken
bundles log a warning and are skipped, never raise.

New CLI subcommand: hermes bundles {list,show,create,delete,reload}.
Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py.
'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it.

New in-session slash command: /bundles lists installed bundles in
both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py)
and gateway (gateway/run.py) before the existing /<skill-name> path.

Autocomplete: SlashCommandCompleter gained an optional
skill_bundles_provider parameter that defaults to None — the prompt
shows '▣ <description> (N skills)' for bundles vs '⚡' for skills.

Tests:
- tests/agent/test_skill_bundles.py — 33 tests covering slugify,
  scan/cache freshness, resolve (including underscore→hyphen
  Telegram alias), build_bundle_invocation_message (loading, missing
  skills, user/bundle instruction injection, dedup), save/delete,
  reload diff, list sort.
- tests/hermes_cli/test_bundles.py — 8 tests for the CLI
  subcommand (create/list/show/delete/reload, --force, missing
  bundle errors).
- tests/gateway/test_bundles_command.py — 4 tests for the gateway
  handler and bundle resolution priority.

Live E2E: verified subprocess invocations of hermes bundles
{list,create,show,reload,delete} round-trip correctly against an
isolated HERMES_HOME.

Docs:
- website/docs/user-guide/features/skills.md — new 'Skill Bundles'
  section with quick example, YAML schema, management commands,
  behavior notes.
- website/docs/reference/cli-commands.md — 'hermes bundles' added to
  the top-level command table and given its own subcommand section.

* feat(kanban): add scheduled status for delayed follow-ups

Salvages #24533 by @roycepersonalassistant. Adds a first-class
'scheduled' Kanban status for time-delay follow-ups that aren't
waiting on human input.

- hermes kanban schedule <task_id> [reason] CLI command
- Dashboard/API transitions to/from Scheduled
- unblock_task() now releases both 'blocked' AND 'scheduled' tasks
  (re-checking parent dependencies before moving to ready/todo)
- i18n + docs updates

Resolved conflicts: kept HEAD's failure-counter reset on unblock
alongside the PR's scheduled state, kept HEAD's 'running' direct-set
rejection, combined both bulk-status branches. Dropped the dist/
bundle changes (months-stale; would need rebuild from source).

* feat(kanban): drag-to-delete trash zone + bulk delete for task cards

Salvages #28125 by @Jpalmer95. Adds:
- Drag-to-delete trash zone in the kanban dashboard
- Bulk delete endpoint with cascading delete_task cleanup
- Frontend updates (drag visual + drop handler)
- Confirmation prompt before delete

Resolved end-of-file test conflict by appending both halves.

* docs: add Korean Kanban documentation

Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and
translates Kanban documentation (kanban.md, kanban-tutorial.md) and the
two related skills (devops-kanban-orchestrator, devops-kanban-worker).

Purely additive — adds ko to the locales list in docusaurus.config.ts
and creates the website/i18n/ko/ tree.

* fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465)

- aux_config: drop session_search from _AUX_TASKS and remove stale test
  (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG)
- compression_boundary_hook: set compressor._last_compress_aborted=False
  on MagicMock so the post-compress abort branch (PR #28117) doesn't
  short-circuit before the session-id rotation under test
- kanban_dashboard_plugin: use consecutive_failures=3 so severity stays
  'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a,
  so failures=5 now crosses the critical floor of 2*2=4)
- cli_manual_compress: accept force kwarg on DummyAgent._compress_context
  (cli._manual_compress now passes force=True)

* fix(telegram): render full clarify choice text in message body, use short button labels

When Telegram clarify prompts offer long choices, mobile clients
truncate the inline button labels, making options unreadable.
Previously only the question was shown in the message body with
truncated choice text in button labels.

Fix: append the full numbered option list to the message body
so users can read complete choice text on any client.  Buttons
now use short numeric labels (1, 2, ...) to avoid Telegram
truncation.  The 'Other (type answer)' button is unchanged.

Long choice labels are now rendered in full (not truncated to
57 chars + '...') since they appear in the body instead of
button labels.

Closes: #27497

* chore(release): map @asdlem for PR #27852 salvage

* fix(telegram): default streaming transport to edit

* fix(telegram): respect reply_to_mode for DM topic reply fallback

The DM topic reply fallback code in send() hardcoded should_thread=True
when telegram_dm_topic_reply_fallback metadata was present, bypassing
_should_thread_reply() and ignoring reply_to_mode config. This caused
quote bubbles on every response even with reply_to_mode: 'off'.

Fix:
- Add reply_to_mode param to _reply_to_message_id_for_send() and
  _thread_kwargs_for_send() classmethods
- In send(), check self._reply_to_mode != 'off' for DM topic fallback
- Suppress reply anchor and reply_to_message_id when mode is 'off'
  while preserving message_thread_id for correct topic routing
- Thread reply_to_mode through all 29 call sites

Regression coverage: 10 new tests in test_telegram_reply_mode.py
covering classmethod behavior, send() integration, and backward
compatibility.

Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994

* fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870)

Telegram distinguishes three kinds of audio payloads:
  - message.voice  → Opus/OGG voice messages  → STT pipeline  ✓
  - message.audio  → audio file attachments   → bypasses STT  ← was broken
  - message.document (audio mime) → generic file route

**Root cause** — the inbound message routing block in gateway/run.py
matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths,
which were then fed unconditionally to _enrich_message_with_transcription.
Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed
instead of being treated as files, making the transcribe skill unusable
from Telegram because the path it needed was never surfaced.

**Fix**
- Introduce a new audio_file_paths list populated exclusively by
  MessageType.AUDIO events.
- Narrow the audio_paths selector to MessageType.VOICE (and bare
  audio/ mime-type events that are not explicitly AUDIO or DOCUMENT).
- After the STT block, inject a document-style context note for each
  audio_file_path, giving the agent the file path and asking what to do
  with it (consistent with how plain documents are handled).

**Tests** — 5 new tests in test_telegram_audio_vs_voice.py:
  - voice message still transcribed (regression guard)
  - audio attachment skips STT (core fix)
  - audio attachment context note format
  - STT disabled still produces file note (not STT-disabled notice)
  - MessageType.AUDIO != MessageType.VOICE sanity check

Fixes #24870

* chore(release): map bartok9 noreply for PR #24879 salvage

* fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY

When the send_message tool runs outside the gateway process (agent loop,
TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone
path in _send_telegram constructs Bot(token=token) directly, bypassing
any configured proxy. In regions where api.telegram.org is blocked, the
send times out after ~5s with 'Telegram send failed: Timed out' and
nothing ever shows up in gateway.log because the request never reaches
the gateway.

Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url,
which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just
before constructing the Bot. When a proxy is found, attach an
HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request',
matching what gateway/platforms/telegram.py already does for in-gateway
sends and what the Discord standalone sender already does. Any
exception attaching the proxy falls back cleanly to a direct connection,
preserving prior behaviour for users without a proxy configured.

Adds tests/tools/test_send_message_telegram_proxy.py covering both the
proxy-configured and no-proxy cases.

* chore(release): map @pepelax for PR #25419 salvage

* fix(kanban-dashboard): restore implementations dropped during salvages (#28481)

Four kanban dashboard test failures, all from PR salvages that picked up
the test additions but dropped the corresponding implementations.

- BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the
  board API never grew the column → test_board_empty failed because
  VALID_STATUSES - {archived} mismatched the rendered columns).
- update_task: enrich the 'ready' 409 detail with the blocking parent
  list (id, title, status) and add _parents_blocking_ready helper.
  Implementation lost in the #26744 salvage (commit e215558ba) which
  pinned the test but not the server-side code.
- dist/index.js: add parseApiErrorMessage helper, wire it through the
  drag/drop banner, add patchErr state to the TaskDrawer and surface
  it inline by the action row. Lost in the same #26744 salvage.
- test_diagnostics_endpoint_severity_filter: update to at-or-above
  semantics (PR a94ddd807 changed the filter from exact-match so the
  warning filter now correctly includes error+critical too).

* fix(gateway): roll over Telegram tool progress bubbles

* fix(gateway): scope audio_file_paths outside media_urls guard

The audio-file-paths handling block at line 7334 references the variable
unconditionally, but #24879 initialized it inside the 'if event.media_urls'
block — so events without media_urls hit UnboundLocalError.

Found via test_run_agent_queued_message_does_not_treat_commentary_as_final
after PR #28478 landed.

* fix(gateway): keep tool-progress edits alive after Telegram flood control

When a progress-message edit hits Telegram flood control (RetryAfter),
can_edit was unconditionally set to False, permanently disabling coalescing
for the rest of the run. Subsequent tool updates were posted as separate
new messages instead of updating the existing progress bubble.

Fix: only set can_edit=False for non-recoverable edit errors. On flood
control, back off by resetting _last_edit_ts so the throttle interval is
respected before the next edit attempt.

Fixes #25188

* chore(release): map @erhnysr for PR #25198 salvage

* fix(telegram): preserve can_edit after transient network errors in progress edits (#27828)

When edit_message_text fails with a transient error (httpx.ConnectError,
NetworkError, server disconnected, timeouts), the progress-message sender
must not permanently set can_edit = False — that would convert a single
Telegram network hiccup into separate per-tool bubbles for the rest of the run.

Changes:
- gateway/platforms/telegram.py: edit_message now returns retryable=True for
  transient network errors (ConnectError, NetworkError, timeouts, server
  disconnects, temporarily unavailable). Permanent failures (flood control,
  message-not-found, permissions) remain retryable=False.
- gateway/run.py: send_progress_messages checks result.retryable before
  setting can_edit = False. Transient failures skip the fallback-send and
  continue — the next edit cycle catches up with the accumulated lines.
  Permanent failures (flood, message-not-found, etc.) still disable editing.

Tests: 22 new tests in test_telegram_progress_edit_transient.py covering
transient vs permanent error classification, SendResult.retryable semantics,
and the can_edit decision logic.

Fixes #27828

* fix(telegram): recover from post-update polling conflict without entering limbo

* fix(test+release): update conflict retry count for MAX=5; map @CryptoByz

* fix(gateway): route background-process notifications into Telegram DM topics

Background-process completion notifications (notify_on_complete) and
watch-pattern notifications were always delivered to the Telegram main
chat instead of the originating private-chat topic.

Hermes-created Telegram DM topic lanes only render a send when it carries
both message_thread_id and a reply anchor. The synthetic MessageEvent
injected on process completion had no message_id, so _reply_anchor_for_event
returned None and _thread_kwargs_for_send dropped message_thread_id
entirely — routing the notification to the main chat.

Capture the triggering message id at spawn time and thread it through to
the synthetic event so it can be reply-anchored back into the topic:

- session_context: add HERMES_SESSION_MESSAGE_ID context var
- telegram adapter: populate SessionSource.message_id on inbound messages
- terminal tool: persist watcher_message_id on the process session
- process registry: carry/persist message_id on watcher dicts + checkpoint
- gateway: set MessageEvent.message_id on injected notifications

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): map @fabiosiqueira for PR #27212 salvage

* fix(telegram): route resumed DM topic sends directly

* fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages

TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button
actions but not for inbound messages. Unauthorized users triggered an
'Unauthorized user' log warning but their messages were still processed
by the agent — a P0 security bypass (issue #23778).

Fix: add allowlist check in _should_process_message() which is called
for all message types (text, command, media, location). If the sender
is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately
with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow
all users (existing behavior).

Fixes #23778

* fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty

The _is_callback_user_authorized fallback returned True when
TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user
to interact with the bot. Change to fail-closed: deny by default
unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.

Fixes #24457

* test(telegram): stub _is_callback_user_authorized in trigger-gating fixture

After PR #24468 made the empty-allowlist callback auth fail-closed
(and #23795 wired _is_callback_user_authorized into _should_process_message),
trigger-gating tests started failing because their fake messages from
user 111 hit the new deny-by-default path before trigger evaluation.

Force-authorize all senders in _make_adapter() so the trigger logic
under test runs.  The fail-closed behavior itself is covered by
test_telegram_callback_auth_fail_closed.py.

* fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS

When a sticky fallback IP (from DoH discovery) becomes unreachable,
the transport previously got stuck in an attempt_order that only
tried the dead IP.  This prevented the gateway from recovering
until the service was restarted.

Changes:
- Always include primary DNS path (None) after the sticky IP in the
  attempt_order so that a primary-path retry happens on sticky failure.
- Reset self._sticky_ip to None when the currently sticky IP hits
  a connect timeout / connect error, allowing the next request to
  retry from scratch.

Fixes silent Telegram disconnection when discovered fallback IPs
are transiently or permanently unreachable.

* test+release: align stale sticky-IP test for #24511; map @falconexe

* fix(telegram): propagate extra base_url config

* feat(send_message): auto-detect @username mentions and create Telegram entities

When sending messages containing @username patterns, auto-generate
MessageEntity(type='mention') entries so that the receiving bot's
require_mention filter can trigger. This enables proper bot-to-bot
interop where mention-based routing is used.

* test+release: align send_message mocks for MessageEntity import; map @fonhal

* fix(telegram): resume typing indicator after inline approval click (#27853)

The text /approve and /deny paths in gateway/run.py call
resume_typing_for_chat() after resolve_gateway_approval() succeeds, but
the Telegram inline-button (ea:*) callback in _handle_callback_query did
not. Typing is paused when the approval is sent (gateway/run.py:15658),
so without a matching resume the typing indicator stayed gone for the
remainder of a long-running turn after a button click.

Symmetry-match the text path: after a successful resolve, call
self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0
to match /approve's "if not count" early-return — if nothing was
actually resolved, the agent thread was never unblocked, so typing
should remain paused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): mark final voice reply as notify-worthy so Telegram delivers it audibly

In Telegram "important" notifications mode (default), TelegramPlatformAdapter
sets ``disable_notification=True`` on every send unless metadata carries
``notify=True``.  GatewayRunner._send_voice_reply already passes thread
metadata through to ``adapter.send_voice``, but never marks the final
auto-TTS voice reply as notify-worthy — so users with the default mode get
the final voice note delivered silently with no push notification.

Mirror the final-text path in gateway/platforms/base.py (the existing
text-response final send already adds ``metadata["notify"] = True``).

Issue #27970 Bug 2.  Bug 1 (MP3 vs. native OGG voice-note) is being
addressed by existing PRs #20182 / #20878 — this PR is intentionally
scoped to the silent-delivery bug only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: avoid Telegram group reply thread session splits

* chore(release): map @eliteworkstation94-ai for PR #28157 salvage

* fix(gateway): avoid duplicate Telegram text after auto-TTS voice replies

* chore(release): map @Zyrixtrex for PR #26754 salvage

* fix(telegram): escape send_slash_confirm preview with format_message

send_slash_confirm() sent the raw command preview with ParseMode.MARKDOWN,
skipping the format_message() conversion applied to every other dynamic
send in the adapter. Commands with underscores, dots, brackets, or other
MarkdownV2-sensitive characters raised BadRequest: Can't parse entities;
the exception was swallowed by the outer try/except, so the confirmation
prompt silently never appeared.

Fix: wrap preview through format_message() and switch to MARKDOWN_V2,
symmetric with send_update_prompt and the callback sends fixed in
a69404052.

* chore(release): map @nftpoetrist for PR #25856 salvage

* fix(telegram): retry wrapped connect timeouts

* chore(release): map @samahn0601 for PR #27887 salvage

* fix(tts): keep native audio outside Telegram voice delivery

* chore(release): map @aqilaziz for PR #26406 salvage

* fix(gateway): pin Telegram DM-topic routing to user's current topic

Topic-mode DM replies were fragmenting one conversation across many sessions: a Reply on a message in another topic delivered Telegram's message_thread_id for *that* topic, and #3206's strip routed plain replies to the lobby. Both pulled the user away from their current session. Fix: when topic mode is on, rewrite source.thread_id to the user's most-recent binding if the inbound id is missing/General or not a known topic. Non-topic-mode users unchanged.

* chore(release): map @karthikeyann for PR #26609 salvage

* fix(send_message): add thread-not-found retry for Telegram forum topic sends

The standalone _send_telegram path in send_message_tool lacked the
thread-not-found fallback that the gateway adapter has. When a forum
topic thread_id was stale or deleted, the send would fail entirely
instead of retrying to the General topic.

Changes:
- Add _is_telegram_thread_not_found() helper matching gateway adapter
- Add thread-not-found retry in text send path
- Add thread-not-found retry in media send path (with f.seek(0))
- Separate text_kwargs from thread_kwargs to prevent
  disable_web_page_preview leaking into send_photo/send_video calls

Closes #27012

* test(send_message): add thread-not-found retry tests for Telegram forum topics

Adds two tests to TestSendTelegramThreadIdMapping:
- test_thread_not_found_retries_without_message_thread_id
- test_thread_not_found_for_media_retries_without_message_thread_id

Refs #27012

* test(send_message): add thread-not-found retry tests for Telegram topics

Three tests covering the #27012 fix:
- test_is_thread_not_found_matches_expected_errors
- test_text_send_retries_without_thread_id_on_thread_not_found
- test_disable_web_page_preview_not_leaked_to_media_sends

116/116 existing tests still pass (no regressions).

* chore(release): map @kunci115 for PR #27098 salvage

* fix(gateway): register Telegram commands for groups

Register Telegram bot commands across default, private, and group scopes so
the slash-command menu is available outside DMs.

Changes from review feedback:
- Add asyncio.Lock to prevent race condition in _ensure_forum_commands
- Extract MAX_COMMANDS_PER_SCOPE constant (30) to avoid magic number
- Upgrade error logging from debug->warning in forum registration
- Add tests covering lazy forum registration and concurrent safety
- Remove /start handler from this PR (separate feature)

Fixes review: needs_work (race, magic number, log levels, missing tests)

* test+release: fix test fixture for forum_commands; map @chromalinx

* fix(telegram): gate profile bots by allowed topics

* chore(release): map @booker1207 for PR #25132 salvage

* fix(cron): route Telegram cron deliveries to a dedicated topic via TELEGRAM_CRON_THREAD_ID

When Telegram topic mode is enabled, cron messages delivered to the bot's
root DM (TELEGRAM_HOME_CHANNEL without a thread id) land in the system
lobby — replies there are rebuffed with the lobby reminder and
reply_to_message_id is dropped, so users cannot interact with the cron
output (#24409).

Add an optional TELEGRAM_CRON_THREAD_ID env var that overrides
TELEGRAM_HOME_CHANNEL_THREAD_ID for cron deliveries only. Operators can
create a "Cron" forum topic in the DM, point this var at its thread id,
and replies to cron messages will land in that topic's existing session
instead of the lobby. The home-channel thread id (used elsewhere, e.g.
restart notifications) is unchanged, and explicit
deliver="telegram:chat:thread" targets continue to win over the env var.

Per the reporter's clarification on 2026-05-13, option (a) (cron-side
route to a dedicated topic + config knob) was chosen.

Fixes #24409

* fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline

When users send images as documents (Telegram file picker), they were
rejected with "Unsupported document type" because SUPPORTED_DOCUMENT_TYPES
only includes text/office formats. Add SUPPORTED_IMAGE_DOCUMENT_TYPES
to base.py and handle them in telegram.py before the document check.

- Add SUPPORTED_IMAGE_DOCUMENT_TYPES constant to base.py
- Add MIME reverse-lookup for image types in telegram.py
- Route image documents through cache_image_from_bytes + vision pipeline
- Handle media groups for image documents

Closes: #20128, #18620

* test+release: stub auth in test_telegram_documents fixture; map @kiranvk-2011

* fix(gateway): prevent Windows Telegram /restart leaving gateway stopped

* chore(release): map @rak135 for PR #25960 salvage

* fix(telegram): preserve topic metadata on overflow edits

* feat(telegram): add disable_topic_auto_rename gateway flag

When Hermes auto-titles a session in a Telegram DM topic it currently
renames the topic itself to the generated title. That works for
operator-managed lanes (extra.dm_topics) but is disruptive for
ad-hoc Threaded-Mode topics that users name by hand — every first
exchange overwrites their chosen title.

Add gateway.platforms.telegram.extra.disable_topic_auto_rename (default
False, preserving prior behaviour). When set, both
_schedule_telegram_topic_title_rename and the underlying
_rename_telegram_topic_for_session_title short-circuit before touching
the Telegram API. Internal session titles (sessions list, TUI) keep
working unchanged.

Also bridge the legacy top-level telegram.disable_topic_auto_rename key
through to gateway.platforms.telegram.extra so users on the older
config layout don't have to migrate to enable it.

- Tests cover the runtime flag, the scheduling entry-point, and string
  truthiness coercion for YAML-loaded values.
- Docs updated in messaging/telegram.md with an example block.

* chore(release): map @B0Tch1 for PR #27634 salvage

* fix(gateway): restore Telegram DM topic thread_id after session split (#27166)

When context compression triggers a mid-turn session split, source.thread_id
can be None on synthetic/recovered events. _thread_metadata_for_source then
returns None, causing the Telegram adapter to send with no message_thread_id
and the response lands in the General thread instead of the active DM topic.

Fix:
- hermes_state.py: Add get_telegram_topic_binding_by_session() for reverse
  lookup by session_id (enabled by the existing UNIQUE INDEX on session_id).
- gateway/run.py: After session-split detection, if source is a Telegram DM
  and source.thread_id is None, recover it from the binding via the new
  method so _thread_metadata_for_source produces the correct thread routing.
- tests/: Coverage for the new lookup method and the recovery flow.

* chore(release): map @jackjin1997 for PR #27239 salvage

* fix(gateway): allow chat-scoped telegram auth without sender user_id

* chore(release): map @soynchux for PR #27806 salvage

* fix(telegram): add DM topic typing fallback when message_thread_id rejected

When a DM topic lane's message_thread_id is rejected by Telegram
(e.g. stale or deleted topic), send_typing now falls back to sending
the typing indicator without thread_id so it at least appears in the
main DM view, rather than being silently swallowed.

Also adds test for the fallback behavior.

* fix(telegram): report cron topic fallback

* chore(release): map @el-analista for PR #25368 salvage

* fix(telegram): wire gt: callback dispatch for gmail-triage buttons

The gmail-triage skill's Telegram inline buttons emit callback_data of the
form `gt:<verb>:<arg>`, but `_handle_callback_query` had no `gt:` branch —
taps fell through silently and the spinner sat there until Telegram timed it
out.

Add `_handle_gmail_triage_callback`, dispatched from the existing callback
router, that:

- Authorizes the caller via the same `_is_callback_user_authorized` path as
  the approval / slash-confirm / clarify handlers.
- Maps each verb to a script under `~/.hermes/scripts/gmail-triage/` and runs
  it async with a 60s timeout.
- Splits verbs into one-shots (send / archive / draft / spam) — append the
  confirmation and strip the keyboard so the action can't fire twice — and
  sticky-state changes (mute / trust / vip ± -domain) — append the
  confirmation but leave the keyboard tappable so the user can stack actions
  on one email.
- On failure: toast only, keyboard preserved so the user can retry.
- Logs every callback outcome to gateway.log for debugging.

* chore(release): map @khungate for PR #25829 salvage

* feat(telegram): support quick-command-only menus

* chore(release): map @stevehq26-bot for PR #28015 salvage

* fix(telegram): handle channel post updates

* test: address telegram channel post review

* test+release: stub auth in channel_posts fixture; map @brndnsvr

* Quiet noisy Telegram gateway errors

* chore(release): map oracle@jarviss-mbp.home for PR #24014 salvage

* Route Telegram multi-bot mentions exclusively

* Document Telegram multi-profile gateway commands

* fix: ignore Telegram messages for other bots

* chore(release): map @OCWC22 for PR #24581 salvage

* feat(telegram): ignore_root_dm with system command lobby

* docs(telegram): document ignore_root_dm feature

* chore(release): map @ai-hana-ai for PR #23928 salvage

* feat(telegram): pin incoming user message for duration of agent turn

When a user sends a message on Telegram, the incoming message is now
automatically pinned at the start of processing and unpinned when the
agent finishes its turn. This gives the user a visual indicator that
their message is being worked on, and keeps the conversation anchored.

Changes:
- telegram.py: Added pinChatMessage in on_processing_start and
  unpinChatMessage in on_processing_complete. Restructured both
  hooks so pin/unpin runs independently of the reactions feature
  (reactions are optional; pinning is always on).
- telegram.py: Pass message_id through SessionSource so it's
  available in the session context.
- session_context.py: Added HERMES_SESSION_MESSAGE_ID context var.
- run.py: Pass source.message_id through set_session_vars.

Pinning is silent (disable_notification=True) and failures are
logged at debug level without interrupting message processing.
Only the user's incoming message is pinned -- never the agent's
replies. Auto-resume events (which have no message_id) are
correctly skipped.

* chore(release): map @indigokarasu for PR #26636 salvage

* feat(telegram): skip-STT audio path + 2GB cap via local Bot API server

Two coordinated changes that unblock downstream audio pipelines
(diarization, custom transcription, archival) on attachments larger
than the public Bot API's 20MB getFile ceiling.

- `stt.enabled: false` no longer drops voice/audio with a generic
  "transcription disabled" note. The gateway probes the cached file's
  duration (wave → mutagen → ffprobe ladder) and surfaces
  `[The user sent a voice message: <abs path> (duration: M:SS)]` to
  the agent so a skill or tool can pick up the raw file. The previous
  placeholder is replaced rather than appended when present.

- `platforms.telegram.extra.base_url` set → adapter auto-lifts its
  document size cap from 20MB to 2GB (the local telegram-bot-api
  `--local` ceiling) and the "too large" reply reports the active
  limit dynamically. No new config knob; presence of `base_url` is the
  opt-in.

- `platforms.telegram.extra.local_mode: true` wires
  `Application.builder().local_mode(True)` on the python-telegram-bot
  builder. PTB then reads files from disk instead of HTTP, which is
  required when telegram-bot-api runs in `--local` mode (the server
  returns absolute filesystem paths, not `/file/bot...` URLs).

- gateway/run.py: rewrites the `stt.enabled: false` branch of
  `_enrich_message_with_transcription`. New `_format_duration` +
  `_probe_audio_duration` helpers.
- gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute
  derived from `extra.base_url`; `local_mode` builder wiring;
  dynamic "too large" message.
- tests/gateway/test_stt_config.py: covers path-surfacing with and
  without an existing user message, and placeholder replacement.
- tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB
  without base_url, 2GB when set, empty-string base_url keeps default.
- website/docs/user-guide/messaging/telegram.md: new "Skipping STT"
  subsection under Voice Messages and a full "Large Files (>20MB) via
  Local Bot API Server" walkthrough (api_id/api_hash, docker-compose,
  one-time `logOut` migration, `platforms.telegram.extra` config, the
  `local_mode` disk-access requirement, the silent HTTP-fallback 404).
- website/docs/user-guide/features/voice-mode.md: documents the
  `stt.enabled` knob in the config reference.

- `pytest tests/gateway/test_telegram_max_doc_bytes.py
  tests/gateway/test_stt_config.py` → 9/9 passing.
- Verified end-to-end on a live deployment: gateway log shows
  `Using custom Telegram base_url: http://...` and
  `Using Telegram local_mode (read files from disk)` on startup;
  voice messages above 20MB cache to disk and surface their path to
  the agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(release): map @alber70g for PR #25280 salvage

* fix(web): add scheduled column to i18n type definitions (#28549)

columnLabels and columnHelp in en.ts include a scheduled entry but the
Translations interface in types.ts did not declare it, causing a
TypeScript build failure in the Nix derivation. Made the field optional
since only en.ts provides it currently.

* docs: comprehensive 2-week sweep of feature/PR coverage gaps (#28497)

Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR #27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR #24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  #28116 / #28118 / #28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs #27663 / #19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR #28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR #25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR #26824).
- `x_search` auto-enables when xAI credentials are present (PR #27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  #26534).
- NVIDIA NIM billing-origin header is set automatically (PR #26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR #28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  #27822).
- Document `dep_ensure` Windows bootstrap (PR #27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR #27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR #26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR #21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR #27245).
- Discord clarify-choice button rendering (PR #25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  #22759).
- Telegram `notifications` mode (`important` vs `all`) (PR #22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR #21210).

CLI / TUI
- `/new [name]` argument (PR #19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR #25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR #22687).
- Status-bar additions: ▶ N background indicator (PR #27175), context
  compression count (PR #21218), YOLO mode banner+statusbar warning (PR
  #26238).
- `display.timestamps` + `docker_extra_args` config keys (PR #23599).
- TUI collapsible startup banner sections (PR #20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR #22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs #27590 / #27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR #21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR #21337).
- ACP session-scoped edit auto-approval modes (PR #27862).
- Curator rename map in the user-visible per-run summary (PR #22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR #23828).
- Cron per-job profile parameter (PR #28124).
- `--no-skills` flag for `hermes profile create` (PR #20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).

* chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 9 (#28571)

Pre-stages AUTHOR_MAP entries for 9 new/under-mapped contributors whose
PRs are being salvaged in the May 2026 LHF batch group 9.

Contributors:
- jdelmerico (#28278 — signal require_mention filter)
- justemu (#27996 — matrix thread_require_mention)
- YuanHanzhong (#28029 — dashboard browser scrollback)
- noctilust (#28080 — drop stale TUI resume env)
- MoonJuhan (#28288 — tolerate unreadable JSONL transcripts)
- outsourc-e (#28164 — cron emoji ZWJ sequences)
- Zyrixtrex (#28275 — Google OAuth urlopen timeout)
- ooovenenoso (#28256 — tool loop recovery hints)
- vanthinh6886 (#28018 — yaml/flock/atomic write guards; non-noreply email)

Per references/batch-pr-salvage-may14-additions.md.

* feat(signal): add require_mention filter for group chats

Add a configurable mention filter to the Signal adapter so the bot
only responds in groups when it is explicitly @mentioned.

Changes:
- gateway/platforms/signal.py: read require_mention from adapter
  extra config or SIGNAL_REQUIRE_MENTION env var; skip group messages
  that don't mention the bot account (checked in rendered text and
  raw mention metadata)
- gateway/config.py: map signal.require_mention YAML key to the
  SIGNAL_REQUIRE_MENTION env var (env var takes precedence)

Config example:
  signal:
    require_mention: true

Or via env var:
  SIGNAL_REQUIRE_MENTION=true

* Revert "feat(telegram): pin incoming user message for duration of agent turn"

This reverts commit a724c3b9cf5f01e28365322ae5ae3a9579567806.

* Revert "feat(telegram): support quick-command-only menus"

This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4.

* Revert "feat(send_message): auto-detect @username mentions and create Telegram entities"

This reverts commit cf814c96f613b38bd891ac941c32da653e81c7ad.

* Revert "fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages"

This reverts commit db50af910be6b4171ea9cf54f4cc38be27ac1da6.

* fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856)

Pre-mark all running agent sessions as resume_pending BEFORE the drain
wait begins. If the service manager kills the process during the drain
(window), the durable marker is already written so the next gateway boot
can recover in-flight sessions. On graceful drain completion, clear the
early markers for sessions that finished successfully.

* fix(matrix): implement thread_require_mention to prevent multi-agent reply loops

In multi-agent shared Matrix rooms, multiple bots all participating in the
same thread could trigger infinite reply loops — each bot's reply re-engaged
the others because they were all in the bot-thread set. Discord has a
`thread_require_mention` opt-in for this; Matrix didn't.

Add `_parse_thread_require_mention(config)` (mirrors Discord's pattern).
In `_resolve_message_context`, when enabled and the message is in a
bot-participated thread (not a free-response room), require @mention
before processing.

Salvage of @justemu's 2-commit stack (#27996). Fixes #27995.

* fix(cli): show active profile in TUI prompt

* fix(tui): preserve dunder identifiers in markdown

* test(file_ops): add regression tests for git baseline warning in write_file

Adds TestGitBaselineCheck with 6 unit tests covering _check_git_baseline
and the warning field in write_file result:
- Git not available → None
- Not in a git repo → None
- Clean repo → None
- Dirty repo → returns warning string with branch name
- write_file result includes warning when dirty
- write_file result omits warning when clean

* fix(dashboard): use browser scrollback for chat wheel

* fix(cli): ignore stale HERMES_TUI_RESUME env

HERMES_TUI_RESUME is an internal env var the Python wrapper exports to hand
a session ID off to the Ink TUI. Because _launch_tui started from
os.environ.copy(), any exported/stale value in the user's shell leaked
through — so plain `hermes --tui` would try to resume a missing session
and leave the UI at 'error: session not found' with no live session.

Drop HERMES_TUI_RESUME from the env before conditionally re-setting it
from the argparse-resolved resume_session_id. Tests cover both the drop
path and the set-from-arg path.

Salvage of #28080 by @noctilust.

* fix(cron): allow emoji ZWJ sequences in prompts

* fix: tolerate unreadable gateway JSONL transcripts

* fix(skills): add timeout to Google OAuth urlopen calls

* fix: add recovery hints to loop guard warnings

* fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes

1. trajectory_compressor.py: yaml.safe_load() returns None on empty
   files, crashing with TypeError on `if 'tokenizer' in data`. Fix by
   adding `or {}` fallback. (HIGH — blocks startup with empty config)

2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without
   try/except: cron/scheduler.py, hermes_cli/auth.py,
   agent/shell_hooks.py, tools/skill_usage.py,
   tools/environments/file_sync.py, tools/memory_tool.py. If unlock
   raises OSError, fd.close() is skipped and the lock is held forever.
   The msvcrt branches already had try/except; the fcntl branches did
   not. Fix by wrapping in try/except (OSError, IOError): pass.

3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists()
   followed by path.read_text() with no try/except. If file is deleted
   between the check and the read, FileNotFoundError propagates. Fix
   by using try/except FileNotFoundError.

4. gateway/sticker_cache.py: non-atomic write via Path.write_text()
   can leave truncated JSON on crash, causing JSONDecodeError on next
   load. Fix by writing to tempfile + fsync + os.replace (atomic).

* chore(release): alias xxxigm noreply for upcoming #27986 salvage (#28594)

Adds the canonical noreply form (54813621+xxxigm@users.noreply.github.com)
alongside the existing plain-email mapping so the salvage commit for
@xxxigm's codex doctor PR doesn't fail AUTHOR_MAP CI.

* fix(doctor): attach codex CLI hint to OpenAI Codex auth warning for #27975

`hermes doctor` printed 'codex CLI not installed (optional — ...)' as a
generic info line at the bottom of the auth section, several rows below
'OpenAI Codex auth (not logged in)' and after MiniMax/Gemini auth checks.
Users reading sequentially mistook it for MiniMax-related advice.

Move the hint up under the Codex auth warning so it's adjacent to the
row it actually pertains to. Behavior unchanged when the codex CLI is
installed (success path keeps its 'codex CLI ✓' row at the bottom).
Tests cover both placement and suppression cases.

Salvage of @xxxigm's 3-commit stack (#27986).
Closes #27975.

* fix(tests): catch up 25 stale tests after recent merges (#28626)

Sweep of all CI failures on origin/main, grouped by drift source:

Telegram allowlist gate (db50af910 added user-authz to _should_process_message):
- Hardcoded "[Telegram]" prefix in the logger.warning so the call no
  longer dereferences self.name → self.platform, which test fixtures
  built via object.__new__ never set.
- test_telegram_format / test_allowed_channels_widening fixtures stub
  _is_callback_user_authorized → True so the new gate doesn't reject
  guest-mode / allowed-channels test messages.
- test_telegram_approval_buttons::test_update_prompt_callback_not_affected
  sets TELEGRAM_ALLOWED_USERS="*" so the fail-closed default doesn't
  reject the callback before it writes .update_response.

Approval surface (6d495d9e7 renamed status, 214b95392 detached stdin):
- test_no_callback_returns_approval_required: status is now
  "pending_approval" (was "approval_required").
- test_close_stdin_allows_eof_driven_process_to_finish: switch to
  use_pty=True; non-PTY now uses stdin=DEVNULL.

Mattermost (send() now resolves root_id via _api_get first):
- test_send_with_thread_reply mocks _session.get with a thread-root
  response so the new resolver doesn't TypeError on a bare AsyncMock.

Kanban (d8ad431de rename, f55d94a1e review column, _kanban_worker_skill_available):
- _safe_int → _to_epoch in the two test_kanban_db tests.
- Spawn-skills tests (×3) monkey-patch _kanban_worker_skill_available
  to True since the isolated kanban_home fixture has no devops/kanban-worker tree.
- test_gateway_dispatcher_disables_corrupt_board: connect count
  3 → 5 (review-column probe now also runs per tick).

Aux-config severity at_or_above (a94ddd807):
- test_diagnostics_endpoint_severity_filter expects warning filter to
  include error+critical now (was exact-match).

Anthropic error handling (conversation loop extracted from run_agent):
- _no_backoff_wait fixture patches BOTH run_agent.jittered_backoff AND
  agent.conversation_loop.jittered_backoff. The latter is the actual
  call site; without the second patch tests burn ~2s per retry and
  hit the 30s SIGALRM timeout on CI.

Other test pollution / drift:
- test_auto_does_not_select_copilot_from_github_token: patch
  agent.bedrock_adapter.has_aws_credentials → False so boto3's
  credential chain can't auto-pick Bedrock from developer ~/.aws.
- test_setup_openclaw_migration: patch hermes_cli.gateway.get_env_value
  in addition to setup_mod.get_env_value — _platform_status reads
  through the gateway module's binding.
- test_gateway_prefix: COMPONENT_PREFIXES["gateway"] now includes
  "hermes_plugins" too.
- test_recommended_update_command_defaults_to_hermes_update: also
  short-circuit get_managed_update_command in case a stray
  ~/.hermes/.managed marker is present.
- test_user_id_is_not_explicit: _parse_target_ref now returns
  is_explicit=False for Slack U.../W... IDs (chat.postMessage rejects
  them — a DM must be opened first via conversations.open).

* feat(update): syntax-validate critical files post-pull, auto-rollback on failure (#28669)

Catch the PR #28452 failure mode (orphan merge-conflict markers in
hermes_cli/config.py) on the user side: after git pull succeeds, compile
the files every 'hermes' invocation imports at startup. If any has a
syntax error, git reset --hard back to the pre-pull SHA so the install
stays bootable. User can retry once a fix lands upstream.

- New _capture…
Lillard01 pushed a commit to Lillard01/hermes-agent that referenced this pull request May 21, 2026
…earch#28497)

Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR NousResearch#27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR NousResearch#25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR NousResearch#24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  NousResearch#28116 / NousResearch#28118 / NousResearch#28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs NousResearch#27663 / NousResearch#19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR NousResearch#28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR NousResearch#25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR NousResearch#26824).
- `x_search` auto-enables when xAI credentials are present (PR NousResearch#27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  NousResearch#26534).
- NVIDIA NIM billing-origin header is set automatically (PR NousResearch#26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR NousResearch#28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  NousResearch#27822).
- Document `dep_ensure` Windows bootstrap (PR NousResearch#27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR NousResearch#27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR NousResearch#26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR NousResearch#21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR NousResearch#27245).
- Discord clarify-choice button rendering (PR NousResearch#25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  NousResearch#22759).
- Telegram `notifications` mode (`important` vs `all`) (PR NousResearch#22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR NousResearch#21210).

CLI / TUI
- `/new [name]` argument (PR NousResearch#19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR NousResearch#25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR NousResearch#22687).
- Status-bar additions: ▶ N background indicator (PR NousResearch#27175), context
  compression count (PR NousResearch#21218), YOLO mode banner+statusbar warning (PR
  NousResearch#26238).
- `display.timestamps` + `docker_extra_args` config keys (PR NousResearch#23599).
- TUI collapsible startup banner sections (PR NousResearch#20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR NousResearch#23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR NousResearch#22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs NousResearch#27590 / NousResearch#27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR NousResearch#21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR NousResearch#21337).
- ACP session-scoped edit auto-approval modes (PR NousResearch#27862).
- Curator rename map in the user-visible per-run summary (PR NousResearch#22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR NousResearch#23828).
- Cron per-job profile parameter (PR NousResearch#28124).
- `--no-skills` flag for `hermes profile create` (PR NousResearch#20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).
bot-ted added a commit to bot-ted/hermes-agent that referenced this pull request May 21, 2026
* feat(kanban): configure worktree paths and branches

Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so
tasks with workspace_kind='worktree' can pin a target branch on
create. Schema migration added to _migrate_add_optional_columns.

- Task.branch_name field + DB column + migration
- create_task accepts branch_name kwarg
- hermes kanban create --branch <name> flag
- kanban show output includes 'Branch: <name>' when set

Cherry-picked the substantive commit (a7558cf27); the PR's tip was
an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list
and show-output conflicts alongside main's session_id and
max_runtime_seconds additions; kept all three.

* feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373)

Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.

Use cases:
- /backend-dev → loads github-code-review + test-driven-development
  + github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.

Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
  at invocation time, the same way /<skill> does. Prompt cache stays
  intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
  /help display.

Schema (~/.hermes/skill-bundles/<slug>.yaml):
    name: backend-dev
    description: Backend feature work.
    skills:
      - github-code-review
      - test-driven-development
    instruction: |
      Optional extra guidance prepended to the loaded skills.

New module: agent/skill_bundles.py — load, scan, resolve, build
invocation message, save, delete. yaml.safe_load only; broken
bundles log a warning and are skipped, never raise.

New CLI subcommand: hermes bundles {list,show,create,delete,reload}.
Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py.
'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it.

New in-session slash command: /bundles lists installed bundles in
both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py)
and gateway (gateway/run.py) before the existing /<skill-name> path.

Autocomplete: SlashCommandCompleter gained an optional
skill_bundles_provider parameter that defaults to None — the prompt
shows '▣ <description> (N skills)' for bundles vs '⚡' for skills.

Tests:
- tests/agent/test_skill_bundles.py — 33 tests covering slugify,
  scan/cache freshness, resolve (including underscore→hyphen
  Telegram alias), build_bundle_invocation_message (loading, missing
  skills, user/bundle instruction injection, dedup), save/delete,
  reload diff, list sort.
- tests/hermes_cli/test_bundles.py — 8 tests for the CLI
  subcommand (create/list/show/delete/reload, --force, missing
  bundle errors).
- tests/gateway/test_bundles_command.py — 4 tests for the gateway
  handler and bundle resolution priority.

Live E2E: verified subprocess invocations of hermes bundles
{list,create,show,reload,delete} round-trip correctly against an
isolated HERMES_HOME.

Docs:
- website/docs/user-guide/features/skills.md — new 'Skill Bundles'
  section with quick example, YAML schema, management commands,
  behavior notes.
- website/docs/reference/cli-commands.md — 'hermes bundles' added to
  the top-level command table and given its own subcommand section.

* feat(kanban): add scheduled status for delayed follow-ups

Salvages #24533 by @roycepersonalassistant. Adds a first-class
'scheduled' Kanban status for time-delay follow-ups that aren't
waiting on human input.

- hermes kanban schedule <task_id> [reason] CLI command
- Dashboard/API transitions to/from Scheduled
- unblock_task() now releases both 'blocked' AND 'scheduled' tasks
  (re-checking parent dependencies before moving to ready/todo)
- i18n + docs updates

Resolved conflicts: kept HEAD's failure-counter reset on unblock
alongside the PR's scheduled state, kept HEAD's 'running' direct-set
rejection, combined both bulk-status branches. Dropped the dist/
bundle changes (months-stale; would need rebuild from source).

* feat(kanban): drag-to-delete trash zone + bulk delete for task cards

Salvages #28125 by @Jpalmer95. Adds:
- Drag-to-delete trash zone in the kanban dashboard
- Bulk delete endpoint with cascading delete_task cleanup
- Frontend updates (drag visual + drop handler)
- Confirmation prompt before delete

Resolved end-of-file test conflict by appending both halves.

* docs: add Korean Kanban documentation

Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and
translates Kanban documentation (kanban.md, kanban-tutorial.md) and the
two related skills (devops-kanban-orchestrator, devops-kanban-worker).

Purely additive — adds ko to the locales list in docusaurus.config.ts
and creates the website/i18n/ko/ tree.

* fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465)

- aux_config: drop session_search from _AUX_TASKS and remove stale test
  (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG)
- compression_boundary_hook: set compressor._last_compress_aborted=False
  on MagicMock so the post-compress abort branch (PR #28117) doesn't
  short-circuit before the session-id rotation under test
- kanban_dashboard_plugin: use consecutive_failures=3 so severity stays
  'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a,
  so failures=5 now crosses the critical floor of 2*2=4)
- cli_manual_compress: accept force kwarg on DummyAgent._compress_context
  (cli._manual_compress now passes force=True)

* fix(telegram): render full clarify choice text in message body, use short button labels

When Telegram clarify prompts offer long choices, mobile clients
truncate the inline button labels, making options unreadable.
Previously only the question was shown in the message body with
truncated choice text in button labels.

Fix: append the full numbered option list to the message body
so users can read complete choice text on any client.  Buttons
now use short numeric labels (1, 2, ...) to avoid Telegram
truncation.  The 'Other (type answer)' button is unchanged.

Long choice labels are now rendered in full (not truncated to
57 chars + '...') since they appear in the body instead of
button labels.

Closes: #27497

* chore(release): map @asdlem for PR #27852 salvage

* fix(telegram): default streaming transport to edit

* fix(telegram): respect reply_to_mode for DM topic reply fallback

The DM topic reply fallback code in send() hardcoded should_thread=True
when telegram_dm_topic_reply_fallback metadata was present, bypassing
_should_thread_reply() and ignoring reply_to_mode config. This caused
quote bubbles on every response even with reply_to_mode: 'off'.

Fix:
- Add reply_to_mode param to _reply_to_message_id_for_send() and
  _thread_kwargs_for_send() classmethods
- In send(), check self._reply_to_mode != 'off' for DM topic fallback
- Suppress reply anchor and reply_to_message_id when mode is 'off'
  while preserving message_thread_id for correct topic routing
- Thread reply_to_mode through all 29 call sites

Regression coverage: 10 new tests in test_telegram_reply_mode.py
covering classmethod behavior, send() integration, and backward
compatibility.

Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994

* fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870)

Telegram distinguishes three kinds of audio payloads:
  - message.voice  → Opus/OGG voice messages  → STT pipeline  ✓
  - message.audio  → audio file attachments   → bypasses STT  ← was broken
  - message.document (audio mime) → generic file route

**Root cause** — the inbound message routing block in gateway/run.py
matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths,
which were then fed unconditionally to _enrich_message_with_transcription.
Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed
instead of being treated as files, making the transcribe skill unusable
from Telegram because the path it needed was never surfaced.

**Fix**
- Introduce a new audio_file_paths list populated exclusively by
  MessageType.AUDIO events.
- Narrow the audio_paths selector to MessageType.VOICE (and bare
  audio/ mime-type events that are not explicitly AUDIO or DOCUMENT).
- After the STT block, inject a document-style context note for each
  audio_file_path, giving the agent the file path and asking what to do
  with it (consistent with how plain documents are handled).

**Tests** — 5 new tests in test_telegram_audio_vs_voice.py:
  - voice message still transcribed (regression guard)
  - audio attachment skips STT (core fix)
  - audio attachment context note format
  - STT disabled still produces file note (not STT-disabled notice)
  - MessageType.AUDIO != MessageType.VOICE sanity check

Fixes #24870

* chore(release): map bartok9 noreply for PR #24879 salvage

* fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY

When the send_message tool runs outside the gateway process (agent loop,
TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone
path in _send_telegram constructs Bot(token=token) directly, bypassing
any configured proxy. In regions where api.telegram.org is blocked, the
send times out after ~5s with 'Telegram send failed: Timed out' and
nothing ever shows up in gateway.log because the request never reaches
the gateway.

Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url,
which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just
before constructing the Bot. When a proxy is found, attach an
HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request',
matching what gateway/platforms/telegram.py already does for in-gateway
sends and what the Discord standalone sender already does. Any
exception attaching the proxy falls back cleanly to a direct connection,
preserving prior behaviour for users without a proxy configured.

Adds tests/tools/test_send_message_telegram_proxy.py covering both the
proxy-configured and no-proxy cases.

* chore(release): map @pepelax for PR #25419 salvage

* fix(kanban-dashboard): restore implementations dropped during salvages (#28481)

Four kanban dashboard test failures, all from PR salvages that picked up
the test additions but dropped the corresponding implementations.

- BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the
  board API never grew the column → test_board_empty failed because
  VALID_STATUSES - {archived} mismatched the rendered columns).
- update_task: enrich the 'ready' 409 detail with the blocking parent
  list (id, title, status) and add _parents_blocking_ready helper.
  Implementation lost in the #26744 salvage (commit e215558ba) which
  pinned the test but not the server-side code.
- dist/index.js: add parseApiErrorMessage helper, wire it through the
  drag/drop banner, add patchErr state to the TaskDrawer and surface
  it inline by the action row. Lost in the same #26744 salvage.
- test_diagnostics_endpoint_severity_filter: update to at-or-above
  semantics (PR a94ddd807 changed the filter from exact-match so the
  warning filter now correctly includes error+critical too).

* fix(gateway): roll over Telegram tool progress bubbles

* fix(gateway): scope audio_file_paths outside media_urls guard

The audio-file-paths handling block at line 7334 references the variable
unconditionally, but #24879 initialized it inside the 'if event.media_urls'
block — so events without media_urls hit UnboundLocalError.

Found via test_run_agent_queued_message_does_not_treat_commentary_as_final
after PR #28478 landed.

* fix(gateway): keep tool-progress edits alive after Telegram flood control

When a progress-message edit hits Telegram flood control (RetryAfter),
can_edit was unconditionally set to False, permanently disabling coalescing
for the rest of the run. Subsequent tool updates were posted as separate
new messages instead of updating the existing progress bubble.

Fix: only set can_edit=False for non-recoverable edit errors. On flood
control, back off by resetting _last_edit_ts so the throttle interval is
respected before the next edit attempt.

Fixes #25188

* chore(release): map @erhnysr for PR #25198 salvage

* fix(telegram): preserve can_edit after transient network errors in progress edits (#27828)

When edit_message_text fails with a transient error (httpx.ConnectError,
NetworkError, server disconnected, timeouts), the progress-message sender
must not permanently set can_edit = False — that would convert a single
Telegram network hiccup into separate per-tool bubbles for the rest of the run.

Changes:
- gateway/platforms/telegram.py: edit_message now returns retryable=True for
  transient network errors (ConnectError, NetworkError, timeouts, server
  disconnects, temporarily unavailable). Permanent failures (flood control,
  message-not-found, permissions) remain retryable=False.
- gateway/run.py: send_progress_messages checks result.retryable before
  setting can_edit = False. Transient failures skip the fallback-send and
  continue — the next edit cycle catches up with the accumulated lines.
  Permanent failures (flood, message-not-found, etc.) still disable editing.

Tests: 22 new tests in test_telegram_progress_edit_transient.py covering
transient vs permanent error classification, SendResult.retryable semantics,
and the can_edit decision logic.

Fixes #27828

* fix(telegram): recover from post-update polling conflict without entering limbo

* fix(test+release): update conflict retry count for MAX=5; map @CryptoByz

* fix(gateway): route background-process notifications into Telegram DM topics

Background-process completion notifications (notify_on_complete) and
watch-pattern notifications were always delivered to the Telegram main
chat instead of the originating private-chat topic.

Hermes-created Telegram DM topic lanes only render a send when it carries
both message_thread_id and a reply anchor. The synthetic MessageEvent
injected on process completion had no message_id, so _reply_anchor_for_event
returned None and _thread_kwargs_for_send dropped message_thread_id
entirely — routing the notification to the main chat.

Capture the triggering message id at spawn time and thread it through to
the synthetic event so it can be reply-anchored back into the topic:

- session_context: add HERMES_SESSION_MESSAGE_ID context var
- telegram adapter: populate SessionSource.message_id on inbound messages
- terminal tool: persist watcher_message_id on the process session
- process registry: carry/persist message_id on watcher dicts + checkpoint
- gateway: set MessageEvent.message_id on injected notifications

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): map @fabiosiqueira for PR #27212 salvage

* fix(telegram): route resumed DM topic sends directly

* fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages

TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button
actions but not for inbound messages. Unauthorized users triggered an
'Unauthorized user' log warning but their messages were still processed
by the agent — a P0 security bypass (issue #23778).

Fix: add allowlist check in _should_process_message() which is called
for all message types (text, command, media, location). If the sender
is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately
with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow
all users (existing behavior).

Fixes #23778

* fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty

The _is_callback_user_authorized fallback returned True when
TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user
to interact with the bot. Change to fail-closed: deny by default
unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.

Fixes #24457

* test(telegram): stub _is_callback_user_authorized in trigger-gating fixture

After PR #24468 made the empty-allowlist callback auth fail-closed
(and #23795 wired _is_callback_user_authorized into _should_process_message),
trigger-gating tests started failing because their fake messages from
user 111 hit the new deny-by-default path before trigger evaluation.

Force-authorize all senders in _make_adapter() so the trigger logic
under test runs.  The fail-closed behavior itself is covered by
test_telegram_callback_auth_fail_closed.py.

* fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS

When a sticky fallback IP (from DoH discovery) becomes unreachable,
the transport previously got stuck in an attempt_order that only
tried the dead IP.  This prevented the gateway from recovering
until the service was restarted.

Changes:
- Always include primary DNS path (None) after the sticky IP in the
  attempt_order so that a primary-path retry happens on sticky failure.
- Reset self._sticky_ip to None when the currently sticky IP hits
  a connect timeout / connect error, allowing the next request to
  retry from scratch.

Fixes silent Telegram disconnection when discovered fallback IPs
are transiently or permanently unreachable.

* test+release: align stale sticky-IP test for #24511; map @falconexe

* fix(telegram): propagate extra base_url config

* feat(send_message): auto-detect @username mentions and create Telegram entities

When sending messages containing @username patterns, auto-generate
MessageEntity(type='mention') entries so that the receiving bot's
require_mention filter can trigger. This enables proper bot-to-bot
interop where mention-based routing is used.

* test+release: align send_message mocks for MessageEntity import; map @fonhal

* fix(telegram): resume typing indicator after inline approval click (#27853)

The text /approve and /deny paths in gateway/run.py call
resume_typing_for_chat() after resolve_gateway_approval() succeeds, but
the Telegram inline-button (ea:*) callback in _handle_callback_query did
not. Typing is paused when the approval is sent (gateway/run.py:15658),
so without a matching resume the typing indicator stayed gone for the
remainder of a long-running turn after a button click.

Symmetry-match the text path: after a successful resolve, call
self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0
to match /approve's "if not count" early-return — if nothing was
actually resolved, the agent thread was never unblocked, so typing
should remain paused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): mark final voice reply as notify-worthy so Telegram delivers it audibly

In Telegram "important" notifications mode (default), TelegramPlatformAdapter
sets ``disable_notification=True`` on every send unless metadata carries
``notify=True``.  GatewayRunner._send_voice_reply already passes thread
metadata through to ``adapter.send_voice``, but never marks the final
auto-TTS voice reply as notify-worthy — so users with the default mode get
the final voice note delivered silently with no push notification.

Mirror the final-text path in gateway/platforms/base.py (the existing
text-response final send already adds ``metadata["notify"] = True``).

Issue #27970 Bug 2.  Bug 1 (MP3 vs. native OGG voice-note) is being
addressed by existing PRs #20182 / #20878 — this PR is intentionally
scoped to the silent-delivery bug only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: avoid Telegram group reply thread session splits

* chore(release): map @eliteworkstation94-ai for PR #28157 salvage

* fix(gateway): avoid duplicate Telegram text after auto-TTS voice replies

* chore(release): map @Zyrixtrex for PR #26754 salvage

* fix(telegram): escape send_slash_confirm preview with format_message

send_slash_confirm() sent the raw command preview with ParseMode.MARKDOWN,
skipping the format_message() conversion applied to every other dynamic
send in the adapter. Commands with underscores, dots, brackets, or other
MarkdownV2-sensitive characters raised BadRequest: Can't parse entities;
the exception was swallowed by the outer try/except, so the confirmation
prompt silently never appeared.

Fix: wrap preview through format_message() and switch to MARKDOWN_V2,
symmetric with send_update_prompt and the callback sends fixed in
a69404052.

* chore(release): map @nftpoetrist for PR #25856 salvage

* fix(telegram): retry wrapped connect timeouts

* chore(release): map @samahn0601 for PR #27887 salvage

* fix(tts): keep native audio outside Telegram voice delivery

* chore(release): map @aqilaziz for PR #26406 salvage

* fix(gateway): pin Telegram DM-topic routing to user's current topic

Topic-mode DM replies were fragmenting one conversation across many sessions: a Reply on a message in another topic delivered Telegram's message_thread_id for *that* topic, and #3206's strip routed plain replies to the lobby. Both pulled the user away from their current session. Fix: when topic mode is on, rewrite source.thread_id to the user's most-recent binding if the inbound id is missing/General or not a known topic. Non-topic-mode users unchanged.

* chore(release): map @karthikeyann for PR #26609 salvage

* fix(send_message): add thread-not-found retry for Telegram forum topic sends

The standalone _send_telegram path in send_message_tool lacked the
thread-not-found fallback that the gateway adapter has. When a forum
topic thread_id was stale or deleted, the send would fail entirely
instead of retrying to the General topic.

Changes:
- Add _is_telegram_thread_not_found() helper matching gateway adapter
- Add thread-not-found retry in text send path
- Add thread-not-found retry in media send path (with f.seek(0))
- Separate text_kwargs from thread_kwargs to prevent
  disable_web_page_preview leaking into send_photo/send_video calls

Closes #27012

* test(send_message): add thread-not-found retry tests for Telegram forum topics

Adds two tests to TestSendTelegramThreadIdMapping:
- test_thread_not_found_retries_without_message_thread_id
- test_thread_not_found_for_media_retries_without_message_thread_id

Refs #27012

* test(send_message): add thread-not-found retry tests for Telegram topics

Three tests covering the #27012 fix:
- test_is_thread_not_found_matches_expected_errors
- test_text_send_retries_without_thread_id_on_thread_not_found
- test_disable_web_page_preview_not_leaked_to_media_sends

116/116 existing tests still pass (no regressions).

* chore(release): map @kunci115 for PR #27098 salvage

* fix(gateway): register Telegram commands for groups

Register Telegram bot commands across default, private, and group scopes so
the slash-command menu is available outside DMs.

Changes from review feedback:
- Add asyncio.Lock to prevent race condition in _ensure_forum_commands
- Extract MAX_COMMANDS_PER_SCOPE constant (30) to avoid magic number
- Upgrade error logging from debug->warning in forum registration
- Add tests covering lazy forum registration and concurrent safety
- Remove /start handler from this PR (separate feature)

Fixes review: needs_work (race, magic number, log levels, missing tests)

* test+release: fix test fixture for forum_commands; map @chromalinx

* fix(telegram): gate profile bots by allowed topics

* chore(release): map @booker1207 for PR #25132 salvage

* fix(cron): route Telegram cron deliveries to a dedicated topic via TELEGRAM_CRON_THREAD_ID

When Telegram topic mode is enabled, cron messages delivered to the bot's
root DM (TELEGRAM_HOME_CHANNEL without a thread id) land in the system
lobby — replies there are rebuffed with the lobby reminder and
reply_to_message_id is dropped, so users cannot interact with the cron
output (#24409).

Add an optional TELEGRAM_CRON_THREAD_ID env var that overrides
TELEGRAM_HOME_CHANNEL_THREAD_ID for cron deliveries only. Operators can
create a "Cron" forum topic in the DM, point this var at its thread id,
and replies to cron messages will land in that topic's existing session
instead of the lobby. The home-channel thread id (used elsewhere, e.g.
restart notifications) is unchanged, and explicit
deliver="telegram:chat:thread" targets continue to win over the env var.

Per the reporter's clarification on 2026-05-13, option (a) (cron-side
route to a dedicated topic + config knob) was chosen.

Fixes #24409

* fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline

When users send images as documents (Telegram file picker), they were
rejected with "Unsupported document type" because SUPPORTED_DOCUMENT_TYPES
only includes text/office formats. Add SUPPORTED_IMAGE_DOCUMENT_TYPES
to base.py and handle them in telegram.py before the document check.

- Add SUPPORTED_IMAGE_DOCUMENT_TYPES constant to base.py
- Add MIME reverse-lookup for image types in telegram.py
- Route image documents through cache_image_from_bytes + vision pipeline
- Handle media groups for image documents

Closes: #20128, #18620

* test+release: stub auth in test_telegram_documents fixture; map @kiranvk-2011

* fix(gateway): prevent Windows Telegram /restart leaving gateway stopped

* chore(release): map @rak135 for PR #25960 salvage

* fix(telegram): preserve topic metadata on overflow edits

* feat(telegram): add disable_topic_auto_rename gateway flag

When Hermes auto-titles a session in a Telegram DM topic it currently
renames the topic itself to the generated title. That works for
operator-managed lanes (extra.dm_topics) but is disruptive for
ad-hoc Threaded-Mode topics that users name by hand — every first
exchange overwrites their chosen title.

Add gateway.platforms.telegram.extra.disable_topic_auto_rename (default
False, preserving prior behaviour). When set, both
_schedule_telegram_topic_title_rename and the underlying
_rename_telegram_topic_for_session_title short-circuit before touching
the Telegram API. Internal session titles (sessions list, TUI) keep
working unchanged.

Also bridge the legacy top-level telegram.disable_topic_auto_rename key
through to gateway.platforms.telegram.extra so users on the older
config layout don't have to migrate to enable it.

- Tests cover the runtime flag, the scheduling entry-point, and string
  truthiness coercion for YAML-loaded values.
- Docs updated in messaging/telegram.md with an example block.

* chore(release): map @B0Tch1 for PR #27634 salvage

* fix(gateway): restore Telegram DM topic thread_id after session split (#27166)

When context compression triggers a mid-turn session split, source.thread_id
can be None on synthetic/recovered events. _thread_metadata_for_source then
returns None, causing the Telegram adapter to send with no message_thread_id
and the response lands in the General thread instead of the active DM topic.

Fix:
- hermes_state.py: Add get_telegram_topic_binding_by_session() for reverse
  lookup by session_id (enabled by the existing UNIQUE INDEX on session_id).
- gateway/run.py: After session-split detection, if source is a Telegram DM
  and source.thread_id is None, recover it from the binding via the new
  method so _thread_metadata_for_source produces the correct thread routing.
- tests/: Coverage for the new lookup method and the recovery flow.

* chore(release): map @jackjin1997 for PR #27239 salvage

* fix(gateway): allow chat-scoped telegram auth without sender user_id

* chore(release): map @soynchux for PR #27806 salvage

* fix(telegram): add DM topic typing fallback when message_thread_id rejected

When a DM topic lane's message_thread_id is rejected by Telegram
(e.g. stale or deleted topic), send_typing now falls back to sending
the typing indicator without thread_id so it at least appears in the
main DM view, rather than being silently swallowed.

Also adds test for the fallback behavior.

* fix(telegram): report cron topic fallback

* chore(release): map @el-analista for PR #25368 salvage

* fix(telegram): wire gt: callback dispatch for gmail-triage buttons

The gmail-triage skill's Telegram inline buttons emit callback_data of the
form `gt:<verb>:<arg>`, but `_handle_callback_query` had no `gt:` branch —
taps fell through silently and the spinner sat there until Telegram timed it
out.

Add `_handle_gmail_triage_callback`, dispatched from the existing callback
router, that:

- Authorizes the caller via the same `_is_callback_user_authorized` path as
  the approval / slash-confirm / clarify handlers.
- Maps each verb to a script under `~/.hermes/scripts/gmail-triage/` and runs
  it async with a 60s timeout.
- Splits verbs into one-shots (send / archive / draft / spam) — append the
  confirmation and strip the keyboard so the action can't fire twice — and
  sticky-state changes (mute / trust / vip ± -domain) — append the
  confirmation but leave the keyboard tappable so the user can stack actions
  on one email.
- On failure: toast only, keyboard preserved so the user can retry.
- Logs every callback outcome to gateway.log for debugging.

* chore(release): map @khungate for PR #25829 salvage

* feat(telegram): support quick-command-only menus

* chore(release): map @stevehq26-bot for PR #28015 salvage

* fix(telegram): handle channel post updates

* test: address telegram channel post review

* test+release: stub auth in channel_posts fixture; map @brndnsvr

* Quiet noisy Telegram gateway errors

* chore(release): map oracle@jarviss-mbp.home for PR #24014 salvage

* Route Telegram multi-bot mentions exclusively

* Document Telegram multi-profile gateway commands

* fix: ignore Telegram messages for other bots

* chore(release): map @OCWC22 for PR #24581 salvage

* feat(telegram): ignore_root_dm with system command lobby

* docs(telegram): document ignore_root_dm feature

* chore(release): map @ai-hana-ai for PR #23928 salvage

* feat(telegram): pin incoming user message for duration of agent turn

When a user sends a message on Telegram, the incoming message is now
automatically pinned at the start of processing and unpinned when the
agent finishes its turn. This gives the user a visual indicator that
their message is being worked on, and keeps the conversation anchored.

Changes:
- telegram.py: Added pinChatMessage in on_processing_start and
  unpinChatMessage in on_processing_complete. Restructured both
  hooks so pin/unpin runs independently of the reactions feature
  (reactions are optional; pinning is always on).
- telegram.py: Pass message_id through SessionSource so it's
  available in the session context.
- session_context.py: Added HERMES_SESSION_MESSAGE_ID context var.
- run.py: Pass source.message_id through set_session_vars.

Pinning is silent (disable_notification=True) and failures are
logged at debug level without interrupting message processing.
Only the user's incoming message is pinned -- never the agent's
replies. Auto-resume events (which have no message_id) are
correctly skipped.

* chore(release): map @indigokarasu for PR #26636 salvage

* feat(telegram): skip-STT audio path + 2GB cap via local Bot API server

Two coordinated changes that unblock downstream audio pipelines
(diarization, custom transcription, archival) on attachments larger
than the public Bot API's 20MB getFile ceiling.

- `stt.enabled: false` no longer drops voice/audio with a generic
  "transcription disabled" note. The gateway probes the cached file's
  duration (wave → mutagen → ffprobe ladder) and surfaces
  `[The user sent a voice message: <abs path> (duration: M:SS)]` to
  the agent so a skill or tool can pick up the raw file. The previous
  placeholder is replaced rather than appended when present.

- `platforms.telegram.extra.base_url` set → adapter auto-lifts its
  document size cap from 20MB to 2GB (the local telegram-bot-api
  `--local` ceiling) and the "too large" reply reports the active
  limit dynamically. No new config knob; presence of `base_url` is the
  opt-in.

- `platforms.telegram.extra.local_mode: true` wires
  `Application.builder().local_mode(True)` on the python-telegram-bot
  builder. PTB then reads files from disk instead of HTTP, which is
  required when telegram-bot-api runs in `--local` mode (the server
  returns absolute filesystem paths, not `/file/bot...` URLs).

- gateway/run.py: rewrites the `stt.enabled: false` branch of
  `_enrich_message_with_transcription`. New `_format_duration` +
  `_probe_audio_duration` helpers.
- gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute
  derived from `extra.base_url`; `local_mode` builder wiring;
  dynamic "too large" message.
- tests/gateway/test_stt_config.py: covers path-surfacing with and
  without an existing user message, and placeholder replacement.
- tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB
  without base_url, 2GB when set, empty-string base_url keeps default.
- website/docs/user-guide/messaging/telegram.md: new "Skipping STT"
  subsection under Voice Messages and a full "Large Files (>20MB) via
  Local Bot API Server" walkthrough (api_id/api_hash, docker-compose,
  one-time `logOut` migration, `platforms.telegram.extra` config, the
  `local_mode` disk-access requirement, the silent HTTP-fallback 404).
- website/docs/user-guide/features/voice-mode.md: documents the
  `stt.enabled` knob in the config reference.

- `pytest tests/gateway/test_telegram_max_doc_bytes.py
  tests/gateway/test_stt_config.py` → 9/9 passing.
- Verified end-to-end on a live deployment: gateway log shows
  `Using custom Telegram base_url: http://...` and
  `Using Telegram local_mode (read files from disk)` on startup;
  voice messages above 20MB cache to disk and surface their path to
  the agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(release): map @alber70g for PR #25280 salvage

* fix(web): add scheduled column to i18n type definitions (#28549)

columnLabels and columnHelp in en.ts include a scheduled entry but the
Translations interface in types.ts did not declare it, causing a
TypeScript build failure in the Nix derivation. Made the field optional
since only en.ts provides it currently.

* docs: comprehensive 2-week sweep of feature/PR coverage gaps (#28497)

Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR #27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR #24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  #28116 / #28118 / #28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs #27663 / #19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR #28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR #25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR #26824).
- `x_search` auto-enables when xAI credentials are present (PR #27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  #26534).
- NVIDIA NIM billing-origin header is set automatically (PR #26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR #28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  #27822).
- Document `dep_ensure` Windows bootstrap (PR #27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR #27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR #26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR #21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR #27245).
- Discord clarify-choice button rendering (PR #25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  #22759).
- Telegram `notifications` mode (`important` vs `all`) (PR #22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR #21210).

CLI / TUI
- `/new [name]` argument (PR #19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR #25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR #22687).
- Status-bar additions: ▶ N background indicator (PR #27175), context
  compression count (PR #21218), YOLO mode banner+statusbar warning (PR
  #26238).
- `display.timestamps` + `docker_extra_args` config keys (PR #23599).
- TUI collapsible startup banner sections (PR #20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR #22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs #27590 / #27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR #21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR #21337).
- ACP session-scoped edit auto-approval modes (PR #27862).
- Curator rename map in the user-visible per-run summary (PR #22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR #23828).
- Cron per-job profile parameter (PR #28124).
- `--no-skills` flag for `hermes profile create` (PR #20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).

* chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 9 (#28571)

Pre-stages AUTHOR_MAP entries for 9 new/under-mapped contributors whose
PRs are being salvaged in the May 2026 LHF batch group 9.

Contributors:
- jdelmerico (#28278 — signal require_mention filter)
- justemu (#27996 — matrix thread_require_mention)
- YuanHanzhong (#28029 — dashboard browser scrollback)
- noctilust (#28080 — drop stale TUI resume env)
- MoonJuhan (#28288 — tolerate unreadable JSONL transcripts)
- outsourc-e (#28164 — cron emoji ZWJ sequences)
- Zyrixtrex (#28275 — Google OAuth urlopen timeout)
- ooovenenoso (#28256 — tool loop recovery hints)
- vanthinh6886 (#28018 — yaml/flock/atomic write guards; non-noreply email)

Per references/batch-pr-salvage-may14-additions.md.

* feat(signal): add require_mention filter for group chats

Add a configurable mention filter to the Signal adapter so the bot
only responds in groups when it is explicitly @mentioned.

Changes:
- gateway/platforms/signal.py: read require_mention from adapter
  extra config or SIGNAL_REQUIRE_MENTION env var; skip group messages
  that don't mention the bot account (checked in rendered text and
  raw mention metadata)
- gateway/config.py: map signal.require_mention YAML key to the
  SIGNAL_REQUIRE_MENTION env var (env var takes precedence)

Config example:
  signal:
    require_mention: true

Or via env var:
  SIGNAL_REQUIRE_MENTION=true

* Revert "feat(telegram): pin incoming user message for duration of agent turn"

This reverts commit a724c3b9cf5f01e28365322ae5ae3a9579567806.

* Revert "feat(telegram): support quick-command-only menus"

This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4.

* Revert "feat(send_message): auto-detect @username mentions and create Telegram entities"

This reverts commit cf814c96f613b38bd891ac941c32da653e81c7ad.

* Revert "fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages"

This reverts commit db50af910be6b4171ea9cf54f4cc38be27ac1da6.

* fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856)

Pre-mark all running agent sessions as resume_pending BEFORE the drain
wait begins. If the service manager kills the process during the drain
(window), the durable marker is already written so the next gateway boot
can recover in-flight sessions. On graceful drain completion, clear the
early markers for sessions that finished successfully.

* fix(matrix): implement thread_require_mention to prevent multi-agent reply loops

In multi-agent shared Matrix rooms, multiple bots all participating in the
same thread could trigger infinite reply loops — each bot's reply re-engaged
the others because they were all in the bot-thread set. Discord has a
`thread_require_mention` opt-in for this; Matrix didn't.

Add `_parse_thread_require_mention(config)` (mirrors Discord's pattern).
In `_resolve_message_context`, when enabled and the message is in a
bot-participated thread (not a free-response room), require @mention
before processing.

Salvage of @justemu's 2-commit stack (#27996). Fixes #27995.

* fix(cli): show active profile in TUI prompt

* fix(tui): preserve dunder identifiers in markdown

* test(file_ops): add regression tests for git baseline warning in write_file

Adds TestGitBaselineCheck with 6 unit tests covering _check_git_baseline
and the warning field in write_file result:
- Git not available → None
- Not in a git repo → None
- Clean repo → None
- Dirty repo → returns warning string with branch name
- write_file result includes warning when dirty
- write_file result omits warning when clean

* fix(dashboard): use browser scrollback for chat wheel

* fix(cli): ignore stale HERMES_TUI_RESUME env

HERMES_TUI_RESUME is an internal env var the Python wrapper exports to hand
a session ID off to the Ink TUI. Because _launch_tui started from
os.environ.copy(), any exported/stale value in the user's shell leaked
through — so plain `hermes --tui` would try to resume a missing session
and leave the UI at 'error: session not found' with no live session.

Drop HERMES_TUI_RESUME from the env before conditionally re-setting it
from the argparse-resolved resume_session_id. Tests cover both the drop
path and the set-from-arg path.

Salvage of #28080 by @noctilust.

* fix(cron): allow emoji ZWJ sequences in prompts

* fix: tolerate unreadable gateway JSONL transcripts

* fix(skills): add timeout to Google OAuth urlopen calls

* fix: add recovery hints to loop guard warnings

* fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes

1. trajectory_compressor.py: yaml.safe_load() returns None on empty
   files, crashing with TypeError on `if 'tokenizer' in data`. Fix by
   adding `or {}` fallback. (HIGH — blocks startup with empty config)

2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without
   try/except: cron/scheduler.py, hermes_cli/auth.py,
   agent/shell_hooks.py, tools/skill_usage.py,
   tools/environments/file_sync.py, tools/memory_tool.py. If unlock
   raises OSError, fd.close() is skipped and the lock is held forever.
   The msvcrt branches already had try/except; the fcntl branches did
   not. Fix by wrapping in try/except (OSError, IOError): pass.

3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists()
   followed by path.read_text() with no try/except. If file is deleted
   between the check and the read, FileNotFoundError propagates. Fix
   by using try/except FileNotFoundError.

4. gateway/sticker_cache.py: non-atomic write via Path.write_text()
   can leave truncated JSON on crash, causing JSONDecodeError on next
   load. Fix by writing to tempfile + fsync + os.replace (atomic).

* chore(release): alias xxxigm noreply for upcoming #27986 salvage (#28594)

Adds the canonical noreply form (54813621+xxxigm@users.noreply.github.com)
alongside the existing plain-email mapping so the salvage commit for
@xxxigm's codex doctor PR doesn't fail AUTHOR_MAP CI.

* fix(doctor): attach codex CLI hint to OpenAI Codex auth warning for #27975

`hermes doctor` printed 'codex CLI not installed (optional — ...)' as a
generic info line at the bottom of the auth section, several rows below
'OpenAI Codex auth (not logged in)' and after MiniMax/Gemini auth checks.
Users reading sequentially mistook it for MiniMax-related advice.

Move the hint up under the Codex auth warning so it's adjacent to the
row it actually pertains to. Behavior unchanged when the codex CLI is
installed (success path keeps its 'codex CLI ✓' row at the bottom).
Tests cover both placement and suppression cases.

Salvage of @xxxigm's 3-commit stack (#27986).
Closes #27975.

* fix(tests): catch up 25 stale tests after recent merges (#28626)

Sweep of all CI failures on origin/main, grouped by drift source:

Telegram allowlist gate (db50af910 added user-authz to _should_process_message):
- Hardcoded "[Telegram]" prefix in the logger.warning so the call no
  longer dereferences self.name → self.platform, which test fixtures
  built via object.__new__ never set.
- test_telegram_format / test_allowed_channels_widening fixtures stub
  _is_callback_user_authorized → True so the new gate doesn't reject
  guest-mode / allowed-channels test messages.
- test_telegram_approval_buttons::test_update_prompt_callback_not_affected
  sets TELEGRAM_ALLOWED_USERS="*" so the fail-closed default doesn't
  reject the callback before it writes .update_response.

Approval surface (6d495d9e7 renamed status, 214b95392 detached stdin):
- test_no_callback_returns_approval_required: status is now
  "pending_approval" (was "approval_required").
- test_close_stdin_allows_eof_driven_process_to_finish: switch to
  use_pty=True; non-PTY now uses stdin=DEVNULL.

Mattermost (send() now resolves root_id via _api_get first):
- test_send_with_thread_reply mocks _session.get with a thread-root
  response so the new resolver doesn't TypeError on a bare AsyncMock.

Kanban (d8ad431de rename, f55d94a1e review column, _kanban_worker_skill_available):
- _safe_int → _to_epoch in the two test_kanban_db tests.
- Spawn-skills tests (×3) monkey-patch _kanban_worker_skill_available
  to True since the isolated kanban_home fixture has no devops/kanban-worker tree.
- test_gateway_dispatcher_disables_corrupt_board: connect count
  3 → 5 (review-column probe now also runs per tick).

Aux-config severity at_or_above (a94ddd807):
- test_diagnostics_endpoint_severity_filter expects warning filter to
  include error+critical now (was exact-match).

Anthropic error handling (conversation loop extracted from run_agent):
- _no_backoff_wait fixture patches BOTH run_agent.jittered_backoff AND
  agent.conversation_loop.jittered_backoff. The latter is the actual
  call site; without the second patch tests burn ~2s per retry and
  hit the 30s SIGALRM timeout on CI.

Other test pollution / drift:
- test_auto_does_not_select_copilot_from_github_token: patch
  agent.bedrock_adapter.has_aws_credentials → False so boto3's
  credential chain can't auto-pick Bedrock from developer ~/.aws.
- test_setup_openclaw_migration: patch hermes_cli.gateway.get_env_value
  in addition to setup_mod.get_env_value — _platform_status reads
  through the gateway module's binding.
- test_gateway_prefix: COMPONENT_PREFIXES["gateway"] now includes
  "hermes_plugins" too.
- test_recommended_update_command_defaults_to_hermes_update: also
  short-circuit get_managed_update_command in case a stray
  ~/.hermes/.managed marker is present.
- test_user_id_is_not_explicit: _parse_target_ref now returns
  is_explicit=False for Slack U.../W... IDs (chat.postMessage rejects
  them — a DM must be opened first via conversations.open).

* feat(update): syntax-validate critical files post-pull, auto-rollback on failure (#28669)

Catch the PR #28452 failure mode (orphan merge-conflict markers in
hermes_cli/config.py) on the user side: after git pull succeeds, compile
the files every 'hermes' invocation imports at startup. If any has a
syntax error, git reset --hard back to the pre-pull SHA so the install
stays bootable. User can retry once a fix lands upstream.

- New _capture_head_sha() + _validate_critical_files_syntax() helpers
- Wires both into _cmd_update_impl after the pull/reset succeeds
- Tests cover the helpers, the rollback flow, and a production-tree
  invariant (CI fails if main itself has a syntax error in a critical
  file — catches future broken commits before users hit them)

* feat: show names of user-modified skills in bundled skill sync summary

When 'hermes update' syncs bundled skills, the summary line only shows
the count of user-modified skills that were kept (e.g. '3 user-modified
(kept)'), but not *which* skills. Once the update finishes, the user
has no way to know which skills need triage.

Append the skill names to the summary line, truncated to 5 with a
'+N more' suffix for long lists:

  Done: 12 new, 3 updated, 7 unchanged, 3 user-modified (kept):
  hermes-agent, debugging-hermes-tui-commands, system-health.
  25 total bundled.

Closes #28121

* fix(acp): use tempfile.gettempdir() in workspace auto-approve

#28063 fixed the macOS `/tmp`→`/private/tmp` symlink issue by checking
the RAW path (pre-resolve) against startswith('/tmp/'). That works on
Linux + macOS but not on Windows — Path('/tmp/foo').resolve() returns
C:\\tmp\\foo and isn't the real Windows temp anyway.

Replace the hardcoded '/tmp/' prefix with Path(tempfile.gettempdir()).
resolve() + Path.relative_to() — same idiom as the cwd branch just
below. Works correctly on Linux (/tmp), macOS (/private/var/folders/...),
and Windows (%LOCALAPPDATA%\\Temp).

Test rewritten to use tempfile.gettempdir() so the assertion exercises
the same code path on every platform.

Conflict against the just-merged #28063 (raw_path approach) resolved
by replacing the whole raw_path block — tempfile.gettempdir() is
strictly better than that intermediate fix.

Salvage of #28262 by @Zyrixtrex.

* fix(kanban): stale reclaim must not tick failure counter (#28680)

Follow-up to #28452. detect_stale_running() was calling
_record_task_failure() on every reclaim, which ticked the
consecutive_failures counter. With the default failure_limit=2,
two legitimately long-running tasks (>4 h without explicit
heartbeat) would auto-block via the spawn-failure circuit
breaker — even though no worker actually failed.

Stale reclaim is dispatcher-side absence-of-heartbeat detection,
not a worker fault. Removed the _record_task_failure() call;
the 'stale' event in task_events is still the audit surface,
but the failure counter is now reserved for spawn_failed /
timed_out / crashed (real failures).

Also documents the heartbeat requirement:
- KANBAN_GUIDANCE in agent/prompt_builder.py now states the
  rule ('call kanban_heartbeat at least once an hour for tasks
  running longer than 1 hour') so workers learn the contract.
- kanban.md adds the stale event row to the events table and
  flags the heartbeat requirement in the worker lifecycle list.

New regression test: test_detect_stale_does_not_tick_failure_counter
locks in the new behaviour.

* fix(telegram): address post-merge audit follow-ups (#28670, #28672, #28674, #28676, #28678)

Five small fixes against issues filed during the post-merge salvage audit:

* #28670: `_GATEWAY_PROVIDER_ERROR_RE` false-positives on legitimate prose.
  Replace the regex with an anchored `_GATEWAY_PROVIDER_ERROR_SHAPE_RE` and
  add a length-cap heuristic to `_looks_like_gateway_provider_error`:
  short envelope at the start of the message → real provider error; long
  prose containing 'HTTP 404' → assistant answer, leave alone.

* #28672: drop the pointless 1s asyncio.sleep on Telegram thread-not-found
  retries. The same-thread retry is preserved (catches Telegram's
  occasional transient flake exercised by
  test_send_retries_transient_thread_not_found_before_fallback) but with
  no artificial delay.

* #28674: broaden `_should_retry_without_dm_topic_reply_anchor` to also
  fire when Bot API rejects `direct_messages_topic_id` for synthetic /
  resumed sends that have no reply anchor. Avoids dropping post-resume
  background notifications if the topic id goes stale.

* #28676: delete the dead image-document branch superseded by bd0c54d17
  (which returns early on the same extension set).

* #28678: extend chat-scoped allowlist (`TELEGRAM_GROUP_ALLOWED_CHATS`)
  to also cover `chat_type == 'channel'`, so operators can authorize
  channel posts by chat id without falling back to per-user allowlists.

Tests:
- scripts/run_tests.sh tests/gateway/test_telegram_thread_fallback.py -q  → 41/41
- scripts/run_tests.sh tests/cron/test_scheduler.py -q                    → 127/127
- broader test set: same 3 pre-existing test-pollution failures reproduce
  on plain main.

* chore(actions)(deps): bump the actions-minor-patch group across 1 directory with 2 updates

Bumps the actions-minor-patch group with 2 updates in the / directory: [google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml](https://github.com/google/osv-scanner-action) and [sigstore/gh-action-sigstore-python](https://github.com/sigstore/gh-action-sigstore-python).


Updates `google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml` from 2.3.5 to 2.3.8
- [Release notes](https://github.com/google/osv-scanner-action/releases)
- [Commits](https://github.com/google/osv-scanner-action/compare/c51854704019a247608d928f370c98740469d4b5...9a498708959aeaef5ef730655706c5a1df1edbc2)

Updates `sigstore/gh-action-sigstore-python` from 3.0.0 to 3.3.0
- [Release notes](https://github.com/sigstore/gh-action-sigstore-python/releases)
- [Changelog](https://github.com/sigstore/gh-action-sigstore-python/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sigstore/gh-action-sigstore-python/compare/f514d46b907ebcd5bedc05145c03b69c1edd8b46...04cffa1d795717b140764e8b640de88853c92acc)

---
updated-dependencies:
- dependency-name: google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml
  dependency-version: 2.3.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-minor-patch
- dependency-name: sigstore/gh-action-sigstore-python
  dependency-version: 3.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-minor-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(actions)(deps): bump docker/login-action from 3.7.0 to 4.1.0

Bumps [docker/login-action](https://github.com/docker/login-action) from 3.7.0 to 4.1.0.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/c94ce9fb468520275223c153574b00df6fe4bcc9...4907a6ddec9925e35a0a9e82d7399ccc52663121)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-version: 4.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(actions)(deps): bump docker/build-push-action from 6.19.2 to 7.1.0

Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.19.2 to 7.1.0.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/10e90e3645eae34f1e60eeb005ba3a3d33f178e8...bcafcacb16a39f128d818304e6c9c0c18556b85f)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: 7.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(actions)(deps): bump actions/setup-python from 5.3.0 to 6.2.0

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.3.0 to 6.2.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v5.3.0...a309ff8b426b58ec0e2a45f0f869d46889d02405)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: 6.2.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix(kanban): respawn guard defers blocker_auth instead of auto-blocking (#28683)

Follow-up to #28455. The respawn guard's blocker_auth rule (last error
matched a quota/auth/429 pattern) was auto-blocking the task on first
occurrence. That's too aggressive: transient rate limits typically
clear in seconds to minutes, but the auto-block puts the task in
'blocked' status which requires manual unblock.

Now treats blocker_auth the same as recent_success and active_pr:
defer the spawn this tick, leave the task in 'ready', let the next
tick try again. If the auth error genuinely persists, the existing
consecutive_failures counter trips the auto-block circuit breaker
after failure_limit failures via the normal path — so a persistent
401/403/quota-exhausted still ends up blocked, just not on first hit.

Also documents the respawn_guarded event in kanban.md's events table
with the three guard reasons.

Updated test_dispatch_respawn_guard_auto_blocks_auth_error → renamed
to test_dispatch_respawn_guard_defers_auth_error_without_auto_block;
asserts task stays in 'ready' and the guard reason is recorded.

* chore(actions)(deps): bump actions/checkout from 4.3.1 to 6.0.2

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.3.1 to 6.0.2.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/34e114876b0b11c390a56381ad16ebd13914f8d5...de0fac2e4500dabe0009e67214ff5f5447ce83dd)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 6.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix(dashboard): add scheduled kanban i18n strings (#28534)

Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(cli): exit prompt_toolkit cleanly on SIGTERM/SIGHUP instead of raising KeyboardInterrupt (#28688)

The SIGTERM/SIGHUP handler raised KeyboardInterrupt() at the end of its
agent-interrupt + grace-window sequence. Python delivers signals between
bytecodes on the main thread, so when the signal hit mid-event-loop
(typically inside prompt_toolkit's '_poll_output_size' coroutine's
'await asyncio.sleep()'), the KeyboardInterrupt unwound INTO that
coroutine. prompt_toolkit's Task captured it as a BaseException;
prompt_toolkit's '_handle_exception' then printed 'Unhandled exception
in event loop' + the full asyncio traceback and parked the terminal on
'Press ENTER to continue...' before exiting.

Same root cause as #13710, different surface: there the failure was an
EIO cascade after a logging-cache KeyError escaped the handler; here
it's the KBI raise itself landing inside an asyncio Task. The fix is
the same shape — let the event loop unwind on its own terms.

Now: schedule 'app.exit()' via 'loop.call_soon_threadsafe()'. The
prompt_toolkit Application returns normally from 'app.run()' and the
existing '(EOFError, KeyboardInterrupt, BrokenPipeError)' handler in
the input loop catches everything else. Fallback to 'raise
KeyboardInterrupt()' preserved for contexts where prompt_toolkit isn't
the active app (e.g. -q one-shot mode).

The agent interrupt + 1.5 s grace window run unchanged before the new
exit path, so subprocess-group cleanup ('os.killpg' on Linux) still
gets its window.

Tested live: external SIGTERM to the CLI (with 'kill <pid>') now exits
cleanly with no traceback dump and no ENTER pause.

* chore(deps): bump dompurify from 3.3.3 to 3.4.2 in /website

Bumps [dompurify](https://github.com/cure53/DOMPurify) from 3.3.3 to 3.4.2.
- [Release notes](https://github.com/cure53/DOMPurify/releases)
- [Commits](https://github.com/cure53/DOMPurify/compare/3.3.3...3.4.2)

---
updated-dependencies:
- dependency-name: dompurify
  dependency-version: 3.4.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump follow-redirects from 1.15.11 to 1.16.0 in /website

Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.11 to 1.16.0.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.15.11...v1.16.0)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-version: 1.16.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump lodash from 4.17.23 to 4.18.1 in /website

Bumps [lodash](https://github.com/lodash/lodash) from 4.17.23 to 4.18.1.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.23...4.18.1)

---
updated-dependencies:
- dependency-name: lodash
  dependency-version: 4.18.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump lodash-es and langium in /website

Bumps [lodash-es](https://github.com/lodash/lodash) and [langium](https://github.com/eclipse-langium/langium/tree/HEAD/packages/langium). These dependencies needed to be updated together.

Updates `lodash-es` from 4.17.23 to 4.18.1
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.23...4.18.1)

Updates `langium` from 4.2.1 to 4.2.3
- [Release notes](https://github.com/eclipse-langium/langium/releases)
- [Changelog](https://github.com/eclipse-langium/langium/blob/main/packages/langium/CHANGELOG.md)
- [Commits](https://github.com/eclipse-langium/langium/commits/HEAD/packages/langium)

---
updated-dependencies:
- dependency-name: lodash-es
  dependency-version: 4.18.1
  dependency-type: indirect
- dependency-name: langium
  dependency-version: 4.2.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump python-multipart from 0.0.22 to 0.0.27

Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.22 to 0.0.27.
- [Release notes](https://github.com/Kludex/python-multipart/releases)
- [Changelog](https://github.com/Kludex/python-multipart/blob/main/CHANGELOG.md)
- [Commits](https://github.com/Kludex/python-multipart/compare/0.0.22...0.0.27)

---
updated-dependencies:
- dependency-name: python-multipart
  dependency-version: 0.0.27
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump python-dotenv from 1.2.1 to 1.2.2

Bumps [python-dotenv](https://github.com/theskumar/python-dotenv) from 1.2.1 to 1.2.2.
- [Release notes](https://github.com/theskumar/python-dotenv/releases)
- [Changelog](https://github.com/theskumar/python-dotenv/blob/main/CHANGELOG.md)
- [Commits](https://github.com/theskumar/python-dotenv/compare/v1.2.1...v1.2.2)

---
updated-dependencies:
- dependency-name: python-dotenv
  dependency-version: 1.2.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump fast-uri from 3.1.0 to 3.1.2 in /website

Bumps [fast-uri](https://github.com/fastify/fast-uri) from 3.1.0 to 3.1.2.
- [Release notes](https://github.com/fastify/fast-uri/releases)
- [Commits](https://github.com/fastify/fast-uri/compare/v3.1.0...v3.1.2)

---
updated-dependencies:
- dependency-name: fast-uri
  dependency-version: 3.1.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump @babel/plugin-transform-modules-systemjs in /website

Bumps [@babel/plugin-transform-modules-systemjs](https://github.com/babel/babel/tree/HEAD/packages/babel-plugin-transform-modules-systemjs) from 7.29.0 to 7.29.4.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.29.4/packages/babel-plugin-transform-modules-systemjs)

---
updated-dependencies:
- dependency-name: "@babel/plugin-transform-modules-systemjs"
  dependency-version: 7.29.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix(web): consume bundled design system assets (#26391)

* fix: update design system…
bot-ted added a commit to bot-ted/hermes-agent that referenced this pull request May 22, 2026
* fix(gateway): allow chat-scoped telegram auth without sender user_id

* chore(release): map @soynchux for PR #27806 salvage

* fix(telegram): add DM topic typing fallback when message_thread_id rejected

When a DM topic lane's message_thread_id is rejected by Telegram
(e.g. stale or deleted topic), send_typing now falls back to sending
the typing indicator without thread_id so it at least appears in the
main DM view, rather than being silently swallowed.

Also adds test for the fallback behavior.

* fix(telegram): report cron topic fallback

* chore(release): map @el-analista for PR #25368 salvage

* fix(telegram): wire gt: callback dispatch for gmail-triage buttons

The gmail-triage skill's Telegram inline buttons emit callback_data of the
form `gt:<verb>:<arg>`, but `_handle_callback_query` had no `gt:` branch —
taps fell through silently and the spinner sat there until Telegram timed it
out.

Add `_handle_gmail_triage_callback`, dispatched from the existing callback
router, that:

- Authorizes the caller via the same `_is_callback_user_authorized` path as
  the approval / slash-confirm / clarify handlers.
- Maps each verb to a script under `~/.hermes/scripts/gmail-triage/` and runs
  it async with a 60s timeout.
- Splits verbs into one-shots (send / archive / draft / spam) — append the
  confirmation and strip the keyboard so the action can't fire twice — and
  sticky-state changes (mute / trust / vip ± -domain) — append the
  confirmation but leave the keyboard tappable so the user can stack actions
  on one email.
- On failure: toast only, keyboard preserved so the user can retry.
- Logs every callback outcome to gateway.log for debugging.

* chore(release): map @khungate for PR #25829 salvage

* feat(telegram): support quick-command-only menus

* chore(release): map @stevehq26-bot for PR #28015 salvage

* fix(telegram): handle channel post updates

* test: address telegram channel post review

* test+release: stub auth in channel_posts fixture; map @brndnsvr

* Quiet noisy Telegram gateway errors

* chore(release): map oracle@jarviss-mbp.home for PR #24014 salvage

* Route Telegram multi-bot mentions exclusively

* Document Telegram multi-profile gateway commands

* fix: ignore Telegram messages for other bots

* chore(release): map @OCWC22 for PR #24581 salvage

* feat(telegram): ignore_root_dm with system command lobby

* docs(telegram): document ignore_root_dm feature

* chore(release): map @ai-hana-ai for PR #23928 salvage

* feat(telegram): pin incoming user message for duration of agent turn

When a user sends a message on Telegram, the incoming message is now
automatically pinned at the start of processing and unpinned when the
agent finishes its turn. This gives the user a visual indicator that
their message is being worked on, and keeps the conversation anchored.

Changes:
- telegram.py: Added pinChatMessage in on_processing_start and
  unpinChatMessage in on_processing_complete. Restructured both
  hooks so pin/unpin runs independently of the reactions feature
  (reactions are optional; pinning is always on).
- telegram.py: Pass message_id through SessionSource so it's
  available in the session context.
- session_context.py: Added HERMES_SESSION_MESSAGE_ID context var.
- run.py: Pass source.message_id through set_session_vars.

Pinning is silent (disable_notification=True) and failures are
logged at debug level without interrupting message processing.
Only the user's incoming message is pinned -- never the agent's
replies. Auto-resume events (which have no message_id) are
correctly skipped.

* chore(release): map @indigokarasu for PR #26636 salvage

* feat(telegram): skip-STT audio path + 2GB cap via local Bot API server

Two coordinated changes that unblock downstream audio pipelines
(diarization, custom transcription, archival) on attachments larger
than the public Bot API's 20MB getFile ceiling.

- `stt.enabled: false` no longer drops voice/audio with a generic
  "transcription disabled" note. The gateway probes the cached file's
  duration (wave → mutagen → ffprobe ladder) and surfaces
  `[The user sent a voice message: <abs path> (duration: M:SS)]` to
  the agent so a skill or tool can pick up the raw file. The previous
  placeholder is replaced rather than appended when present.

- `platforms.telegram.extra.base_url` set → adapter auto-lifts its
  document size cap from 20MB to 2GB (the local telegram-bot-api
  `--local` ceiling) and the "too large" reply reports the active
  limit dynamically. No new config knob; presence of `base_url` is the
  opt-in.

- `platforms.telegram.extra.local_mode: true` wires
  `Application.builder().local_mode(True)` on the python-telegram-bot
  builder. PTB then reads files from disk instead of HTTP, which is
  required when telegram-bot-api runs in `--local` mode (the server
  returns absolute filesystem paths, not `/file/bot...` URLs).

- gateway/run.py: rewrites the `stt.enabled: false` branch of
  `_enrich_message_with_transcription`. New `_format_duration` +
  `_probe_audio_duration` helpers.
- gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute
  derived from `extra.base_url`; `local_mode` builder wiring;
  dynamic "too large" message.
- tests/gateway/test_stt_config.py: covers path-surfacing with and
  without an existing user message, and placeholder replacement.
- tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB
  without base_url, 2GB when set, empty-string base_url keeps default.
- website/docs/user-guide/messaging/telegram.md: new "Skipping STT"
  subsection under Voice Messages and a full "Large Files (>20MB) via
  Local Bot API Server" walkthrough (api_id/api_hash, docker-compose,
  one-time `logOut` migration, `platforms.telegram.extra` config, the
  `local_mode` disk-access requirement, the silent HTTP-fallback 404).
- website/docs/user-guide/features/voice-mode.md: documents the
  `stt.enabled` knob in the config reference.

- `pytest tests/gateway/test_telegram_max_doc_bytes.py
  tests/gateway/test_stt_config.py` → 9/9 passing.
- Verified end-to-end on a live deployment: gateway log shows
  `Using custom Telegram base_url: http://...` and
  `Using Telegram local_mode (read files from disk)` on startup;
  voice messages above 20MB cache to disk and surface their path to
  the agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(release): map @alber70g for PR #25280 salvage

* fix(web): add scheduled column to i18n type definitions (#28549)

columnLabels and columnHelp in en.ts include a scheduled entry but the
Translations interface in types.ts did not declare it, causing a
TypeScript build failure in the Nix derivation. Made the field optional
since only en.ts provides it currently.

* docs: comprehensive 2-week sweep of feature/PR coverage gaps (#28497)

Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR #27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR #24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  #28116 / #28118 / #28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs #27663 / #19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR #28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR #25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR #26824).
- `x_search` auto-enables when xAI credentials are present (PR #27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  #26534).
- NVIDIA NIM billing-origin header is set automatically (PR #26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR #28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  #27822).
- Document `dep_ensure` Windows bootstrap (PR #27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR #27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR #26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR #21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR #27245).
- Discord clarify-choice button rendering (PR #25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  #22759).
- Telegram `notifications` mode (`important` vs `all`) (PR #22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR #21210).

CLI / TUI
- `/new [name]` argument (PR #19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR #25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR #22687).
- Status-bar additions: ▶ N background indicator (PR #27175), context
  compression count (PR #21218), YOLO mode banner+statusbar warning (PR
  #26238).
- `display.timestamps` + `docker_extra_args` config keys (PR #23599).
- TUI collapsible startup banner sections (PR #20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR #22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs #27590 / #27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR #21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR #21337).
- ACP session-scoped edit auto-approval modes (PR #27862).
- Curator rename map in the user-visible per-run summary (PR #22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR #23828).
- Cron per-job profile parameter (PR #28124).
- `--no-skills` flag for `hermes profile create` (PR #20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).

* chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 9 (#28571)

Pre-stages AUTHOR_MAP entries for 9 new/under-mapped contributors whose
PRs are being salvaged in the May 2026 LHF batch group 9.

Contributors:
- jdelmerico (#28278 — signal require_mention filter)
- justemu (#27996 — matrix thread_require_mention)
- YuanHanzhong (#28029 — dashboard browser scrollback)
- noctilust (#28080 — drop stale TUI resume env)
- MoonJuhan (#28288 — tolerate unreadable JSONL transcripts)
- outsourc-e (#28164 — cron emoji ZWJ sequences)
- Zyrixtrex (#28275 — Google OAuth urlopen timeout)
- ooovenenoso (#28256 — tool loop recovery hints)
- vanthinh6886 (#28018 — yaml/flock/atomic write guards; non-noreply email)

Per references/batch-pr-salvage-may14-additions.md.

* feat(signal): add require_mention filter for group chats

Add a configurable mention filter to the Signal adapter so the bot
only responds in groups when it is explicitly @mentioned.

Changes:
- gateway/platforms/signal.py: read require_mention from adapter
  extra config or SIGNAL_REQUIRE_MENTION env var; skip group messages
  that don't mention the bot account (checked in rendered text and
  raw mention metadata)
- gateway/config.py: map signal.require_mention YAML key to the
  SIGNAL_REQUIRE_MENTION env var (env var takes precedence)

Config example:
  signal:
    require_mention: true

Or via env var:
  SIGNAL_REQUIRE_MENTION=true

* Revert "feat(telegram): pin incoming user message for duration of agent turn"

This reverts commit a724c3b9cf5f01e28365322ae5ae3a9579567806.

* Revert "feat(telegram): support quick-command-only menus"

This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4.

* Revert "feat(send_message): auto-detect @username mentions and create Telegram entities"

This reverts commit cf814c96f613b38bd891ac941c32da653e81c7ad.

* Revert "fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages"

This reverts commit db50af910be6b4171ea9cf54f4cc38be27ac1da6.

* fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856)

Pre-mark all running agent sessions as resume_pending BEFORE the drain
wait begins. If the service manager kills the process during the drain
(window), the durable marker is already written so the next gateway boot
can recover in-flight sessions. On graceful drain completion, clear the
early markers for sessions that finished successfully.

* fix(matrix): implement thread_require_mention to prevent multi-agent reply loops

In multi-agent shared Matrix rooms, multiple bots all participating in the
same thread could trigger infinite reply loops — each bot's reply re-engaged
the others because they were all in the bot-thread set. Discord has a
`thread_require_mention` opt-in for this; Matrix didn't.

Add `_parse_thread_require_mention(config)` (mirrors Discord's pattern).
In `_resolve_message_context`, when enabled and the message is in a
bot-participated thread (not a free-response room), require @mention
before processing.

Salvage of @justemu's 2-commit stack (#27996). Fixes #27995.

* fix(cli): show active profile in TUI prompt

* fix(tui): preserve dunder identifiers in markdown

* test(file_ops): add regression tests for git baseline warning in write_file

Adds TestGitBaselineCheck with 6 unit tests covering _check_git_baseline
and the warning field in write_file result:
- Git not available → None
- Not in a git repo → None
- Clean repo → None
- Dirty repo → returns warning string with branch name
- write_file result includes warning when dirty
- write_file result omits warning when clean

* fix(dashboard): use browser scrollback for chat wheel

* fix(cli): ignore stale HERMES_TUI_RESUME env

HERMES_TUI_RESUME is an internal env var the Python wrapper exports to hand
a session ID off to the Ink TUI. Because _launch_tui started from
os.environ.copy(), any exported/stale value in the user's shell leaked
through — so plain `hermes --tui` would try to resume a missing session
and leave the UI at 'error: session not found' with no live session.

Drop HERMES_TUI_RESUME from the env before conditionally re-setting it
from the argparse-resolved resume_session_id. Tests cover both the drop
path and the set-from-arg path.

Salvage of #28080 by @noctilust.

* fix(cron): allow emoji ZWJ sequences in prompts

* fix: tolerate unreadable gateway JSONL transcripts

* fix(skills): add timeout to Google OAuth urlopen calls

* fix: add recovery hints to loop guard warnings

* fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes

1. trajectory_compressor.py: yaml.safe_load() returns None on empty
   files, crashing with TypeError on `if 'tokenizer' in data`. Fix by
   adding `or {}` fallback. (HIGH — blocks startup with empty config)

2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without
   try/except: cron/scheduler.py, hermes_cli/auth.py,
   agent/shell_hooks.py, tools/skill_usage.py,
   tools/environments/file_sync.py, tools/memory_tool.py. If unlock
   raises OSError, fd.close() is skipped and the lock is held forever.
   The msvcrt branches already had try/except; the fcntl branches did
   not. Fix by wrapping in try/except (OSError, IOError): pass.

3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists()
   followed by path.read_text() with no try/except. If file is deleted
   between the check and the read, FileNotFoundError propagates. Fix
   by using try/except FileNotFoundError.

4. gateway/sticker_cache.py: non-atomic write via Path.write_text()
   can leave truncated JSON on crash, causing JSONDecodeError on next
   load. Fix by writing to tempfile + fsync + os.replace (atomic).

* chore(release): alias xxxigm noreply for upcoming #27986 salvage (#28594)

Adds the canonical noreply form (54813621+xxxigm@users.noreply.github.com)
alongside the existing plain-email mapping so the salvage commit for
@xxxigm's codex doctor PR doesn't fail AUTHOR_MAP CI.

* fix(doctor): attach codex CLI hint to OpenAI Codex auth warning for #27975

`hermes doctor` printed 'codex CLI not installed (optional — ...)' as a
generic info line at the bottom of the auth section, several rows below
'OpenAI Codex auth (not logged in)' and after MiniMax/Gemini auth checks.
Users reading sequentially mistook it for MiniMax-related advice.

Move the hint up under the Codex auth warning so it's adjacent to the
row it actually pertains to. Behavior unchanged when the codex CLI is
installed (success path keeps its 'codex CLI ✓' row at the bottom).
Tests cover both placement and suppression cases.

Salvage of @xxxigm's 3-commit stack (#27986).
Closes #27975.

* fix(tests): catch up 25 stale tests after recent merges (#28626)

Sweep of all CI failures on origin/main, grouped by drift source:

Telegram allowlist gate (db50af910 added user-authz to _should_process_message):
- Hardcoded "[Telegram]" prefix in the logger.warning so the call no
  longer dereferences self.name → self.platform, which test fixtures
  built via object.__new__ never set.
- test_telegram_format / test_allowed_channels_widening fixtures stub
  _is_callback_user_authorized → True so the new gate doesn't reject
  guest-mode / allowed-channels test messages.
- test_telegram_approval_buttons::test_update_prompt_callback_not_affected
  sets TELEGRAM_ALLOWED_USERS="*" so the fail-closed default doesn't
  reject the callback before it writes .update_response.

Approval surface (6d495d9e7 renamed status, 214b95392 detached stdin):
- test_no_callback_returns_approval_required: status is now
  "pending_approval" (was "approval_required").
- test_close_stdin_allows_eof_driven_process_to_finish: switch to
  use_pty=True; non-PTY now uses stdin=DEVNULL.

Mattermost (send() now resolves root_id via _api_get first):
- test_send_with_thread_reply mocks _session.get with a thread-root
  response so the new resolver doesn't TypeError on a bare AsyncMock.

Kanban (d8ad431de rename, f55d94a1e review column, _kanban_worker_skill_available):
- _safe_int → _to_epoch in the two test_kanban_db tests.
- Spawn-skills tests (×3) monkey-patch _kanban_worker_skill_available
  to True since the isolated kanban_home fixture has no devops/kanban-worker tree.
- test_gateway_dispatcher_disables_corrupt_board: connect count
  3 → 5 (review-column probe now also runs per tick).

Aux-config severity at_or_above (a94ddd807):
- test_diagnostics_endpoint_severity_filter expects warning filter to
  include error+critical now (was exact-match).

Anthropic error handling (conversation loop extracted from run_agent):
- _no_backoff_wait fixture patches BOTH run_agent.jittered_backoff AND
  agent.conversation_loop.jittered_backoff. The latter is the actual
  call site; without the second patch tests burn ~2s per retry and
  hit the 30s SIGALRM timeout on CI.

Other test pollution / drift:
- test_auto_does_not_select_copilot_from_github_token: patch
  agent.bedrock_adapter.has_aws_credentials → False so boto3's
  credential chain can't auto-pick Bedrock from developer ~/.aws.
- test_setup_openclaw_migration: patch hermes_cli.gateway.get_env_value
  in addition to setup_mod.get_env_value — _platform_status reads
  through the gateway module's binding.
- test_gateway_prefix: COMPONENT_PREFIXES["gateway"] now includes
  "hermes_plugins" too.
- test_recommended_update_command_defaults_to_hermes_update: also
  short-circuit get_managed_update_command in case a stray
  ~/.hermes/.managed marker is present.
- test_user_id_is_not_explicit: _parse_target_ref now returns
  is_explicit=False for Slack U.../W... IDs (chat.postMessage rejects
  them — a DM must be opened first via conversations.open).

* feat(update): syntax-validate critical files post-pull, auto-rollback on failure (#28669)

Catch the PR #28452 failure mode (orphan merge-conflict markers in
hermes_cli/config.py) on the user side: after git pull succeeds, compile
the files every 'hermes' invocation imports at startup. If any has a
syntax error, git reset --hard back to the pre-pull SHA so the install
stays bootable. User can retry once a fix lands upstream.

- New _capture_head_sha() + _validate_critical_files_syntax() helpers
- Wires both into _cmd_update_impl after the pull/reset succeeds
- Tests cover the helpers, the rollback flow, and a production-tree
  invariant (CI fails if main itself has a syntax error in a critical
  file — catches future broken commits before users hit them)

* feat: show names of user-modified skills in bundled skill sync summary

When 'hermes update' syncs bundled skills, the summary line only shows
the count of user-modified skills that were kept (e.g. '3 user-modified
(kept)'), but not *which* skills. Once the update finishes, the user
has no way to know which skills need triage.

Append the skill names to the summary line, truncated to 5 with a
'+N more' suffix for long lists:

  Done: 12 new, 3 updated, 7 unchanged, 3 user-modified (kept):
  hermes-agent, debugging-hermes-tui-commands, system-health.
  25 total bundled.

Closes #28121

* fix(acp): use tempfile.gettempdir() in workspace auto-approve

#28063 fixed the macOS `/tmp`→`/private/tmp` symlink issue by checking
the RAW path (pre-resolve) against startswith('/tmp/'). That works on
Linux + macOS but not on Windows — Path('/tmp/foo').resolve() returns
C:\\tmp\\foo and isn't the real Windows temp anyway.

Replace the hardcoded '/tmp/' prefix with Path(tempfile.gettempdir()).
resolve() + Path.relative_to() — same idiom as the cwd branch just
below. Works correctly on Linux (/tmp), macOS (/private/var/folders/...),
and Windows (%LOCALAPPDATA%\\Temp).

Test rewritten to use tempfile.gettempdir() so the assertion exercises
the same code path on every platform.

Conflict against the just-merged #28063 (raw_path approach) resolved
by replacing the whole raw_path block — tempfile.gettempdir() is
strictly better than that intermediate fix.

Salvage of #28262 by @Zyrixtrex.

* fix(kanban): stale reclaim must not tick failure counter (#28680)

Follow-up to #28452. detect_stale_running() was calling
_record_task_failure() on every reclaim, which ticked the
consecutive_failures counter. With the default failure_limit=2,
two legitimately long-running tasks (>4 h without explicit
heartbeat) would auto-block via the spawn-failure circuit
breaker — even though no worker actually failed.

Stale reclaim is dispatcher-side absence-of-heartbeat detection,
not a worker fault. Removed the _record_task_failure() call;
the 'stale' event in task_events is still the audit surface,
but the failure counter is now reserved for spawn_failed /
timed_out / crashed (real failures).

Also documents the heartbeat requirement:
- KANBAN_GUIDANCE in agent/prompt_builder.py now states the
  rule ('call kanban_heartbeat at least once an hour for tasks
  running longer than 1 hour') so workers learn the contract.
- kanban.md adds the stale event row to the events table and
  flags the heartbeat requirement in the worker lifecycle list.

New regression test: test_detect_stale_does_not_tick_failure_counter
locks in the new behaviour.

* fix(telegram): address post-merge audit follow-ups (#28670, #28672, #28674, #28676, #28678)

Five small fixes against issues filed during the post-merge salvage audit:

* #28670: `_GATEWAY_PROVIDER_ERROR_RE` false-positives on legitimate prose.
  Replace the regex with an anchored `_GATEWAY_PROVIDER_ERROR_SHAPE_RE` and
  add a length-cap heuristic to `_looks_like_gateway_provider_error`:
  short envelope at the start of the message → real provider error; long
  prose containing 'HTTP 404' → assistant answer, leave alone.

* #28672: drop the pointless 1s asyncio.sleep on Telegram thread-not-found
  retries. The same-thread retry is preserved (catches Telegram's
  occasional transient flake exercised by
  test_send_retries_transient_thread_not_found_before_fallback) but with
  no artificial delay.

* #28674: broaden `_should_retry_without_dm_topic_reply_anchor` to also
  fire when Bot API rejects `direct_messages_topic_id` for synthetic /
  resumed sends that have no reply anchor. Avoids dropping post-resume
  background notifications if the topic id goes stale.

* #28676: delete the dead image-document branch superseded by bd0c54d17
  (which returns early on the same extension set).

* #28678: extend chat-scoped allowlist (`TELEGRAM_GROUP_ALLOWED_CHATS`)
  to also cover `chat_type == 'channel'`, so operators can authorize
  channel posts by chat id without falling back to per-user allowlists.

Tests:
- scripts/run_tests.sh tests/gateway/test_telegram_thread_fallback.py -q  → 41/41
- scripts/run_tests.sh tests/cron/test_scheduler.py -q                    → 127/127
- broader test set: same 3 pre-existing test-pollution failures reproduce
  on plain main.

* chore(actions)(deps): bump the actions-minor-patch group across 1 directory with 2 updates

Bumps the actions-minor-patch group with 2 updates in the / directory: [google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml](https://github.com/google/osv-scanner-action) and [sigstore/gh-action-sigstore-python](https://github.com/sigstore/gh-action-sigstore-python).


Updates `google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml` from 2.3.5 to 2.3.8
- [Release notes](https://github.com/google/osv-scanner-action/releases)
- [Commits](https://github.com/google/osv-scanner-action/compare/c51854704019a247608d928f370c98740469d4b5...9a498708959aeaef5ef730655706c5a1df1edbc2)

Updates `sigstore/gh-action-sigstore-python` from 3.0.0 to 3.3.0
- [Release notes](https://github.com/sigstore/gh-action-sigstore-python/releases)
- [Changelog](https://github.com/sigstore/gh-action-sigstore-python/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sigstore/gh-action-sigstore-python/compare/f514d46b907ebcd5bedc05145c03b69c1edd8b46...04cffa1d795717b140764e8b640de88853c92acc)

---
updated-dependencies:
- dependency-name: google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml
  dependency-version: 2.3.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-minor-patch
- dependency-name: sigstore/gh-action-sigstore-python
  dependency-version: 3.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-minor-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(actions)(deps): bump docker/login-action from 3.7.0 to 4.1.0

Bumps [docker/login-action](https://github.com/docker/login-action) from 3.7.0 to 4.1.0.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/c94ce9fb468520275223c153574b00df6fe4bcc9...4907a6ddec9925e35a0a9e82d7399ccc52663121)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-version: 4.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(actions)(deps): bump docker/build-push-action from 6.19.2 to 7.1.0

Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.19.2 to 7.1.0.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/10e90e3645eae34f1e60eeb005ba3a3d33f178e8...bcafcacb16a39f128d818304e6c9c0c18556b85f)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: 7.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(actions)(deps): bump actions/setup-python from 5.3.0 to 6.2.0

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.3.0 to 6.2.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v5.3.0...a309ff8b426b58ec0e2a45f0f869d46889d02405)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: 6.2.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix(kanban): respawn guard defers blocker_auth instead of auto-blocking (#28683)

Follow-up to #28455. The respawn guard's blocker_auth rule (last error
matched a quota/auth/429 pattern) was auto-blocking the task on first
occurrence. That's too aggressive: transient rate limits typically
clear in seconds to minutes, but the auto-block puts the task in
'blocked' status which requires manual unblock.

Now treats blocker_auth the same as recent_success and active_pr:
defer the spawn this tick, leave the task in 'ready', let the next
tick try again. If the auth error genuinely persists, the existing
consecutive_failures counter trips the auto-block circuit breaker
after failure_limit failures via the normal path — so a persistent
401/403/quota-exhausted still ends up blocked, just not on first hit.

Also documents the respawn_guarded event in kanban.md's events table
with the three guard reasons.

Updated test_dispatch_respawn_guard_auto_blocks_auth_error → renamed
to test_dispatch_respawn_guard_defers_auth_error_without_auto_block;
asserts task stays in 'ready' and the guard reason is recorded.

* chore(actions)(deps): bump actions/checkout from 4.3.1 to 6.0.2

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.3.1 to 6.0.2.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/34e114876b0b11c390a56381ad16ebd13914f8d5...de0fac2e4500dabe0009e67214ff5f5447ce83dd)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 6.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix(dashboard): add scheduled kanban i18n strings (#28534)

Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(cli): exit prompt_toolkit cleanly on SIGTERM/SIGHUP instead of raising KeyboardInterrupt (#28688)

The SIGTERM/SIGHUP handler raised KeyboardInterrupt() at the end of its
agent-interrupt + grace-window sequence. Python delivers signals between
bytecodes on the main thread, so when the signal hit mid-event-loop
(typically inside prompt_toolkit's '_poll_output_size' coroutine's
'await asyncio.sleep()'), the KeyboardInterrupt unwound INTO that
coroutine. prompt_toolkit's Task captured it as a BaseException;
prompt_toolkit's '_handle_exception' then printed 'Unhandled exception
in event loop' + the full asyncio traceback and parked the terminal on
'Press ENTER to continue...' before exiting.

Same root cause as #13710, different surface: there the failure was an
EIO cascade after a logging-cache KeyError escaped the handler; here
it's the KBI raise itself landing inside an asyncio Task. The fix is
the same shape — let the event loop unwind on its own terms.

Now: schedule 'app.exit()' via 'loop.call_soon_threadsafe()'. The
prompt_toolkit Application returns normally from 'app.run()' and the
existing '(EOFError, KeyboardInterrupt, BrokenPipeError)' handler in
the input loop catches everything else. Fallback to 'raise
KeyboardInterrupt()' preserved for contexts where prompt_toolkit isn't
the active app (e.g. -q one-shot mode).

The agent interrupt + 1.5 s grace window run unchanged before the new
exit path, so subprocess-group cleanup ('os.killpg' on Linux) still
gets its window.

Tested live: external SIGTERM to the CLI (with 'kill <pid>') now exits
cleanly with no traceback dump and no ENTER pause.

* chore(deps): bump dompurify from 3.3.3 to 3.4.2 in /website

Bumps [dompurify](https://github.com/cure53/DOMPurify) from 3.3.3 to 3.4.2.
- [Release notes](https://github.com/cure53/DOMPurify/releases)
- [Commits](https://github.com/cure53/DOMPurify/compare/3.3.3...3.4.2)

---
updated-dependencies:
- dependency-name: dompurify
  dependency-version: 3.4.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump follow-redirects from 1.15.11 to 1.16.0 in /website

Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.11 to 1.16.0.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.15.11...v1.16.0)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-version: 1.16.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump lodash from 4.17.23 to 4.18.1 in /website

Bumps [lodash](https://github.com/lodash/lodash) from 4.17.23 to 4.18.1.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.23...4.18.1)

---
updated-dependencies:
- dependency-name: lodash
  dependency-version: 4.18.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump lodash-es and langium in /website

Bumps [lodash-es](https://github.com/lodash/lodash) and [langium](https://github.com/eclipse-langium/langium/tree/HEAD/packages/langium). These dependencies needed to be updated together.

Updates `lodash-es` from 4.17.23 to 4.18.1
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.23...4.18.1)

Updates `langium` from 4.2.1 to 4.2.3
- [Release notes](https://github.com/eclipse-langium/langium/releases)
- [Changelog](https://github.com/eclipse-langium/langium/blob/main/packages/langium/CHANGELOG.md)
- [Commits](https://github.com/eclipse-langium/langium/commits/HEAD/packages/langium)

---
updated-dependencies:
- dependency-name: lodash-es
  dependency-version: 4.18.1
  dependency-type: indirect
- dependency-name: langium
  dependency-version: 4.2.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump python-multipart from 0.0.22 to 0.0.27

Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.22 to 0.0.27.
- [Release notes](https://github.com/Kludex/python-multipart/releases)
- [Changelog](https://github.com/Kludex/python-multipart/blob/main/CHANGELOG.md)
- [Commits](https://github.com/Kludex/python-multipart/compare/0.0.22...0.0.27)

---
updated-dependencies:
- dependency-name: python-multipart
  dependency-version: 0.0.27
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump python-dotenv from 1.2.1 to 1.2.2

Bumps [python-dotenv](https://github.com/theskumar/python-dotenv) from 1.2.1 to 1.2.2.
- [Release notes](https://github.com/theskumar/python-dotenv/releases)
- [Changelog](https://github.com/theskumar/python-dotenv/blob/main/CHANGELOG.md)
- [Commits](https://github.com/theskumar/python-dotenv/compare/v1.2.1...v1.2.2)

---
updated-dependencies:
- dependency-name: python-dotenv
  dependency-version: 1.2.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump fast-uri from 3.1.0 to 3.1.2 in /website

Bumps [fast-uri](https://github.com/fastify/fast-uri) from 3.1.0 to 3.1.2.
- [Release notes](https://github.com/fastify/fast-uri/releases)
- [Commits](https://github.com/fastify/fast-uri/compare/v3.1.0...v3.1.2)

---
updated-dependencies:
- dependency-name: fast-uri
  dependency-version: 3.1.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump @babel/plugin-transform-modules-systemjs in /website

Bumps [@babel/plugin-transform-modules-systemjs](https://github.com/babel/babel/tree/HEAD/packages/babel-plugin-transform-modules-systemjs) from 7.29.0 to 7.29.4.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.29.4/packages/babel-plugin-transform-modules-systemjs)

---
updated-dependencies:
- dependency-name: "@babel/plugin-transform-modules-systemjs"
  dependency-version: 7.29.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix(web): consume bundled design system assets (#26391)

* fix: update design system package, replace bg image, remove sync assets

* fix(web): update bundled asset metadata

* fix(web): normalize npm lockfile metadata

* fix(nix): refresh npm lockfile hashes

* chore(ci): trigger PR checks

* fix(web): declare motion peer dependency

* fix(nix): refresh npm lockfile hashes

* chore(ci): trigger PR checks after dependency update

* fix(web): restore cross-platform lockfile entries

* fix(nix): refresh npm lockfile hashes

* chore(ci): trigger PR checks after lockfile restore

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* chore(deps): bump webpack-dev-server from 5.2.3 to 5.2.4 in /website (#28104)

Bumps [webpack-dev-server](https://github.com/webpack/webpack-dev-server) from 5.2.3 to 5.2.4.
- [Release notes](https://github.com/webpack/webpack-dev-server/releases)
- [Changelog](https://github.com/webpack/webpack-dev-server/blob/main/CHANGELOG.md)
- [Commits](https://github.com/webpack/webpack-dev-server/compare/v5.2.3...v5.2.4)

---
updated-dependencies:
- dependency-name: webpack-dev-server
  dependency-version: 5.2.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump ws in /ui-tui/packages/hermes-ink (#28183)

Bumps [ws](https://github.com/websockets/ws) from 8.20.0 to 8.20.1.
- [Release notes](https://github.com/websockets/ws/releases)
- [Commits](https://github.com/websockets/ws/compare/8.20.0...8.20.1)

---
updated-dependencies:
- dependency-name: ws
  dependency-version: 8.20.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump mermaid from 11.13.0 to 11.15.0 in /website (#24011)

Bumps [mermaid](https://github.com/mermaid-js/mermaid) from 11.13.0 to 11.15.0.
- [Release notes](https://github.com/mermaid-js/mermaid/releases)
- [Commits](https://github.com/mermaid-js/mermaid/compare/mermaid@11.13.0...mermaid@11.15.0)

---
updated-dependencies:
- dependency-name: mermaid
  dependency-version: 11.15.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix(kanban): migrate task session index after columns

* fix(kanban): also hoist idx_events_run + drop redundant inner create

Extends the previous commit to cover the remaining additive-column index
that sits on the same migration trap:

- ``task_events.run_id`` -> ``idx_events_run`` was still in SCHEMA_SQL.
  A legacy ``task_events`` table predating #17805 (no ``run_id``) would
  still abort ``executescript`` before ``_migrate_add_optional_columns``
  could add the column. Hoisted out of SCHEMA_SQL and made unconditional
  in the migration alongside the other three indexes.

- Removed the now-redundant ``CREATE INDEX idx_tasks_idempotency`` that
  was nested inside the ``if "idempotency_key" not in cols`` branch.
  The unconditional create lower in the function makes it idempotent
  on both fresh and legacy DBs.

- Strengthened the regression test to cover all four indexes
  (``idx_tasks_session_id``, ``idx_tasks_tenant``, ``idx_tasks_idempotency``,
  ``idx_events_run``) and to seed a pre-#17805 ``task_events`` shape that
  exercises the ``run_id`` migration path.

The result: every ``CREATE INDEX`` that depends on an additive column now
runs after the migration ensures the column exists. Verified against a
realistic pre-#16081 board fixture (tasks + task_events both legacy
shape) — origin/main reproduces ``no such column: session_id``; this
branch migrates cleanly and creates all four indexes.

* fix(discord): define view classes after lazy discord.py install

When discord.py is not installed at import time, DISCORD_AVAILABLE=False
and the view class definitions at module bottom are skipped.
check_discord_requirements() performs a lazy install and sets
DISCORD_AVAILABLE=True but never re-ran the class definitions, causing
NameError on the first button interaction (exec approval, slash confirm, etc.).

Extract the five ui.View subclasses into _define_discord_view_classes() and
call it both at module load (when discord.py is pre-installed) and inside
check_discord_requirements() after a successful lazy install.

* Merge pull request #28829 from NousResearch/bb/tui-no-history-truncation

fix(tui): render full assistant text in scrollback (no history truncation)

* chore: add erikengervall to AUTHOR_MAP (#28855)

For PR #28774 (firecrawl integration tag).

Co-authored-by: alt-glitch <balyan.sid@gmail.com>

* feat(firecrawl): add integration tag for Hermes usage in browser and web providers

* fix(model-switch): mark bare custom provider as current

* Revert "feat(firecrawl): add integration tag for Hermes usage in browser and web providers" (#28862)

This reverts commit 273ff5c4a47af4499bbe5e3b1139efd313995554.

* fix(update): quarantine hermes.exe vs concurrent Windows instance (#26670) (#26677)

* fix(update): detect concurrent hermes.exe on Windows; retry + restart-defer quarantine

Closes #26670.

When 'hermes update' runs on Windows with another hermes.exe alive (most
commonly the Hermes Desktop Electron app's spawned backend) _quarantine_running_hermes_exe()
fails to rename the venv shim with [WinError 32]. uv pip install -e .
then exits 2, the git-pull fast path is silently abandoned, and the ZIP
fallback runs (and fails the same way) before eventually succeeding.

This change implements three of the five proposed fixes from the issue:

1. Concurrent-instance detection (preferred fix). _detect_concurrent_hermes_instances()
   uses psutil to enumerate processes whose .exe is one of our venv shims
   (hermes.exe / hermes-gateway.exe), excluding the caller's PID. When any
   match exists, cmd_update prints an actionable message naming the
   blocking PIDs and exits 2 BEFORE any destructive work. New --force flag
   bypasses the gate.

2. Retry + restart-deferred fallback. _quarantine_running_hermes_exe()
   now retries the rename up to 4 times with 100/250/500/1000 ms backoff
   (covers the transient AV-scanner-handle case). If all retries fail,
   it schedules the replacement via MoveFileExW with the OS deferred-rename
   flag so the new shim can land at the original path and the update
   completes; the old image is fully unloaded after the user's next
   system restart.

3. Actionable warning text. The old 'Could not quarantine: [WinError 32]'
   warning is replaced with one that names the likely culprits (Hermes
   Desktop, REPLs, gateway, AV) and points to the new --force flag.

Tests:
- 13 new tests in tests/hermes_cli/test_update_concurrent_quarantine.py
  covering: psutil-based enumeration, self-pid exclusion, case-insensitive
  matching of .EXE, no-psutil graceful degradation, off-Windows no-op,
  helpful warning formatting, retry-then-succeed, restart-deferred fallback,
  cmd_update abort + exit code 2, and --force bypass.
- New autouse fixture in tests/hermes_cli/conftest.py defaults
  _detect_concurrent_hermes_instances to [] so the rest of the suite
  isn't tripped by the developer's own running hermes.exe. Opt-out marker
  'real_concurrent_gate' registered in pyproject.toml.
- Updating docs page (website/docs/getting-started/updating.md) gains a
  short section explaining the new Windows error and remediation.

* chore: refresh uv.lock to match pyproject.toml exact pins

aiohttp 3.13.4 -> 3.13.3 (matches pyproject pin: aiohttp==3.13.3)
anthropic 0.87.0 -> 0.86.0 (matches pyproject pin: anthropic==0.86.0)
hermes-agent 0.13.0 -> 0.14.0 (matches pyproject version)

CI's uv lock --check was failing on the merged state because main
drifted: pyproject.toml uses exact == pins for those two deps and the
hermes-agent version was bumped to 0.14.0 but the lockfile still had
0.13.0.

* fix(windows): hide cron script subprocess consoles

Apply CREATE_NO_WINDOW flags when the cron scheduler launches job scripts on Windows so gateway-managed no-agent cron jobs do not flash cmd or python console windows every tick.

* fix(windows): hide local subprocess consoles

Apply Windows CREATE_NO_WINDOW flags to foreground local terminal subprocesses and tracked background processes so Hermes operations do not flash or steal focus with extra console windows.

* fix(gateway): harden Windows gateway install lifecycle

Preserve Windows profile install decisions across UAC handoff, avoid visible console windows by launching via pythonw, make repeated install/start idempotent, recreate stale Scheduled Tasks, and separate start-now from login auto-start behavior. Add Windows gateway regression coverage and systemd setup tests for the shared install flow.

* test(gateway-windows): make ctypes.windll monkeypatch tolerant on non-Windows

Linux/macOS CI runners don't have ctypes.windll, so the elevated-gateway
test fails at module load. Adding raising=False lets monkeypatch install
the mock attribute without first requiring it to exist.

* fix(agent): set tool_name on tool-result messages at construction time

Introduces make_tool_result_message() in tool_dispatch_helpers.py as the
single place where tool-result message dicts are built. All six construction
sites in tool_executor.py, agent_runtime_helpers.py, and mini_swe_runner.py
now use it, so tool_name is set in memory from the moment a message is
created rather than relying on fallback logic in the flush paths.

Fixes blank tool_name in both state.db and JSON session logs.

Adds tests.

* fix(tui): termux-gate scrollback preservation, touch-friendly defaults

Adds a Termux runtime detection helper and gates three TUI defaults on it:

- Skip the startup scrollback clear on Termux so users can review/copy
  earlier output after reopening the app. Desktop keeps the existing
  \x1b[2J\x1b[H\x1b[3J slate (AlternateScreen takes over there anyway).
- Default INLINE_MODE on under Termux: primary-buffer rendering makes
  long-thread review and copy/paste much less fragile when users
  background/foreground the app. Override with HERMES_TUI_INLINE=0/1.
- Default mouse tracking off under Termux so touch selection isn't
  intercepted by terminal mouse protocols. Explicit override via
  HERMES_TUI_MOUSE_TRACKING=0/1; legacy HERMES_TUI_DISABLE_MOUSE still
  works on desktop.

Detection is purely env-based (TERMUX_VERSION or PREFIX path) with an
explicit opt-out HERMES_TUI_TERMUX_MODE=0 for debugging. Non-Termux
platforms keep every existing default.

Co-authored-by: adybag14-cyber <252811164+adybag14-cyber@users.noreply.github.com>

* feat: add BrowseShSource adapter for browse.sh skills catalog

Adds BrowseShSource — a new skill source adapter that integrates
Browserbase's browse.sh catalog (169+ site-specific SKILL.md files)
into the Hermes Skills Hub.

- BrowseShSource class in tools/skills_hub.py implementing SkillSource ABC
- Fetches browse.sh catalog API with 1h TTL cache
- Full-text search across name, title, description, hostname, category, tags
- fetch() downloads SKILL.md via sourceUrl (GitHub HTML -> raw URL conversion)
- Registered in create_source_router() after LobeHubSource
- Tests in tests/tools/test_skills_hub_browse_sh.py (7 tests, all passing)

* fix: register browse-sh in per-source limits and --source choices

- Add 'browse-sh' to _PER_SOURCE_LIMIT in both do_browse() and
  browse_skills() with limit=500 (covers full 171-skill catalog)
- Add 'browse-sh' to --source argparse choices for both
  'hermes skills browse' and 'hermes skills search'

Without these, browse-sh fell back to the default cap of 50 results
and was not filterable via --source.

* fix(browse-sh): fetch SKILL.md via /api/skills/{slug}+skillMdUrl

The catalog's sourceUrl points at github.com/browserbase/browse.sh,
whose underlying repository is not always public — most raw URLs derived
from it 404. Use the per-skill detail endpoint instead, which returns a
skillMdUrl CDN blob that reliably resolves to the SKILL.md text. Fall
back to a raw.githubusercontent.com sourceUrl if the detail call fails.

- tools/skills_hub.py: rewrite BrowseShSource.fetch() to resolve via
  /api/skills/{slug} -> skillMdUrl; drop the unreachable _to_raw_url
  helper; expose the resolved URL in bundle.metadata.skill_md_url.
- tests/tools/test_skills_hub_browse_sh.py: match the real catalog
  shape (name = task name, slug = host/task-id), exercise the
  detail-endpoint -> blob two-call flow, and add a fallback test.
- scripts/release.py: map kylejeong21@gmail.com -> Kylejeong2.

* docs(skills): document browse.sh source (#28939)

Add browse.sh (browse-sh) to the supported-sources table and
integrated-hubs section in user-guide/features/skills.md, and to the
--source notes in reference/cli-commands.md. Companion to the
BrowseShSource adapter merged in #28936.

* fix(cli): preserve setup config picker writes

Resync the setup wizard's in-memory config after the shared model picker writes to disk so the wizard's final save does not overwrite auxiliary choices or other provider updates.\n\nAdds a regression test for auxiliary task choices saved by the picker.

* fix(runtime): treat 'ollama'/'vllm'/'llamacpp' aliases like 'custom' for base_url trust (#27132)

When config.yaml has provider: ollama (or vllm/llamacpp/llama-cpp) with a
non-loopback base_url, auth.py's resolve_provider() correctly normalises
the alias to 'custom' at the top level, but two sites in runtime_provider.py
were still comparing the *original* string against the literal 'custom':

  - _config_base_url_trustworthy_for_bare_custom() rejected non-loopback
    URLs because cfg_provider_norm was 'ollama', not 'custom'.
  - _resolve_openrouter_runtime() only entered the trust branch when
    requested_norm == 'custom'.

Both sites now consult resolve_provider() and treat any alias that
resolves to 'custom' identically. Result: provider: ollama + LAN IP no
longer silently falls through to OpenRouter (HTTP 401), matching the
behaviour of provider: custom with the same base_url.

E2E verified across 6 cases (ollama/vllm/llamacpp/custom + LAN; ollama +
loopback; openrouter + cloud) — all route to the configured endpoint;
'frobnicate' + LAN still rejects with AuthError as before.

Also adds scripts/release.py AUTHOR_MAP entry for @stepanov1975
(PR #22074 — wizard config picker preservation, cherry-picked into the
preceding commit).

* perf(cli): defer openai._base_client import via sys.meta_path finder (#28864)

`cli.py` was eager-importing `openai._base_client` at module-load time
purely to monkeypatch `AsyncHttpxClientWrapper.__del__` (defense against
"Press ENTER to continue..." errors when AsyncOpenAI clients are GC'd
against dead event loops). That import cost ~166ms / ~30MB on every
cold CLI start because openai's type tree (responses/*, graders/*) is huge.

Replace with a `sys.meta_path` finder that intercepts the first import
of `openai._base_client` from anywhere in the codebase, lets the normal
load run, then applies the `__del__ = lambda self: None` patch before
control returns to the caller. Same correctness guarantee (patch
applies before any AsyncOpenAI instance can be constructed), zero cost
until the SDK is actually needed.

Hot path: every hermes chat / gateway boot / cron tick / subagent spawn.

A/B benchmark, 10 runs each, fresh subprocess:
                     BEFORE  AFTER   delta
  import cli wall    0.86s   0.62s   -28% (median)
  import cli wall    0.85s   0.59s   -31% (min)
  import cli RSS     91.2MB  74.0MB  -19% (median)

The `neuter_async_httpx_del` function in agent/auxiliary_client.py is
unchanged; its tests still pass and any future callers can still invoke
it directly.

Verified:
- import cli no longer pulls openai into sys.modules
- first 'from openai._base_client import AsyncHttpxClientWrapper'
  triggers the patch; __del__.__name__ == '<lambda>'
- tests/run_agent/test_async_httpx_del_neuter.py: 9/9 pass
- tests/agent/test_auxiliary_client.py: 159/159 pass
- tests/cli/: 715/715 pass

* perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866)

* perf(config): add load_config_readonly() fast path for hot agent loop

`load_config()` is called from the agent loop's per-API-call hot path via
`get_provider_request_timeout()` and `get_provider_stale_timeout()` —
both invoked once per turn from `_resolved_api_call_timeout()` in
run_agent.py.

Profiling a synthetic 20-tool-call agent run revealed:
- 21 invocations of `load_config()` cumulating 56ms (~17% of agent loop)
- 34,398 deepcopy calls totaling 37ms (config defensive deepcopy + chain)
- 8,652 `_expand_env_vars` invocations (~412 per turn)

Microbench (cache-hit, real config.yaml present):
  load_config()          265us/call  (125us deepcopy + 140us infra)
  load_config_readonly() 138us/call  (~48% faster)

`load_config_readonly()` returns the cached dict directly without the
defensive deepcopy. Documented contract: caller must not mutate. Returns
plain dict (not MappingProxyType) so downstream `isinstance(x, dict)`
guards keep working — caught during initial implementation when
MappingProxyType broke get_provider_request_timeout's guard logic.

Wired into hermes_cli/timeouts.py (the two functions called per agent
turn). load_config() is unchanged for the 263 other call sites that
mutate the result before save_config(), are not in the hot path, or
where the safety guarantee matters more than the perf.

Profile A/B (cached config, 21-turn agent loop):
                                BEFORE  AFTER   delta
  get_provider_request_timeout  55ms    16ms    -71%
  total function calls          399k    160k    -60%
  deepcopy calls (in hotspots)  34,398  ~0      ~elim

Verified:
- isinstance(load_config_readonly(), dict) is True
- timeout/stale resolutions correct
- load_config() still returns isolated mutable deepcopies
- tests/hermes_cli/test_config*.py / test_timeouts.py: 102/102 pass
- tests/cli/ + tests/agent/test_auxiliary_client.py: 883/883 pass

* perf(redact): substring pre-screens skip non-matching regex chains

Every log record passes through `RedactingFormatter.format` which calls
`redact_sensitive_text`, which historically ran ALL 13 secret-pattern
regexes against every line — including DB connection strings, JWTs,
Discord mentions, Signal phone numbers, etc. — even for typical clean
log records like 'INFO run_agent: API call completed'.

Add cheap substring pre-checks before each regex pass. False positives
still run the regex (which then matches nothing); false negatives are
impossible because every pattern requires the gated substring to match
its leading anchor:

- `_PREFIX_RE`        gated on any of 33 known credential prefix substrings
- `_ENV_ASSIGN_RE`    gated on `=` in text
- `_JSON_FIELD_RE`    gated on `:` and `"` in text
- `_AUTH_HEADER_RE`   gated on `uthorization`/`UTHORIZATION` in text
- `_TELEGRAM_RE`      gated on `:` in text
- `_PRIVATE_KEY_RE`   gated on `BEGIN` and `-----`
- `_DB_CONNSTR_RE`    gated on `://` in text
- `_JWT_RE`           gated on `eyJ` in text
- URL userinfo/query  gated on `://`
- `_redact_form_body` gated on `&` and `=`
- `_DISCORD_MENTION_RE` gated on `<@`
- `_SIGNAL_PHONE_RE`  gated on `+`

Microbench (5 typical log records, 20k iterations each):
                              BEFORE  AFTER  delta
  redact_sensitive_text per call  5.63us  1.79us  -68%

Real-world impact: ~244 log records emitted in a 30-turn agent loop, so
the chain saves ~1ms of CPU per conversation. Bigger win is the
reduction in regex execution and GC pressure during heavy logging
sessions (verbose logging, gateway message processing).

Security regression test: 30 secret-containing inputs (sk-/ghp_/JWT/DB
connstr/Auth-Bearer/private key/URL userinfo/Discord/Signal/etc.)
verified to produce identical redacted output before/after. All 75
existing tests/agent/test_redact.py cases pass.

The `?access_token=foo&code=bar` (bare query string, no scheme) case
that 'leaks' is pre-existing behavior — the URL query redaction
requires a well-formed URL with scheme+host. Not a regression.

* perf(run_agent): cache _needs_thinking_reasoning_pad result per (provider, model, base_url)

Profile of a 31-turn synthetic agent run shows `_needs_thinking_reasoning_pad`
fires 495 times (~16 per turn) and each call ran 3 helper methods, each
hitting `base_url_host_matches` 1-4 times via `urlparse`. Total cost:
3,342 base_url_host_matches calls + 3,373 urlparse calls accounting for
~36ms of agent-loop overhead (~7% of the entire post-network work).

Provider / model / base_url don't change during a conversation except via
`switch_model` and fallback activation — both of which already overwrite
those attributes atomically. Cache the result on a tuple key; since the
key is derived from the very fields that would change, the cache
auto-invalidates on the next read after a switch. No manual invalidation
needed in switch_model / _try_activate_fallback.

Profile A/B (31-turn cached-config agent run):
                                      BEFORE  AFTER  delta
  _needs_thinking_reasoning_pad cum    18ms    1ms    -94%
  _copy_reasoning_content_for_api cum  17ms    1ms    -94%
  base_url_host_matches calls          3,342   372    -89%
  urlparse calls                       3,373   403    -88%
  total function calls                 296k    223k   -25%

Verified:
- tests/run_agent/test_deepseek_reasoning_content_echo.py: 36/36 pass
- tests/run_agent/ (full): 1383/1383 pass + 3 skipped

* chore(deps): bump ws from 8.20.0 to 8.20.1 in /ui-tui

Bumps [ws](https://github.com/websockets/ws) from 8.20.0 to 8.20.1.
- [Release notes](https://github.com/websockets/ws/releases)
- [Commits](https://github.com/websockets/ws/compare/8.20.0...8.20.1)

---
updated-dependencies:
- dependency-name: ws
  dependency-version: 8.20.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix(install.ps1): pin PortableGit instead of hitting rate-limited GitHub API (#28943)

The Windows installer fetched the latest git-for-windows release via
api.github.com/repos/git-for-windows/git/releases/latest, which is
rate-limited to 60 requests/hour/IP for unauthenticated callers. Users
behind CGNAT, corporate NAT, dorm WiFi, or shared ISP routinely hit the
limit, and the installer aborts asking them to install Git manually.

Switch to a pinned release tag (v2.54.0.windows.1) and a static
github.com/.../releases/download/<tag>/<asset> URL. Static download
URLs are served by GitHub's blob storage and are not subject to the
API rate limit.

Trade-offs:
- We have to bump the pin when we want a newer Git for Windows. The
  installer doesn't depend on Git features beyond 'works', so this is
  a once-a-year maintenance cost at most.
- Loses the (cosmetic) MB size display, since we no longer have asset
  metadata. Replaced with the version string in the 'Downloading ...'
  line instead.

* fix(model): match custom provider by active base url

* 🐛 fix(cli): handle no-remote worktree cleanup

* 🐛 fix(cli): handle missing remote tracking refs

* fix(xai-oauth): pin inference base_url to x.ai origin (#28952)

XAI_BASE_URL / HERMES_XAI_BASE_URL let users repoint the OAuth-authenticated
inference endpoint, but the env override was an unguarded credential-leak
vector: a tampered .env or hostile shell init setting
XAI_BASE_URL=https://attacker.example/v1 would silently ship the SuperGrok
OAuth bearer to a third party on every request.

Add _xai_validate_inference_base_url() that pins the host to x.ai or a
*.x.ai subdomain and rejects non-HTTPS. On rejection, fall back to the
default with a warning rather than raise — a bad env var should not
deadlock auth, but should never leak the bearer either.

Apply at all three sites that read the env override for xai-oauth:
- hermes_cli/auth.py resolve_xai_oauth_runtime_credentials (main path)
- hermes_cli/auth.py _xai_oauth_loopback_login (initial login)
- agent/auxiliary_client.py _resolve_xai_oauth_for_aux (aux client)

E2E validated against four scenarios: attacker.example, lookalike
api.x.ai.evil.com, http:// downgrade on api.x.ai, and legit custom.x.ai
subdomain (which still resolves correctly).

Discovered while comparing against the opencode-grok-auth plugin
(github.com/ysnock404/opencode-grok-auth), which highlighted the same
guard on the OpenCode side.

* fix(kanban): worker-initiated block must not be auto-promoted (#28712)

When a worker calls ``kanban_block(reason="review-required: ...")`` to
hand a task off for human review, the dispatcher's ``recompute_ready``
was treating the resulting ``blocked`` status as eligible for
auto-promotion — exactly the same as a circuit-breaker block.  On the
next tick the task flipped back to ``ready``, a fresh worker spawned,
found nothing to do (work already applied, review-required comment
already posted), exited cleanly, got recorded as ``protocol_violation``
→ ``gave_up`` → ``blocked``, and the dispatcher promoted again.
Infinite loop until manual ``hermes kanban reclaim`` + ``kanban block``.

Add ``_has_sticky_block`` which distinguishes the two block sources
using the cheapest available signal: the most recent
``"blocked"``/``"unblocked"`` event in ``task_events``.

* Worker / operator ``kanban_block`` emits ``"blocked"`` →
  ``_has_sticky_block`` returns True → ``recompute_ready`` skips the
  task entirely.  ``unblock_task`` emits ``"unblocked"`` which flips
  the predicate back, so the only legitimate exit is the documented
  human-in-the-loop path.
* Circuit-breaker ``_record_task_failure`` emits ``"gave_up"`` (not
  ``"blocked"``) → predicate stays False → original
  parent-completion-recovery semantics from #40c1decb3 are preserved.
* Tasks blocked purely by direct DB manipulation also recover, since
  they have no ``"blocked"`` event row at all — matches the existing
  ``test_recompute_ready_promotes_blocked_with_done_parents`` fixture
  behaviour.

* test(kanban): cover sticky blocks for worker-initiated kanban_block (#28712)

Six regression tests pinning the dispatcher contract that was broken
in #28712:

* test_worker_block_is_not_auto_promoted_by_recompute_ready —
  kanban_block survives five back-to-back ticks (compressed dispatcher
  loop).
* test_worker_block_on_child_with_done_parents_is_still_sticky —
  the parent-completion code path was the worst false-positive; even
  when every parent is done, an explicit worker block stays blocked.
* test_circuit_breaker_block_still_auto_promotes — preserves the
  pre-#28712 recovery semantics for circuit-breaker blocks (direct
  UPDATE + no "blocked" event).
* test_gave_up_event_alone_does_not_make_block_sticky — explicit
  guard so the gave_up event is never accidentally treated as
  sticky; covers the second leg of the protocol_violation loop.
* test_unblock_clears_sticky_state_and_lets_block_recover — only
  unblock_task resolves the sticky state; subsequent circuit-breaker
  blocks recover normally.
* test_protocol_violation_loop_is_broken — full bug-shaped
  reproduction: block → tick → (would-be) crash + gave_up → next tick
  still blocked.  Without the fix this would loop indefinitely.

The seventh test from the original PR (legacy-DB init recovery) was
dropped during salvage — the schema-init half of #28712 is already
fixed on main by #28754 and #28781, and the contract is covered by
test_kanban_db.py::test_connect_migrates_legacy_db_before_optional_column_indexes.

* fix(discord): transcribe n…
Mucky010 pushed a commit to Mucky010/hermes-agent that referenced this pull request May 24, 2026
…earch#28497)

Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR NousResearch#27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR NousResearch#25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR NousResearch#24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  NousResearch#28116 / NousResearch#28118 / NousResearch#28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs NousResearch#27663 / NousResearch#19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR NousResearch#28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR NousResearch#25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR NousResearch#26824).
- `x_search` auto-enables when xAI credentials are present (PR NousResearch#27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  NousResearch#26534).
- NVIDIA NIM billing-origin header is set automatically (PR NousResearch#26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR NousResearch#28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  NousResearch#27822).
- Document `dep_ensure` Windows bootstrap (PR NousResearch#27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR NousResearch#27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR NousResearch#26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR NousResearch#21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR NousResearch#27245).
- Discord clarify-choice button rendering (PR NousResearch#25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  NousResearch#22759).
- Telegram `notifications` mode (`important` vs `all`) (PR NousResearch#22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR NousResearch#21210).

CLI / TUI
- `/new [name]` argument (PR NousResearch#19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR NousResearch#25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR NousResearch#22687).
- Status-bar additions: ▶ N background indicator (PR NousResearch#27175), context
  compression count (PR NousResearch#21218), YOLO mode banner+statusbar warning (PR
  NousResearch#26238).
- `display.timestamps` + `docker_extra_args` config keys (PR NousResearch#23599).
- TUI collapsible startup banner sections (PR NousResearch#20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR NousResearch#23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR NousResearch#22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs NousResearch#27590 / NousResearch#27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR NousResearch#21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR NousResearch#21337).
- ACP session-scoped edit auto-approval modes (PR NousResearch#27862).
- Curator rename map in the user-visible per-run summary (PR NousResearch#22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR NousResearch#23828).
- Cron per-job profile parameter (PR NousResearch#28124).
- `--no-skills` flag for `hermes profile create` (PR NousResearch#20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
…earch#28497)

Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR NousResearch#27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR NousResearch#25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR NousResearch#24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  NousResearch#28116 / NousResearch#28118 / NousResearch#28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs NousResearch#27663 / NousResearch#19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR NousResearch#28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR NousResearch#25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR NousResearch#26824).
- `x_search` auto-enables when xAI credentials are present (PR NousResearch#27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  NousResearch#26534).
- NVIDIA NIM billing-origin header is set automatically (PR NousResearch#26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR NousResearch#28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  NousResearch#27822).
- Document `dep_ensure` Windows bootstrap (PR NousResearch#27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR NousResearch#27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR NousResearch#26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR NousResearch#21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR NousResearch#27245).
- Discord clarify-choice button rendering (PR NousResearch#25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  NousResearch#22759).
- Telegram `notifications` mode (`important` vs `all`) (PR NousResearch#22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR NousResearch#21210).

CLI / TUI
- `/new [name]` argument (PR NousResearch#19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR NousResearch#25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR NousResearch#22687).
- Status-bar additions: ▶ N background indicator (PR NousResearch#27175), context
  compression count (PR NousResearch#21218), YOLO mode banner+statusbar warning (PR
  NousResearch#26238).
- `display.timestamps` + `docker_extra_args` config keys (PR NousResearch#23599).
- TUI collapsible startup banner sections (PR NousResearch#20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR NousResearch#23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR NousResearch#22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs NousResearch#27590 / NousResearch#27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR NousResearch#21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR NousResearch#21337).
- ACP session-scoped edit auto-approval modes (PR NousResearch#27862).
- Curator rename map in the user-visible per-run summary (PR NousResearch#22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR NousResearch#23828).
- Cron per-job profile parameter (PR NousResearch#28124).
- `--no-skills` flag for `hermes profile create` (PR NousResearch#20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).
#AI commit#
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…arch#20986)

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…earch#28497)

Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026,
roughly 1,080 PRs). The audit found ~50 user-visible features that had landed
in code with no docs footprint, plus a handful of stale pages. This PR closes
every gap the scan turned up.

New pages
- user-guide/features/deliverable-mode.md — extension list, agent triggers,
  kanban_complete artifacts pattern, [[as_document]] override (PR NousResearch#27813).
- developer-guide/web-search-provider-plugin.md — authoring guide modeled on
  image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR NousResearch#25448).

Providers / auth
- Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the
  display label shows up; provider id stays `alibaba` (PR NousResearch#24835).
- Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs
  NousResearch#28116 / NousResearch#28118 / NousResearch#28119).
- Document Nous JWT minting from refresh token + invalid-refresh quarantine
  + cross-profile shared token store (PRs NousResearch#27663 / NousResearch#19712).
- Add `## Microsoft Entra ID authentication (keyless)` section to
  azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic
  routing details (PR NousResearch#28101 / #9df9816da).
- Custom providers `api_mode` is now prompted-and-persisted, not just URL
  autodetected (PR NousResearch#25068).
- Delegation honours `api_mode` + auto-detects anthropic_messages base URLs
  (PR NousResearch#26824).
- `x_search` auto-enables when xAI credentials are present (PR NousResearch#27376).
- Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR
  NousResearch#26534).
- NVIDIA NIM billing-origin header is set automatically (PR NousResearch#26585).

Windows / installer
- `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus
  the BOM-strip / git-retry hardening (PR NousResearch#28169).
- Document Hermes Desktop thin installer + first-launch bootstrap (PR
  NousResearch#27822).
- Document `dep_ensure` Windows bootstrap (PR NousResearch#27845).
- Document install-method auto-detection (pip / git / homebrew / nixos) and
  the matching update command (PR NousResearch#27843).

Gateway / messaging
- `/platform list|pause|resume` full description + circuit-breaker
  semantics (PR NousResearch#26600).
- Slack / Matrix / Mattermost get parallel `allowed_channels` /
  `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk
  (PR NousResearch#21251).
- Discord `allow_any_attachment` + `max_attachment_bytes` (config and env
  vars) (PR NousResearch#27245).
- Discord clarify-choice button rendering (PR NousResearch#25485).
- Telegram `guest_mode` @mention bypass for allowlisted groups (PR
  NousResearch#22759).
- Telegram `notifications` mode (`important` vs `all`) (PR NousResearch#22793).
- `[[as_document]]` skill / response directive for forcing
  document-style media delivery (PR NousResearch#21210).

CLI / TUI
- `/new [name]` argument (PR NousResearch#19637).
- `/subgoal` user-supplied criteria appended to `/goal` (PR NousResearch#25449).
- `/exit --delete` flag confirmation prompts for destructive slash
  commands (PR NousResearch#22687).
- Status-bar additions: ▶ N background indicator (PR NousResearch#27175), context
  compression count (PR NousResearch#21218), YOLO mode banner+statusbar warning (PR
  NousResearch#26238).
- `display.timestamps` + `docker_extra_args` config keys (PR NousResearch#23599).
- TUI collapsible startup banner sections (PR NousResearch#20625).
- `HERMES_SESSION_ID` exported to tool subprocesses (PR NousResearch#23847).

i18n
- Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja,
  de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches
  `agent/i18n.py:SUPPORTED_LANGUAGES`.

Tools / features
- `vision_analyze` native-pixel passthrough for vision-capable callers,
  with auxiliary text-describer fallback (PR NousResearch#22955).
- `session_search` rewrite to the single-shape tool (discovery / scroll /
  browse modes) (PRs NousResearch#27590 / NousResearch#27840).
- Clarify MCP transport scope: client supports stdio + SSE; embedded
  `hermes mcp serve` is stdio-only (PR NousResearch#21227).
- Web search backends table: add Brave Search (free tier) and DDGS rows
  (PR NousResearch#21337).
- ACP session-scoped edit auto-approval modes (PR NousResearch#27862).
- Curator rename map in the user-visible per-run summary (PR NousResearch#22910).
- Prompt caching feature page reference in features/overview.md — Claude
  cross-session 1-hour prefix cache on native Anthropic / OpenRouter /
  Nous Portal (PR NousResearch#23828).
- Cron per-job profile parameter (PR NousResearch#28124).
- `--no-skills` flag for `hermes profile create` (PR NousResearch#20986).

Build
- Verified with `npm run build` in `website/`; both `en` and `zh-Hans`
  locales compile. Remaining broken-link/anchor warnings are pre-existing
  (`rl-training.md` from learning-path / overview; the
  zh-Hans translation lag the docs skill already calls out).
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…arch#20986)

Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants