feat(kanban): configure worktree paths and branches by aqilaziz · Pull Request #26496 · NousResearch/hermes-agent

aqilaziz · 2026-05-15T18:04:52Z

Summary

allow kanban create --workspace worktree: to persist an explicit worktree target
add --branch for worktree tasks and expose it via task JSON, show output, worker context, and HERMES_KANBAN_BRANCH
migrate legacy kanban DBs with a nullable branch_name column and document the worker workflow

Tests

python -m py_compile hermes_cli/kanban.py hermes_cli/kanban_db.py tests/hermes_cli/test_kanban_cli.py tests/hermes_cli/test_kanban_db.py tests/e2e/conftest.py tests/run_agent/test_provider_parity.py
python -m ruff check hermes_cli/kanban.py hermes_cli/kanban_db.py tests/hermes_cli/test_kanban_cli.py tests/hermes_cli/test_kanban_db.py tests/e2e/conftest.py tests/run_agent/test_provider_parity.py
python -m pytest -o addopts='' tests/hermes_cli/test_kanban_cli.py tests/hermes_cli/test_kanban_db.py -q
python -m pytest -o addopts='' tests/e2e/test_discord_adapter.py -q
python -m pytest -o addopts='' tests/run_agent/test_provider_parity.py::TestDeveloperRoleSwap::test_developer_role_via_nous_portal tests/run_agent/test_provider_parity.py::TestBuildApiKwargsNousPortal::test_includes_nous_product_tags tests/run_agent/test_provider_parity.py::TestBuildApiKwargsNousPortal::test_uses_chat_completions_format -q
git diff --check

@aqilaziz

Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three.

teknium1 · 2026-05-19T04:33:14Z

Merged via PR #28462 (#28462). Cherry-picked the substantive commit (a7558cf27); your tip was an unrelated service-path fix that doesn't belong with this change. Resolved 2 conflicts where main had landed sibling features touching the same lines (session_id, max_runtime_seconds) — kept all three. Authorship preserved via rebase merge. Thanks for the contribution!

* fix(wecom): handle WSMsgType.CLOSING to prevent CPU spin The WeCom adapter's _read_events() loop only handled CLOSE, CLOSED, and ERROR websocket message types. When the server initiates a graceful shutdown, aiohttp returns WSMsgType.CLOSING before the connection is fully closed. This message type was not handled, causing the receive() call to return immediately in a tight loop while self._ws.closed remained False. The result was 100% CPU usage on the asyncio event loop. Add WSMsgType.CLOSING to the set of terminal message types that raise RuntimeError("WeCom websocket closed"), allowing _listen_loop() to enter its normal reconnect backoff path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(auth): treat empty credential pool entries as unauthenticated Fixes #28140 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: include hermes_plugins in gateway.log component filter gateway.log uses a _ComponentFilter that only passes records from loggers starting with ('gateway',). Plugin modules are loaded under the hermes_plugins.* namespace, so all plugin log output is silently dropped from gateway.log. This makes plugin registration — which directly affects gateway hooks (pre_gateway_dispatch, transform_llm_output, etc.) — invisible in the gateway-specific log. Operators debugging gateway behavior check gateway.log and see no plugin activity, even when plugins are working correctly. Add 'hermes_plugins' to the gateway component prefixes tuple so plugin log messages appear in gateway.log. Closes #28138 * fix(gateway): align kanban artifact _IMAGE_EXTS with response dispatch _deliver_kanban_artifacts used a broader _IMAGE_EXTS that included .bmp, .tiff, and .svg. These three extensions are absent from the equivalent set in _deliver_media_from_response (line 10661), which intentionally routes them through send_document rather than send_multiple_images (comment near line 10522 notes that Telegram sendPhoto recompresses and rejects non-raster formats). Routing .svg (XML text), .bmp, or .tiff through the photo API causes send_multiple_images to raise on most platforms; the exception is caught and logged as a warning, silently dropping the artifact. Aligning the two sets ensures kanban deliverables with these extensions follow the same send_document path as regular agent responses. No behaviour change for .png/.jpg/.jpeg/.gif/.webp. * fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze Background process non-PTY path used stdin=subprocess.PIPE unconditionally, creating an orphan pipe that was never written to and never closed. Child processes that read stdin would block indefinitely, competing with the parent's prompt_toolkit event loop for terminal ownership and causing complete keyboard lockout. Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin reads instead of blocking forever. For interactive stdin, the PTY path (which has its own independent PTY via ptyprocess.PtyProcess.spawn) should be used instead. Fixes #17959 * chore(release): alias stale-ID salvage commit for @LifeJiggy (#28317) * fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze Background process non-PTY path used stdin=subprocess.PIPE unconditionally, creating an orphan pipe that was never written to and never closed. Child processes that read stdin would block indefinitely, competing with the parent's prompt_toolkit event loop for terminal ownership and causing complete keyboard lockout. Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin reads instead of blocking forever. For interactive stdin, the PTY path (which has its own independent PTY via ptyprocess.PtyProcess.spawn) should be used instead. Fixes #17959 * chore(release): alias stale-ID salvage commit for LifeJiggy PR #28315 was salvaged with a wrong noreply numeric ID (192385615 vs the correct 141562589). The commit on main is correctly authored to LifeJiggy by username, but the noreply email doesn't match AUTHOR_MAP. Adds an alias so release-notes generation maps both forms to the same contributor. --------- Co-authored-by: LifeJiggy <192385615+LifeJiggy@users.noreply.github.com> * fix: elevate plugin discovery failures from debug to warning Plugin discovery exceptions in gateway startup (gateway/run.py) and CLI startup (hermes_cli/main.py) are caught and logged at DEBUG level, making them invisible at the default INFO log level. If any plugin import fails — syntax error, missing dependency, import cycle — operators get zero indication unless they bump the log level to DEBUG. This makes broken plugins appear enabled but silently non-functional. Change both locations to logger.warning() so failures are visible at production log levels. Closes #28137 * fix: treat inline-shell timeout guard as timeout * fix(acp): resolve /tmp symlink before workspace auto-approve check on macOS Path.resolve() follows the /tmp -> /private/tmp symlink on macOS, so str(path).startswith("/tmp/") is always False for temp-dir paths. The "Accept Edits" (workspace_session) mode silently refused to auto-approve every /tmp write on macOS, breaking the documented behaviour and making the existing test fail on this platform. Fix: keep the raw expanded path (pre-resolve) for the /tmp prefix check and continue using the resolved form only for the cwd relative_to() call where symlink resolution is correct behaviour. * fix(kanban): single-row horizontal scroll for board columns Switch .hermes-kanban-columns from auto-fit CSS grid to a flex row with overflow-x: auto and a hidden scrollbar (scrollbar-width / ::-webkit- scrollbar), and pin .hermes-kanban-column to flex: 0 0 280px so columns sit side-by-side at a fixed width instead of wrapping into a 2xN grid. Page vertical scroll is unaffected: each column already caps at max-height: calc(100vh - 220px), so the container never grows tall enough to introduce its own vertical scrollbar. * fix(approval): surface pending-approval state with explicit marker visible to LLM When a tool call requires user approval in the non-blocking gateway path, the LLM previously received a result that was indistinguishable from a failed tool call (exit_code=-1, error=message). The LLM could not tell whether the tool was pending approval, had returned empty results, or had failed silently — causing it to burn context on wrong hypotheses. Fix changes the result format to include: - status: pending_approval (clear state name) - approval_pending: True (explicit boolean for LLMs to detect) - error: cleared to empty string (removes misleading error signal) This lets the LLM reason about approval latency vs actual errors, short-circuiting the previous silent failure mode. Fixes #14806 * fix: recognize emoji and caret as natural response endings GLM models via Ollama report finish_reason='stop' even when the response was truncated by max_tokens. The continuation mechanism uses _has_natural_response_ending() as one of the heuristics to detect whether the response was genuinely finished. Currently only ASCII punctuation and CJK punctuation are recognized. This means any response ending with an emoji (e.g. ⚡, 👍) or the caret character ^ (common in French ^^ smiley) is not recognized as naturally ended, triggering a false-positive continuation where the model receives 'Continue where you left off' and produces garbled output. Add: - ^ (caret) to the punctuation set - Unicode emoji range (codepoint >= 0x1F300) as natural ending This only affects GLM/Ollama users but the fix is safe for all backends since _has_natural_response_ending() is only consulted inside the continuation flow. * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 8 (#28328) Pre-stages AUTHOR_MAP entries for 10 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 8). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - AceWattGit (#28159 — _pool_may_recover_from_rate_limit NameError) - YuanHanzhong (#28032 — x.com/status fallbacks link-like) - colin-chang (#28245, #28249, #28251 — gateway + mattermost fixes) - felix-windsor (#28019 — preserve cron asterisks in strip mode) - houenyang-momo (#28205 — charizard completion menu contrast) - iqdoctor (#28095 — windows installer docs) - joe102084 (#28151 — whitespace-only cron responses) - jvinals (#27936 — Slack U-IDs → DM channel) - maxmilian (#28267 — ModelPickerDialog portal) - samggggflynn (#27952 — dingtalk pre_start) Per references/batch-pr-salvage-may14-additions.md. * fix: add pre_start() to _IncomingHandler for dingtalk SDK compatibility The dingtalk-stream SDK calls pre_start() on every registered handler before opening the WebSocket connection. Without this method, the SDK raises AttributeError and kills the stream connection, causing DingTalk to be unable to connect via Stream Mode. * fix(windows): handle redirected stdout in _cprint fallback Wraps _pt_print in try/except with a print() fallback. When a kanban worker's stdout is piped to a log file, prompt_toolkit raises NoConsoleScreenBufferError (Windows) or OSError (other) because there is no real console buffer. The fallback keeps worker output flowing instead of crashing. * chore(release): alias stale-ID salvage commit for @Grogger (#28334) PR #28330 was salvaged with a wrong noreply numeric ID (18091625 vs the correct 7065068). The commit on main is correctly authored to Grogger by username, but neither noreply form was in AUTHOR_MAP. Adds both so release-notes generation maps them to @Grogger. * fix(aux): remove stale session_search model menu entry * fix(tui): keep x status citation fallbacks link-like * fix(xai-oauth): quarantine dead tokens on terminal refresh failure resolve_xai_oauth_runtime_credentials() called _refresh_xai_oauth_tokens() with no try/except. A terminal refresh failure (HTTP 400/401/403 — invalid_grant, token revoked) propagated without clearing the dead access_token / refresh_token from auth.json, causing every subsequent session to retry the same doomed network request. Add a try/except around the refresh call that mirrors the existing credential_pool.py quarantine: when _is_terminal_xai_oauth_refresh_error identifies a non-retryable failure, clear the dead token fields from auth.json and write a last_auth_error diagnostic marker so future calls fail fast with a clear relogin_required error instead of hitting the network. active_provider is preserved (set_active=False) so multi-provider users whose chosen provider is not xai-oauth are unaffected. Tests: two new cases in test_auth_xai_oauth_provider.py cover terminal quarantine and transient pass-through. * feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644) The background review prompts (_SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT) now include explicit protection rules for bundled, hub-installed, and pinned skills — aligning with the curator's existing policy at curator.py L345/350. Before this change, bg-review could freely rewrite bundled skills like 'hermes-agent' or pinned skills, while the 7-day curator explicitly skips them. The review agent now sees: • Bundled skills (shipped with Hermes) • Hub-installed skills (installed via hermes skills install) • Pinned skills (marked via hermes curator pin) If only protected skills need updating, the review says 'Nothing to save.' and stops. Fixes #27644 * fix(web): portal Change Model modal so it renders above the app sidebar The dashboard's main column is `relative z-2` (App.tsx), which creates a stacking context that traps fixed descendants below the app sidebar (`z-50`). `ModelPickerDialog` renders `fixed inset-0 z-[100]` inline, so its z-100 is scoped to z-2 and the sidebar covers its left edge. The bug is visible across all themes but only obvious in the Large theme variants (Hermes Teal (Large), etc.) where the larger root font widens the dialog into the sidebar's column. Toast.tsx already documents the same trap and uses the same `createPortal(..., document.body)` escape. This commit ports the picker; the same pattern affects other inline z-[100] modals in the dashboard (OAuthLoginModal, Cron / Models / Profiles page modals) and is left for a follow-up — keeping this PR scoped to the reporter's specific case. Fixes #28103 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): exit code 75 on service restart so launchd relaunches When the gateway receives SIGUSR1 (graceful restart via launchd_restart), the SIGUSR1 handler calls request_restart(via_service=True) and the gateway shuts down cleanly with exit code 0. However, the generated launchd plist uses KeepAlive → SuccessfulExit → false, meaning launchd only relaunches on *non-zero* exit codes. A clean exit(0) is treated as "successful, don't restart", so the gateway stays down after /restart, /update, or SIGUSR1. The systemd unit template already uses RestartForceExitStatus=75 for the same scenario. Mirror that convention: when _restart_via_service is True, raise SystemExit(75) so launchd's SuccessfulExit=false policy triggers a relaunch. Closes #28135 * fix: guard json.loads() against invalid TTS and skill_view responses Two code paths call json.loads() on output from external tools without catching JSONDecodeError. If the tool returns a non-JSON string (error message, empty string, or None), the entire call path crashes. 1. gateway/run.py — text_to_speech_tool() result in voice reply path. A TTS failure that returns an error string instead of JSON crashes the voice reply handler, killing the message response entirely. 2. cron/scheduler.py — skill_view() result when loading skills for cron jobs. A corrupted or missing skill file that returns an error string instead of JSON crashes the cron tick, preventing all jobs from executing that cycle. Both fixes catch (json.JSONDecodeError, TypeError), log a warning, and gracefully skip the failed operation instead of crashing. * fix(gateway): bridge gateway_restart_notification from YAML platform sections Two related bugs in gateway/config.py prevented per-platform gateway_restart_notification from working through config.yaml: 1. The shared-key bridging loop (load_gateway_config) omitted 'gateway_restart_notification', so the key never landed in platform_data['extra'] even when set under e.g. 'discord:' or 'mattermost:' sections. 2. PlatformConfig.from_dict() only read gateway_restart_notification from the top-level data dict, ignoring the 'extra' sub-dict where bridged keys are stored. Fix: add the key to the bridging loop, and add an 'extra' fallback in from_dict() so that round-tripped values (YAML → bridged → extra → from_dict) resolve correctly. Impact: users can now set gateway_restart_notification: false per platform in config.yaml instead of relying on env vars or the global platforms: block. * feat(kanban): add auto_promote_children config toggle When the kanban auto-decomposer fans a triage task into child tasks, recompute_ready() immediately promotes parent-free children to 'ready' so the dispatcher picks them up. Some users want a manual workflow where children stay in 'todo' for review before dispatch. Add 'kanban.auto_promote_children' config key (default: true): - false: children stay in 'todo' after decomposition - true: existing behavior (auto-promote to 'ready') Changes: - kanban_db.py: decompose_triage_task() gains auto_promote param - kanban_decompose.py: reads auto_promote_children from config - kanban dashboard API: exposes the new setting in GET/PUT /orchestration Closes #28016 * fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace The conversation_loop.py references _pool_may_recover_from_rate_limit which was defined in run_agent.py. After the conversation-loop extraction refactor, the helper was no longer in the same module scope. Wrap the call as _ra()._pool_may_recover_from_rate_limit() to route through the run_agent monkeypatch namespace where the helper is available. Adds regression test in test_gemini_fast_fallback.py. Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError. * fix(tui): improve charizard completion menu contrast * docs(windows): avoid piping installer directly into iex * fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without enforcement steering — agents narrate "calling tool X" without actually emitting a tool call, or run partial loops. Both model families fit the same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for (gpt, codex, gemini, gemma, grok, glm). Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Squashed salvage of: - 403e567ce fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS - 9433eabe7 test(agent): use realistic qwen-plus identifier in enforcement test Fixes #28079. * fix(send_message): resolve Slack user IDs to DM channel IDs The _SLACK_TARGET_RE regex only matched IDs starting with C (channel), G (group), or D (direct message). Slack user IDs start with U, causing 'Could not resolve' errors when trying to send DMs to specific users. Changes: - Expand _SLACK_TARGET_RE to accept U-prefixed IDs (user IDs) - Add conversations.open fallback to resolve user IDs to DM channel IDs before sending, since chat.postMessage requires a conversation ID Fixes #ISSUE_NUMBER * fix(gateway): tighten MEDIA extraction regex + silent skip on file-not-found Three related fixes for the MEDIA:<path> extraction pipeline that caused 'file not found' noise in platform channels: 1. run.py — tighten tool-result MEDIA regex from \S+ (any non- whitespace) to require a path pattern with known extensions. Prevents LLM-generated placeholder paths like 'MEDIA:/path/to/example.mp4' from being captured as real media. 2. base.py — remove the |\S+ fallback in extract_media() that catches anything non-whitespace as a potential MEDIA path. This was the primary cause of false positives — strings like '' in tool output were captured as MEDIA: paths. 3. mattermost.py — replace the file-not-found error message sent to the channel with a silent logger.warning() skip. When a path extracted by MEDIA doesn't exist on disk, the channel no longer gets a noisy '(file not found: ...)' message. Impact: eliminates the persistent 'file not found' spam in Mattermost channels caused by over-broad MEDIA regex patterns matching non-path text in tool output. * fix(xai-oauth): split 403 (tier/entitlement) from 400/401 in token endpoint xAI's token endpoint returns HTTP 403 to the OAuth grant when the account isn't on the allowlist for API access (e.g. standard SuperGrok subscribers — see #26847). Treating it like a stale-token 400/401 made ``format_auth_error`` append "Run ``hermes model`` to re-authenticate", which is misleading because re-login can't change xAI's tier decision. Split 403 off in both ``refresh_xai_oauth_pure`` and the loopback login token exchange: * New error code ``xai_oauth_tier_denied`` with ``relogin_required=False`` * Message explains the entitlement gate and points at the ``XAI_API_KEY`` + ``provider: xai`` fallback * 400/401 still set ``relogin_required=True`` as before * 5xx still set ``relogin_required=False`` as before * fix(run-agent): treat any 403 on xai-oauth as entitlement to stop refresh-loop The existing ``_is_entitlement_failure`` heuristic only fires when the response body contains specific substrings ("do not have an active Grok subscription", etc.). xAI has been seen to 403 standard SuperGrok subscribers with a terser body that doesn't match those keywords (#26847), and the recovery path would then mint a fresh token, get a fresh 403, and loop until Ctrl+C. Add a defense-in-depth check at the recovery call site: any 403 on ``provider == "xai-oauth"`` short-circuits ``try_refresh_current`` so the error surfaces immediately with the friendly hint from ``_summarize_api_error``. Keeps the existing keyword path for all other providers untouched. * test(xai-oauth): pin tier-denied 403 behavior + docs warning for #26847 Tests: * ``test_refresh_xai_oauth_pure_403_marked_tier_denied_not_relogin`` — refresh-403 raises ``xai_oauth_tier_denied`` with ``relogin_required=False`` and the API-key fallback hint in body. * ``test_format_auth_error_tier_denied_does_not_suggest_relogin`` — the renderer does not append "Run ``hermes model``" for the new code. * ``test_recover_with_credential_pool_skips_refresh_on_bare_403_for_xai_oauth`` — bare ``{"reason":"forbidden","message":"Forbidden"}`` body (which does not match the existing keyword heuristic) still short-circuits ``try_refresh_current`` on xai-oauth. Docs: * Drop the "(any active tier)" claim from the xai-grok-oauth guide, add a top-of-page warning callout, and a Troubleshooting section for the 403-after-login case pointing at ``XAI_API_KEY`` + ``provider: xai`` as the documented fallback. * fix: handle whitespace-only cron responses * fix(cli): preserve cron asterisks in strip mode * fix(mattermost): resolve thread root_id and route progress to threads Two Mattermost thread-related bugs: 1. _resolve_root_id() — Mattermost CRT requires root_id to be the thread root post. Using any reply's own ID as root_id causes '400 Invalid RootId'. Add _resolve_root_id() that walks up the post chain via API to find the actual root, and apply it in send(), _send_url_as_file(), and _send_local_file(). 2. _progress_reply_to — The condition in run.py only checked Platform.FEISHU, missing Mattermost entirely. This caused tool progress messages to always land in the main channel instead of the thread. Add Platform.MATTERMOST to the condition so progress messages are routed to threads when reply_mode=thread. Impact: Tool progress messages now appear in Mattermost threads instead of flooding the main channel; thread replies no longer fail with Invalid RootId when the reply target is itself a reply. * feat(kanban): archive --rm to hard-delete archived tasks Salvages #19964 by @Beandon13. Adds `hermes kanban archive --rm` to permanently remove already-archived tasks with cascading cleanup of links, comments, events, runs, and notify-subs. Safety guard: only archived tasks can be deleted; active/blocked/done must be archived first. Cherry-picked from #19964 onto current main (severe stale base, applied manually to preserve substance only). * feat(proxy): add xai upstream adapter for Grok via OAuth * chore(release): map @yannsunn for PR #28064 xai proxy adapter salvage * docs(skill): align kanban dispatcher failure_limit text with current default * fix(oauth): add manual-paste fallback for browser-only remote consoles xAI Grok OAuth (and Spotify) use a loopback redirect to ``http://127.0.0.1:<port>/callback`` to capture the authorization code. That works when the browser and Hermes run on the same machine, and the SSH tunnel recipe handles the regular remote case. It breaks completely on **browser-only remote consoles** (GCP Cloud Shell, GitHub Codespaces, AWS EC2 Instance Connect, Gitpod, Replit, …) where the user has a browser but no real SSH client to forward a port — the redirect to 127.0.0.1 on the remote VM simply isn't reachable from the laptop, and there's nothing the existing flow can do about it (#26923). This commit adds the foundation for a manual-paste fallback: * ``_is_remote_session`` now also recognises Cloud Shell, Codespaces, Gitpod, Replit, StackBlitz (in addition to SSH), so the existing tunnel hint at least fires in those environments. * ``_parse_pasted_callback`` accepts any of: a full ``http(s)://...?code=...&state=...`` URL, a bare ``?code=...`` query string, a bare ``code=...&state=...`` fragment, or a bare opaque code value. Returns the same dict shape the HTTP callback handler produces, so the caller's state / error validation works unchanged (no CSRF bypass). * ``_prompt_manual_callback_paste`` reads stdin with a clear multi-line explanation of what's happening and what to paste. * ``_xai_oauth_loopback_login`` gains a ``manual_paste`` kwarg that skips the HTTP listener entirely. The redirect_uri, PKCE verifier, state, and nonce are byte-identical to the loopback path so xAI's token endpoint can't tell the difference at the protocol level. * ``_print_loopback_ssh_hint`` now also mentions ``--manual-paste`` so users without a real SSH client see a path forward instead of a dead-end tunnel recipe. * ``_login_xai_oauth`` threads ``args.manual_paste`` into the loopback helper. * feat(cli): wire --manual-paste into ``hermes auth add`` and ``hermes model`` Register the new ``--manual-paste`` flag on both entry points and thread it through to the xAI loopback login: * ``hermes auth add xai-oauth --manual-paste`` — pool-add path, forwarded inside ``auth_commands.handle_auth_add``. * ``hermes model --manual-paste`` — model-picker path, forwarded by ``_model_flow_xai_oauth`` into the synthetic ``argparse.Namespace`` it passes to ``_login_xai_oauth``. The picker also now forwards ``--no-browser`` and ``--timeout`` for consistency (previously hardcoded to defaults regardless of CLI flags). Help text on both flags points at #26923 and names the browser-only remote consoles (Cloud Shell, Codespaces, EC2 Instance Connect) so users searching ``hermes --help`` can find the workaround. * test+docs(oauth): pin manual-paste semantics and document browser-only path (#26923) Tests (``tests/hermes_cli/test_auth_manual_paste.py``): * 9 parametrised + scalar cases for ``_is_remote_session`` covering the new Cloud Shell / Codespaces / Gitpod / Replit / StackBlitz env vars (plus the existing SSH ones). * 9 cases for ``_parse_pasted_callback`` covering every paste form (full URL, https URL with extra params, bare ``?code=...``, bare ``code=...`` fragment, bare opaque value, error+description, empty, whitespace-only, malformed URL). * 3 cases for ``_prompt_manual_callback_paste`` (happy path, EOF, Ctrl-C). * 3 end-to-end ``_xai_oauth_loopback_login(manual_paste=True)`` cases: the HTTP server MUST NOT be started (asserted via a callable that raises if invoked), wrong state still rejected with ``xai_state_mismatch`` (no CSRF bypass), and empty paste surfaces ``xai_code_missing``. * SSH-hint mention test ensures the ``--manual-paste`` instruction is printed in the remote-session hint. Docs: * ``oauth-over-ssh.md`` — new "Browser-only remote (Cloud Shell / Codespaces / EC2 Instance Connect)" section with the ``--manual-paste`` recipe, plus a TL;DR note for the new flag. * ``xai-grok-oauth.md`` — short subsection pointing at the same recipe and the OAuth-over-SSH guide anchor. * docs(kanban): document max-retries task override * docs(kanban): document inline create shortcuts * test(kanban): cover default board dashboard pin * docs: ignore box diagrams in ascii guard Wrap existing box-drawing diagrams with ascii-guard markers so docs-site checks pass when website docs are touched. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: per-task model override for kanban workers - Add model_override field to Task class and tasks schema - Add migration for existing databases - Spawn worker with -m model when model_override is set * test(kanban-dashboard): cover _task_dict task_age fallback The fix in 061a1830 added an outer try/except in plugin_api._task_dict so that a future failure mode in kanban_db.task_age (anything _safe_int doesn't already absorb) cannot 500 the GET /board response. The _safe_int / task_age corruption paths got regression coverage in tests/hermes_cli/test_kanban_db.py, but the OUTER fallback contract remained untested -- meaning a refactor that drops the try/except would not be caught by CI. Pin that contract from both consumers of _task_dict: - GET /board returns 200 with the literal fallback age dict for the affected card (other cards continue to render via the same path) - GET /tasks/:id (drawer view) returns 200 with the same fallback, so a single corrupt task can't block its own drawer Both tests force task_age to raise RuntimeError rather than ValueError on '%s', because ValueError is absorbed by _safe_int and never reaches the outer try/except -- testing that path would only re-cover what test_kanban_db.py already pins. Manually verified the regression discipline: git checkout 061a1830^ -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both FAIL with 500 git checkout HEAD -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both PASS * fix(kanban): clear _INITIALIZED_PATHS in remove_board so recycled DBs re-init schema Archiving or deleting a board via remove_board() leaves the path's "schema already initialized" entry in the module-level cache. A concurrent connect(board=<slug>) call (e.g. the dashboard event-stream poll loop) then: 1. resolves the same kanban.db path, 2. recreates the directory + an empty sqlite file because connect() does mkdir(parents=True, exist_ok=True), 3. skips the CREATE TABLE pass because the cache entry says the schema is already in place, 4. errors on the next read with `no such table: task_events`. Drop the cache entry before mutating the filesystem so the fresh file gets a proper schema init on next connect(). Applies to both archive=True (rename) and archive=False (rmtree) branches. Fixes #23833. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): add Cache-Control: no-store to plugin static file serving Prevents browser caching of stale dashboard plugin JS files that may contain bugs already fixed upstream (e.g. COLUMN_LABEL undefined). * fix(kanban): seed bundled skills (e.g. kanban-worker) on kanban init Closes #23725 * fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards * fix(kanban): keep board-management commands independent from board override * fix(kanban): preserve notifier_profile for dashboard home subscriptions * fix(kanban): promote dependents when a parent is archived * fix(cli): make kanban specify max_tokens configurable * fix(kanban): sync slash subcommands with live parser * fix(kanban): promote blocked tasks when parent dependencies complete recompute_ready only scanned 'todo' tasks for promotion, ignoring 'blocked' tasks entirely. When a task was blocked (e.g. by the circuit breaker) and its parent dependencies later completed, the task stayed stuck in 'blocked' forever unless manually unblocked. Now recompute_ready also scans 'blocked' tasks. When all parents are done/archived, the blocked task is promoted to 'ready' with failure counters reset — equivalent to an automatic unblock. Includes a regression test for the blocked-parent-done promotion path. * fix(kanban): use 'is not None' check for max_runtime_seconds in create_task max_runtime_seconds=0 was being silently coerced to None due to a falsy check (if max_runtime_seconds). Zero is a valid value that causes the dispatcher to immediately time out a task. The adjacent max_retries parameter already used the correct 'is not None' pattern. Fixes the inconsistency by aligning max_runtime_seconds with max_retries. * fix(kanban): reset failure counters on unblock_task When a task is manually unblocked (blocked → ready/todo), the consecutive_failures counter and last_failure_error were left intact. The next failure would immediately re-trip the circuit breaker because the counter was still at or above the failure limit. Reset both fields on unblock so the task gets a fresh retry budget. Includes a regression test that verifies counters are zeroed. * fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion When a systemic failure (provider outage, auth expiry, OOM) crashes multiple workers simultaneously, detect_crashed_workers increments each task failure counter independently. The circuit breaker only trips after N × failure_limit retries across the fleet. Fingerprint crash errors by normalizing host-specific details (PIDs, timestamps). When 3+ tasks crash with the same fingerprint in a single detection cycle, immediately trip the circuit breaker (failure_limit=1) instead of waiting for repeated failures. Isolated crashes (unique fingerprints) retain their normal retry budget. Protocol violations continue to trip immediately. Includes regression tests for systemic and isolated crash paths. * fix(kanban): align board_exists with board discovery rules * fix(kanban): demote ready children when a parent is reopened * fix(kanban): serialize DB initialization * fix(kanban): task_age() tolerates ISO-8601 timestamps Prevents ValueError crash in dashboard get_board() when a task has an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch int. Adds _to_epoch() helper that normalises both formats. * Fix Kanban dashboard initial board selection * fix(kanban): persist worker session metadata on completion Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive commit (not the AUTHOR_MAP fixup tip) onto current main. * fix(kanban): make claim ttl configurable Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(kanban): pass accept-hooks to worker chat subprocess * feat(kanban): add board-level default workdir (#25430) * docs(kanban-worker): document notification routing configuration * fix(kanban): preserve worker tools with restricted toolsets * fix(kanban): make legacy task migration idempotent (cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90) * fix: harden Kanban worker Hermes command resolution * feat(kanban): allow trimmed task comments SS-1647 live SHIP validation: real code + tests for kanban comment --max-len. * fix: show scheduled kanban tasks in dashboard * fix: assign single-task kanban decompositions * fix(kanban-dashboard): make Orchestration mode checkbox label static The checkbox label echoed its state ("Auto (default)" / "Manual") instead of describing the action, so a checked box reading "Auto" parsed as a status indicator rather than a control. The accompanying sub-description was also static and started with "When on, ...", which read awkwardly when the box was unchecked. Replace the dynamic label with a static action label ("Auto-decompose triage tasks") and flip the sub-description between the two modes so it stays accurate either way. The top-of-page Orchestration pill is unchanged — that one is intentionally a status badge / toggle. Fixes #28178 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956) Salvages the env-vars docs portion of #21956 by @Bartok9. The ascii-guard-ignore tags from the original PR already landed on main. * fix(kanban): close sqlite connection on init failure to prevent fd leak Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema init raises after sqlite3.connect() succeeds, the new connection was leaking. Wrap the body in try/except so the connection is closed before the exception propagates. * fix(kanban): don't crash dispatched workers when kanban-worker skill is absent Salvages #27372 by @oemtalks. The dispatcher unconditionally injected `--skills kanban-worker` into every worker spawn, but worker profiles sometimes don't have that bundled skill in their skills dir, which is fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`). Adds `_kanban_worker_skill_available(hermes_home)` and only injects the flag when the skill resolves. The MANDATORY lifecycle still ships via KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe. * fix(packaging): ship dashboard plugin assets in wheel Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/ glob entries to setuptools package-data so wheel installs ship the bundled dashboard plugin assets (kanban, achievements, etc.). Without these, /api/dashboard/plugins can't discover plugin assets outside a source checkout. * docs(kanban): document worker protocol auto-blocks Salvages #21585 by @helix4u. Documents the protocol_violation event (worker exits successfully while task is still running), adds --max-retries to the create flag list and --failure-limit to dispatch. * fix(oneshot): pass fallback_providers from profile config to AIAgent Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers spawned via 'hermes -p <profile> chat -q ...') were not honouring the profile's fallback_providers / fallback_model chain because oneshot.py never read the config and never passed fallback_model= to AIAgent. Reads cfg.get('fallback_providers') (new list format) or cfg.get('fallback_model') (legacy single-dict) with the same normalization cli.py applies, then forwards as fallback_model=_fb. * fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry. * feat(kanban): add initial-status for human-ops cards Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag (running|blocked, default running) to 'kanban create', threaded through kanban_db.create_task() and the kanban_create tool schema. 'blocked' parks the task directly in the blocked column for R3 human-ops review, skipping the brief running-to-blocked transition. Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and slash-handler error formatting changes that were bundled in the original PR — those should ship as their own focused changes if still wanted. * fix(kanban): release scratch workspace and tmux session on task completion Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace() and _cleanup_worker_tmux() after marking a task complete. Scratch workspaces (used by swarm agents) accumulate on disk — hundreds of MB per task, never released. Stale tmux sessions from completed agents also persist indefinitely. Both gates are safe: - workspace_kind == 'scratch' gate preserves user worktree/dir workspaces - tmux #{pane_dead} == 1 gate only kills sessions where the worker has already exited - best-effort: cleanup failures never block task completion * fix(kanban): honor severity thresholds in diagnostics Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics was using exact-match (severity == filter), so '--severity warning' hid 'error' and 'critical' diagnostics. Adds severity_at_or_above() helper to kanban_diagnostics and uses it in the dashboard endpoint (CLI already used SEVERITY_ORDER comparison correctly). * test: isolate Kanban env pins in hermetic fixture Salvages the substantive part of #22295 by @steezkelly. Adds the missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK, HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so ambient developer-shell pins on those vars don't bleed into pytest runs. The frozenset extraction + standalone regression test from the original PR were dropped to keep the change minimal — main already maintains the list inline. * feat(kanban): add max_in_progress config to cap concurrent running tasks Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). * fix(packaging): ship bundled skills in wheel Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and optional-skills/ because pyproject's [tool.setuptools.packages.find] only includes Python packages — the skills directories don't have __init__.py so they were silently dropped from the wheel. Adds setup.py with data_files spec emitting skills/* and optional-skills/* under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper in hermes_constants that discovers the wheel-installed location via sysconfig before falling back to a source-checkout path. tools/skills_sync uses the helper so 'hermes update' works for pip-installed users. * fix: 4 small surgical bugs Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. * perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init. * feat(kanban): add --sort option to 'hermes kanban list' Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc, priority,priority-desc,status,assignee,title,updated} to 'hermes kanban list'. Validated against VALID_SORT_ORDERS map; invalid values raise ValueError. Default behaviour (priority DESC, created ASC) is unchanged when --sort is omitted. * docs: add kanban codex lane skill * feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map contributor email for attribution check * test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks. * docs(codex_app_server): document multi-root Kanban writable_roots (#27941) Update the Codex app-server runtime guide's Kanban section to reflect the new behaviour: * The sandbox override now adds the board DB directory plus every Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT, HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated, DB-dir first. * The motivation note now includes the cross-mount artifact-write scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate drive) and links to issue #27941 so readers can find the original bug report. * fix(gateway): quiet corrupt kanban dispatcher boards Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board DBs ("file is not a database" / "database disk image is malformed") and disables them by fingerprint until they're repaired, instead of flooding the gateway log with repeated logger.exception tracebacks every tick. Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was an unrelated _is_dir OSError fix for service-path lookup. Dropped a small test reformat that was bundled in the same commit. * docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold. * feat(kanban): surface per-task model_override in show + tool output Salvages #26897 by @loicnico96. The per-task model_override DB column already exists on main, but it wasn't exposed in user-facing surfaces. This adds: - 'kanban show' prints 'model: <name>' when model_override is set - kanban_show / kanban_list tool responses include the model_override field Original branch was stale (PR was authored against an older field name 'model'); applied the substantive surface exposure manually using the current 'model_override' field name. * feat(cli): add kanban swarm topology helper Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a durable Kanban Swarm v1 graph: a completed root/blackboard card, parallel worker cards, a verifier gated on all workers, and a synthesizer gated on the verifier. Stores shared swarm blackboard updates as structured JSON comments on the root card. Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring + unit tests. * feat(kanban): add optional board parameter to all MCP tools Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9 kanban_* MCP tools via shared _connect helper. Backwards compatible — omitting board keeps current pinned-board behavior. Useful for orchestrator profiles that route across multiple boards. Two-file scope: tools/kanban_tools.py + tests. * feat(kanban): stamp originating ACP session_id on tasks Salvages #23208 by @awizemann. Tracks which chat session created a kanban task so clients can render a per-session board without falling back to tenant + time-window heuristics. - Schema: tasks gains nullable session_id TEXT column with index (additive migration in _migrate_add_optional_columns). - ACP: server.py exposes the originating session id via HERMES_SESSION_ID with save/restore around the agent loop. - Tool: kanban_create reads HERMES_SESSION_ID (with explicit override). - CLI: 'hermes kanban list --session <id>' filter; JSON output exposes session_id. * feat(kanban): wire dispatcher to dispatch review agents from review column Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). * feat(kanban): stale detection for running tasks in dispatcher Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. * feat(kanban): filter tasks by workflow fields and runs by status/outcome Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing workflow_template_id and current_step_key columns: - list_tasks() accepts workflow_template_id and current_step_key kwargs - 'hermes kanban list' adds matching CLI flags - dashboard plugin_api also exposes the filters Resolved a small conflict in list_tasks signature alongside main's session_id and order_by additions; combined all three into the single filter list. * feat(kanban): add respawn guard to block repeat worker storms Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker spawn for tasks where: - a recent run already succeeded (recent_success — within guard window) - the previous run hit a quota/auth error (blocker_auth, also auto-blocks) - a recent task comment includes a GitHub PR URL (active_pr) The guard prevents repeat worker storms on the same bug/task. Includes the contributor's review-findings fixup (regex hardening, observability, auth coverage). Resolved a small DispatchResult conflict alongside main's 'stale' field; kept both. Authorship preserved via rebase merge. * feat(kanban): show dashboard cron jobs across profiles Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron jobs from all profiles, with profile-aware filter UI and storage routing. Includes test coverage for cross-profile listing, mutation, deletion, and validation. Also fixes orphan conflict markers in config.py left by an earlier salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested in HEAD/PR markers from #28452 salvage of #23790). * fix(kanban): remove orphan conflict markers from config.py (#28458) PR #28452 (salvage of #23790, stale detection) merged with leftover git conflict markers in hermes_cli/config.py around the `dispatch_stale_timeout_seconds` config block, breaking config import and any code path that loads it. Cleans up the markers and keeps both config blocks (worker log rotation/orchestrator + stale detection). Resolves a self-introduced regression. * fix(kanban): remove orphan conflict markers from kanban.py (#28459) PR #28454 (salvage of #26745, workflow filter) merged with leftover git conflict markers in hermes_cli/kanban.py at three sites: - _task_to_dict() (session_id alongside workflow_template_id/current_step_key) - p_list parser (--sort alongside --workflow-template-id/--step-key) - _cmd_list (order_by alongside the new filter kwargs) Cleans up the markers and keeps both halves at each site. Resolves a self-introduced regression. * feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: | Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section. * feat(kanban): add scheduled status for delayed follow-ups Salvages #24533 by @roycepersonalassistant. Adds a first-class 'scheduled' Kanban status for time-delay follow-ups that aren't waiting on human input. - hermes kanban schedule <task_id> [reason] CLI command - Dashboard/API transitions to/from Scheduled - unblock_task() now releases both 'blocked' AND 'scheduled' tasks (re-checking parent dependencies before moving to ready/todo) - i18n + docs updates Resolved conflicts: kept HEAD's failure-counter reset on unblock alongside the PR's scheduled state, kept HEAD's 'running' direct-set rejection, combined both bulk-status branches. Dropped the dist/ bundle changes (months-stale; would need rebuild from source). * feat(kanban): drag-to-delete trash zone + bulk delete for task cards Salvages #28125 by @Jpalmer95. Adds: - Drag-to-delete trash zone in the kanban dashboard - Bulk delete endpoint with cascading delete_task cleanup - Frontend updates (drag visual + drop handler) - Confirmation prompt before delete Resolved end-of-file test conflict by appending both halves. * docs: add Korean Kanban documentation Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and translates Kanban documentation (kanban.md, kanban-tutorial.md) and the two related skills (devops-kanban-orchestrator, devops-kanban-worker). Purely additive — adds ko to the locales list in docusaurus.config.ts and creates the website/i18n/ko/ tree. * fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) - aux_config: drop session_search from _AUX_TASKS and remove stale test (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG) - compression_boundary_hook: set compressor._last_compress_aborted=False on MagicMock so the post-compress abort branch (PR #28117) doesn't short-circuit before the session-id rotation under test - kanban_dashboard_plugin: use consecutive_failures=3 so severity stays 'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a, so failures=5 now crosses the critical floor of 2*2=4) - cli_manual_compress: accept force kwarg on DummyAgent._compress_context (cli._manual_compress now passes force=True) * fix(telegram): render full clarify choice text in message body, use short button labels When Telegram clarify prompts offer long choices, mobile clients truncate the inline button labels, making options unreadable. Previously only the question was shown in the message body with truncated choice text in button labels. Fix: append the full numbered option list to the message body so users can read complete choice text on any client. Buttons now use short numeric labels (1, 2, ...) to avoid Telegram truncation. The 'Other (type answer)' button is unchanged. Long choice labels are now rendered in full (not truncated to 57 chars + '...') since they appear in the body instead of button labels. Closes: #27497 * chore(release): map @asdlem for PR #27852 salvage * fix(telegram): default streaming transport to edit * fix(telegram): respect reply_to_mode for DM topic reply fallback The DM topic reply fallback code in send() hardcoded should_thread=True when telegram_dm_topic_reply_fallback metadata was present, bypassing _should_thread_reply() and ignoring reply_to_mode config. This caused quote bubbles on every response even with reply_to_mode: 'off'. Fix: - Add reply_to_mode param to _reply_to_message_id_for_send() and _thread_kwargs_for_send() classmethods - In send(), check self._reply_to_mode != 'off' for DM topic fallback - Suppress reply anchor and reply_to_message_id when mode is 'off' while preserving message_thread_id for correct topic routing - Thread reply_to_mode through all 29 call sites Regression coverage: 10 new tests in test_telegram_reply_mode.py covering classmethod behavior, send() integration, and backward compatibility. Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994 * fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870) Telegram distinguishes three kinds of audio payloads: - message.voice → Opus/OGG voice messages → STT pipeline ✓ - message.audio → audio file attachments → bypasses STT ← was broken - message.document (audio mime) → generic file route **Root cause** — the inbound message routing block in gateway/run.py matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription. Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed instead of being treated as files, making the transcribe skill unusable from Telegram because the path it needed was never surfaced. **Fix** - Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events. - Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ mime-type events that are not explicitly AUDIO or DOCUMENT). - After the STT block, inject a document-style context note for each audio_file_path, giving the agent the file path and asking what to do with it (consistent with how plain documents are handled). **Tests** — 5 new tests in test_telegram_audio_vs_voice.py: - voice message still transcribed (regression guard) - audio attachment skips STT (core fix) - audio attachment context note format - STT disabled still produces file note (not STT-disabled notice) - MessageType.AUDIO != MessageType.VOICE sanity check Fixes #24870 * chore(release): map bartok9 noreply for PR #24879 salvage * fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY When the send_message tool runs outside the gateway process (agent loop, TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone path in _send_telegram constructs Bot(token=token) directly, bypassing any configured proxy. In regions where api.telegram.org is blocked, the send times out after ~5s with 'Telegram send failed: Timed out' and nothing ever shows up in gateway.log because the request never reaches the gateway. Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url, which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just before constructing the Bot. When a proxy is found, attach an HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request', matching what gateway/platforms/telegram.py already does for in-gateway sends and what the Discord standalone sender already does. Any exception attaching the proxy falls back cleanly to a direct connection, preserving prior behaviour for users without a proxy configured. Adds tests/tools/test_send_message_telegram_proxy.py covering both the proxy-configured and no-proxy cases. * chore(release): map @pepelax for PR #25419 salvage * fix(kanban-dashboard): restore implementations dropped during salvages (#28481) Four kanban dashboard test failures, all from PR salvages that picked up the test additions but dropped the corresponding implementations. - BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the board API never grew the column → test_board_empty failed because VALID_STATUSES - {archived} mismatched the rendered columns). - update_task: enrich the 'ready' 409 detail with the blocking parent list (id, title, status) and add _parents_blocking_ready helper. Implementation lost in the #26744 salvage (commit e215558ba) which pinned the test but not the server-side code. - dist/index.js: add parseApiErrorMessage helper, wire it through the drag/drop banner, add patchErr state to the TaskDrawer and surface it inline by the action row. Lost in the same #26744 salvage. - test_diagnostics_endpoint_severity_filter: update to at-or-above semantics (PR a94ddd807 changed the filter from exact-match so the warning filter now correctly includes error+critical too). * fix(gateway): roll over Telegram tool progress bubbles * fix(gateway): scope audio_file_paths outside media_urls guard The audio-file-paths handling block at line 7334 references the variable unconditionally, but #24879 initialized it inside the 'if event.media_urls' block — so events without media_urls hit UnboundLocalError. Found via test_run_agent_queued_message_does_not_treat_commentary_as_final after PR #28478 landed. * fix(gateway): keep tool-progress edits alive after Telegram flood control When a progress-message edit hits Telegram flood control (RetryAfter), can_edit was unconditionally set to False, permanently disabling coalescing for the rest of the run. Subsequent tool updates were posted as separate new messages instead of updating the existing progress bubble. Fix: only set can_edit=False for non-recoverable edit errors. On flood control, back off by resetting _last_edit_ts so the throttle interval is respected before the next edit attempt. Fixes #25188 * chore(release): map @erhnysr for PR #25198 salvage * fix(telegram): preser…

* fix(approval): surface pending-approval state with explicit marker visible to LLM When a tool call requires user approval in the non-blocking gateway path, the LLM previously received a result that was indistinguishable from a failed tool call (exit_code=-1, error=message). The LLM could not tell whether the tool was pending approval, had returned empty results, or had failed silently — causing it to burn context on wrong hypotheses. Fix changes the result format to include: - status: pending_approval (clear state name) - approval_pending: True (explicit boolean for LLMs to detect) - error: cleared to empty string (removes misleading error signal) This lets the LLM reason about approval latency vs actual errors, short-circuiting the previous silent failure mode. Fixes #14806 * fix: recognize emoji and caret as natural response endings GLM models via Ollama report finish_reason='stop' even when the response was truncated by max_tokens. The continuation mechanism uses _has_natural_response_ending() as one of the heuristics to detect whether the response was genuinely finished. Currently only ASCII punctuation and CJK punctuation are recognized. This means any response ending with an emoji (e.g. ⚡, 👍) or the caret character ^ (common in French ^^ smiley) is not recognized as naturally ended, triggering a false-positive continuation where the model receives 'Continue where you left off' and produces garbled output. Add: - ^ (caret) to the punctuation set - Unicode emoji range (codepoint >= 0x1F300) as natural ending This only affects GLM/Ollama users but the fix is safe for all backends since _has_natural_response_ending() is only consulted inside the continuation flow. * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 8 (#28328) Pre-stages AUTHOR_MAP entries for 10 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 8). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - AceWattGit (#28159 — _pool_may_recover_from_rate_limit NameError) - YuanHanzhong (#28032 — x.com/status fallbacks link-like) - colin-chang (#28245, #28249, #28251 — gateway + mattermost fixes) - felix-windsor (#28019 — preserve cron asterisks in strip mode) - houenyang-momo (#28205 — charizard completion menu contrast) - iqdoctor (#28095 — windows installer docs) - joe102084 (#28151 — whitespace-only cron responses) - jvinals (#27936 — Slack U-IDs → DM channel) - maxmilian (#28267 — ModelPickerDialog portal) - samggggflynn (#27952 — dingtalk pre_start) Per references/batch-pr-salvage-may14-additions.md. * fix: add pre_start() to _IncomingHandler for dingtalk SDK compatibility The dingtalk-stream SDK calls pre_start() on every registered handler before opening the WebSocket connection. Without this method, the SDK raises AttributeError and kills the stream connection, causing DingTalk to be unable to connect via Stream Mode. * fix(windows): handle redirected stdout in _cprint fallback Wraps _pt_print in try/except with a print() fallback. When a kanban worker's stdout is piped to a log file, prompt_toolkit raises NoConsoleScreenBufferError (Windows) or OSError (other) because there is no real console buffer. The fallback keeps worker output flowing instead of crashing. * chore(release): alias stale-ID salvage commit for @Grogger (#28334) PR #28330 was salvaged with a wrong noreply numeric ID (18091625 vs the correct 7065068). The commit on main is correctly authored to Grogger by username, but neither noreply form was in AUTHOR_MAP. Adds both so release-notes generation maps them to @Grogger. * fix(aux): remove stale session_search model menu entry * fix(tui): keep x status citation fallbacks link-like * fix(xai-oauth): quarantine dead tokens on terminal refresh failure resolve_xai_oauth_runtime_credentials() called _refresh_xai_oauth_tokens() with no try/except. A terminal refresh failure (HTTP 400/401/403 — invalid_grant, token revoked) propagated without clearing the dead access_token / refresh_token from auth.json, causing every subsequent session to retry the same doomed network request. Add a try/except around the refresh call that mirrors the existing credential_pool.py quarantine: when _is_terminal_xai_oauth_refresh_error identifies a non-retryable failure, clear the dead token fields from auth.json and write a last_auth_error diagnostic marker so future calls fail fast with a clear relogin_required error instead of hitting the network. active_provider is preserved (set_active=False) so multi-provider users whose chosen provider is not xai-oauth are unaffected. Tests: two new cases in test_auth_xai_oauth_provider.py cover terminal quarantine and transient pass-through. * feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644) The background review prompts (_SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT) now include explicit protection rules for bundled, hub-installed, and pinned skills — aligning with the curator's existing policy at curator.py L345/350. Before this change, bg-review could freely rewrite bundled skills like 'hermes-agent' or pinned skills, while the 7-day curator explicitly skips them. The review agent now sees: • Bundled skills (shipped with Hermes) • Hub-installed skills (installed via hermes skills install) • Pinned skills (marked via hermes curator pin) If only protected skills need updating, the review says 'Nothing to save.' and stops. Fixes #27644 * fix(web): portal Change Model modal so it renders above the app sidebar The dashboard's main column is `relative z-2` (App.tsx), which creates a stacking context that traps fixed descendants below the app sidebar (`z-50`). `ModelPickerDialog` renders `fixed inset-0 z-[100]` inline, so its z-100 is scoped to z-2 and the sidebar covers its left edge. The bug is visible across all themes but only obvious in the Large theme variants (Hermes Teal (Large), etc.) where the larger root font widens the dialog into the sidebar's column. Toast.tsx already documents the same trap and uses the same `createPortal(..., document.body)` escape. This commit ports the picker; the same pattern affects other inline z-[100] modals in the dashboard (OAuthLoginModal, Cron / Models / Profiles page modals) and is left for a follow-up — keeping this PR scoped to the reporter's specific case. Fixes #28103 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): exit code 75 on service restart so launchd relaunches When the gateway receives SIGUSR1 (graceful restart via launchd_restart), the SIGUSR1 handler calls request_restart(via_service=True) and the gateway shuts down cleanly with exit code 0. However, the generated launchd plist uses KeepAlive → SuccessfulExit → false, meaning launchd only relaunches on *non-zero* exit codes. A clean exit(0) is treated as "successful, don't restart", so the gateway stays down after /restart, /update, or SIGUSR1. The systemd unit template already uses RestartForceExitStatus=75 for the same scenario. Mirror that convention: when _restart_via_service is True, raise SystemExit(75) so launchd's SuccessfulExit=false policy triggers a relaunch. Closes #28135 * fix: guard json.loads() against invalid TTS and skill_view responses Two code paths call json.loads() on output from external tools without catching JSONDecodeError. If the tool returns a non-JSON string (error message, empty string, or None), the entire call path crashes. 1. gateway/run.py — text_to_speech_tool() result in voice reply path. A TTS failure that returns an error string instead of JSON crashes the voice reply handler, killing the message response entirely. 2. cron/scheduler.py — skill_view() result when loading skills for cron jobs. A corrupted or missing skill file that returns an error string instead of JSON crashes the cron tick, preventing all jobs from executing that cycle. Both fixes catch (json.JSONDecodeError, TypeError), log a warning, and gracefully skip the failed operation instead of crashing. * fix(gateway): bridge gateway_restart_notification from YAML platform sections Two related bugs in gateway/config.py prevented per-platform gateway_restart_notification from working through config.yaml: 1. The shared-key bridging loop (load_gateway_config) omitted 'gateway_restart_notification', so the key never landed in platform_data['extra'] even when set under e.g. 'discord:' or 'mattermost:' sections. 2. PlatformConfig.from_dict() only read gateway_restart_notification from the top-level data dict, ignoring the 'extra' sub-dict where bridged keys are stored. Fix: add the key to the bridging loop, and add an 'extra' fallback in from_dict() so that round-tripped values (YAML → bridged → extra → from_dict) resolve correctly. Impact: users can now set gateway_restart_notification: false per platform in config.yaml instead of relying on env vars or the global platforms: block. * feat(kanban): add auto_promote_children config toggle When the kanban auto-decomposer fans a triage task into child tasks, recompute_ready() immediately promotes parent-free children to 'ready' so the dispatcher picks them up. Some users want a manual workflow where children stay in 'todo' for review before dispatch. Add 'kanban.auto_promote_children' config key (default: true): - false: children stay in 'todo' after decomposition - true: existing behavior (auto-promote to 'ready') Changes: - kanban_db.py: decompose_triage_task() gains auto_promote param - kanban_decompose.py: reads auto_promote_children from config - kanban dashboard API: exposes the new setting in GET/PUT /orchestration Closes #28016 * fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace The conversation_loop.py references _pool_may_recover_from_rate_limit which was defined in run_agent.py. After the conversation-loop extraction refactor, the helper was no longer in the same module scope. Wrap the call as _ra()._pool_may_recover_from_rate_limit() to route through the run_agent monkeypatch namespace where the helper is available. Adds regression test in test_gemini_fast_fallback.py. Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError. * fix(tui): improve charizard completion menu contrast * docs(windows): avoid piping installer directly into iex * fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without enforcement steering — agents narrate "calling tool X" without actually emitting a tool call, or run partial loops. Both model families fit the same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for (gpt, codex, gemini, gemma, grok, glm). Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Squashed salvage of: - 403e567ce fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS - 9433eabe7 test(agent): use realistic qwen-plus identifier in enforcement test Fixes #28079. * fix(send_message): resolve Slack user IDs to DM channel IDs The _SLACK_TARGET_RE regex only matched IDs starting with C (channel), G (group), or D (direct message). Slack user IDs start with U, causing 'Could not resolve' errors when trying to send DMs to specific users. Changes: - Expand _SLACK_TARGET_RE to accept U-prefixed IDs (user IDs) - Add conversations.open fallback to resolve user IDs to DM channel IDs before sending, since chat.postMessage requires a conversation ID Fixes #ISSUE_NUMBER * fix(gateway): tighten MEDIA extraction regex + silent skip on file-not-found Three related fixes for the MEDIA:<path> extraction pipeline that caused 'file not found' noise in platform channels: 1. run.py — tighten tool-result MEDIA regex from \S+ (any non- whitespace) to require a path pattern with known extensions. Prevents LLM-generated placeholder paths like 'MEDIA:/path/to/example.mp4' from being captured as real media. 2. base.py — remove the |\S+ fallback in extract_media() that catches anything non-whitespace as a potential MEDIA path. This was the primary cause of false positives — strings like '' in tool output were captured as MEDIA: paths. 3. mattermost.py — replace the file-not-found error message sent to the channel with a silent logger.warning() skip. When a path extracted by MEDIA doesn't exist on disk, the channel no longer gets a noisy '(file not found: ...)' message. Impact: eliminates the persistent 'file not found' spam in Mattermost channels caused by over-broad MEDIA regex patterns matching non-path text in tool output. * fix(xai-oauth): split 403 (tier/entitlement) from 400/401 in token endpoint xAI's token endpoint returns HTTP 403 to the OAuth grant when the account isn't on the allowlist for API access (e.g. standard SuperGrok subscribers — see #26847). Treating it like a stale-token 400/401 made ``format_auth_error`` append "Run ``hermes model`` to re-authenticate", which is misleading because re-login can't change xAI's tier decision. Split 403 off in both ``refresh_xai_oauth_pure`` and the loopback login token exchange: * New error code ``xai_oauth_tier_denied`` with ``relogin_required=False`` * Message explains the entitlement gate and points at the ``XAI_API_KEY`` + ``provider: xai`` fallback * 400/401 still set ``relogin_required=True`` as before * 5xx still set ``relogin_required=False`` as before * fix(run-agent): treat any 403 on xai-oauth as entitlement to stop refresh-loop The existing ``_is_entitlement_failure`` heuristic only fires when the response body contains specific substrings ("do not have an active Grok subscription", etc.). xAI has been seen to 403 standard SuperGrok subscribers with a terser body that doesn't match those keywords (#26847), and the recovery path would then mint a fresh token, get a fresh 403, and loop until Ctrl+C. Add a defense-in-depth check at the recovery call site: any 403 on ``provider == "xai-oauth"`` short-circuits ``try_refresh_current`` so the error surfaces immediately with the friendly hint from ``_summarize_api_error``. Keeps the existing keyword path for all other providers untouched. * test(xai-oauth): pin tier-denied 403 behavior + docs warning for #26847 Tests: * ``test_refresh_xai_oauth_pure_403_marked_tier_denied_not_relogin`` — refresh-403 raises ``xai_oauth_tier_denied`` with ``relogin_required=False`` and the API-key fallback hint in body. * ``test_format_auth_error_tier_denied_does_not_suggest_relogin`` — the renderer does not append "Run ``hermes model``" for the new code. * ``test_recover_with_credential_pool_skips_refresh_on_bare_403_for_xai_oauth`` — bare ``{"reason":"forbidden","message":"Forbidden"}`` body (which does not match the existing keyword heuristic) still short-circuits ``try_refresh_current`` on xai-oauth. Docs: * Drop the "(any active tier)" claim from the xai-grok-oauth guide, add a top-of-page warning callout, and a Troubleshooting section for the 403-after-login case pointing at ``XAI_API_KEY`` + ``provider: xai`` as the documented fallback. * fix: handle whitespace-only cron responses * fix(cli): preserve cron asterisks in strip mode * fix(mattermost): resolve thread root_id and route progress to threads Two Mattermost thread-related bugs: 1. _resolve_root_id() — Mattermost CRT requires root_id to be the thread root post. Using any reply's own ID as root_id causes '400 Invalid RootId'. Add _resolve_root_id() that walks up the post chain via API to find the actual root, and apply it in send(), _send_url_as_file(), and _send_local_file(). 2. _progress_reply_to — The condition in run.py only checked Platform.FEISHU, missing Mattermost entirely. This caused tool progress messages to always land in the main channel instead of the thread. Add Platform.MATTERMOST to the condition so progress messages are routed to threads when reply_mode=thread. Impact: Tool progress messages now appear in Mattermost threads instead of flooding the main channel; thread replies no longer fail with Invalid RootId when the reply target is itself a reply. * feat(kanban): archive --rm to hard-delete archived tasks Salvages #19964 by @Beandon13. Adds `hermes kanban archive --rm` to permanently remove already-archived tasks with cascading cleanup of links, comments, events, runs, and notify-subs. Safety guard: only archived tasks can be deleted; active/blocked/done must be archived first. Cherry-picked from #19964 onto current main (severe stale base, applied manually to preserve substance only). * feat(proxy): add xai upstream adapter for Grok via OAuth * chore(release): map @yannsunn for PR #28064 xai proxy adapter salvage * docs(skill): align kanban dispatcher failure_limit text with current default * fix(oauth): add manual-paste fallback for browser-only remote consoles xAI Grok OAuth (and Spotify) use a loopback redirect to ``http://127.0.0.1:<port>/callback`` to capture the authorization code. That works when the browser and Hermes run on the same machine, and the SSH tunnel recipe handles the regular remote case. It breaks completely on **browser-only remote consoles** (GCP Cloud Shell, GitHub Codespaces, AWS EC2 Instance Connect, Gitpod, Replit, …) where the user has a browser but no real SSH client to forward a port — the redirect to 127.0.0.1 on the remote VM simply isn't reachable from the laptop, and there's nothing the existing flow can do about it (#26923). This commit adds the foundation for a manual-paste fallback: * ``_is_remote_session`` now also recognises Cloud Shell, Codespaces, Gitpod, Replit, StackBlitz (in addition to SSH), so the existing tunnel hint at least fires in those environments. * ``_parse_pasted_callback`` accepts any of: a full ``http(s)://...?code=...&state=...`` URL, a bare ``?code=...`` query string, a bare ``code=...&state=...`` fragment, or a bare opaque code value. Returns the same dict shape the HTTP callback handler produces, so the caller's state / error validation works unchanged (no CSRF bypass). * ``_prompt_manual_callback_paste`` reads stdin with a clear multi-line explanation of what's happening and what to paste. * ``_xai_oauth_loopback_login`` gains a ``manual_paste`` kwarg that skips the HTTP listener entirely. The redirect_uri, PKCE verifier, state, and nonce are byte-identical to the loopback path so xAI's token endpoint can't tell the difference at the protocol level. * ``_print_loopback_ssh_hint`` now also mentions ``--manual-paste`` so users without a real SSH client see a path forward instead of a dead-end tunnel recipe. * ``_login_xai_oauth`` threads ``args.manual_paste`` into the loopback helper. * feat(cli): wire --manual-paste into ``hermes auth add`` and ``hermes model`` Register the new ``--manual-paste`` flag on both entry points and thread it through to the xAI loopback login: * ``hermes auth add xai-oauth --manual-paste`` — pool-add path, forwarded inside ``auth_commands.handle_auth_add``. * ``hermes model --manual-paste`` — model-picker path, forwarded by ``_model_flow_xai_oauth`` into the synthetic ``argparse.Namespace`` it passes to ``_login_xai_oauth``. The picker also now forwards ``--no-browser`` and ``--timeout`` for consistency (previously hardcoded to defaults regardless of CLI flags). Help text on both flags points at #26923 and names the browser-only remote consoles (Cloud Shell, Codespaces, EC2 Instance Connect) so users searching ``hermes --help`` can find the workaround. * test+docs(oauth): pin manual-paste semantics and document browser-only path (#26923) Tests (``tests/hermes_cli/test_auth_manual_paste.py``): * 9 parametrised + scalar cases for ``_is_remote_session`` covering the new Cloud Shell / Codespaces / Gitpod / Replit / StackBlitz env vars (plus the existing SSH ones). * 9 cases for ``_parse_pasted_callback`` covering every paste form (full URL, https URL with extra params, bare ``?code=...``, bare ``code=...`` fragment, bare opaque value, error+description, empty, whitespace-only, malformed URL). * 3 cases for ``_prompt_manual_callback_paste`` (happy path, EOF, Ctrl-C). * 3 end-to-end ``_xai_oauth_loopback_login(manual_paste=True)`` cases: the HTTP server MUST NOT be started (asserted via a callable that raises if invoked), wrong state still rejected with ``xai_state_mismatch`` (no CSRF bypass), and empty paste surfaces ``xai_code_missing``. * SSH-hint mention test ensures the ``--manual-paste`` instruction is printed in the remote-session hint. Docs: * ``oauth-over-ssh.md`` — new "Browser-only remote (Cloud Shell / Codespaces / EC2 Instance Connect)" section with the ``--manual-paste`` recipe, plus a TL;DR note for the new flag. * ``xai-grok-oauth.md`` — short subsection pointing at the same recipe and the OAuth-over-SSH guide anchor. * docs(kanban): document max-retries task override * docs(kanban): document inline create shortcuts * test(kanban): cover default board dashboard pin * docs: ignore box diagrams in ascii guard Wrap existing box-drawing diagrams with ascii-guard markers so docs-site checks pass when website docs are touched. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: per-task model override for kanban workers - Add model_override field to Task class and tasks schema - Add migration for existing databases - Spawn worker with -m model when model_override is set * test(kanban-dashboard): cover _task_dict task_age fallback The fix in 061a1830 added an outer try/except in plugin_api._task_dict so that a future failure mode in kanban_db.task_age (anything _safe_int doesn't already absorb) cannot 500 the GET /board response. The _safe_int / task_age corruption paths got regression coverage in tests/hermes_cli/test_kanban_db.py, but the OUTER fallback contract remained untested -- meaning a refactor that drops the try/except would not be caught by CI. Pin that contract from both consumers of _task_dict: - GET /board returns 200 with the literal fallback age dict for the affected card (other cards continue to render via the same path) - GET /tasks/:id (drawer view) returns 200 with the same fallback, so a single corrupt task can't block its own drawer Both tests force task_age to raise RuntimeError rather than ValueError on '%s', because ValueError is absorbed by _safe_int and never reaches the outer try/except -- testing that path would only re-cover what test_kanban_db.py already pins. Manually verified the regression discipline: git checkout 061a1830^ -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both FAIL with 500 git checkout HEAD -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both PASS * fix(kanban): clear _INITIALIZED_PATHS in remove_board so recycled DBs re-init schema Archiving or deleting a board via remove_board() leaves the path's "schema already initialized" entry in the module-level cache. A concurrent connect(board=<slug>) call (e.g. the dashboard event-stream poll loop) then: 1. resolves the same kanban.db path, 2. recreates the directory + an empty sqlite file because connect() does mkdir(parents=True, exist_ok=True), 3. skips the CREATE TABLE pass because the cache entry says the schema is already in place, 4. errors on the next read with `no such table: task_events`. Drop the cache entry before mutating the filesystem so the fresh file gets a proper schema init on next connect(). Applies to both archive=True (rename) and archive=False (rmtree) branches. Fixes #23833. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): add Cache-Control: no-store to plugin static file serving Prevents browser caching of stale dashboard plugin JS files that may contain bugs already fixed upstream (e.g. COLUMN_LABEL undefined). * fix(kanban): seed bundled skills (e.g. kanban-worker) on kanban init Closes #23725 * fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards * fix(kanban): keep board-management commands independent from board override * fix(kanban): preserve notifier_profile for dashboard home subscriptions * fix(kanban): promote dependents when a parent is archived * fix(cli): make kanban specify max_tokens configurable * fix(kanban): sync slash subcommands with live parser * fix(kanban): promote blocked tasks when parent dependencies complete recompute_ready only scanned 'todo' tasks for promotion, ignoring 'blocked' tasks entirely. When a task was blocked (e.g. by the circuit breaker) and its parent dependencies later completed, the task stayed stuck in 'blocked' forever unless manually unblocked. Now recompute_ready also scans 'blocked' tasks. When all parents are done/archived, the blocked task is promoted to 'ready' with failure counters reset — equivalent to an automatic unblock. Includes a regression test for the blocked-parent-done promotion path. * fix(kanban): use 'is not None' check for max_runtime_seconds in create_task max_runtime_seconds=0 was being silently coerced to None due to a falsy check (if max_runtime_seconds). Zero is a valid value that causes the dispatcher to immediately time out a task. The adjacent max_retries parameter already used the correct 'is not None' pattern. Fixes the inconsistency by aligning max_runtime_seconds with max_retries. * fix(kanban): reset failure counters on unblock_task When a task is manually unblocked (blocked → ready/todo), the consecutive_failures counter and last_failure_error were left intact. The next failure would immediately re-trip the circuit breaker because the counter was still at or above the failure limit. Reset both fields on unblock so the task gets a fresh retry budget. Includes a regression test that verifies counters are zeroed. * fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion When a systemic failure (provider outage, auth expiry, OOM) crashes multiple workers simultaneously, detect_crashed_workers increments each task failure counter independently. The circuit breaker only trips after N × failure_limit retries across the fleet. Fingerprint crash errors by normalizing host-specific details (PIDs, timestamps). When 3+ tasks crash with the same fingerprint in a single detection cycle, immediately trip the circuit breaker (failure_limit=1) instead of waiting for repeated failures. Isolated crashes (unique fingerprints) retain their normal retry budget. Protocol violations continue to trip immediately. Includes regression tests for systemic and isolated crash paths. * fix(kanban): align board_exists with board discovery rules * fix(kanban): demote ready children when a parent is reopened * fix(kanban): serialize DB initialization * fix(kanban): task_age() tolerates ISO-8601 timestamps Prevents ValueError crash in dashboard get_board() when a task has an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch int. Adds _to_epoch() helper that normalises both formats. * Fix Kanban dashboard initial board selection * fix(kanban): persist worker session metadata on completion Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive commit (not the AUTHOR_MAP fixup tip) onto current main. * fix(kanban): make claim ttl configurable Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(kanban): pass accept-hooks to worker chat subprocess * feat(kanban): add board-level default workdir (#25430) * docs(kanban-worker): document notification routing configuration * fix(kanban): preserve worker tools with restricted toolsets * fix(kanban): make legacy task migration idempotent (cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90) * fix: harden Kanban worker Hermes command resolution * feat(kanban): allow trimmed task comments SS-1647 live SHIP validation: real code + tests for kanban comment --max-len. * fix: show scheduled kanban tasks in dashboard * fix: assign single-task kanban decompositions * fix(kanban-dashboard): make Orchestration mode checkbox label static The checkbox label echoed its state ("Auto (default)" / "Manual") instead of describing the action, so a checked box reading "Auto" parsed as a status indicator rather than a control. The accompanying sub-description was also static and started with "When on, ...", which read awkwardly when the box was unchecked. Replace the dynamic label with a static action label ("Auto-decompose triage tasks") and flip the sub-description between the two modes so it stays accurate either way. The top-of-page Orchestration pill is unchanged — that one is intentionally a status badge / toggle. Fixes #28178 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956) Salvages the env-vars docs portion of #21956 by @Bartok9. The ascii-guard-ignore tags from the original PR already landed on main. * fix(kanban): close sqlite connection on init failure to prevent fd leak Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema init raises after sqlite3.connect() succeeds, the new connection was leaking. Wrap the body in try/except so the connection is closed before the exception propagates. * fix(kanban): don't crash dispatched workers when kanban-worker skill is absent Salvages #27372 by @oemtalks. The dispatcher unconditionally injected `--skills kanban-worker` into every worker spawn, but worker profiles sometimes don't have that bundled skill in their skills dir, which is fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`). Adds `_kanban_worker_skill_available(hermes_home)` and only injects the flag when the skill resolves. The MANDATORY lifecycle still ships via KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe. * fix(packaging): ship dashboard plugin assets in wheel Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/ glob entries to setuptools package-data so wheel installs ship the bundled dashboard plugin assets (kanban, achievements, etc.). Without these, /api/dashboard/plugins can't discover plugin assets outside a source checkout. * docs(kanban): document worker protocol auto-blocks Salvages #21585 by @helix4u. Documents the protocol_violation event (worker exits successfully while task is still running), adds --max-retries to the create flag list and --failure-limit to dispatch. * fix(oneshot): pass fallback_providers from profile config to AIAgent Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers spawned via 'hermes -p <profile> chat -q ...') were not honouring the profile's fallback_providers / fallback_model chain because oneshot.py never read the config and never passed fallback_model= to AIAgent. Reads cfg.get('fallback_providers') (new list format) or cfg.get('fallback_model') (legacy single-dict) with the same normalization cli.py applies, then forwards as fallback_model=_fb. * fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry. * feat(kanban): add initial-status for human-ops cards Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag (running|blocked, default running) to 'kanban create', threaded through kanban_db.create_task() and the kanban_create tool schema. 'blocked' parks the task directly in the blocked column for R3 human-ops review, skipping the brief running-to-blocked transition. Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and slash-handler error formatting changes that were bundled in the original PR — those should ship as their own focused changes if still wanted. * fix(kanban): release scratch workspace and tmux session on task completion Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace() and _cleanup_worker_tmux() after marking a task complete. Scratch workspaces (used by swarm agents) accumulate on disk — hundreds of MB per task, never released. Stale tmux sessions from completed agents also persist indefinitely. Both gates are safe: - workspace_kind == 'scratch' gate preserves user worktree/dir workspaces - tmux #{pane_dead} == 1 gate only kills sessions where the worker has already exited - best-effort: cleanup failures never block task completion * fix(kanban): honor severity thresholds in diagnostics Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics was using exact-match (severity == filter), so '--severity warning' hid 'error' and 'critical' diagnostics. Adds severity_at_or_above() helper to kanban_diagnostics and uses it in the dashboard endpoint (CLI already used SEVERITY_ORDER comparison correctly). * test: isolate Kanban env pins in hermetic fixture Salvages the substantive part of #22295 by @steezkelly. Adds the missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK, HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so ambient developer-shell pins on those vars don't bleed into pytest runs. The frozenset extraction + standalone regression test from the original PR were dropped to keep the change minimal — main already maintains the list inline. * feat(kanban): add max_in_progress config to cap concurrent running tasks Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). * fix(packaging): ship bundled skills in wheel Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and optional-skills/ because pyproject's [tool.setuptools.packages.find] only includes Python packages — the skills directories don't have __init__.py so they were silently dropped from the wheel. Adds setup.py with data_files spec emitting skills/* and optional-skills/* under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper in hermes_constants that discovers the wheel-installed location via sysconfig before falling back to a source-checkout path. tools/skills_sync uses the helper so 'hermes update' works for pip-installed users. * fix: 4 small surgical bugs Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. * perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init. * feat(kanban): add --sort option to 'hermes kanban list' Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc, priority,priority-desc,status,assignee,title,updated} to 'hermes kanban list'. Validated against VALID_SORT_ORDERS map; invalid values raise ValueError. Default behaviour (priority DESC, created ASC) is unchanged when --sort is omitted. * docs: add kanban codex lane skill * feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map contributor email for attribution check * test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks. * docs(codex_app_server): document multi-root Kanban writable_roots (#27941) Update the Codex app-server runtime guide's Kanban section to reflect the new behaviour: * The sandbox override now adds the board DB directory plus every Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT, HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated, DB-dir first. * The motivation note now includes the cross-mount artifact-write scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate drive) and links to issue #27941 so readers can find the original bug report. * fix(gateway): quiet corrupt kanban dispatcher boards Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board DBs ("file is not a database" / "database disk image is malformed") and disables them by fingerprint until they're repaired, instead of flooding the gateway log with repeated logger.exception tracebacks every tick. Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was an unrelated _is_dir OSError fix for service-path lookup. Dropped a small test reformat that was bundled in the same commit. * docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold. * feat(kanban): surface per-task model_override in show + tool output Salvages #26897 by @loicnico96. The per-task model_override DB column already exists on main, but it wasn't exposed in user-facing surfaces. This adds: - 'kanban show' prints 'model: <name>' when model_override is set - kanban_show / kanban_list tool responses include the model_override field Original branch was stale (PR was authored against an older field name 'model'); applied the substantive surface exposure manually using the current 'model_override' field name. * feat(cli): add kanban swarm topology helper Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a durable Kanban Swarm v1 graph: a completed root/blackboard card, parallel worker cards, a verifier gated on all workers, and a synthesizer gated on the verifier. Stores shared swarm blackboard updates as structured JSON comments on the root card. Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring + unit tests. * feat(kanban): add optional board parameter to all MCP tools Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9 kanban_* MCP tools via shared _connect helper. Backwards compatible — omitting board keeps current pinned-board behavior. Useful for orchestrator profiles that route across multiple boards. Two-file scope: tools/kanban_tools.py + tests. * feat(kanban): stamp originating ACP session_id on tasks Salvages #23208 by @awizemann. Tracks which chat session created a kanban task so clients can render a per-session board without falling back to tenant + time-window heuristics. - Schema: tasks gains nullable session_id TEXT column with index (additive migration in _migrate_add_optional_columns). - ACP: server.py exposes the originating session id via HERMES_SESSION_ID with save/restore around the agent loop. - Tool: kanban_create reads HERMES_SESSION_ID (with explicit override). - CLI: 'hermes kanban list --session <id>' filter; JSON output exposes session_id. * feat(kanban): wire dispatcher to dispatch review agents from review column Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). * feat(kanban): stale detection for running tasks in dispatcher Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. * feat(kanban): filter tasks by workflow fields and runs by status/outcome Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing workflow_template_id and current_step_key columns: - list_tasks() accepts workflow_template_id and current_step_key kwargs - 'hermes kanban list' adds matching CLI flags - dashboard plugin_api also exposes the filters Resolved a small conflict in list_tasks signature alongside main's session_id and order_by additions; combined all three into the single filter list. * feat(kanban): add respawn guard to block repeat worker storms Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker spawn for tasks where: - a recent run already succeeded (recent_success — within guard window) - the previous run hit a quota/auth error (blocker_auth, also auto-blocks) - a recent task comment includes a GitHub PR URL (active_pr) The guard prevents repeat worker storms on the same bug/task. Includes the contributor's review-findings fixup (regex hardening, observability, auth coverage). Resolved a small DispatchResult conflict alongside main's 'stale' field; kept both. Authorship preserved via rebase merge. * feat(kanban): show dashboard cron jobs across profiles Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron jobs from all profiles, with profile-aware filter UI and storage routing. Includes test coverage for cross-profile listing, mutation, deletion, and validation. Also fixes orphan conflict markers in config.py left by an earlier salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested in HEAD/PR markers from #28452 salvage of #23790). * fix(kanban): remove orphan conflict markers from config.py (#28458) PR #28452 (salvage of #23790, stale detection) merged with leftover git conflict markers in hermes_cli/config.py around the `dispatch_stale_timeout_seconds` config block, breaking config import and any code path that loads it. Cleans up the markers and keeps both config blocks (worker log rotation/orchestrator + stale detection). Resolves a self-introduced regression. * fix(kanban): remove orphan conflict markers from kanban.py (#28459) PR #28454 (salvage of #26745, workflow filter) merged with leftover git conflict markers in hermes_cli/kanban.py at three sites: - _task_to_dict() (session_id alongside workflow_template_id/current_step_key) - p_list parser (--sort alongside --workflow-template-id/--step-key) - _cmd_list (order_by alongside the new filter kwargs) Cleans up the markers and keeps both halves at each site. Resolves a self-introduced regression. * feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: | Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section. * feat(kanban): add scheduled status for delayed follow-ups Salvages #24533 by @roycepersonalassistant. Adds a first-class 'scheduled' Kanban status for time-delay follow-ups that aren't waiting on human input. - hermes kanban schedule <task_id> [reason] CLI command - Dashboard/API transitions to/from Scheduled - unblock_task() now releases both 'blocked' AND 'scheduled' tasks (re-checking parent dependencies before moving to ready/todo) - i18n + docs updates Resolved conflicts: kept HEAD's failure-counter reset on unblock alongside the PR's scheduled state, kept HEAD's 'running' direct-set rejection, combined both bulk-status branches. Dropped the dist/ bundle changes (months-stale; would need rebuild from source). * feat(kanban): drag-to-delete trash zone + bulk delete for task cards Salvages #28125 by @Jpalmer95. Adds: - Drag-to-delete trash zone in the kanban dashboard - Bulk delete endpoint with cascading delete_task cleanup - Frontend updates (drag visual + drop handler) - Confirmation prompt before delete Resolved end-of-file test conflict by appending both halves. * docs: add Korean Kanban documentation Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and translates Kanban documentation (kanban.md, kanban-tutorial.md) and the two related skills (devops-kanban-orchestrator, devops-kanban-worker). Purely additive — adds ko to the locales list in docusaurus.config.ts and creates the website/i18n/ko/ tree. * fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) - aux_config: drop session_search from _AUX_TASKS and remove stale test (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG) - compression_boundary_hook: set compressor._last_compress_aborted=False on MagicMock so the post-compress abort branch (PR #28117) doesn't short-circuit before the session-id rotation under test - kanban_dashboard_plugin: use consecutive_failures=3 so severity stays 'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a, so failures=5 now crosses the critical floor of 2*2=4) - cli_manual_compress: accept force kwarg on DummyAgent._compress_context (cli._manual_compress now passes force=True) * fix(telegram): render full clarify choice text in message body, use short button labels When Telegram clarify prompts offer long choices, mobile clients truncate the inline button labels, making options unreadable. Previously only the question was shown in the message body with truncated choice text in button labels. Fix: append the full numbered option list to the message body so users can read complete choice text on any client. Buttons now use short numeric labels (1, 2, ...) to avoid Telegram truncation. The 'Other (type answer)' button is unchanged. Long choice labels are now rendered in full (not truncated to 57 chars + '...') since they appear in the body instead of button labels. Closes: #27497 * chore(release): map @asdlem for PR #27852 salvage * fix(telegram): default streaming transport to edit * fix(telegram): respect reply_to_mode for DM topic reply fallback The DM topic reply fallback code in send() hardcoded should_thread=True when telegram_dm_topic_reply_fallback metadata was present, bypassing _should_thread_reply() and ignoring reply_to_mode config. This caused quote bubbles on every response even with reply_to_mode: 'off'. Fix: - Add reply_to_mode param to _reply_to_message_id_for_send() and _thread_kwargs_for_send() classmethods - In send(), check self._reply_to_mode != 'off' for DM topic fallback - Suppress reply anchor and reply_to_message_id when mode is 'off' while preserving message_thread_id for correct topic routing - Thread reply_to_mode through all 29 call sites Regression coverage: 10 new tests in test_telegram_reply_mode.py covering classmethod behavior, send() integration, and backward compatibility. Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994 * fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870) Telegram distinguishes three kinds of audio payloads: - message.voice → Opus/OGG voice messages → STT pipeline ✓ - message.audio → audio file attachments → bypasses STT ← was broken - message.document (audio mime) → generic file route **Root cause** — the inbound message routing block in gateway/run.py matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription. Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed instead of being treated as files, making the transcribe skill unusable from Telegram because the path it needed was never surfaced. **Fix** - Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events. - Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ mime-type events that are not explicitly AUDIO or DOCUMENT). - After the STT block, inject a document-style context note for each audio_file_path, giving the agent the file path and asking what to do with it (consistent with how plain documents are handled). **Tests** — 5 new tests in test_telegram_audio_vs_voice.py: - voice message still transcribed (regression guard) - audio attachment skips STT (core fix) - audio attachment context note format - STT disabled still produces file note (not STT-disabled notice) - MessageType.AUDIO != MessageType.VOICE sanity check Fixes #24870 * chore(release): map bartok9 noreply for PR #24879 salvage * fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY When the send_message tool runs outside the gateway process (agent loop, TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone path in _send_telegram constructs Bot(token=token) directly, bypassing any configured proxy. In regions where api.telegram.org is blocked, the send times out after ~5s with 'Telegram send failed: Timed out' and nothing ever shows up in gateway.log because the request never reaches the gateway. Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url, which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just before constructing the Bot. When a proxy is found, attach an HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request', matching what gateway/platforms/telegram.py already does for in-gateway sends and what the Discord standalone sender already does. Any exception attaching the proxy falls back cleanly to a direct connection, preserving prior behaviour for users without a proxy configured. Adds tests/tools/test_send_message_telegram_proxy.py covering both the proxy-configured and no-proxy cases. * chore(release): map @pepelax for PR #25419 salvage * fix(kanban-dashboard): restore implementations dropped during salvages (#28481) Four kanban dashboard test failures, all from PR salvages that picked up the test additions but dropped the corresponding implementations. - BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the board API never grew the column → test_board_empty failed because VALID_STATUSES - {archived} mismatched the rendered columns). - update_task: enrich the 'ready' 409 detail with the blocking parent list (id, title, status) and add _parents_blocking_ready helper. Implementation lost in the #26744 salvage (commit e215558ba) which pinned the test but not the server-side code. - dist/index.js: add parseApiErrorMessage helper, wire it through the drag/drop banner, add patchErr state to the TaskDrawer and surface it inline by the action row. Lost in the same #26744 salvage. - test_diagnostics_endpoint_severity_filter: update to at-or-above semantics (PR a94ddd807 changed the filter from exact-match so the warning filter now correctly includes error+critical too). * fix(gateway): roll over Telegram tool progress bubbles * fix(gateway): scope audio_file_paths outside media_urls guard The audio-file-paths handling block at line 7334 references the variable unconditionally, but #24879 initialized it inside the 'if event.media_urls' block — so events without media_urls hit UnboundLocalError. Found via test_run_agent_queued_message_does_not_treat_commentary_as_final after PR #28478 landed. * fix(gateway): keep tool-progress edits alive after Telegram flood control When a progress-message edit hits Telegram flood control (RetryAfter), can_edit was unconditionally set to False, permanently disabling coalescing for the rest of the run. Subsequent tool updates were posted as separate new messages instead of updating the existing progress bubble. Fix: only set can_edit=False for non-recoverable edit errors. On flood control, back off by resetting _last_edit_ts so the throttle interval is respected before the next edit attempt. Fixes #25188 * chore(release): map @erhnysr for PR #25198 salvage * fix(telegram): preserve can_edit after transient network errors in progress edits (#27828) When edit_message_text fails with a transient error (httpx.ConnectError, NetworkError, server disconnected, timeouts), the progress-message sender must not permanently set can_edit = False — that would convert a single Telegram network hiccup into separate per-tool bubbles for the rest of the run. Changes: - gateway/platforms/telegram.py: edit_message now returns retryable=True for transient network errors (ConnectError, NetworkError, timeouts, server disconnects, temporarily unavailable). Permanent failures (flood control, message-not-found, permissions) remain retryable=False. - gateway/run.py: send_progress_messages checks result.retryable before setting can_edit = False. Transient failures skip the fallback-send and continue — the next edit cycle catches up with the accumulated lines. Permanent failures (flood, message-not-found, etc.) still disable editing. Tests: 22 new tests in test_telegram_progress_edit_transient.py covering transient vs permanent error classification, SendResult.retryable semantics, and the can_edit decision logic. Fixes #27828 * fix(telegram): recover from post-update polling conflict without entering limbo * fix(test+release): update conflict retry count for MAX=5; map @CryptoByz * fix(gateway): route background-process notifications into Telegram DM topics Background-process completion notifications (notify_on_complete) and watch-pattern notifications were always delivered to the Telegram main chat instead of the originating private-chat topic. Hermes-created Telegram DM topic lanes only render a send when it carries both message_thread_id and a reply anchor. The synthetic MessageEvent injected on process completion had no message_id, so _reply_anchor_for_event returned None and _thread_kwargs_for_send dropped message_thread_id entirely — routing the notification to the main chat. Capture the triggering message id at spawn time and thread it through to the synthetic event so it can be reply-anchored back into the topic: - session_context: add HERMES_SESSION_MESSAGE_ID context var - telegram adapter: populate SessionSource.message_id on inbound messages - terminal tool: persist watcher_message_id on the process session - process registry: carry/persist message_id on watcher dicts + checkpoint - gateway: set MessageEvent.message_id on injected notifications Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map @fabiosiqueira for PR #27212 salvage * fix(telegram): route resumed DM topic sends directly * fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button actions but not for inbound messages. Unauthorized users triggered an 'Unauthorized user' log warning but their messages were still processed by the agent — a P0 security bypass (issue #23778). Fix: add allowlist check in _should_process_message() which is called for all message types (text, command, media, location). If the sender is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow all users (existing behavior). Fixes #23778 * fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty The _is_callback_user_authorized fallback returned True when TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user to interact with the bot. Change to fail-closed: deny by default unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set. Fixes #24457 * test(telegram): stub _is_callback_user_authorized in trigger-gating fixture After PR #24468 made the empty-allowlist callback auth fail-closed (and #23795 wired _is_callback_user_authorized into _should_process_message), trigger-gating tests started failing because their fake messages from user 111 hit the new deny-by-default path before trigger evaluation. Force-authorize all senders in _make_adapter() so the trigger logic under test runs. The fail-closed behavior itself is covered by test_telegram_callback_auth_fail_closed.py. * fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS When a sticky fallback IP (from DoH discovery) becomes unreachable, the transport previously got stuck in an attempt_order that only tried the dead IP. This prevented the gateway from recovering until the service was restarted. Changes: - Always include primary DNS path (None) after the sticky IP in the attempt_order so that a primary-path retry happens on sticky failure. - Reset self._sticky_ip to None when the currently sticky IP hits a connect timeout / connect error, allowing the next request to retry from scratch. Fixes silent Telegram disconnection when discovered fallback IPs are transiently or permanently unreachable. * test+release: align stale sticky-IP test for #24511; map @falconexe * fix(telegram): propagate extra base_url config * feat(send_message): auto-detect @username mentions and create Telegram entities When sending messages containing @username patterns, auto-generate MessageEntity(type='mention') entries so that the receiving bot's require_mention filter can trigger. This enables proper bot-to-bot interop where mention-based routing is used. * test+release: align send_message mocks for MessageEntity import; map @fonhal * fix(telegram): resume typing indicator after inline approval click (#27853) The text /approve and /deny paths in gateway/run.py call resume_typing_for_chat() after resolve_gateway_approval() succeeds, but the Telegram inline-button (ea:*) callback in _handle_callback_query did not. Typing is paused when the approval is sent (gateway/run.py:15658), so without a matching resume the typing indicator stayed gone for the remainder of a long-running turn after a button click. Symmetry-match the text path: after a successful resolve, call self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0 to matc…

* fix(acp): treat polished tool error payloads as failed * fix(acp): also mark raised-exception tool results as failed Extends #26573 to also catch the case the original PR deliberately left out: when a tool raises an exception, the agent's tool executor wraps it in a canonical 'Error executing tool '<name>': ...' string prefix (see agent/tool_executor.py around the try/except). That prefix is unique to the wrapper and cannot legitimately appear in well-behaved tool output, so it is a safe signal that the tool blew up. Without this, the canonical 'tool raised' case still rendered as a green 'completed' row in Zed despite being a runtime failure — exactly the class of bug #26573 set out to fix. Adds a positive test (raised-exception prefix -> failed) and a negative test (bare 'Error:' word in legit tool output stays completed) so a future contributor doesn't accidentally widen the rule to false-positive on compiler/linter diagnostics. * fix(acp): refresh session info after auto-title * fix(acp): use refresh moment as updated_at on session info push Follow-up to #26543. The sessions table does not have an updated_at column (see hermes_state.py — only started_at/ended_at), so row.get('updated_at') always returned None and the str() coercion was dead code. Use datetime.now(UTC).isoformat() instead, which reflects exactly what the field means here: 'the title was refreshed at this moment'. Drop the dead coercion. * feat(acp): enrich permission request cards * feat(web): mobile dashboard UX polish (#28127) * feat(web): mobile dashboard UX polish Bottom sheets for sidebar theme/language pickers on narrow viewports with enter/exit animation and drag-to-close; inline header badges beside titles; bottom padding on the route outlet for scroll clearance; profiles loading uses a unicode braille spinner; align profile/cron card actions to the top; viewport-fit cover and supporting layout tweaks across dashboard pages. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix Nix web npm hash and mobile sheet accessibility. Align fetchNpmDeps in nix/web.nix with web/package-lock.json for CI. Improve BottomPickSheet backdrop labeling, avoid aria-hidden on the dialog during exit animation, and wire theme/language sheets with listbox semantics and localized dismiss labels. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * feat(install.ps1): strip BOM, add -Commit/-Tag pin params, harden git ops Three install.ps1 improvements pulled from the thin-installer work on bb/gui (PR #27822) that benefit the canonical CLI install flow on main: 1. Strip UTF-8 BOM from scripts/install.ps1. The canonical 'irm <raw URL> | iex' install flow has been broken since commit 4279da4db re-introduced a UTF-8 BOM that PR #27224 had explicitly stripped. PowerShell 5.1's 'irm' returns the response body as a string with the BOM surviving as a leading \ufeff character; 'iex' then evaluates that string and the parser chokes on the invisible character before param(), surfacing as a cascade of 'The assignment expression is not valid' errors at every param default value. File body is verified pure ASCII (no character above byte 127), so PS 5.1 with no BOM falls back to Windows-1252 decoding which is identical to ASCII for our content. Both install paths work: - 'irm ... | iex' (canonical one-liner) - 'powershell -File install.ps1' (programmatic / desktop bootstrap) 2. New -Commit and -Tag string params for reproducible pinning. Higher-precedence variants of -Branch. When set, the repository stage clones $Branch (fast partial fetch) and then 'git checkout's the exact ref. Precedence: Commit > Tag > Branch. Honoured by all three code paths: - Update path (existing valid checkout): fetch + checkout --detach <commit|tag> instead of checkout + pull. - Fresh clone: clone --branch $Branch, then post-clone 'git checkout --detach' to the requested ref. - ZIP fallback: pick archive URL for the most-specific ref (commit -> archive/<sha>.zip, tag -> archive/refs/tags/ <tag>.zip, else archive/refs/heads/<branch>.zip). Used by the Hermes desktop's first-launch bootstrap to pin the .exe to the exact commit it was built against, so the cloned Hermes Agent tree always matches what the .exe was tested with. Also enables release-bundle pinning (e.g. Microsoft Store builds pinning to a release tag) and CI reproducibility. 3. EAP=Continue wrap around the new pin-step git invocations. 'git fetch origin <commit>' writes the routine 'From <url>' info line to stderr. Under the script's global $ErrorActionPreference = 'Stop' that stderr line is wrapped as an ErrorRecord and terminates the script even though fetch+checkout actually succeed. Same EAP=Stop + native-stderr footgun we hit during the install.ps1 hardening pass in Install-Uv, Test-Python, _Run-NpmInstall. Wrap both the update-path fetch/checkout block AND the post-clone pin block in $ErrorActionPreference = 'Continue' (restored in finally). Real failures still caught by $LASTEXITCODE checks. * fix: add default base_url_override for ollama-cloud provider * chore(release): add AUTHOR_MAP entry for falasi * feat(cli): add /update slash command to CLI and TUI (#23854) * feat: add /update slash command to CLI and TUI * test(cli): add Python tests for /update slash command Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cli): address Copilot review for /update slash command Route classic CLI /update through prompt_toolkit modal confirmation and defer relaunch to the main-thread cleanup path after app.exit(). Tighten Y/n semantics, add Python wrapper and catalog coverage tests, and assert /update stays visible in the TUI command catalog. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cli): address review feedback on /update command - Replace raw input() with _prompt_text_input_modal in _handle_update_command to avoid EOF/hang/keystroke-leak races with prompt_toolkit's stdin ownership - Fix confirmation logic: only proceed on recognized affirmative aliases (y/yes/1/ok); cancel on everything else including empty string, typos, and unrecognized input — matches all other [Y/n] prompts in the codebase - Route relaunch through main-thread shutdown path: set _pending_relaunch and return False from process_command so process_loop triggers app.exit(); run() then calls relaunch() after prompt_toolkit has restored terminal modes and after cleanup — safe on both POSIX (execvp) and Windows (subprocess+exit) - Fix misleading docstring in test_update_command.py: the Vitest only covers the TypeScript slash handler that emits code 42, not the Python wrapper branch that acts on it - Rewrite tests to use SimpleNamespace pattern (like test_destructive_slash_confirm) so _prompt_text_input_modal can be stubbed directly - Add Python test for _launch_tui exit-code-42 → relaunch branch in main.py Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com> * fix(cli): polish test fixtures for /update command - Remove unused _prompt_text_input from SimpleNamespace stub - Use pytest.fail sentinel in managed-install guard test to catch unexpected modal invocations Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com> * chore: re-trigger CI after Copilot review fixes Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com> * feat(skills): add baoyu-article-illustrator skill * feat(skills): adapt baoyu-article-illustrator for Hermes Adapts the upstream baoyu-article-illustrator skill (verbatim-copied in the previous commit) to Hermes' tool ecosystem, matching the pattern used by baoyu-infographic. - Metadata: openclaw → hermes; add author, license, tags, category - Triggering: slash command + CLI flags → natural language - User config: remove EXTEND.md, first-time-setup, preferences-schema - User prompts: AskUserQuestion (batched) → clarify (one at a time) - Image gen: baoyu-imagine → image_generate (describe refs in prompt text) - Platform: drop Windows/PowerShell; Linux/macOS only - File ops: switch to write_file / read_file - Watermark: opt-in per-article instead of EXTEND.md-driven - Add PORT_NOTES.md describing the adaptation and sync procedure Style, palette, and prompt/system.md reference files are verbatim copies and are the sync points with upstream. * fix(skills): align article-illustrator with real Hermes tool capabilities Addresses review feedback on #13193: 1. Reference-image flow no longer assumes write_file/read_file handle binaries. vision_analyze produces a textual description; the binary is optionally copied via terminal (cp/curl). The description is what gets embedded in prompts. 2. image_generate's URL-only return is now explicit. Step 6 downloads the returned URL to local disk via terminal (curl -sSL -o ...), then verifies non-zero size before proceeding. 3. Removed "Please use nano banana pro..." line from prompts/system.md — the backend is user-configured and not agent-selectable, so routing hints in the prompt are misleading. PORT_NOTES.md updated: prompts/system.md is no longer verbatim, and the file-ops/backend-selection rows now reflect Hermes' actual tool surface (write_file/read_file for text, terminal for binaries and URL downloads, vision_analyze for reading images). * chore(skills/baoyu-article-illustrator): tighten description, add platforms, regen docs * chore(release): map Jack Yang contributor email Adds the contributor email mapping for Jack Yang (@0xjackyang) so future release-note generation attributes commits correctly. Salvage of #27964 by @0xjackyang. * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 7 Pre-stages AUTHOR_MAP entries for 5 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 7). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - 02356abc (#28286 — wecom WSMsgType.CLOSING) - burjorjee (#28201 — inline-shell timeout guard) - oseftg (#28168 — natural response ending: emoji + caret) - rudi193-cmd (#28241 — empty credential pool entries) - sadiksaifi (#27982 — kanban horizontal scroll) Per references/batch-pr-salvage-may14-additions.md. * fix(wecom): handle WSMsgType.CLOSING to prevent CPU spin The WeCom adapter's _read_events() loop only handled CLOSE, CLOSED, and ERROR websocket message types. When the server initiates a graceful shutdown, aiohttp returns WSMsgType.CLOSING before the connection is fully closed. This message type was not handled, causing the receive() call to return immediately in a tight loop while self._ws.closed remained False. The result was 100% CPU usage on the asyncio event loop. Add WSMsgType.CLOSING to the set of terminal message types that raise RuntimeError("WeCom websocket closed"), allowing _listen_loop() to enter its normal reconnect backoff path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(auth): treat empty credential pool entries as unauthenticated Fixes #28140 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: include hermes_plugins in gateway.log component filter gateway.log uses a _ComponentFilter that only passes records from loggers starting with ('gateway',). Plugin modules are loaded under the hermes_plugins.* namespace, so all plugin log output is silently dropped from gateway.log. This makes plugin registration — which directly affects gateway hooks (pre_gateway_dispatch, transform_llm_output, etc.) — invisible in the gateway-specific log. Operators debugging gateway behavior check gateway.log and see no plugin activity, even when plugins are working correctly. Add 'hermes_plugins' to the gateway component prefixes tuple so plugin log messages appear in gateway.log. Closes #28138 * fix(gateway): align kanban artifact _IMAGE_EXTS with response dispatch _deliver_kanban_artifacts used a broader _IMAGE_EXTS that included .bmp, .tiff, and .svg. These three extensions are absent from the equivalent set in _deliver_media_from_response (line 10661), which intentionally routes them through send_document rather than send_multiple_images (comment near line 10522 notes that Telegram sendPhoto recompresses and rejects non-raster formats). Routing .svg (XML text), .bmp, or .tiff through the photo API causes send_multiple_images to raise on most platforms; the exception is caught and logged as a warning, silently dropping the artifact. Aligning the two sets ensures kanban deliverables with these extensions follow the same send_document path as regular agent responses. No behaviour change for .png/.jpg/.jpeg/.gif/.webp. * fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze Background process non-PTY path used stdin=subprocess.PIPE unconditionally, creating an orphan pipe that was never written to and never closed. Child processes that read stdin would block indefinitely, competing with the parent's prompt_toolkit event loop for terminal ownership and causing complete keyboard lockout. Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin reads instead of blocking forever. For interactive stdin, the PTY path (which has its own independent PTY via ptyprocess.PtyProcess.spawn) should be used instead. Fixes #17959 * chore(release): alias stale-ID salvage commit for @LifeJiggy (#28317) * fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze Background process non-PTY path used stdin=subprocess.PIPE unconditionally, creating an orphan pipe that was never written to and never closed. Child processes that read stdin would block indefinitely, competing with the parent's prompt_toolkit event loop for terminal ownership and causing complete keyboard lockout. Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin reads instead of blocking forever. For interactive stdin, the PTY path (which has its own independent PTY via ptyprocess.PtyProcess.spawn) should be used instead. Fixes #17959 * chore(release): alias stale-ID salvage commit for LifeJiggy PR #28315 was salvaged with a wrong noreply numeric ID (192385615 vs the correct 141562589). The commit on main is correctly authored to LifeJiggy by username, but the noreply email doesn't match AUTHOR_MAP. Adds an alias so release-notes generation maps both forms to the same contributor. --------- Co-authored-by: LifeJiggy <192385615+LifeJiggy@users.noreply.github.com> * fix: elevate plugin discovery failures from debug to warning Plugin discovery exceptions in gateway startup (gateway/run.py) and CLI startup (hermes_cli/main.py) are caught and logged at DEBUG level, making them invisible at the default INFO log level. If any plugin import fails — syntax error, missing dependency, import cycle — operators get zero indication unless they bump the log level to DEBUG. This makes broken plugins appear enabled but silently non-functional. Change both locations to logger.warning() so failures are visible at production log levels. Closes #28137 * fix: treat inline-shell timeout guard as timeout * fix(acp): resolve /tmp symlink before workspace auto-approve check on macOS Path.resolve() follows the /tmp -> /private/tmp symlink on macOS, so str(path).startswith("/tmp/") is always False for temp-dir paths. The "Accept Edits" (workspace_session) mode silently refused to auto-approve every /tmp write on macOS, breaking the documented behaviour and making the existing test fail on this platform. Fix: keep the raw expanded path (pre-resolve) for the /tmp prefix check and continue using the resolved form only for the cwd relative_to() call where symlink resolution is correct behaviour. * fix(kanban): single-row horizontal scroll for board columns Switch .hermes-kanban-columns from auto-fit CSS grid to a flex row with overflow-x: auto and a hidden scrollbar (scrollbar-width / ::-webkit- scrollbar), and pin .hermes-kanban-column to flex: 0 0 280px so columns sit side-by-side at a fixed width instead of wrapping into a 2xN grid. Page vertical scroll is unaffected: each column already caps at max-height: calc(100vh - 220px), so the container never grows tall enough to introduce its own vertical scrollbar. * fix(approval): surface pending-approval state with explicit marker visible to LLM When a tool call requires user approval in the non-blocking gateway path, the LLM previously received a result that was indistinguishable from a failed tool call (exit_code=-1, error=message). The LLM could not tell whether the tool was pending approval, had returned empty results, or had failed silently — causing it to burn context on wrong hypotheses. Fix changes the result format to include: - status: pending_approval (clear state name) - approval_pending: True (explicit boolean for LLMs to detect) - error: cleared to empty string (removes misleading error signal) This lets the LLM reason about approval latency vs actual errors, short-circuiting the previous silent failure mode. Fixes #14806 * fix: recognize emoji and caret as natural response endings GLM models via Ollama report finish_reason='stop' even when the response was truncated by max_tokens. The continuation mechanism uses _has_natural_response_ending() as one of the heuristics to detect whether the response was genuinely finished. Currently only ASCII punctuation and CJK punctuation are recognized. This means any response ending with an emoji (e.g. ⚡, 👍) or the caret character ^ (common in French ^^ smiley) is not recognized as naturally ended, triggering a false-positive continuation where the model receives 'Continue where you left off' and produces garbled output. Add: - ^ (caret) to the punctuation set - Unicode emoji range (codepoint >= 0x1F300) as natural ending This only affects GLM/Ollama users but the fix is safe for all backends since _has_natural_response_ending() is only consulted inside the continuation flow. * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 8 (#28328) Pre-stages AUTHOR_MAP entries for 10 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 8). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - AceWattGit (#28159 — _pool_may_recover_from_rate_limit NameError) - YuanHanzhong (#28032 — x.com/status fallbacks link-like) - colin-chang (#28245, #28249, #28251 — gateway + mattermost fixes) - felix-windsor (#28019 — preserve cron asterisks in strip mode) - houenyang-momo (#28205 — charizard completion menu contrast) - iqdoctor (#28095 — windows installer docs) - joe102084 (#28151 — whitespace-only cron responses) - jvinals (#27936 — Slack U-IDs → DM channel) - maxmilian (#28267 — ModelPickerDialog portal) - samggggflynn (#27952 — dingtalk pre_start) Per references/batch-pr-salvage-may14-additions.md. * fix: add pre_start() to _IncomingHandler for dingtalk SDK compatibility The dingtalk-stream SDK calls pre_start() on every registered handler before opening the WebSocket connection. Without this method, the SDK raises AttributeError and kills the stream connection, causing DingTalk to be unable to connect via Stream Mode. * fix(windows): handle redirected stdout in _cprint fallback Wraps _pt_print in try/except with a print() fallback. When a kanban worker's stdout is piped to a log file, prompt_toolkit raises NoConsoleScreenBufferError (Windows) or OSError (other) because there is no real console buffer. The fallback keeps worker output flowing instead of crashing. * chore(release): alias stale-ID salvage commit for @Grogger (#28334) PR #28330 was salvaged with a wrong noreply numeric ID (18091625 vs the correct 7065068). The commit on main is correctly authored to Grogger by username, but neither noreply form was in AUTHOR_MAP. Adds both so release-notes generation maps them to @Grogger. * fix(aux): remove stale session_search model menu entry * fix(tui): keep x status citation fallbacks link-like * fix(xai-oauth): quarantine dead tokens on terminal refresh failure resolve_xai_oauth_runtime_credentials() called _refresh_xai_oauth_tokens() with no try/except. A terminal refresh failure (HTTP 400/401/403 — invalid_grant, token revoked) propagated without clearing the dead access_token / refresh_token from auth.json, causing every subsequent session to retry the same doomed network request. Add a try/except around the refresh call that mirrors the existing credential_pool.py quarantine: when _is_terminal_xai_oauth_refresh_error identifies a non-retryable failure, clear the dead token fields from auth.json and write a last_auth_error diagnostic marker so future calls fail fast with a clear relogin_required error instead of hitting the network. active_provider is preserved (set_active=False) so multi-provider users whose chosen provider is not xai-oauth are unaffected. Tests: two new cases in test_auth_xai_oauth_provider.py cover terminal quarantine and transient pass-through. * feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644) The background review prompts (_SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT) now include explicit protection rules for bundled, hub-installed, and pinned skills — aligning with the curator's existing policy at curator.py L345/350. Before this change, bg-review could freely rewrite bundled skills like 'hermes-agent' or pinned skills, while the 7-day curator explicitly skips them. The review agent now sees: • Bundled skills (shipped with Hermes) • Hub-installed skills (installed via hermes skills install) • Pinned skills (marked via hermes curator pin) If only protected skills need updating, the review says 'Nothing to save.' and stops. Fixes #27644 * fix(web): portal Change Model modal so it renders above the app sidebar The dashboard's main column is `relative z-2` (App.tsx), which creates a stacking context that traps fixed descendants below the app sidebar (`z-50`). `ModelPickerDialog` renders `fixed inset-0 z-[100]` inline, so its z-100 is scoped to z-2 and the sidebar covers its left edge. The bug is visible across all themes but only obvious in the Large theme variants (Hermes Teal (Large), etc.) where the larger root font widens the dialog into the sidebar's column. Toast.tsx already documents the same trap and uses the same `createPortal(..., document.body)` escape. This commit ports the picker; the same pattern affects other inline z-[100] modals in the dashboard (OAuthLoginModal, Cron / Models / Profiles page modals) and is left for a follow-up — keeping this PR scoped to the reporter's specific case. Fixes #28103 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): exit code 75 on service restart so launchd relaunches When the gateway receives SIGUSR1 (graceful restart via launchd_restart), the SIGUSR1 handler calls request_restart(via_service=True) and the gateway shuts down cleanly with exit code 0. However, the generated launchd plist uses KeepAlive → SuccessfulExit → false, meaning launchd only relaunches on *non-zero* exit codes. A clean exit(0) is treated as "successful, don't restart", so the gateway stays down after /restart, /update, or SIGUSR1. The systemd unit template already uses RestartForceExitStatus=75 for the same scenario. Mirror that convention: when _restart_via_service is True, raise SystemExit(75) so launchd's SuccessfulExit=false policy triggers a relaunch. Closes #28135 * fix: guard json.loads() against invalid TTS and skill_view responses Two code paths call json.loads() on output from external tools without catching JSONDecodeError. If the tool returns a non-JSON string (error message, empty string, or None), the entire call path crashes. 1. gateway/run.py — text_to_speech_tool() result in voice reply path. A TTS failure that returns an error string instead of JSON crashes the voice reply handler, killing the message response entirely. 2. cron/scheduler.py — skill_view() result when loading skills for cron jobs. A corrupted or missing skill file that returns an error string instead of JSON crashes the cron tick, preventing all jobs from executing that cycle. Both fixes catch (json.JSONDecodeError, TypeError), log a warning, and gracefully skip the failed operation instead of crashing. * fix(gateway): bridge gateway_restart_notification from YAML platform sections Two related bugs in gateway/config.py prevented per-platform gateway_restart_notification from working through config.yaml: 1. The shared-key bridging loop (load_gateway_config) omitted 'gateway_restart_notification', so the key never landed in platform_data['extra'] even when set under e.g. 'discord:' or 'mattermost:' sections. 2. PlatformConfig.from_dict() only read gateway_restart_notification from the top-level data dict, ignoring the 'extra' sub-dict where bridged keys are stored. Fix: add the key to the bridging loop, and add an 'extra' fallback in from_dict() so that round-tripped values (YAML → bridged → extra → from_dict) resolve correctly. Impact: users can now set gateway_restart_notification: false per platform in config.yaml instead of relying on env vars or the global platforms: block. * feat(kanban): add auto_promote_children config toggle When the kanban auto-decomposer fans a triage task into child tasks, recompute_ready() immediately promotes parent-free children to 'ready' so the dispatcher picks them up. Some users want a manual workflow where children stay in 'todo' for review before dispatch. Add 'kanban.auto_promote_children' config key (default: true): - false: children stay in 'todo' after decomposition - true: existing behavior (auto-promote to 'ready') Changes: - kanban_db.py: decompose_triage_task() gains auto_promote param - kanban_decompose.py: reads auto_promote_children from config - kanban dashboard API: exposes the new setting in GET/PUT /orchestration Closes #28016 * fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace The conversation_loop.py references _pool_may_recover_from_rate_limit which was defined in run_agent.py. After the conversation-loop extraction refactor, the helper was no longer in the same module scope. Wrap the call as _ra()._pool_may_recover_from_rate_limit() to route through the run_agent monkeypatch namespace where the helper is available. Adds regression test in test_gemini_fast_fallback.py. Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError. * fix(tui): improve charizard completion menu contrast * docs(windows): avoid piping installer directly into iex * fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without enforcement steering — agents narrate "calling tool X" without actually emitting a tool call, or run partial loops. Both model families fit the same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for (gpt, codex, gemini, gemma, grok, glm). Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Squashed salvage of: - 403e567ce fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS - 9433eabe7 test(agent): use realistic qwen-plus identifier in enforcement test Fixes #28079. * fix(send_message): resolve Slack user IDs to DM channel IDs The _SLACK_TARGET_RE regex only matched IDs starting with C (channel), G (group), or D (direct message). Slack user IDs start with U, causing 'Could not resolve' errors when trying to send DMs to specific users. Changes: - Expand _SLACK_TARGET_RE to accept U-prefixed IDs (user IDs) - Add conversations.open fallback to resolve user IDs to DM channel IDs before sending, since chat.postMessage requires a conversation ID Fixes #ISSUE_NUMBER * fix(gateway): tighten MEDIA extraction regex + silent skip on file-not-found Three related fixes for the MEDIA:<path> extraction pipeline that caused 'file not found' noise in platform channels: 1. run.py — tighten tool-result MEDIA regex from \S+ (any non- whitespace) to require a path pattern with known extensions. Prevents LLM-generated placeholder paths like 'MEDIA:/path/to/example.mp4' from being captured as real media. 2. base.py — remove the |\S+ fallback in extract_media() that catches anything non-whitespace as a potential MEDIA path. This was the primary cause of false positives — strings like '' in tool output were captured as MEDIA: paths. 3. mattermost.py — replace the file-not-found error message sent to the channel with a silent logger.warning() skip. When a path extracted by MEDIA doesn't exist on disk, the channel no longer gets a noisy '(file not found: ...)' message. Impact: eliminates the persistent 'file not found' spam in Mattermost channels caused by over-broad MEDIA regex patterns matching non-path text in tool output. * fix(xai-oauth): split 403 (tier/entitlement) from 400/401 in token endpoint xAI's token endpoint returns HTTP 403 to the OAuth grant when the account isn't on the allowlist for API access (e.g. standard SuperGrok subscribers — see #26847). Treating it like a stale-token 400/401 made ``format_auth_error`` append "Run ``hermes model`` to re-authenticate", which is misleading because re-login can't change xAI's tier decision. Split 403 off in both ``refresh_xai_oauth_pure`` and the loopback login token exchange: * New error code ``xai_oauth_tier_denied`` with ``relogin_required=False`` * Message explains the entitlement gate and points at the ``XAI_API_KEY`` + ``provider: xai`` fallback * 400/401 still set ``relogin_required=True`` as before * 5xx still set ``relogin_required=False`` as before * fix(run-agent): treat any 403 on xai-oauth as entitlement to stop refresh-loop The existing ``_is_entitlement_failure`` heuristic only fires when the response body contains specific substrings ("do not have an active Grok subscription", etc.). xAI has been seen to 403 standard SuperGrok subscribers with a terser body that doesn't match those keywords (#26847), and the recovery path would then mint a fresh token, get a fresh 403, and loop until Ctrl+C. Add a defense-in-depth check at the recovery call site: any 403 on ``provider == "xai-oauth"`` short-circuits ``try_refresh_current`` so the error surfaces immediately with the friendly hint from ``_summarize_api_error``. Keeps the existing keyword path for all other providers untouched. * test(xai-oauth): pin tier-denied 403 behavior + docs warning for #26847 Tests: * ``test_refresh_xai_oauth_pure_403_marked_tier_denied_not_relogin`` — refresh-403 raises ``xai_oauth_tier_denied`` with ``relogin_required=False`` and the API-key fallback hint in body. * ``test_format_auth_error_tier_denied_does_not_suggest_relogin`` — the renderer does not append "Run ``hermes model``" for the new code. * ``test_recover_with_credential_pool_skips_refresh_on_bare_403_for_xai_oauth`` — bare ``{"reason":"forbidden","message":"Forbidden"}`` body (which does not match the existing keyword heuristic) still short-circuits ``try_refresh_current`` on xai-oauth. Docs: * Drop the "(any active tier)" claim from the xai-grok-oauth guide, add a top-of-page warning callout, and a Troubleshooting section for the 403-after-login case pointing at ``XAI_API_KEY`` + ``provider: xai`` as the documented fallback. * fix: handle whitespace-only cron responses * fix(cli): preserve cron asterisks in strip mode * fix(mattermost): resolve thread root_id and route progress to threads Two Mattermost thread-related bugs: 1. _resolve_root_id() — Mattermost CRT requires root_id to be the thread root post. Using any reply's own ID as root_id causes '400 Invalid RootId'. Add _resolve_root_id() that walks up the post chain via API to find the actual root, and apply it in send(), _send_url_as_file(), and _send_local_file(). 2. _progress_reply_to — The condition in run.py only checked Platform.FEISHU, missing Mattermost entirely. This caused tool progress messages to always land in the main channel instead of the thread. Add Platform.MATTERMOST to the condition so progress messages are routed to threads when reply_mode=thread. Impact: Tool progress messages now appear in Mattermost threads instead of flooding the main channel; thread replies no longer fail with Invalid RootId when the reply target is itself a reply. * feat(kanban): archive --rm to hard-delete archived tasks Salvages #19964 by @Beandon13. Adds `hermes kanban archive --rm` to permanently remove already-archived tasks with cascading cleanup of links, comments, events, runs, and notify-subs. Safety guard: only archived tasks can be deleted; active/blocked/done must be archived first. Cherry-picked from #19964 onto current main (severe stale base, applied manually to preserve substance only). * feat(proxy): add xai upstream adapter for Grok via OAuth * chore(release): map @yannsunn for PR #28064 xai proxy adapter salvage * docs(skill): align kanban dispatcher failure_limit text with current default * fix(oauth): add manual-paste fallback for browser-only remote consoles xAI Grok OAuth (and Spotify) use a loopback redirect to ``http://127.0.0.1:<port>/callback`` to capture the authorization code. That works when the browser and Hermes run on the same machine, and the SSH tunnel recipe handles the regular remote case. It breaks completely on **browser-only remote consoles** (GCP Cloud Shell, GitHub Codespaces, AWS EC2 Instance Connect, Gitpod, Replit, …) where the user has a browser but no real SSH client to forward a port — the redirect to 127.0.0.1 on the remote VM simply isn't reachable from the laptop, and there's nothing the existing flow can do about it (#26923). This commit adds the foundation for a manual-paste fallback: * ``_is_remote_session`` now also recognises Cloud Shell, Codespaces, Gitpod, Replit, StackBlitz (in addition to SSH), so the existing tunnel hint at least fires in those environments. * ``_parse_pasted_callback`` accepts any of: a full ``http(s)://...?code=...&state=...`` URL, a bare ``?code=...`` query string, a bare ``code=...&state=...`` fragment, or a bare opaque code value. Returns the same dict shape the HTTP callback handler produces, so the caller's state / error validation works unchanged (no CSRF bypass). * ``_prompt_manual_callback_paste`` reads stdin with a clear multi-line explanation of what's happening and what to paste. * ``_xai_oauth_loopback_login`` gains a ``manual_paste`` kwarg that skips the HTTP listener entirely. The redirect_uri, PKCE verifier, state, and nonce are byte-identical to the loopback path so xAI's token endpoint can't tell the difference at the protocol level. * ``_print_loopback_ssh_hint`` now also mentions ``--manual-paste`` so users without a real SSH client see a path forward instead of a dead-end tunnel recipe. * ``_login_xai_oauth`` threads ``args.manual_paste`` into the loopback helper. * feat(cli): wire --manual-paste into ``hermes auth add`` and ``hermes model`` Register the new ``--manual-paste`` flag on both entry points and thread it through to the xAI loopback login: * ``hermes auth add xai-oauth --manual-paste`` — pool-add path, forwarded inside ``auth_commands.handle_auth_add``. * ``hermes model --manual-paste`` — model-picker path, forwarded by ``_model_flow_xai_oauth`` into the synthetic ``argparse.Namespace`` it passes to ``_login_xai_oauth``. The picker also now forwards ``--no-browser`` and ``--timeout`` for consistency (previously hardcoded to defaults regardless of CLI flags). Help text on both flags points at #26923 and names the browser-only remote consoles (Cloud Shell, Codespaces, EC2 Instance Connect) so users searching ``hermes --help`` can find the workaround. * test+docs(oauth): pin manual-paste semantics and document browser-only path (#26923) Tests (``tests/hermes_cli/test_auth_manual_paste.py``): * 9 parametrised + scalar cases for ``_is_remote_session`` covering the new Cloud Shell / Codespaces / Gitpod / Replit / StackBlitz env vars (plus the existing SSH ones). * 9 cases for ``_parse_pasted_callback`` covering every paste form (full URL, https URL with extra params, bare ``?code=...``, bare ``code=...`` fragment, bare opaque value, error+description, empty, whitespace-only, malformed URL). * 3 cases for ``_prompt_manual_callback_paste`` (happy path, EOF, Ctrl-C). * 3 end-to-end ``_xai_oauth_loopback_login(manual_paste=True)`` cases: the HTTP server MUST NOT be started (asserted via a callable that raises if invoked), wrong state still rejected with ``xai_state_mismatch`` (no CSRF bypass), and empty paste surfaces ``xai_code_missing``. * SSH-hint mention test ensures the ``--manual-paste`` instruction is printed in the remote-session hint. Docs: * ``oauth-over-ssh.md`` — new "Browser-only remote (Cloud Shell / Codespaces / EC2 Instance Connect)" section with the ``--manual-paste`` recipe, plus a TL;DR note for the new flag. * ``xai-grok-oauth.md`` — short subsection pointing at the same recipe and the OAuth-over-SSH guide anchor. * docs(kanban): document max-retries task override * docs(kanban): document inline create shortcuts * test(kanban): cover default board dashboard pin * docs: ignore box diagrams in ascii guard Wrap existing box-drawing diagrams with ascii-guard markers so docs-site checks pass when website docs are touched. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: per-task model override for kanban workers - Add model_override field to Task class and tasks schema - Add migration for existing databases - Spawn worker with -m model when model_override is set * test(kanban-dashboard): cover _task_dict task_age fallback The fix in 061a1830 added an outer try/except in plugin_api._task_dict so that a future failure mode in kanban_db.task_age (anything _safe_int doesn't already absorb) cannot 500 the GET /board response. The _safe_int / task_age corruption paths got regression coverage in tests/hermes_cli/test_kanban_db.py, but the OUTER fallback contract remained untested -- meaning a refactor that drops the try/except would not be caught by CI. Pin that contract from both consumers of _task_dict: - GET /board returns 200 with the literal fallback age dict for the affected card (other cards continue to render via the same path) - GET /tasks/:id (drawer view) returns 200 with the same fallback, so a single corrupt task can't block its own drawer Both tests force task_age to raise RuntimeError rather than ValueError on '%s', because ValueError is absorbed by _safe_int and never reaches the outer try/except -- testing that path would only re-cover what test_kanban_db.py already pins. Manually verified the regression discipline: git checkout 061a1830^ -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both FAIL with 500 git checkout HEAD -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both PASS * fix(kanban): clear _INITIALIZED_PATHS in remove_board so recycled DBs re-init schema Archiving or deleting a board via remove_board() leaves the path's "schema already initialized" entry in the module-level cache. A concurrent connect(board=<slug>) call (e.g. the dashboard event-stream poll loop) then: 1. resolves the same kanban.db path, 2. recreates the directory + an empty sqlite file because connect() does mkdir(parents=True, exist_ok=True), 3. skips the CREATE TABLE pass because the cache entry says the schema is already in place, 4. errors on the next read with `no such table: task_events`. Drop the cache entry before mutating the filesystem so the fresh file gets a proper schema init on next connect(). Applies to both archive=True (rename) and archive=False (rmtree) branches. Fixes #23833. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): add Cache-Control: no-store to plugin static file serving Prevents browser caching of stale dashboard plugin JS files that may contain bugs already fixed upstream (e.g. COLUMN_LABEL undefined). * fix(kanban): seed bundled skills (e.g. kanban-worker) on kanban init Closes #23725 * fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards * fix(kanban): keep board-management commands independent from board override * fix(kanban): preserve notifier_profile for dashboard home subscriptions * fix(kanban): promote dependents when a parent is archived * fix(cli): make kanban specify max_tokens configurable * fix(kanban): sync slash subcommands with live parser * fix(kanban): promote blocked tasks when parent dependencies complete recompute_ready only scanned 'todo' tasks for promotion, ignoring 'blocked' tasks entirely. When a task was blocked (e.g. by the circuit breaker) and its parent dependencies later completed, the task stayed stuck in 'blocked' forever unless manually unblocked. Now recompute_ready also scans 'blocked' tasks. When all parents are done/archived, the blocked task is promoted to 'ready' with failure counters reset — equivalent to an automatic unblock. Includes a regression test for the blocked-parent-done promotion path. * fix(kanban): use 'is not None' check for max_runtime_seconds in create_task max_runtime_seconds=0 was being silently coerced to None due to a falsy check (if max_runtime_seconds). Zero is a valid value that causes the dispatcher to immediately time out a task. The adjacent max_retries parameter already used the correct 'is not None' pattern. Fixes the inconsistency by aligning max_runtime_seconds with max_retries. * fix(kanban): reset failure counters on unblock_task When a task is manually unblocked (blocked → ready/todo), the consecutive_failures counter and last_failure_error were left intact. The next failure would immediately re-trip the circuit breaker because the counter was still at or above the failure limit. Reset both fields on unblock so the task gets a fresh retry budget. Includes a regression test that verifies counters are zeroed. * fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion When a systemic failure (provider outage, auth expiry, OOM) crashes multiple workers simultaneously, detect_crashed_workers increments each task failure counter independently. The circuit breaker only trips after N × failure_limit retries across the fleet. Fingerprint crash errors by normalizing host-specific details (PIDs, timestamps). When 3+ tasks crash with the same fingerprint in a single detection cycle, immediately trip the circuit breaker (failure_limit=1) instead of waiting for repeated failures. Isolated crashes (unique fingerprints) retain their normal retry budget. Protocol violations continue to trip immediately. Includes regression tests for systemic and isolated crash paths. * fix(kanban): align board_exists with board discovery rules * fix(kanban): demote ready children when a parent is reopened * fix(kanban): serialize DB initialization * fix(kanban): task_age() tolerates ISO-8601 timestamps Prevents ValueError crash in dashboard get_board() when a task has an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch int. Adds _to_epoch() helper that normalises both formats. * Fix Kanban dashboard initial board selection * fix(kanban): persist worker session metadata on completion Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive commit (not the AUTHOR_MAP fixup tip) onto current main. * fix(kanban): make claim ttl configurable Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(kanban): pass accept-hooks to worker chat subprocess * feat(kanban): add board-level default workdir (#25430) * docs(kanban-worker): document notification routing configuration * fix(kanban): preserve worker tools with restricted toolsets * fix(kanban): make legacy task migration idempotent (cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90) * fix: harden Kanban worker Hermes command resolution * feat(kanban): allow trimmed task comments SS-1647 live SHIP validation: real code + tests for kanban comment --max-len. * fix: show scheduled kanban tasks in dashboard * fix: assign single-task kanban decompositions * fix(kanban-dashboard): make Orchestration mode checkbox label static The checkbox label echoed its state ("Auto (default)" / "Manual") instead of describing the action, so a checked box reading "Auto" parsed as a status indicator rather than a control. The accompanying sub-description was also static and started with "When on, ...", which read awkwardly when the box was unchecked. Replace the dynamic label with a static action label ("Auto-decompose triage tasks") and flip the sub-description between the two modes so it stays accurate either way. The top-of-page Orchestration pill is unchanged — that one is intentionally a status badge / toggle. Fixes #28178 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956) Salvages the env-vars docs portion of #21956 by @Bartok9. The ascii-guard-ignore tags from the original PR already landed on main. * fix(kanban): close sqlite connection on init failure to prevent fd leak Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema init raises after sqlite3.connect() succeeds, the new connection was leaking. Wrap the body in try/except so the connection is closed before the exception propagates. * fix(kanban): don't crash dispatched workers when kanban-worker skill is absent Salvages #27372 by @oemtalks. The dispatcher unconditionally injected `--skills kanban-worker` into every worker spawn, but worker profiles sometimes don't have that bundled skill in their skills dir, which is fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`). Adds `_kanban_worker_skill_available(hermes_home)` and only injects the flag when the skill resolves. The MANDATORY lifecycle still ships via KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe. * fix(packaging): ship dashboard plugin assets in wheel Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/ glob entries to setuptools package-data so wheel installs ship the bundled dashboard plugin assets (kanban, achievements, etc.). Without these, /api/dashboard/plugins can't discover plugin assets outside a source checkout. * docs(kanban): document worker protocol auto-blocks Salvages #21585 by @helix4u. Documents the protocol_violation event (worker exits successfully while task is still running), adds --max-retries to the create flag list and --failure-limit to dispatch. * fix(oneshot): pass fallback_providers from profile config to AIAgent Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers spawned via 'hermes -p <profile> chat -q ...') were not honouring the profile's fallback_providers / fallback_model chain because oneshot.py never read the config and never passed fallback_model= to AIAgent. Reads cfg.get('fallback_providers') (new list format) or cfg.get('fallback_model') (legacy single-dict) with the same normalization cli.py applies, then forwards as fallback_model=_fb. * fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry. * feat(kanban): add initial-status for human-ops cards Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag (running|blocked, default running) to 'kanban create', threaded through kanban_db.create_task() and the kanban_create tool schema. 'blocked' parks the task directly in the blocked column for R3 human-ops review, skipping the brief running-to-blocked transition. Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and slash-handler error formatting changes that were bundled in the original PR — those should ship as their own focused changes if still wanted. * fix(kanban): release scratch workspace and tmux session on task completion Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace() and _cleanup_worker_tmux() after marking a task complete. Scratch workspaces (used by swarm agents) accumulate on disk — hundreds of MB per task, never released. Stale tmux sessions from completed agents also persist indefinitely. Both gates are safe: - workspace_kind == 'scratch' gate preserves user worktree/dir workspaces - tmux #{pane_dead} == 1 gate only kills sessions where the worker has already exited - best-effort: cleanup failures never block task completion * fix(kanban): honor severity thresholds in diagnostics Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics was using exact-match (severity == filter), so '--severity warning' hid 'error' and 'critical' diagnostics. Adds severity_at_or_above() helper to kanban_diagnostics and uses it in the dashboard endpoint (CLI already used SEVERITY_ORDER comparison correctly). * test: isolate Kanban env pins in hermetic fixture Salvages the substantive part of #22295 by @steezkelly. Adds the missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK, HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so ambient developer-shell pins on those vars don't bleed into pytest runs. The frozenset extraction + standalone regression test from the original PR were dropped to keep the change minimal — main already maintains the list inline. * feat(kanban): add max_in_progress config to cap concurrent running tasks Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). * fix(packaging): ship bundled skills in wheel Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and optional-skills/ because pyproject's [tool.setuptools.packages.find] only includes Python packages — the skills directories don't have __init__.py so they were silently dropped from the wheel. Adds setup.py with data_files spec emitting skills/* and optional-skills/* under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper in hermes_constants that discovers the wheel-installed location via sysconfig before falling back to a source-checkout path. tools/skills_sync uses the helper so 'hermes update' works for pip-installed users. * fix: 4 small surgical bugs Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. * perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init. * feat(kanban): add --sort option to 'hermes kanban list' Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc, priority,priority-desc,status,assignee,title,updated} to 'hermes kanban list'. Validated against VALID_SORT_ORDERS map; invalid values raise ValueError. Default behaviour (priority DESC, created ASC) is unchanged when --sort is omitted. * docs: add kanban codex lane skill * feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map contributor email for attribution check * test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks. * docs(codex_app_server): document multi-root Kanban writable_roots (#27941) Update the Codex app-server runtime guide's Kanban section to reflect the new behaviour: * The sandbox override now adds the board DB directory plus every Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT, HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated, DB-dir first. * The motivation note now includes the cross-mount artifact-write scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate drive) and links to issue #27941 so readers can find the original bug report. * fix(gateway): quiet corrupt kanban dispatcher boards Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board DBs ("file is not a database" / "database disk image is malformed") and disables them by fingerprint until they're repaired, instead of flooding the gateway log with repeated logger.exception tracebacks every tick. Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was an unrelated _is_dir OSError fix for service-path lookup. Dropped a small test reformat that was bundled in the same commit. * docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold. * feat(kanban): surface per-task model_override in show + tool output Salvages #26897 by @loicnico96. The per-task model_override DB column already exists on main, but it wasn't exposed in user-facing surfaces. This adds: - 'kanban show' prints 'model: <name>' when model_override is set - kanban_show / kanban_list tool responses include the model_override field Original branch was stale (PR was authored against an older field name 'model'); applied the substantive surface exposure manually using the current 'model_override' field name. * feat(cli): add kanban swarm topology helper Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a durable Kanban Swarm v1 graph: a completed root/blackboard card, parallel worker cards, a verifier gated on all workers, and a synthesizer gated on the verifier. Stores shared swarm blackboard updates as structured JSON comments on the root card. Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring + unit tests. * feat(kanban): add optional board parameter to all MCP tools Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9 kanban_* MCP tools via shared _connect helper. Backwards compatible — omitting board keeps current pinned-board behavior. Useful for orchestrator profiles that route across multiple boards. Two-file scope: tools/kanban_tools.py + tests. * feat(kanban): stamp originating ACP session_id on tasks Salvages #23208 by @awizemann. Tracks which chat session created a kanban task so clients can render a per-session board without falling back to tenant + time-window heuristics. - Schema: tasks gains nullable session_id TEXT column with index (additive migration in _migrate_add_optional_columns). - ACP: server.py exposes the originating session id via HERMES_SESSION_ID with save/restore around the agent loop. - Tool: kanban_create reads HERMES_SESSION_ID (with explicit override). - CLI: 'hermes kanban list --session <id>' filter; JSON output exposes session_id. * feat(kanban): wire dispatcher to dispatch review agents from review column Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). * feat(kanban): stale detection for running tasks in dispatcher Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. * feat(kanban): filter tasks by workflow fields and runs by status/outcome Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing workflow_template_id and current_step_key columns: - list_tasks() accepts workflow_template_id and current_step_key kwargs - 'hermes kanban list' adds matching CLI flags - dashboard plugin_api also exposes the filters Resolved a small conflict in list_tasks signature alongside main's session_id and order_by additions; combined all three into the single filter list. * feat(kanban): add respawn guard to block repeat worker storms Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker spawn for tasks where: - a recent run already succeeded (recent_success — within guard window) - the previous run hit a quota/auth error (blocker_auth, also auto-blocks) - a recent task comment includes a GitHub PR URL (active_pr) The guard prevents repeat worker storms on the same bug/task. Includes the contributor's review-findings fixup (regex hardening, observability, auth coverage). Resolved a small DispatchResult conflict alongside main's 'stale' field; kept both. Authorship preserved via rebase merge. * feat(kanban): show dashboard cron jobs across profiles Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron jobs from all profiles, with profile-aware filter UI and storage routing. Includes test coverage for cross-profile listing, mutation, deletion, and validation. Also fixes orphan conflict markers in config.py left by an earlier salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested in HEAD/PR markers from #28452 salvage of #23790). * fix(kanban): remove orphan conflict markers from config.py (#28458) PR #28452 (salvage of #23790, stale detection) merged with leftover git conflict markers in hermes_cli/config.py around the `dispatch_stale_timeout_seconds` config block, breaking config import and any code path that loads it. Cleans up the markers and keeps both config blocks (worker log rotation/orchestrator + stale detection). Resolves a self-introduced regression. * fix(kanban): remove orphan conflict markers from kanban.py (#28459) PR #28454 (salvage of #26745, workflow filter) merged with leftover git conflict markers in hermes_cli/kanban.py at three sites: - _task_to_dict() (session_id alongside workflow_template_id/current_step_key) - p_list parser (--sort alongside --workflow-template-id/--step-key) - _cmd_list (order_by alongside the new filter kwargs) Cleans up the markers and keeps both halves at each site. Resolves a self-introduced regression. * feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-developme…

* fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards * fix(kanban): keep board-management commands independent from board override * fix(kanban): preserve notifier_profile for dashboard home subscriptions * fix(kanban): promote dependents when a parent is archived * fix(cli): make kanban specify max_tokens configurable * fix(kanban): sync slash subcommands with live parser * fix(kanban): promote blocked tasks when parent dependencies complete recompute_ready only scanned 'todo' tasks for promotion, ignoring 'blocked' tasks entirely. When a task was blocked (e.g. by the circuit breaker) and its parent dependencies later completed, the task stayed stuck in 'blocked' forever unless manually unblocked. Now recompute_ready also scans 'blocked' tasks. When all parents are done/archived, the blocked task is promoted to 'ready' with failure counters reset — equivalent to an automatic unblock. Includes a regression test for the blocked-parent-done promotion path. * fix(kanban): use 'is not None' check for max_runtime_seconds in create_task max_runtime_seconds=0 was being silently coerced to None due to a falsy check (if max_runtime_seconds). Zero is a valid value that causes the dispatcher to immediately time out a task. The adjacent max_retries parameter already used the correct 'is not None' pattern. Fixes the inconsistency by aligning max_runtime_seconds with max_retries. * fix(kanban): reset failure counters on unblock_task When a task is manually unblocked (blocked → ready/todo), the consecutive_failures counter and last_failure_error were left intact. The next failure would immediately re-trip the circuit breaker because the counter was still at or above the failure limit. Reset both fields on unblock so the task gets a fresh retry budget. Includes a regression test that verifies counters are zeroed. * fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion When a systemic failure (provider outage, auth expiry, OOM) crashes multiple workers simultaneously, detect_crashed_workers increments each task failure counter independently. The circuit breaker only trips after N × failure_limit retries across the fleet. Fingerprint crash errors by normalizing host-specific details (PIDs, timestamps). When 3+ tasks crash with the same fingerprint in a single detection cycle, immediately trip the circuit breaker (failure_limit=1) instead of waiting for repeated failures. Isolated crashes (unique fingerprints) retain their normal retry budget. Protocol violations continue to trip immediately. Includes regression tests for systemic and isolated crash paths. * fix(kanban): align board_exists with board discovery rules * fix(kanban): demote ready children when a parent is reopened * fix(kanban): serialize DB initialization * fix(kanban): task_age() tolerates ISO-8601 timestamps Prevents ValueError crash in dashboard get_board() when a task has an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch int. Adds _to_epoch() helper that normalises both formats. * Fix Kanban dashboard initial board selection * fix(kanban): persist worker session metadata on completion Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive commit (not the AUTHOR_MAP fixup tip) onto current main. * fix(kanban): make claim ttl configurable Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(kanban): pass accept-hooks to worker chat subprocess * feat(kanban): add board-level default workdir (#25430) * docs(kanban-worker): document notification routing configuration * fix(kanban): preserve worker tools with restricted toolsets * fix(kanban): make legacy task migration idempotent (cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90) * fix: harden Kanban worker Hermes command resolution * feat(kanban): allow trimmed task comments SS-1647 live SHIP validation: real code + tests for kanban comment --max-len. * fix: show scheduled kanban tasks in dashboard * fix: assign single-task kanban decompositions * fix(kanban-dashboard): make Orchestration mode checkbox label static The checkbox label echoed its state ("Auto (default)" / "Manual") instead of describing the action, so a checked box reading "Auto" parsed as a status indicator rather than a control. The accompanying sub-description was also static and started with "When on, ...", which read awkwardly when the box was unchecked. Replace the dynamic label with a static action label ("Auto-decompose triage tasks") and flip the sub-description between the two modes so it stays accurate either way. The top-of-page Orchestration pill is unchanged — that one is intentionally a status badge / toggle. Fixes #28178 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956) Salvages the env-vars docs portion of #21956 by @Bartok9. The ascii-guard-ignore tags from the original PR already landed on main. * fix(kanban): close sqlite connection on init failure to prevent fd leak Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema init raises after sqlite3.connect() succeeds, the new connection was leaking. Wrap the body in try/except so the connection is closed before the exception propagates. * fix(kanban): don't crash dispatched workers when kanban-worker skill is absent Salvages #27372 by @oemtalks. The dispatcher unconditionally injected `--skills kanban-worker` into every worker spawn, but worker profiles sometimes don't have that bundled skill in their skills dir, which is fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`). Adds `_kanban_worker_skill_available(hermes_home)` and only injects the flag when the skill resolves. The MANDATORY lifecycle still ships via KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe. * fix(packaging): ship dashboard plugin assets in wheel Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/ glob entries to setuptools package-data so wheel installs ship the bundled dashboard plugin assets (kanban, achievements, etc.). Without these, /api/dashboard/plugins can't discover plugin assets outside a source checkout. * docs(kanban): document worker protocol auto-blocks Salvages #21585 by @helix4u. Documents the protocol_violation event (worker exits successfully while task is still running), adds --max-retries to the create flag list and --failure-limit to dispatch. * fix(oneshot): pass fallback_providers from profile config to AIAgent Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers spawned via 'hermes -p <profile> chat -q ...') were not honouring the profile's fallback_providers / fallback_model chain because oneshot.py never read the config and never passed fallback_model= to AIAgent. Reads cfg.get('fallback_providers') (new list format) or cfg.get('fallback_model') (legacy single-dict) with the same normalization cli.py applies, then forwards as fallback_model=_fb. * fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry. * feat(kanban): add initial-status for human-ops cards Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag (running|blocked, default running) to 'kanban create', threaded through kanban_db.create_task() and the kanban_create tool schema. 'blocked' parks the task directly in the blocked column for R3 human-ops review, skipping the brief running-to-blocked transition. Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and slash-handler error formatting changes that were bundled in the original PR — those should ship as their own focused changes if still wanted. * fix(kanban): release scratch workspace and tmux session on task completion Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace() and _cleanup_worker_tmux() after marking a task complete. Scratch workspaces (used by swarm agents) accumulate on disk — hundreds of MB per task, never released. Stale tmux sessions from completed agents also persist indefinitely. Both gates are safe: - workspace_kind == 'scratch' gate preserves user worktree/dir workspaces - tmux #{pane_dead} == 1 gate only kills sessions where the worker has already exited - best-effort: cleanup failures never block task completion * fix(kanban): honor severity thresholds in diagnostics Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics was using exact-match (severity == filter), so '--severity warning' hid 'error' and 'critical' diagnostics. Adds severity_at_or_above() helper to kanban_diagnostics and uses it in the dashboard endpoint (CLI already used SEVERITY_ORDER comparison correctly). * test: isolate Kanban env pins in hermetic fixture Salvages the substantive part of #22295 by @steezkelly. Adds the missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK, HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so ambient developer-shell pins on those vars don't bleed into pytest runs. The frozenset extraction + standalone regression test from the original PR were dropped to keep the change minimal — main already maintains the list inline. * feat(kanban): add max_in_progress config to cap concurrent running tasks Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). * fix(packaging): ship bundled skills in wheel Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and optional-skills/ because pyproject's [tool.setuptools.packages.find] only includes Python packages — the skills directories don't have __init__.py so they were silently dropped from the wheel. Adds setup.py with data_files spec emitting skills/* and optional-skills/* under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper in hermes_constants that discovers the wheel-installed location via sysconfig before falling back to a source-checkout path. tools/skills_sync uses the helper so 'hermes update' works for pip-installed users. * fix: 4 small surgical bugs Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. * perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init. * feat(kanban): add --sort option to 'hermes kanban list' Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc, priority,priority-desc,status,assignee,title,updated} to 'hermes kanban list'. Validated against VALID_SORT_ORDERS map; invalid values raise ValueError. Default behaviour (priority DESC, created ASC) is unchanged when --sort is omitted. * docs: add kanban codex lane skill * feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map contributor email for attribution check * test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks. * docs(codex_app_server): document multi-root Kanban writable_roots (#27941) Update the Codex app-server runtime guide's Kanban section to reflect the new behaviour: * The sandbox override now adds the board DB directory plus every Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT, HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated, DB-dir first. * The motivation note now includes the cross-mount artifact-write scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate drive) and links to issue #27941 so readers can find the original bug report. * fix(gateway): quiet corrupt kanban dispatcher boards Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board DBs ("file is not a database" / "database disk image is malformed") and disables them by fingerprint until they're repaired, instead of flooding the gateway log with repeated logger.exception tracebacks every tick. Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was an unrelated _is_dir OSError fix for service-path lookup. Dropped a small test reformat that was bundled in the same commit. * docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold. * feat(kanban): surface per-task model_override in show + tool output Salvages #26897 by @loicnico96. The per-task model_override DB column already exists on main, but it wasn't exposed in user-facing surfaces. This adds: - 'kanban show' prints 'model: <name>' when model_override is set - kanban_show / kanban_list tool responses include the model_override field Original branch was stale (PR was authored against an older field name 'model'); applied the substantive surface exposure manually using the current 'model_override' field name. * feat(cli): add kanban swarm topology helper Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a durable Kanban Swarm v1 graph: a completed root/blackboard card, parallel worker cards, a verifier gated on all workers, and a synthesizer gated on the verifier. Stores shared swarm blackboard updates as structured JSON comments on the root card. Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring + unit tests. * feat(kanban): add optional board parameter to all MCP tools Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9 kanban_* MCP tools via shared _connect helper. Backwards compatible — omitting board keeps current pinned-board behavior. Useful for orchestrator profiles that route across multiple boards. Two-file scope: tools/kanban_tools.py + tests. * feat(kanban): stamp originating ACP session_id on tasks Salvages #23208 by @awizemann. Tracks which chat session created a kanban task so clients can render a per-session board without falling back to tenant + time-window heuristics. - Schema: tasks gains nullable session_id TEXT column with index (additive migration in _migrate_add_optional_columns). - ACP: server.py exposes the originating session id via HERMES_SESSION_ID with save/restore around the agent loop. - Tool: kanban_create reads HERMES_SESSION_ID (with explicit override). - CLI: 'hermes kanban list --session <id>' filter; JSON output exposes session_id. * feat(kanban): wire dispatcher to dispatch review agents from review column Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). * feat(kanban): stale detection for running tasks in dispatcher Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. * feat(kanban): filter tasks by workflow fields and runs by status/outcome Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing workflow_template_id and current_step_key columns: - list_tasks() accepts workflow_template_id and current_step_key kwargs - 'hermes kanban list' adds matching CLI flags - dashboard plugin_api also exposes the filters Resolved a small conflict in list_tasks signature alongside main's session_id and order_by additions; combined all three into the single filter list. * feat(kanban): add respawn guard to block repeat worker storms Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker spawn for tasks where: - a recent run already succeeded (recent_success — within guard window) - the previous run hit a quota/auth error (blocker_auth, also auto-blocks) - a recent task comment includes a GitHub PR URL (active_pr) The guard prevents repeat worker storms on the same bug/task. Includes the contributor's review-findings fixup (regex hardening, observability, auth coverage). Resolved a small DispatchResult conflict alongside main's 'stale' field; kept both. Authorship preserved via rebase merge. * feat(kanban): show dashboard cron jobs across profiles Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron jobs from all profiles, with profile-aware filter UI and storage routing. Includes test coverage for cross-profile listing, mutation, deletion, and validation. Also fixes orphan conflict markers in config.py left by an earlier salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested in HEAD/PR markers from #28452 salvage of #23790). * fix(kanban): remove orphan conflict markers from config.py (#28458) PR #28452 (salvage of #23790, stale detection) merged with leftover git conflict markers in hermes_cli/config.py around the `dispatch_stale_timeout_seconds` config block, breaking config import and any code path that loads it. Cleans up the markers and keeps both config blocks (worker log rotation/orchestrator + stale detection). Resolves a self-introduced regression. * fix(kanban): remove orphan conflict markers from kanban.py (#28459) PR #28454 (salvage of #26745, workflow filter) merged with leftover git conflict markers in hermes_cli/kanban.py at three sites: - _task_to_dict() (session_id alongside workflow_template_id/current_step_key) - p_list parser (--sort alongside --workflow-template-id/--step-key) - _cmd_list (order_by alongside the new filter kwargs) Cleans up the markers and keeps both halves at each site. Resolves a self-introduced regression. * feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: | Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section. * feat(kanban): add scheduled status for delayed follow-ups Salvages #24533 by @roycepersonalassistant. Adds a first-class 'scheduled' Kanban status for time-delay follow-ups that aren't waiting on human input. - hermes kanban schedule <task_id> [reason] CLI command - Dashboard/API transitions to/from Scheduled - unblock_task() now releases both 'blocked' AND 'scheduled' tasks (re-checking parent dependencies before moving to ready/todo) - i18n + docs updates Resolved conflicts: kept HEAD's failure-counter reset on unblock alongside the PR's scheduled state, kept HEAD's 'running' direct-set rejection, combined both bulk-status branches. Dropped the dist/ bundle changes (months-stale; would need rebuild from source). * feat(kanban): drag-to-delete trash zone + bulk delete for task cards Salvages #28125 by @Jpalmer95. Adds: - Drag-to-delete trash zone in the kanban dashboard - Bulk delete endpoint with cascading delete_task cleanup - Frontend updates (drag visual + drop handler) - Confirmation prompt before delete Resolved end-of-file test conflict by appending both halves. * docs: add Korean Kanban documentation Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and translates Kanban documentation (kanban.md, kanban-tutorial.md) and the two related skills (devops-kanban-orchestrator, devops-kanban-worker). Purely additive — adds ko to the locales list in docusaurus.config.ts and creates the website/i18n/ko/ tree. * fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) - aux_config: drop session_search from _AUX_TASKS and remove stale test (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG) - compression_boundary_hook: set compressor._last_compress_aborted=False on MagicMock so the post-compress abort branch (PR #28117) doesn't short-circuit before the session-id rotation under test - kanban_dashboard_plugin: use consecutive_failures=3 so severity stays 'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a, so failures=5 now crosses the critical floor of 2*2=4) - cli_manual_compress: accept force kwarg on DummyAgent._compress_context (cli._manual_compress now passes force=True) * fix(telegram): render full clarify choice text in message body, use short button labels When Telegram clarify prompts offer long choices, mobile clients truncate the inline button labels, making options unreadable. Previously only the question was shown in the message body with truncated choice text in button labels. Fix: append the full numbered option list to the message body so users can read complete choice text on any client. Buttons now use short numeric labels (1, 2, ...) to avoid Telegram truncation. The 'Other (type answer)' button is unchanged. Long choice labels are now rendered in full (not truncated to 57 chars + '...') since they appear in the body instead of button labels. Closes: #27497 * chore(release): map @asdlem for PR #27852 salvage * fix(telegram): default streaming transport to edit * fix(telegram): respect reply_to_mode for DM topic reply fallback The DM topic reply fallback code in send() hardcoded should_thread=True when telegram_dm_topic_reply_fallback metadata was present, bypassing _should_thread_reply() and ignoring reply_to_mode config. This caused quote bubbles on every response even with reply_to_mode: 'off'. Fix: - Add reply_to_mode param to _reply_to_message_id_for_send() and _thread_kwargs_for_send() classmethods - In send(), check self._reply_to_mode != 'off' for DM topic fallback - Suppress reply anchor and reply_to_message_id when mode is 'off' while preserving message_thread_id for correct topic routing - Thread reply_to_mode through all 29 call sites Regression coverage: 10 new tests in test_telegram_reply_mode.py covering classmethod behavior, send() integration, and backward compatibility. Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994 * fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870) Telegram distinguishes three kinds of audio payloads: - message.voice → Opus/OGG voice messages → STT pipeline ✓ - message.audio → audio file attachments → bypasses STT ← was broken - message.document (audio mime) → generic file route **Root cause** — the inbound message routing block in gateway/run.py matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription. Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed instead of being treated as files, making the transcribe skill unusable from Telegram because the path it needed was never surfaced. **Fix** - Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events. - Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ mime-type events that are not explicitly AUDIO or DOCUMENT). - After the STT block, inject a document-style context note for each audio_file_path, giving the agent the file path and asking what to do with it (consistent with how plain documents are handled). **Tests** — 5 new tests in test_telegram_audio_vs_voice.py: - voice message still transcribed (regression guard) - audio attachment skips STT (core fix) - audio attachment context note format - STT disabled still produces file note (not STT-disabled notice) - MessageType.AUDIO != MessageType.VOICE sanity check Fixes #24870 * chore(release): map bartok9 noreply for PR #24879 salvage * fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY When the send_message tool runs outside the gateway process (agent loop, TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone path in _send_telegram constructs Bot(token=token) directly, bypassing any configured proxy. In regions where api.telegram.org is blocked, the send times out after ~5s with 'Telegram send failed: Timed out' and nothing ever shows up in gateway.log because the request never reaches the gateway. Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url, which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just before constructing the Bot. When a proxy is found, attach an HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request', matching what gateway/platforms/telegram.py already does for in-gateway sends and what the Discord standalone sender already does. Any exception attaching the proxy falls back cleanly to a direct connection, preserving prior behaviour for users without a proxy configured. Adds tests/tools/test_send_message_telegram_proxy.py covering both the proxy-configured and no-proxy cases. * chore(release): map @pepelax for PR #25419 salvage * fix(kanban-dashboard): restore implementations dropped during salvages (#28481) Four kanban dashboard test failures, all from PR salvages that picked up the test additions but dropped the corresponding implementations. - BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the board API never grew the column → test_board_empty failed because VALID_STATUSES - {archived} mismatched the rendered columns). - update_task: enrich the 'ready' 409 detail with the blocking parent list (id, title, status) and add _parents_blocking_ready helper. Implementation lost in the #26744 salvage (commit e215558ba) which pinned the test but not the server-side code. - dist/index.js: add parseApiErrorMessage helper, wire it through the drag/drop banner, add patchErr state to the TaskDrawer and surface it inline by the action row. Lost in the same #26744 salvage. - test_diagnostics_endpoint_severity_filter: update to at-or-above semantics (PR a94ddd807 changed the filter from exact-match so the warning filter now correctly includes error+critical too). * fix(gateway): roll over Telegram tool progress bubbles * fix(gateway): scope audio_file_paths outside media_urls guard The audio-file-paths handling block at line 7334 references the variable unconditionally, but #24879 initialized it inside the 'if event.media_urls' block — so events without media_urls hit UnboundLocalError. Found via test_run_agent_queued_message_does_not_treat_commentary_as_final after PR #28478 landed. * fix(gateway): keep tool-progress edits alive after Telegram flood control When a progress-message edit hits Telegram flood control (RetryAfter), can_edit was unconditionally set to False, permanently disabling coalescing for the rest of the run. Subsequent tool updates were posted as separate new messages instead of updating the existing progress bubble. Fix: only set can_edit=False for non-recoverable edit errors. On flood control, back off by resetting _last_edit_ts so the throttle interval is respected before the next edit attempt. Fixes #25188 * chore(release): map @erhnysr for PR #25198 salvage * fix(telegram): preserve can_edit after transient network errors in progress edits (#27828) When edit_message_text fails with a transient error (httpx.ConnectError, NetworkError, server disconnected, timeouts), the progress-message sender must not permanently set can_edit = False — that would convert a single Telegram network hiccup into separate per-tool bubbles for the rest of the run. Changes: - gateway/platforms/telegram.py: edit_message now returns retryable=True for transient network errors (ConnectError, NetworkError, timeouts, server disconnects, temporarily unavailable). Permanent failures (flood control, message-not-found, permissions) remain retryable=False. - gateway/run.py: send_progress_messages checks result.retryable before setting can_edit = False. Transient failures skip the fallback-send and continue — the next edit cycle catches up with the accumulated lines. Permanent failures (flood, message-not-found, etc.) still disable editing. Tests: 22 new tests in test_telegram_progress_edit_transient.py covering transient vs permanent error classification, SendResult.retryable semantics, and the can_edit decision logic. Fixes #27828 * fix(telegram): recover from post-update polling conflict without entering limbo * fix(test+release): update conflict retry count for MAX=5; map @CryptoByz * fix(gateway): route background-process notifications into Telegram DM topics Background-process completion notifications (notify_on_complete) and watch-pattern notifications were always delivered to the Telegram main chat instead of the originating private-chat topic. Hermes-created Telegram DM topic lanes only render a send when it carries both message_thread_id and a reply anchor. The synthetic MessageEvent injected on process completion had no message_id, so _reply_anchor_for_event returned None and _thread_kwargs_for_send dropped message_thread_id entirely — routing the notification to the main chat. Capture the triggering message id at spawn time and thread it through to the synthetic event so it can be reply-anchored back into the topic: - session_context: add HERMES_SESSION_MESSAGE_ID context var - telegram adapter: populate SessionSource.message_id on inbound messages - terminal tool: persist watcher_message_id on the process session - process registry: carry/persist message_id on watcher dicts + checkpoint - gateway: set MessageEvent.message_id on injected notifications Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map @fabiosiqueira for PR #27212 salvage * fix(telegram): route resumed DM topic sends directly * fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button actions but not for inbound messages. Unauthorized users triggered an 'Unauthorized user' log warning but their messages were still processed by the agent — a P0 security bypass (issue #23778). Fix: add allowlist check in _should_process_message() which is called for all message types (text, command, media, location). If the sender is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow all users (existing behavior). Fixes #23778 * fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty The _is_callback_user_authorized fallback returned True when TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user to interact with the bot. Change to fail-closed: deny by default unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set. Fixes #24457 * test(telegram): stub _is_callback_user_authorized in trigger-gating fixture After PR #24468 made the empty-allowlist callback auth fail-closed (and #23795 wired _is_callback_user_authorized into _should_process_message), trigger-gating tests started failing because their fake messages from user 111 hit the new deny-by-default path before trigger evaluation. Force-authorize all senders in _make_adapter() so the trigger logic under test runs. The fail-closed behavior itself is covered by test_telegram_callback_auth_fail_closed.py. * fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS When a sticky fallback IP (from DoH discovery) becomes unreachable, the transport previously got stuck in an attempt_order that only tried the dead IP. This prevented the gateway from recovering until the service was restarted. Changes: - Always include primary DNS path (None) after the sticky IP in the attempt_order so that a primary-path retry happens on sticky failure. - Reset self._sticky_ip to None when the currently sticky IP hits a connect timeout / connect error, allowing the next request to retry from scratch. Fixes silent Telegram disconnection when discovered fallback IPs are transiently or permanently unreachable. * test+release: align stale sticky-IP test for #24511; map @falconexe * fix(telegram): propagate extra base_url config * feat(send_message): auto-detect @username mentions and create Telegram entities When sending messages containing @username patterns, auto-generate MessageEntity(type='mention') entries so that the receiving bot's require_mention filter can trigger. This enables proper bot-to-bot interop where mention-based routing is used. * test+release: align send_message mocks for MessageEntity import; map @fonhal * fix(telegram): resume typing indicator after inline approval click (#27853) The text /approve and /deny paths in gateway/run.py call resume_typing_for_chat() after resolve_gateway_approval() succeeds, but the Telegram inline-button (ea:*) callback in _handle_callback_query did not. Typing is paused when the approval is sent (gateway/run.py:15658), so without a matching resume the typing indicator stayed gone for the remainder of a long-running turn after a button click. Symmetry-match the text path: after a successful resolve, call self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0 to match /approve's "if not count" early-return — if nothing was actually resolved, the agent thread was never unblocked, so typing should remain paused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): mark final voice reply as notify-worthy so Telegram delivers it audibly In Telegram "important" notifications mode (default), TelegramPlatformAdapter sets ``disable_notification=True`` on every send unless metadata carries ``notify=True``. GatewayRunner._send_voice_reply already passes thread metadata through to ``adapter.send_voice``, but never marks the final auto-TTS voice reply as notify-worthy — so users with the default mode get the final voice note delivered silently with no push notification. Mirror the final-text path in gateway/platforms/base.py (the existing text-response final send already adds ``metadata["notify"] = True``). Issue #27970 Bug 2. Bug 1 (MP3 vs. native OGG voice-note) is being addressed by existing PRs #20182 / #20878 — this PR is intentionally scoped to the silent-delivery bug only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: avoid Telegram group reply thread session splits * chore(release): map @eliteworkstation94-ai for PR #28157 salvage * fix(gateway): avoid duplicate Telegram text after auto-TTS voice replies * chore(release): map @Zyrixtrex for PR #26754 salvage * fix(telegram): escape send_slash_confirm preview with format_message send_slash_confirm() sent the raw command preview with ParseMode.MARKDOWN, skipping the format_message() conversion applied to every other dynamic send in the adapter. Commands with underscores, dots, brackets, or other MarkdownV2-sensitive characters raised BadRequest: Can't parse entities; the exception was swallowed by the outer try/except, so the confirmation prompt silently never appeared. Fix: wrap preview through format_message() and switch to MARKDOWN_V2, symmetric with send_update_prompt and the callback sends fixed in a69404052. * chore(release): map @nftpoetrist for PR #25856 salvage * fix(telegram): retry wrapped connect timeouts * chore(release): map @samahn0601 for PR #27887 salvage * fix(tts): keep native audio outside Telegram voice delivery * chore(release): map @aqilaziz for PR #26406 salvage * fix(gateway): pin Telegram DM-topic routing to user's current topic Topic-mode DM replies were fragmenting one conversation across many sessions: a Reply on a message in another topic delivered Telegram's message_thread_id for *that* topic, and #3206's strip routed plain replies to the lobby. Both pulled the user away from their current session. Fix: when topic mode is on, rewrite source.thread_id to the user's most-recent binding if the inbound id is missing/General or not a known topic. Non-topic-mode users unchanged. * chore(release): map @karthikeyann for PR #26609 salvage * fix(send_message): add thread-not-found retry for Telegram forum topic sends The standalone _send_telegram path in send_message_tool lacked the thread-not-found fallback that the gateway adapter has. When a forum topic thread_id was stale or deleted, the send would fail entirely instead of retrying to the General topic. Changes: - Add _is_telegram_thread_not_found() helper matching gateway adapter - Add thread-not-found retry in text send path - Add thread-not-found retry in media send path (with f.seek(0)) - Separate text_kwargs from thread_kwargs to prevent disable_web_page_preview leaking into send_photo/send_video calls Closes #27012 * test(send_message): add thread-not-found retry tests for Telegram forum topics Adds two tests to TestSendTelegramThreadIdMapping: - test_thread_not_found_retries_without_message_thread_id - test_thread_not_found_for_media_retries_without_message_thread_id Refs #27012 * test(send_message): add thread-not-found retry tests for Telegram topics Three tests covering the #27012 fix: - test_is_thread_not_found_matches_expected_errors - test_text_send_retries_without_thread_id_on_thread_not_found - test_disable_web_page_preview_not_leaked_to_media_sends 116/116 existing tests still pass (no regressions). * chore(release): map @kunci115 for PR #27098 salvage * fix(gateway): register Telegram commands for groups Register Telegram bot commands across default, private, and group scopes so the slash-command menu is available outside DMs. Changes from review feedback: - Add asyncio.Lock to prevent race condition in _ensure_forum_commands - Extract MAX_COMMANDS_PER_SCOPE constant (30) to avoid magic number - Upgrade error logging from debug->warning in forum registration - Add tests covering lazy forum registration and concurrent safety - Remove /start handler from this PR (separate feature) Fixes review: needs_work (race, magic number, log levels, missing tests) * test+release: fix test fixture for forum_commands; map @chromalinx * fix(telegram): gate profile bots by allowed topics * chore(release): map @booker1207 for PR #25132 salvage * fix(cron): route Telegram cron deliveries to a dedicated topic via TELEGRAM_CRON_THREAD_ID When Telegram topic mode is enabled, cron messages delivered to the bot's root DM (TELEGRAM_HOME_CHANNEL without a thread id) land in the system lobby — replies there are rebuffed with the lobby reminder and reply_to_message_id is dropped, so users cannot interact with the cron output (#24409). Add an optional TELEGRAM_CRON_THREAD_ID env var that overrides TELEGRAM_HOME_CHANNEL_THREAD_ID for cron deliveries only. Operators can create a "Cron" forum topic in the DM, point this var at its thread id, and replies to cron messages will land in that topic's existing session instead of the lobby. The home-channel thread id (used elsewhere, e.g. restart notifications) is unchanged, and explicit deliver="telegram:chat:thread" targets continue to win over the env var. Per the reporter's clarification on 2026-05-13, option (a) (cron-side route to a dedicated topic + config knob) was chosen. Fixes #24409 * fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline When users send images as documents (Telegram file picker), they were rejected with "Unsupported document type" because SUPPORTED_DOCUMENT_TYPES only includes text/office formats. Add SUPPORTED_IMAGE_DOCUMENT_TYPES to base.py and handle them in telegram.py before the document check. - Add SUPPORTED_IMAGE_DOCUMENT_TYPES constant to base.py - Add MIME reverse-lookup for image types in telegram.py - Route image documents through cache_image_from_bytes + vision pipeline - Handle media groups for image documents Closes: #20128, #18620 * test+release: stub auth in test_telegram_documents fixture; map @kiranvk-2011 * fix(gateway): prevent Windows Telegram /restart leaving gateway stopped * chore(release): map @rak135 for PR #25960 salvage * fix(telegram): preserve topic metadata on overflow edits * feat(telegram): add disable_topic_auto_rename gateway flag When Hermes auto-titles a session in a Telegram DM topic it currently renames the topic itself to the generated title. That works for operator-managed lanes (extra.dm_topics) but is disruptive for ad-hoc Threaded-Mode topics that users name by hand — every first exchange overwrites their chosen title. Add gateway.platforms.telegram.extra.disable_topic_auto_rename (default False, preserving prior behaviour). When set, both _schedule_telegram_topic_title_rename and the underlying _rename_telegram_topic_for_session_title short-circuit before touching the Telegram API. Internal session titles (sessions list, TUI) keep working unchanged. Also bridge the legacy top-level telegram.disable_topic_auto_rename key through to gateway.platforms.telegram.extra so users on the older config layout don't have to migrate to enable it. - Tests cover the runtime flag, the scheduling entry-point, and string truthiness coercion for YAML-loaded values. - Docs updated in messaging/telegram.md with an example block. * chore(release): map @B0Tch1 for PR #27634 salvage * fix(gateway): restore Telegram DM topic thread_id after session split (#27166) When context compression triggers a mid-turn session split, source.thread_id can be None on synthetic/recovered events. _thread_metadata_for_source then returns None, causing the Telegram adapter to send with no message_thread_id and the response lands in the General thread instead of the active DM topic. Fix: - hermes_state.py: Add get_telegram_topic_binding_by_session() for reverse lookup by session_id (enabled by the existing UNIQUE INDEX on session_id). - gateway/run.py: After session-split detection, if source is a Telegram DM and source.thread_id is None, recover it from the binding via the new method so _thread_metadata_for_source produces the correct thread routing. - tests/: Coverage for the new lookup method and the recovery flow. * chore(release): map @jackjin1997 for PR #27239 salvage * fix(gateway): allow chat-scoped telegram auth without sender user_id * chore(release): map @soynchux for PR #27806 salvage * fix(telegram): add DM topic typing fallback when message_thread_id rejected When a DM topic lane's message_thread_id is rejected by Telegram (e.g. stale or deleted topic), send_typing now falls back to sending the typing indicator without thread_id so it at least appears in the main DM view, rather than being silently swallowed. Also adds test for the fallback behavior. * fix(telegram): report cron topic fallback * chore(release): map @el-analista for PR #25368 salvage * fix(telegram): wire gt: callback dispatch for gmail-triage buttons The gmail-triage skill's Telegram inline buttons emit callback_data of the form `gt:<verb>:<arg>`, but `_handle_callback_query` had no `gt:` branch — taps fell through silently and the spinner sat there until Telegram timed it out. Add `_handle_gmail_triage_callback`, dispatched from the existing callback router, that: - Authorizes the caller via the same `_is_callback_user_authorized` path as the approval / slash-confirm / clarify handlers. - Maps each verb to a script under `~/.hermes/scripts/gmail-triage/` and runs it async with a 60s timeout. - Splits verbs into one-shots (send / archive / draft / spam) — append the confirmation and strip the keyboard so the action can't fire twice — and sticky-state changes (mute / trust / vip ± -domain) — append the confirmation but leave the keyboard tappable so the user can stack actions on one email. - On failure: toast only, keyboard preserved so the user can retry. - Logs every callback outcome to gateway.log for debugging. * chore(release): map @khungate for PR #25829 salvage * feat(telegram): support quick-command-only menus * chore(release): map @stevehq26-bot for PR #28015 salvage * fix(telegram): handle channel post updates * test: address telegram channel post review * test+release: stub auth in channel_posts fixture; map @brndnsvr * Quiet noisy Telegram gateway errors * chore(release): map oracle@jarviss-mbp.home for PR #24014 salvage * Route Telegram multi-bot mentions exclusively * Document Telegram multi-profile gateway commands * fix: ignore Telegram messages for other bots * chore(release): map @OCWC22 for PR #24581 salvage * feat(telegram): ignore_root_dm with system command lobby * docs(telegram): document ignore_root_dm feature * chore(release): map @ai-hana-ai for PR #23928 salvage * feat(telegram): pin incoming user message for duration of agent turn When a user sends a message on Telegram, the incoming message is now automatically pinned at the start of processing and unpinned when the agent finishes its turn. This gives the user a visual indicator that their message is being worked on, and keeps the conversation anchored. Changes: - telegram.py: Added pinChatMessage in on_processing_start and unpinChatMessage in on_processing_complete. Restructured both hooks so pin/unpin runs independently of the reactions feature (reactions are optional; pinning is always on). - telegram.py: Pass message_id through SessionSource so it's available in the session context. - session_context.py: Added HERMES_SESSION_MESSAGE_ID context var. - run.py: Pass source.message_id through set_session_vars. Pinning is silent (disable_notification=True) and failures are logged at debug level without interrupting message processing. Only the user's incoming message is pinned -- never the agent's replies. Auto-resume events (which have no message_id) are correctly skipped. * chore(release): map @indigokarasu for PR #26636 salvage * feat(telegram): skip-STT audio path + 2GB cap via local Bot API server Two coordinated changes that unblock downstream audio pipelines (diarization, custom transcription, archival) on attachments larger than the public Bot API's 20MB getFile ceiling. - `stt.enabled: false` no longer drops voice/audio with a generic "transcription disabled" note. The gateway probes the cached file's duration (wave → mutagen → ffprobe ladder) and surfaces `[The user sent a voice message: <abs path> (duration: M:SS)]` to the agent so a skill or tool can pick up the raw file. The previous placeholder is replaced rather than appended when present. - `platforms.telegram.extra.base_url` set → adapter auto-lifts its document size cap from 20MB to 2GB (the local telegram-bot-api `--local` ceiling) and the "too large" reply reports the active limit dynamically. No new config knob; presence of `base_url` is the opt-in. - `platforms.telegram.extra.local_mode: true` wires `Application.builder().local_mode(True)` on the python-telegram-bot builder. PTB then reads files from disk instead of HTTP, which is required when telegram-bot-api runs in `--local` mode (the server returns absolute filesystem paths, not `/file/bot...` URLs). - gateway/run.py: rewrites the `stt.enabled: false` branch of `_enrich_message_with_transcription`. New `_format_duration` + `_probe_audio_duration` helpers. - gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute derived from `extra.base_url`; `local_mode` builder wiring; dynamic "too large" message. - tests/gateway/test_stt_config.py: covers path-surfacing with and without an existing user message, and placeholder replacement. - tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB without base_url, 2GB when set, empty-string base_url keeps default. - website/docs/user-guide/messaging/telegram.md: new "Skipping STT" subsection under Voice Messages and a full "Large Files (>20MB) via Local Bot API Server" walkthrough (api_id/api_hash, docker-compose, one-time `logOut` migration, `platforms.telegram.extra` config, the `local_mode` disk-access requirement, the silent HTTP-fallback 404). - website/docs/user-guide/features/voice-mode.md: documents the `stt.enabled` knob in the config reference. - `pytest tests/gateway/test_telegram_max_doc_bytes.py tests/gateway/test_stt_config.py` → 9/9 passing. - Verified end-to-end on a live deployment: gateway log shows `Using custom Telegram base_url: http://...` and `Using Telegram local_mode (read files from disk)` on startup; voice messages above 20MB cache to disk and surface their path to the agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(release): map @alber70g for PR #25280 salvage * fix(web): add scheduled column to i18n type definitions (#28549) columnLabels and columnHelp in en.ts include a scheduled entry but the Translations interface in types.ts did not declare it, causing a TypeScript build failure in the Nix derivation. Made the field optional since only en.ts provides it currently. * docs: comprehensive 2-week sweep of feature/PR coverage gaps (#28497) Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026, roughly 1,080 PRs). The audit found ~50 user-visible features that had landed in code with no docs footprint, plus a handful of stale pages. This PR closes every gap the scan turned up. New pages - user-guide/features/deliverable-mode.md — extension list, agent triggers, kanban_complete artifacts pattern, [[as_document]] override (PR #27813). - developer-guide/web-search-provider-plugin.md — authoring guide modeled on image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448). Providers / auth - Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the display label shows up; provider id stays `alibaba` (PR #24835). - Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs #28116 / #28118 / #28119). - Document Nous JWT minting from refresh token + invalid-refresh quarantine + cross-profile shared token store (PRs #27663 / #19712). - Add `## Microsoft Entra ID authentication (keyless)` section to azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic routing details (PR #28101 / #9df9816da). - Custom providers `api_mode` is now prompted-and-persisted, not just URL autodetected (PR #25068). - Delegation honours `api_mode` + auto-detects anthropic_messages base URLs (PR #26824). - `x_search` auto-enables when xAI credentials are present (PR #27376). - Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR #26534). - NVIDIA NIM billing-origin header is set automatically (PR #26585). Windows / installer - `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus the BOM-strip / git-retry hardening (PR #28169). - Document Hermes Desktop thin installer + first-launch bootstrap (PR #27822). - Document `dep_ensure` Windows bootstrap (PR #27845). - Document install-method auto-detection (pip / git / homebrew / nixos) and the matching update command (PR #27843). Gateway / messaging - `/platform list|pause|resume` full description + circuit-breaker semantics (PR #26600). - Slack / Matrix / Mattermost get parallel `allowed_channels` / `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk (PR #21251). - Discord `allow_any_attachment` + `max_attachment_bytes` (config and env vars) (PR #27245). - Discord clarify-choice button rendering (PR #25485). - Telegram `guest_mode` @mention bypass for allowlisted groups (PR #22759). - Telegram `notifications` mode (`important` vs `all`) (PR #22793). - `[[as_document]]` skill / response directive for forcing document-style media delivery (PR #21210). CLI / TUI - `/new [name]` argument (PR #19637). - `/subgoal` user-supplied criteria appended to `/goal` (PR #25449). - `/exit --delete` flag confirmation prompts for destructive slash commands (PR #22687). - Status-bar additions: ▶ N background indicator (PR #27175), context compression count (PR #21218), YOLO mode banner+statusbar warning (PR #26238). - `display.timestamps` + `docker_extra_args` config keys (PR #23599). - TUI collapsible startup banner sections (PR #20625). - `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847). i18n - Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja, de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches `agent/i18n.py:SUPPORTED_LANGUAGES`. Tools / features - `vision_analyze` native-pixel passthrough for vision-capable callers, with auxiliary text-describer fallback (PR #22955). - `session_search` rewrite to the single-shape tool (discovery / scroll / browse modes) (PRs #27590 / #27840). - Clarify MCP transport scope: client supports stdio + SSE; embedded `hermes mcp serve` is stdio-only (PR #21227). - Web search backends table: add Brave Search (free tier) and DDGS rows (PR #21337). - ACP session-scoped edit auto-approval modes (PR #27862). - Curator rename map in the user-visible per-run summary (PR #22910). - Prompt caching feature page reference in features/overview.md — Claude cross-session 1-hour prefix cache on native Anthropic / OpenRouter / Nous Portal (PR #23828). - Cron per-job profile parameter (PR #28124). - `--no-skills` flag for `hermes profile create` (PR #20986). Build - Verified with `npm run build` in `website/`; both `en` and `zh-Hans` locales compile. Remaining broken-link/anchor warnings are pre-existing (`rl-training.md` from learning-path / overview; the zh-Hans translation lag the docs skill already calls out). * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 9 (#28571) Pre-stages AUTHOR_MAP entries for 9 new/under-mapped contributors whose PRs are being salvaged in the May 2026 LHF batch group 9. Contributors: - jdelmerico (#28278 — signal require_mention filter) - justemu (#27996 — matrix thread_require_mention) - YuanHanzhong (#28029 — dashboard browser scrollback) - noctilust (#28080 — drop stale TUI resume env) - MoonJuhan (#28288 — tolerate unreadable JSONL transcripts) - outsourc-e (#28164 — cron emoji ZWJ sequences) - Zyrixtrex (#28275 — Google OAuth urlopen timeout) - ooovenenoso (#28256 — tool loop recovery hints) - vanthinh6886 (#28018 — yaml/flock/atomic write guards; non-noreply email) Per references/batch-pr-salvage-may14-additions.md. * feat(signal): add require_mention filter for group chats Add a configurable mention filter to the Signal adapter so the bot only responds in groups when it is explicitly @mentioned. Changes: - gateway/platforms/signal.py: read require_mention from adapter extra config or SIGNAL_REQUIRE_MENTION env var; skip group messages that don't mention the bot account (checked in rendered text and raw mention metadata) - gateway/config.py: map signal.require_mention YAML key to the SIGNAL_REQUIRE_MENTION env var (env var takes precedence) Config example: signal: require_mention: true Or via env var: SIGNAL_REQUIRE_MENTION=true * Revert "feat(telegram): pin incoming user message for duration of agent turn" This reverts commit a724c3b9cf5f01e28365322ae5ae3a9579567806. * Revert "feat(telegram): support quick-command-only menus" This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4. * Revert "feat(send_message): auto-detect @username mentions and create Telegram entities" This reverts commit cf814c96f613b38bd891ac941c32da653e81c7ad. * Revert "fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages" This reverts commit db50af910be6b4171ea9cf54f4cc38be27ac1da6. * fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856) Pre-mark all running agent sessions as resume_pending BEFORE the drain wait begins. If the service manager kills the process during the drain (window), the durable marker is already written so the next gateway boot can recover in-flight sessions. On graceful drain completion, clear the early markers for sessions that finished successfully. * fix(matrix): implement thread_require_mention to prevent multi-agent reply loops In multi-agent shared Matrix rooms, multiple bots all participating in the same thread could trigger infinite reply loops — each bot's reply re-engaged the others because they were all in the bot-thread set. Discord has a `thread_require_mention` opt-in for this; Matrix didn't. Add `_parse_thread_require_mention(config)` (mirrors Discord's pattern). In `_resolve_message_context`, when enabled and the message is in a bot-participated thread (not a free-response room), require @mention before processing. Salvage of @justemu's 2-commit stack (#27996). Fixes #27995. * fix(cli): show active profile in TUI prompt * fix(tui): preserve dunder identifiers in markdown * test(file_ops): add regression tests for git baseline warning in write_file Adds TestGitBaselineCheck with 6 unit tests covering _check_git_baseline and the warning field in write_file result: - Git not available → None - Not in a git repo → None - Clean repo → None - Dirty repo → returns warning string with branch name - write_file result includes warning when dirty - write_file result omits warning when clean * fix(dashboard): use browser scrollback for chat wheel * fix(cli): ignore stale HERMES_TUI_RESUME env HERMES_TUI_RESUME is an internal env var the Python wrapper exports to hand a session ID off to the Ink TUI. Because…

* fix(kanban): seed bundled skills (e.g. kanban-worker) on kanban init Closes #23725 * fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards * fix(kanban): keep board-management commands independent from board override * fix(kanban): preserve notifier_profile for dashboard home subscriptions * fix(kanban): promote dependents when a parent is archived * fix(cli): make kanban specify max_tokens configurable * fix(kanban): sync slash subcommands with live parser * fix(kanban): promote blocked tasks when parent dependencies complete recompute_ready only scanned 'todo' tasks for promotion, ignoring 'blocked' tasks entirely. When a task was blocked (e.g. by the circuit breaker) and its parent dependencies later completed, the task stayed stuck in 'blocked' forever unless manually unblocked. Now recompute_ready also scans 'blocked' tasks. When all parents are done/archived, the blocked task is promoted to 'ready' with failure counters reset — equivalent to an automatic unblock. Includes a regression test for the blocked-parent-done promotion path. * fix(kanban): use 'is not None' check for max_runtime_seconds in create_task max_runtime_seconds=0 was being silently coerced to None due to a falsy check (if max_runtime_seconds). Zero is a valid value that causes the dispatcher to immediately time out a task. The adjacent max_retries parameter already used the correct 'is not None' pattern. Fixes the inconsistency by aligning max_runtime_seconds with max_retries. * fix(kanban): reset failure counters on unblock_task When a task is manually unblocked (blocked → ready/todo), the consecutive_failures counter and last_failure_error were left intact. The next failure would immediately re-trip the circuit breaker because the counter was still at or above the failure limit. Reset both fields on unblock so the task gets a fresh retry budget. Includes a regression test that verifies counters are zeroed. * fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion When a systemic failure (provider outage, auth expiry, OOM) crashes multiple workers simultaneously, detect_crashed_workers increments each task failure counter independently. The circuit breaker only trips after N × failure_limit retries across the fleet. Fingerprint crash errors by normalizing host-specific details (PIDs, timestamps). When 3+ tasks crash with the same fingerprint in a single detection cycle, immediately trip the circuit breaker (failure_limit=1) instead of waiting for repeated failures. Isolated crashes (unique fingerprints) retain their normal retry budget. Protocol violations continue to trip immediately. Includes regression tests for systemic and isolated crash paths. * fix(kanban): align board_exists with board discovery rules * fix(kanban): demote ready children when a parent is reopened * fix(kanban): serialize DB initialization * fix(kanban): task_age() tolerates ISO-8601 timestamps Prevents ValueError crash in dashboard get_board() when a task has an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch int. Adds _to_epoch() helper that normalises both formats. * Fix Kanban dashboard initial board selection * fix(kanban): persist worker session metadata on completion Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive commit (not the AUTHOR_MAP fixup tip) onto current main. * fix(kanban): make claim ttl configurable Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(kanban): pass accept-hooks to worker chat subprocess * feat(kanban): add board-level default workdir (#25430) * docs(kanban-worker): document notification routing configuration * fix(kanban): preserve worker tools with restricted toolsets * fix(kanban): make legacy task migration idempotent (cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90) * fix: harden Kanban worker Hermes command resolution * feat(kanban): allow trimmed task comments SS-1647 live SHIP validation: real code + tests for kanban comment --max-len. * fix: show scheduled kanban tasks in dashboard * fix: assign single-task kanban decompositions * fix(kanban-dashboard): make Orchestration mode checkbox label static The checkbox label echoed its state ("Auto (default)" / "Manual") instead of describing the action, so a checked box reading "Auto" parsed as a status indicator rather than a control. The accompanying sub-description was also static and started with "When on, ...", which read awkwardly when the box was unchecked. Replace the dynamic label with a static action label ("Auto-decompose triage tasks") and flip the sub-description between the two modes so it stays accurate either way. The top-of-page Orchestration pill is unchanged — that one is intentionally a status badge / toggle. Fixes #28178 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956) Salvages the env-vars docs portion of #21956 by @Bartok9. The ascii-guard-ignore tags from the original PR already landed on main. * fix(kanban): close sqlite connection on init failure to prevent fd leak Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema init raises after sqlite3.connect() succeeds, the new connection was leaking. Wrap the body in try/except so the connection is closed before the exception propagates. * fix(kanban): don't crash dispatched workers when kanban-worker skill is absent Salvages #27372 by @oemtalks. The dispatcher unconditionally injected `--skills kanban-worker` into every worker spawn, but worker profiles sometimes don't have that bundled skill in their skills dir, which is fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`). Adds `_kanban_worker_skill_available(hermes_home)` and only injects the flag when the skill resolves. The MANDATORY lifecycle still ships via KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe. * fix(packaging): ship dashboard plugin assets in wheel Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/ glob entries to setuptools package-data so wheel installs ship the bundled dashboard plugin assets (kanban, achievements, etc.). Without these, /api/dashboard/plugins can't discover plugin assets outside a source checkout. * docs(kanban): document worker protocol auto-blocks Salvages #21585 by @helix4u. Documents the protocol_violation event (worker exits successfully while task is still running), adds --max-retries to the create flag list and --failure-limit to dispatch. * fix(oneshot): pass fallback_providers from profile config to AIAgent Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers spawned via 'hermes -p <profile> chat -q ...') were not honouring the profile's fallback_providers / fallback_model chain because oneshot.py never read the config and never passed fallback_model= to AIAgent. Reads cfg.get('fallback_providers') (new list format) or cfg.get('fallback_model') (legacy single-dict) with the same normalization cli.py applies, then forwards as fallback_model=_fb. * fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry. * feat(kanban): add initial-status for human-ops cards Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag (running|blocked, default running) to 'kanban create', threaded through kanban_db.create_task() and the kanban_create tool schema. 'blocked' parks the task directly in the blocked column for R3 human-ops review, skipping the brief running-to-blocked transition. Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and slash-handler error formatting changes that were bundled in the original PR — those should ship as their own focused changes if still wanted. * fix(kanban): release scratch workspace and tmux session on task completion Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace() and _cleanup_worker_tmux() after marking a task complete. Scratch workspaces (used by swarm agents) accumulate on disk — hundreds of MB per task, never released. Stale tmux sessions from completed agents also persist indefinitely. Both gates are safe: - workspace_kind == 'scratch' gate preserves user worktree/dir workspaces - tmux #{pane_dead} == 1 gate only kills sessions where the worker has already exited - best-effort: cleanup failures never block task completion * fix(kanban): honor severity thresholds in diagnostics Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics was using exact-match (severity == filter), so '--severity warning' hid 'error' and 'critical' diagnostics. Adds severity_at_or_above() helper to kanban_diagnostics and uses it in the dashboard endpoint (CLI already used SEVERITY_ORDER comparison correctly). * test: isolate Kanban env pins in hermetic fixture Salvages the substantive part of #22295 by @steezkelly. Adds the missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK, HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so ambient developer-shell pins on those vars don't bleed into pytest runs. The frozenset extraction + standalone regression test from the original PR were dropped to keep the change minimal — main already maintains the list inline. * feat(kanban): add max_in_progress config to cap concurrent running tasks Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). * fix(packaging): ship bundled skills in wheel Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and optional-skills/ because pyproject's [tool.setuptools.packages.find] only includes Python packages — the skills directories don't have __init__.py so they were silently dropped from the wheel. Adds setup.py with data_files spec emitting skills/* and optional-skills/* under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper in hermes_constants that discovers the wheel-installed location via sysconfig before falling back to a source-checkout path. tools/skills_sync uses the helper so 'hermes update' works for pip-installed users. * fix: 4 small surgical bugs Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. * perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init. * feat(kanban): add --sort option to 'hermes kanban list' Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc, priority,priority-desc,status,assignee,title,updated} to 'hermes kanban list'. Validated against VALID_SORT_ORDERS map; invalid values raise ValueError. Default behaviour (priority DESC, created ASC) is unchanged when --sort is omitted. * docs: add kanban codex lane skill * feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map contributor email for attribution check * test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks. * docs(codex_app_server): document multi-root Kanban writable_roots (#27941) Update the Codex app-server runtime guide's Kanban section to reflect the new behaviour: * The sandbox override now adds the board DB directory plus every Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT, HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated, DB-dir first. * The motivation note now includes the cross-mount artifact-write scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate drive) and links to issue #27941 so readers can find the original bug report. * fix(gateway): quiet corrupt kanban dispatcher boards Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board DBs ("file is not a database" / "database disk image is malformed") and disables them by fingerprint until they're repaired, instead of flooding the gateway log with repeated logger.exception tracebacks every tick. Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was an unrelated _is_dir OSError fix for service-path lookup. Dropped a small test reformat that was bundled in the same commit. * docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold. * feat(kanban): surface per-task model_override in show + tool output Salvages #26897 by @loicnico96. The per-task model_override DB column already exists on main, but it wasn't exposed in user-facing surfaces. This adds: - 'kanban show' prints 'model: <name>' when model_override is set - kanban_show / kanban_list tool responses include the model_override field Original branch was stale (PR was authored against an older field name 'model'); applied the substantive surface exposure manually using the current 'model_override' field name. * feat(cli): add kanban swarm topology helper Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a durable Kanban Swarm v1 graph: a completed root/blackboard card, parallel worker cards, a verifier gated on all workers, and a synthesizer gated on the verifier. Stores shared swarm blackboard updates as structured JSON comments on the root card. Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring + unit tests. * feat(kanban): add optional board parameter to all MCP tools Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9 kanban_* MCP tools via shared _connect helper. Backwards compatible — omitting board keeps current pinned-board behavior. Useful for orchestrator profiles that route across multiple boards. Two-file scope: tools/kanban_tools.py + tests. * feat(kanban): stamp originating ACP session_id on tasks Salvages #23208 by @awizemann. Tracks which chat session created a kanban task so clients can render a per-session board without falling back to tenant + time-window heuristics. - Schema: tasks gains nullable session_id TEXT column with index (additive migration in _migrate_add_optional_columns). - ACP: server.py exposes the originating session id via HERMES_SESSION_ID with save/restore around the agent loop. - Tool: kanban_create reads HERMES_SESSION_ID (with explicit override). - CLI: 'hermes kanban list --session <id>' filter; JSON output exposes session_id. * feat(kanban): wire dispatcher to dispatch review agents from review column Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). * feat(kanban): stale detection for running tasks in dispatcher Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. * feat(kanban): filter tasks by workflow fields and runs by status/outcome Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing workflow_template_id and current_step_key columns: - list_tasks() accepts workflow_template_id and current_step_key kwargs - 'hermes kanban list' adds matching CLI flags - dashboard plugin_api also exposes the filters Resolved a small conflict in list_tasks signature alongside main's session_id and order_by additions; combined all three into the single filter list. * feat(kanban): add respawn guard to block repeat worker storms Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker spawn for tasks where: - a recent run already succeeded (recent_success — within guard window) - the previous run hit a quota/auth error (blocker_auth, also auto-blocks) - a recent task comment includes a GitHub PR URL (active_pr) The guard prevents repeat worker storms on the same bug/task. Includes the contributor's review-findings fixup (regex hardening, observability, auth coverage). Resolved a small DispatchResult conflict alongside main's 'stale' field; kept both. Authorship preserved via rebase merge. * feat(kanban): show dashboard cron jobs across profiles Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron jobs from all profiles, with profile-aware filter UI and storage routing. Includes test coverage for cross-profile listing, mutation, deletion, and validation. Also fixes orphan conflict markers in config.py left by an earlier salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested in HEAD/PR markers from #28452 salvage of #23790). * fix(kanban): remove orphan conflict markers from config.py (#28458) PR #28452 (salvage of #23790, stale detection) merged with leftover git conflict markers in hermes_cli/config.py around the `dispatch_stale_timeout_seconds` config block, breaking config import and any code path that loads it. Cleans up the markers and keeps both config blocks (worker log rotation/orchestrator + stale detection). Resolves a self-introduced regression. * fix(kanban): remove orphan conflict markers from kanban.py (#28459) PR #28454 (salvage of #26745, workflow filter) merged with leftover git conflict markers in hermes_cli/kanban.py at three sites: - _task_to_dict() (session_id alongside workflow_template_id/current_step_key) - p_list parser (--sort alongside --workflow-template-id/--step-key) - _cmd_list (order_by alongside the new filter kwargs) Cleans up the markers and keeps both halves at each site. Resolves a self-introduced regression. * feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: | Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section. * feat(kanban): add scheduled status for delayed follow-ups Salvages #24533 by @roycepersonalassistant. Adds a first-class 'scheduled' Kanban status for time-delay follow-ups that aren't waiting on human input. - hermes kanban schedule <task_id> [reason] CLI command - Dashboard/API transitions to/from Scheduled - unblock_task() now releases both 'blocked' AND 'scheduled' tasks (re-checking parent dependencies before moving to ready/todo) - i18n + docs updates Resolved conflicts: kept HEAD's failure-counter reset on unblock alongside the PR's scheduled state, kept HEAD's 'running' direct-set rejection, combined both bulk-status branches. Dropped the dist/ bundle changes (months-stale; would need rebuild from source). * feat(kanban): drag-to-delete trash zone + bulk delete for task cards Salvages #28125 by @Jpalmer95. Adds: - Drag-to-delete trash zone in the kanban dashboard - Bulk delete endpoint with cascading delete_task cleanup - Frontend updates (drag visual + drop handler) - Confirmation prompt before delete Resolved end-of-file test conflict by appending both halves. * docs: add Korean Kanban documentation Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and translates Kanban documentation (kanban.md, kanban-tutorial.md) and the two related skills (devops-kanban-orchestrator, devops-kanban-worker). Purely additive — adds ko to the locales list in docusaurus.config.ts and creates the website/i18n/ko/ tree. * fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) - aux_config: drop session_search from _AUX_TASKS and remove stale test (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG) - compression_boundary_hook: set compressor._last_compress_aborted=False on MagicMock so the post-compress abort branch (PR #28117) doesn't short-circuit before the session-id rotation under test - kanban_dashboard_plugin: use consecutive_failures=3 so severity stays 'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a, so failures=5 now crosses the critical floor of 2*2=4) - cli_manual_compress: accept force kwarg on DummyAgent._compress_context (cli._manual_compress now passes force=True) * fix(telegram): render full clarify choice text in message body, use short button labels When Telegram clarify prompts offer long choices, mobile clients truncate the inline button labels, making options unreadable. Previously only the question was shown in the message body with truncated choice text in button labels. Fix: append the full numbered option list to the message body so users can read complete choice text on any client. Buttons now use short numeric labels (1, 2, ...) to avoid Telegram truncation. The 'Other (type answer)' button is unchanged. Long choice labels are now rendered in full (not truncated to 57 chars + '...') since they appear in the body instead of button labels. Closes: #27497 * chore(release): map @asdlem for PR #27852 salvage * fix(telegram): default streaming transport to edit * fix(telegram): respect reply_to_mode for DM topic reply fallback The DM topic reply fallback code in send() hardcoded should_thread=True when telegram_dm_topic_reply_fallback metadata was present, bypassing _should_thread_reply() and ignoring reply_to_mode config. This caused quote bubbles on every response even with reply_to_mode: 'off'. Fix: - Add reply_to_mode param to _reply_to_message_id_for_send() and _thread_kwargs_for_send() classmethods - In send(), check self._reply_to_mode != 'off' for DM topic fallback - Suppress reply anchor and reply_to_message_id when mode is 'off' while preserving message_thread_id for correct topic routing - Thread reply_to_mode through all 29 call sites Regression coverage: 10 new tests in test_telegram_reply_mode.py covering classmethod behavior, send() integration, and backward compatibility. Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994 * fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870) Telegram distinguishes three kinds of audio payloads: - message.voice → Opus/OGG voice messages → STT pipeline ✓ - message.audio → audio file attachments → bypasses STT ← was broken - message.document (audio mime) → generic file route **Root cause** — the inbound message routing block in gateway/run.py matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription. Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed instead of being treated as files, making the transcribe skill unusable from Telegram because the path it needed was never surfaced. **Fix** - Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events. - Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ mime-type events that are not explicitly AUDIO or DOCUMENT). - After the STT block, inject a document-style context note for each audio_file_path, giving the agent the file path and asking what to do with it (consistent with how plain documents are handled). **Tests** — 5 new tests in test_telegram_audio_vs_voice.py: - voice message still transcribed (regression guard) - audio attachment skips STT (core fix) - audio attachment context note format - STT disabled still produces file note (not STT-disabled notice) - MessageType.AUDIO != MessageType.VOICE sanity check Fixes #24870 * chore(release): map bartok9 noreply for PR #24879 salvage * fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY When the send_message tool runs outside the gateway process (agent loop, TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone path in _send_telegram constructs Bot(token=token) directly, bypassing any configured proxy. In regions where api.telegram.org is blocked, the send times out after ~5s with 'Telegram send failed: Timed out' and nothing ever shows up in gateway.log because the request never reaches the gateway. Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url, which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just before constructing the Bot. When a proxy is found, attach an HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request', matching what gateway/platforms/telegram.py already does for in-gateway sends and what the Discord standalone sender already does. Any exception attaching the proxy falls back cleanly to a direct connection, preserving prior behaviour for users without a proxy configured. Adds tests/tools/test_send_message_telegram_proxy.py covering both the proxy-configured and no-proxy cases. * chore(release): map @pepelax for PR #25419 salvage * fix(kanban-dashboard): restore implementations dropped during salvages (#28481) Four kanban dashboard test failures, all from PR salvages that picked up the test additions but dropped the corresponding implementations. - BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the board API never grew the column → test_board_empty failed because VALID_STATUSES - {archived} mismatched the rendered columns). - update_task: enrich the 'ready' 409 detail with the blocking parent list (id, title, status) and add _parents_blocking_ready helper. Implementation lost in the #26744 salvage (commit e215558ba) which pinned the test but not the server-side code. - dist/index.js: add parseApiErrorMessage helper, wire it through the drag/drop banner, add patchErr state to the TaskDrawer and surface it inline by the action row. Lost in the same #26744 salvage. - test_diagnostics_endpoint_severity_filter: update to at-or-above semantics (PR a94ddd807 changed the filter from exact-match so the warning filter now correctly includes error+critical too). * fix(gateway): roll over Telegram tool progress bubbles * fix(gateway): scope audio_file_paths outside media_urls guard The audio-file-paths handling block at line 7334 references the variable unconditionally, but #24879 initialized it inside the 'if event.media_urls' block — so events without media_urls hit UnboundLocalError. Found via test_run_agent_queued_message_does_not_treat_commentary_as_final after PR #28478 landed. * fix(gateway): keep tool-progress edits alive after Telegram flood control When a progress-message edit hits Telegram flood control (RetryAfter), can_edit was unconditionally set to False, permanently disabling coalescing for the rest of the run. Subsequent tool updates were posted as separate new messages instead of updating the existing progress bubble. Fix: only set can_edit=False for non-recoverable edit errors. On flood control, back off by resetting _last_edit_ts so the throttle interval is respected before the next edit attempt. Fixes #25188 * chore(release): map @erhnysr for PR #25198 salvage * fix(telegram): preserve can_edit after transient network errors in progress edits (#27828) When edit_message_text fails with a transient error (httpx.ConnectError, NetworkError, server disconnected, timeouts), the progress-message sender must not permanently set can_edit = False — that would convert a single Telegram network hiccup into separate per-tool bubbles for the rest of the run. Changes: - gateway/platforms/telegram.py: edit_message now returns retryable=True for transient network errors (ConnectError, NetworkError, timeouts, server disconnects, temporarily unavailable). Permanent failures (flood control, message-not-found, permissions) remain retryable=False. - gateway/run.py: send_progress_messages checks result.retryable before setting can_edit = False. Transient failures skip the fallback-send and continue — the next edit cycle catches up with the accumulated lines. Permanent failures (flood, message-not-found, etc.) still disable editing. Tests: 22 new tests in test_telegram_progress_edit_transient.py covering transient vs permanent error classification, SendResult.retryable semantics, and the can_edit decision logic. Fixes #27828 * fix(telegram): recover from post-update polling conflict without entering limbo * fix(test+release): update conflict retry count for MAX=5; map @CryptoByz * fix(gateway): route background-process notifications into Telegram DM topics Background-process completion notifications (notify_on_complete) and watch-pattern notifications were always delivered to the Telegram main chat instead of the originating private-chat topic. Hermes-created Telegram DM topic lanes only render a send when it carries both message_thread_id and a reply anchor. The synthetic MessageEvent injected on process completion had no message_id, so _reply_anchor_for_event returned None and _thread_kwargs_for_send dropped message_thread_id entirely — routing the notification to the main chat. Capture the triggering message id at spawn time and thread it through to the synthetic event so it can be reply-anchored back into the topic: - session_context: add HERMES_SESSION_MESSAGE_ID context var - telegram adapter: populate SessionSource.message_id on inbound messages - terminal tool: persist watcher_message_id on the process session - process registry: carry/persist message_id on watcher dicts + checkpoint - gateway: set MessageEvent.message_id on injected notifications Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map @fabiosiqueira for PR #27212 salvage * fix(telegram): route resumed DM topic sends directly * fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button actions but not for inbound messages. Unauthorized users triggered an 'Unauthorized user' log warning but their messages were still processed by the agent — a P0 security bypass (issue #23778). Fix: add allowlist check in _should_process_message() which is called for all message types (text, command, media, location). If the sender is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow all users (existing behavior). Fixes #23778 * fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty The _is_callback_user_authorized fallback returned True when TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user to interact with the bot. Change to fail-closed: deny by default unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set. Fixes #24457 * test(telegram): stub _is_callback_user_authorized in trigger-gating fixture After PR #24468 made the empty-allowlist callback auth fail-closed (and #23795 wired _is_callback_user_authorized into _should_process_message), trigger-gating tests started failing because their fake messages from user 111 hit the new deny-by-default path before trigger evaluation. Force-authorize all senders in _make_adapter() so the trigger logic under test runs. The fail-closed behavior itself is covered by test_telegram_callback_auth_fail_closed.py. * fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS When a sticky fallback IP (from DoH discovery) becomes unreachable, the transport previously got stuck in an attempt_order that only tried the dead IP. This prevented the gateway from recovering until the service was restarted. Changes: - Always include primary DNS path (None) after the sticky IP in the attempt_order so that a primary-path retry happens on sticky failure. - Reset self._sticky_ip to None when the currently sticky IP hits a connect timeout / connect error, allowing the next request to retry from scratch. Fixes silent Telegram disconnection when discovered fallback IPs are transiently or permanently unreachable. * test+release: align stale sticky-IP test for #24511; map @falconexe * fix(telegram): propagate extra base_url config * feat(send_message): auto-detect @username mentions and create Telegram entities When sending messages containing @username patterns, auto-generate MessageEntity(type='mention') entries so that the receiving bot's require_mention filter can trigger. This enables proper bot-to-bot interop where mention-based routing is used. * test+release: align send_message mocks for MessageEntity import; map @fonhal * fix(telegram): resume typing indicator after inline approval click (#27853) The text /approve and /deny paths in gateway/run.py call resume_typing_for_chat() after resolve_gateway_approval() succeeds, but the Telegram inline-button (ea:*) callback in _handle_callback_query did not. Typing is paused when the approval is sent (gateway/run.py:15658), so without a matching resume the typing indicator stayed gone for the remainder of a long-running turn after a button click. Symmetry-match the text path: after a successful resolve, call self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0 to match /approve's "if not count" early-return — if nothing was actually resolved, the agent thread was never unblocked, so typing should remain paused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): mark final voice reply as notify-worthy so Telegram delivers it audibly In Telegram "important" notifications mode (default), TelegramPlatformAdapter sets ``disable_notification=True`` on every send unless metadata carries ``notify=True``. GatewayRunner._send_voice_reply already passes thread metadata through to ``adapter.send_voice``, but never marks the final auto-TTS voice reply as notify-worthy — so users with the default mode get the final voice note delivered silently with no push notification. Mirror the final-text path in gateway/platforms/base.py (the existing text-response final send already adds ``metadata["notify"] = True``). Issue #27970 Bug 2. Bug 1 (MP3 vs. native OGG voice-note) is being addressed by existing PRs #20182 / #20878 — this PR is intentionally scoped to the silent-delivery bug only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: avoid Telegram group reply thread session splits * chore(release): map @eliteworkstation94-ai for PR #28157 salvage * fix(gateway): avoid duplicate Telegram text after auto-TTS voice replies * chore(release): map @Zyrixtrex for PR #26754 salvage * fix(telegram): escape send_slash_confirm preview with format_message send_slash_confirm() sent the raw command preview with ParseMode.MARKDOWN, skipping the format_message() conversion applied to every other dynamic send in the adapter. Commands with underscores, dots, brackets, or other MarkdownV2-sensitive characters raised BadRequest: Can't parse entities; the exception was swallowed by the outer try/except, so the confirmation prompt silently never appeared. Fix: wrap preview through format_message() and switch to MARKDOWN_V2, symmetric with send_update_prompt and the callback sends fixed in a69404052. * chore(release): map @nftpoetrist for PR #25856 salvage * fix(telegram): retry wrapped connect timeouts * chore(release): map @samahn0601 for PR #27887 salvage * fix(tts): keep native audio outside Telegram voice delivery * chore(release): map @aqilaziz for PR #26406 salvage * fix(gateway): pin Telegram DM-topic routing to user's current topic Topic-mode DM replies were fragmenting one conversation across many sessions: a Reply on a message in another topic delivered Telegram's message_thread_id for *that* topic, and #3206's strip routed plain replies to the lobby. Both pulled the user away from their current session. Fix: when topic mode is on, rewrite source.thread_id to the user's most-recent binding if the inbound id is missing/General or not a known topic. Non-topic-mode users unchanged. * chore(release): map @karthikeyann for PR #26609 salvage * fix(send_message): add thread-not-found retry for Telegram forum topic sends The standalone _send_telegram path in send_message_tool lacked the thread-not-found fallback that the gateway adapter has. When a forum topic thread_id was stale or deleted, the send would fail entirely instead of retrying to the General topic. Changes: - Add _is_telegram_thread_not_found() helper matching gateway adapter - Add thread-not-found retry in text send path - Add thread-not-found retry in media send path (with f.seek(0)) - Separate text_kwargs from thread_kwargs to prevent disable_web_page_preview leaking into send_photo/send_video calls Closes #27012 * test(send_message): add thread-not-found retry tests for Telegram forum topics Adds two tests to TestSendTelegramThreadIdMapping: - test_thread_not_found_retries_without_message_thread_id - test_thread_not_found_for_media_retries_without_message_thread_id Refs #27012 * test(send_message): add thread-not-found retry tests for Telegram topics Three tests covering the #27012 fix: - test_is_thread_not_found_matches_expected_errors - test_text_send_retries_without_thread_id_on_thread_not_found - test_disable_web_page_preview_not_leaked_to_media_sends 116/116 existing tests still pass (no regressions). * chore(release): map @kunci115 for PR #27098 salvage * fix(gateway): register Telegram commands for groups Register Telegram bot commands across default, private, and group scopes so the slash-command menu is available outside DMs. Changes from review feedback: - Add asyncio.Lock to prevent race condition in _ensure_forum_commands - Extract MAX_COMMANDS_PER_SCOPE constant (30) to avoid magic number - Upgrade error logging from debug->warning in forum registration - Add tests covering lazy forum registration and concurrent safety - Remove /start handler from this PR (separate feature) Fixes review: needs_work (race, magic number, log levels, missing tests) * test+release: fix test fixture for forum_commands; map @chromalinx * fix(telegram): gate profile bots by allowed topics * chore(release): map @booker1207 for PR #25132 salvage * fix(cron): route Telegram cron deliveries to a dedicated topic via TELEGRAM_CRON_THREAD_ID When Telegram topic mode is enabled, cron messages delivered to the bot's root DM (TELEGRAM_HOME_CHANNEL without a thread id) land in the system lobby — replies there are rebuffed with the lobby reminder and reply_to_message_id is dropped, so users cannot interact with the cron output (#24409). Add an optional TELEGRAM_CRON_THREAD_ID env var that overrides TELEGRAM_HOME_CHANNEL_THREAD_ID for cron deliveries only. Operators can create a "Cron" forum topic in the DM, point this var at its thread id, and replies to cron messages will land in that topic's existing session instead of the lobby. The home-channel thread id (used elsewhere, e.g. restart notifications) is unchanged, and explicit deliver="telegram:chat:thread" targets continue to win over the env var. Per the reporter's clarification on 2026-05-13, option (a) (cron-side route to a dedicated topic + config knob) was chosen. Fixes #24409 * fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline When users send images as documents (Telegram file picker), they were rejected with "Unsupported document type" because SUPPORTED_DOCUMENT_TYPES only includes text/office formats. Add SUPPORTED_IMAGE_DOCUMENT_TYPES to base.py and handle them in telegram.py before the document check. - Add SUPPORTED_IMAGE_DOCUMENT_TYPES constant to base.py - Add MIME reverse-lookup for image types in telegram.py - Route image documents through cache_image_from_bytes + vision pipeline - Handle media groups for image documents Closes: #20128, #18620 * test+release: stub auth in test_telegram_documents fixture; map @kiranvk-2011 * fix(gateway): prevent Windows Telegram /restart leaving gateway stopped * chore(release): map @rak135 for PR #25960 salvage * fix(telegram): preserve topic metadata on overflow edits * feat(telegram): add disable_topic_auto_rename gateway flag When Hermes auto-titles a session in a Telegram DM topic it currently renames the topic itself to the generated title. That works for operator-managed lanes (extra.dm_topics) but is disruptive for ad-hoc Threaded-Mode topics that users name by hand — every first exchange overwrites their chosen title. Add gateway.platforms.telegram.extra.disable_topic_auto_rename (default False, preserving prior behaviour). When set, both _schedule_telegram_topic_title_rename and the underlying _rename_telegram_topic_for_session_title short-circuit before touching the Telegram API. Internal session titles (sessions list, TUI) keep working unchanged. Also bridge the legacy top-level telegram.disable_topic_auto_rename key through to gateway.platforms.telegram.extra so users on the older config layout don't have to migrate to enable it. - Tests cover the runtime flag, the scheduling entry-point, and string truthiness coercion for YAML-loaded values. - Docs updated in messaging/telegram.md with an example block. * chore(release): map @B0Tch1 for PR #27634 salvage * fix(gateway): restore Telegram DM topic thread_id after session split (#27166) When context compression triggers a mid-turn session split, source.thread_id can be None on synthetic/recovered events. _thread_metadata_for_source then returns None, causing the Telegram adapter to send with no message_thread_id and the response lands in the General thread instead of the active DM topic. Fix: - hermes_state.py: Add get_telegram_topic_binding_by_session() for reverse lookup by session_id (enabled by the existing UNIQUE INDEX on session_id). - gateway/run.py: After session-split detection, if source is a Telegram DM and source.thread_id is None, recover it from the binding via the new method so _thread_metadata_for_source produces the correct thread routing. - tests/: Coverage for the new lookup method and the recovery flow. * chore(release): map @jackjin1997 for PR #27239 salvage * fix(gateway): allow chat-scoped telegram auth without sender user_id * chore(release): map @soynchux for PR #27806 salvage * fix(telegram): add DM topic typing fallback when message_thread_id rejected When a DM topic lane's message_thread_id is rejected by Telegram (e.g. stale or deleted topic), send_typing now falls back to sending the typing indicator without thread_id so it at least appears in the main DM view, rather than being silently swallowed. Also adds test for the fallback behavior. * fix(telegram): report cron topic fallback * chore(release): map @el-analista for PR #25368 salvage * fix(telegram): wire gt: callback dispatch for gmail-triage buttons The gmail-triage skill's Telegram inline buttons emit callback_data of the form `gt:<verb>:<arg>`, but `_handle_callback_query` had no `gt:` branch — taps fell through silently and the spinner sat there until Telegram timed it out. Add `_handle_gmail_triage_callback`, dispatched from the existing callback router, that: - Authorizes the caller via the same `_is_callback_user_authorized` path as the approval / slash-confirm / clarify handlers. - Maps each verb to a script under `~/.hermes/scripts/gmail-triage/` and runs it async with a 60s timeout. - Splits verbs into one-shots (send / archive / draft / spam) — append the confirmation and strip the keyboard so the action can't fire twice — and sticky-state changes (mute / trust / vip ± -domain) — append the confirmation but leave the keyboard tappable so the user can stack actions on one email. - On failure: toast only, keyboard preserved so the user can retry. - Logs every callback outcome to gateway.log for debugging. * chore(release): map @khungate for PR #25829 salvage * feat(telegram): support quick-command-only menus * chore(release): map @stevehq26-bot for PR #28015 salvage * fix(telegram): handle channel post updates * test: address telegram channel post review * test+release: stub auth in channel_posts fixture; map @brndnsvr * Quiet noisy Telegram gateway errors * chore(release): map oracle@jarviss-mbp.home for PR #24014 salvage * Route Telegram multi-bot mentions exclusively * Document Telegram multi-profile gateway commands * fix: ignore Telegram messages for other bots * chore(release): map @OCWC22 for PR #24581 salvage * feat(telegram): ignore_root_dm with system command lobby * docs(telegram): document ignore_root_dm feature * chore(release): map @ai-hana-ai for PR #23928 salvage * feat(telegram): pin incoming user message for duration of agent turn When a user sends a message on Telegram, the incoming message is now automatically pinned at the start of processing and unpinned when the agent finishes its turn. This gives the user a visual indicator that their message is being worked on, and keeps the conversation anchored. Changes: - telegram.py: Added pinChatMessage in on_processing_start and unpinChatMessage in on_processing_complete. Restructured both hooks so pin/unpin runs independently of the reactions feature (reactions are optional; pinning is always on). - telegram.py: Pass message_id through SessionSource so it's available in the session context. - session_context.py: Added HERMES_SESSION_MESSAGE_ID context var. - run.py: Pass source.message_id through set_session_vars. Pinning is silent (disable_notification=True) and failures are logged at debug level without interrupting message processing. Only the user's incoming message is pinned -- never the agent's replies. Auto-resume events (which have no message_id) are correctly skipped. * chore(release): map @indigokarasu for PR #26636 salvage * feat(telegram): skip-STT audio path + 2GB cap via local Bot API server Two coordinated changes that unblock downstream audio pipelines (diarization, custom transcription, archival) on attachments larger than the public Bot API's 20MB getFile ceiling. - `stt.enabled: false` no longer drops voice/audio with a generic "transcription disabled" note. The gateway probes the cached file's duration (wave → mutagen → ffprobe ladder) and surfaces `[The user sent a voice message: <abs path> (duration: M:SS)]` to the agent so a skill or tool can pick up the raw file. The previous placeholder is replaced rather than appended when present. - `platforms.telegram.extra.base_url` set → adapter auto-lifts its document size cap from 20MB to 2GB (the local telegram-bot-api `--local` ceiling) and the "too large" reply reports the active limit dynamically. No new config knob; presence of `base_url` is the opt-in. - `platforms.telegram.extra.local_mode: true` wires `Application.builder().local_mode(True)` on the python-telegram-bot builder. PTB then reads files from disk instead of HTTP, which is required when telegram-bot-api runs in `--local` mode (the server returns absolute filesystem paths, not `/file/bot...` URLs). - gateway/run.py: rewrites the `stt.enabled: false` branch of `_enrich_message_with_transcription`. New `_format_duration` + `_probe_audio_duration` helpers. - gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute derived from `extra.base_url`; `local_mode` builder wiring; dynamic "too large" message. - tests/gateway/test_stt_config.py: covers path-surfacing with and without an existing user message, and placeholder replacement. - tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB without base_url, 2GB when set, empty-string base_url keeps default. - website/docs/user-guide/messaging/telegram.md: new "Skipping STT" subsection under Voice Messages and a full "Large Files (>20MB) via Local Bot API Server" walkthrough (api_id/api_hash, docker-compose, one-time `logOut` migration, `platforms.telegram.extra` config, the `local_mode` disk-access requirement, the silent HTTP-fallback 404). - website/docs/user-guide/features/voice-mode.md: documents the `stt.enabled` knob in the config reference. - `pytest tests/gateway/test_telegram_max_doc_bytes.py tests/gateway/test_stt_config.py` → 9/9 passing. - Verified end-to-end on a live deployment: gateway log shows `Using custom Telegram base_url: http://...` and `Using Telegram local_mode (read files from disk)` on startup; voice messages above 20MB cache to disk and surface their path to the agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(release): map @alber70g for PR #25280 salvage * fix(web): add scheduled column to i18n type definitions (#28549) columnLabels and columnHelp in en.ts include a scheduled entry but the Translations interface in types.ts did not declare it, causing a TypeScript build failure in the Nix derivation. Made the field optional since only en.ts provides it currently. * docs: comprehensive 2-week sweep of feature/PR coverage gaps (#28497) Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026, roughly 1,080 PRs). The audit found ~50 user-visible features that had landed in code with no docs footprint, plus a handful of stale pages. This PR closes every gap the scan turned up. New pages - user-guide/features/deliverable-mode.md — extension list, agent triggers, kanban_complete artifacts pattern, [[as_document]] override (PR #27813). - developer-guide/web-search-provider-plugin.md — authoring guide modeled on image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448). Providers / auth - Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the display label shows up; provider id stays `alibaba` (PR #24835). - Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs #28116 / #28118 / #28119). - Document Nous JWT minting from refresh token + invalid-refresh quarantine + cross-profile shared token store (PRs #27663 / #19712). - Add `## Microsoft Entra ID authentication (keyless)` section to azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic routing details (PR #28101 / #9df9816da). - Custom providers `api_mode` is now prompted-and-persisted, not just URL autodetected (PR #25068). - Delegation honours `api_mode` + auto-detects anthropic_messages base URLs (PR #26824). - `x_search` auto-enables when xAI credentials are present (PR #27376). - Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR #26534). - NVIDIA NIM billing-origin header is set automatically (PR #26585). Windows / installer - `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus the BOM-strip / git-retry hardening (PR #28169). - Document Hermes Desktop thin installer + first-launch bootstrap (PR #27822). - Document `dep_ensure` Windows bootstrap (PR #27845). - Document install-method auto-detection (pip / git / homebrew / nixos) and the matching update command (PR #27843). Gateway / messaging - `/platform list|pause|resume` full description + circuit-breaker semantics (PR #26600). - Slack / Matrix / Mattermost get parallel `allowed_channels` / `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk (PR #21251). - Discord `allow_any_attachment` + `max_attachment_bytes` (config and env vars) (PR #27245). - Discord clarify-choice button rendering (PR #25485). - Telegram `guest_mode` @mention bypass for allowlisted groups (PR #22759). - Telegram `notifications` mode (`important` vs `all`) (PR #22793). - `[[as_document]]` skill / response directive for forcing document-style media delivery (PR #21210). CLI / TUI - `/new [name]` argument (PR #19637). - `/subgoal` user-supplied criteria appended to `/goal` (PR #25449). - `/exit --delete` flag confirmation prompts for destructive slash commands (PR #22687). - Status-bar additions: ▶ N background indicator (PR #27175), context compression count (PR #21218), YOLO mode banner+statusbar warning (PR #26238). - `display.timestamps` + `docker_extra_args` config keys (PR #23599). - TUI collapsible startup banner sections (PR #20625). - `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847). i18n - Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja, de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches `agent/i18n.py:SUPPORTED_LANGUAGES`. Tools / features - `vision_analyze` native-pixel passthrough for vision-capable callers, with auxiliary text-describer fallback (PR #22955). - `session_search` rewrite to the single-shape tool (discovery / scroll / browse modes) (PRs #27590 / #27840). - Clarify MCP transport scope: client supports stdio + SSE; embedded `hermes mcp serve` is stdio-only (PR #21227). - Web search backends table: add Brave Search (free tier) and DDGS rows (PR #21337). - ACP session-scoped edit auto-approval modes (PR #27862). - Curator rename map in the user-visible per-run summary (PR #22910). - Prompt caching feature page reference in features/overview.md — Claude cross-session 1-hour prefix cache on native Anthropic / OpenRouter / Nous Portal (PR #23828). - Cron per-job profile parameter (PR #28124). - `--no-skills` flag for `hermes profile create` (PR #20986). Build - Verified with `npm run build` in `website/`; both `en` and `zh-Hans` locales compile. Remaining broken-link/anchor warnings are pre-existing (`rl-training.md` from learning-path / overview; the zh-Hans translation lag the docs skill already calls out). * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 9 (#28571) Pre-stages AUTHOR_MAP entries for 9 new/under-mapped contributors whose PRs are being salvaged in the May 2026 LHF batch group 9. Contributors: - jdelmerico (#28278 — signal require_mention filter) - justemu (#27996 — matrix thread_require_mention) - YuanHanzhong (#28029 — dashboard browser scrollback) - noctilust (#28080 — drop stale TUI resume env) - MoonJuhan (#28288 — tolerate unreadable JSONL transcripts) - outsourc-e (#28164 — cron emoji ZWJ sequences) - Zyrixtrex (#28275 — Google OAuth urlopen timeout) - ooovenenoso (#28256 — tool loop recovery hints) - vanthinh6886 (#28018 — yaml/flock/atomic write guards; non-noreply email) Per references/batch-pr-salvage-may14-additions.md. * feat(signal): add require_mention filter for group chats Add a configurable mention filter to the Signal adapter so the bot only responds in groups when it is explicitly @mentioned. Changes: - gateway/platforms/signal.py: read require_mention from adapter extra config or SIGNAL_REQUIRE_MENTION env var; skip group messages that don't mention the bot account (checked in rendered text and raw mention metadata) - gateway/config.py: map signal.require_mention YAML key to the SIGNAL_REQUIRE_MENTION env var (env var takes precedence) Config example: signal: require_mention: true Or via env var: SIGNAL_REQUIRE_MENTION=true * Revert "feat(telegram): pin incoming user message for duration of agent turn" This reverts commit a724c3b9cf5f01e28365322ae5ae3a9579567806. * Revert "feat(telegram): support quick-command-only menus" This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4. * Revert "feat(send_message): auto-detect @username mentions and create Telegram entities" This reverts commit cf814c96f613b38bd891ac941c32da653e81c7ad. * Revert "fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages" This reverts commit db50af910be6b4171ea9cf54f4cc38be27ac1da6. * fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856) Pre-mark all running agent sessions as resume_pending BEFORE the drain wait begins. If the service manager kills the process during the drain (window), the durable marker is already written so the next gateway boot can recover in-flight sessions. On graceful drain completion, clear the early markers for sessions that finished successfully. * fix(matrix): implement thread_require_mention to prevent multi-agent reply loops In multi-agent shared Matrix rooms, multiple bots all participating in the same thread could trigger infinite reply loops — each bot's reply re-engaged the others because they were all in the bot-thread set. Discord has a `thread_require_mention` opt-in for this; Matrix didn't. Add `_parse_thread_require_mention(config)` (mirrors Discord's pattern). In `_resolve_message_context`, when enabled and the message is in a bot-participated thread (not a free-response room), require @mention before processing. Salvage of @justemu's 2-commit stack (#27996). Fixes #27995. * fix(cli): show active profile in TUI prompt * fix(tui): preserve dunder identifiers in markdown * test(file_ops): add regression tests for git baseline warning in write_file Adds TestGitBaselineCheck with 6 unit tests covering _check_git_baseline and the warning field in write_file result: - Git not available → None - Not in a git repo → None - Clean repo → None - Dirty repo → returns warning string with branch name - write_file result includes warning when dirty - write_file result omits warning when clean * fix(dashboard): use browser scrollback for chat wheel * fix(cli): ignore stale HERMES_TUI_RESUME env HERMES_TUI_RESUME is an internal env va…

…ecture decisions (#4) * docs(kanban): document worker protocol auto-blocks Salvages #21585 by @helix4u. Documents the protocol_violation event (worker exits successfully while task is still running), adds --max-retries to the create flag list and --failure-limit to dispatch. * fix(oneshot): pass fallback_providers from profile config to AIAgent Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers spawned via 'hermes -p <profile> chat -q ...') were not honouring the profile's fallback_providers / fallback_model chain because oneshot.py never read the config and never passed fallback_model= to AIAgent. Reads cfg.get('fallback_providers') (new list format) or cfg.get('fallback_model') (legacy single-dict) with the same normalization cli.py applies, then forwards as fallback_model=_fb. * fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry. * feat(kanban): add initial-status for human-ops cards Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag (running|blocked, default running) to 'kanban create', threaded through kanban_db.create_task() and the kanban_create tool schema. 'blocked' parks the task directly in the blocked column for R3 human-ops review, skipping the brief running-to-blocked transition. Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and slash-handler error formatting changes that were bundled in the original PR — those should ship as their own focused changes if still wanted. * fix(kanban): release scratch workspace and tmux session on task completion Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace() and _cleanup_worker_tmux() after marking a task complete. Scratch workspaces (used by swarm agents) accumulate on disk — hundreds of MB per task, never released. Stale tmux sessions from completed agents also persist indefinitely. Both gates are safe: - workspace_kind == 'scratch' gate preserves user worktree/dir workspaces - tmux #{pane_dead} == 1 gate only kills sessions where the worker has already exited - best-effort: cleanup failures never block task completion * fix(kanban): honor severity thresholds in diagnostics Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics was using exact-match (severity == filter), so '--severity warning' hid 'error' and 'critical' diagnostics. Adds severity_at_or_above() helper to kanban_diagnostics and uses it in the dashboard endpoint (CLI already used SEVERITY_ORDER comparison correctly). * test: isolate Kanban env pins in hermetic fixture Salvages the substantive part of #22295 by @steezkelly. Adds the missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK, HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so ambient developer-shell pins on those vars don't bleed into pytest runs. The frozenset extraction + standalone regression test from the original PR were dropped to keep the change minimal — main already maintains the list inline. * feat(kanban): add max_in_progress config to cap concurrent running tasks Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). * fix(packaging): ship bundled skills in wheel Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and optional-skills/ because pyproject's [tool.setuptools.packages.find] only includes Python packages — the skills directories don't have __init__.py so they were silently dropped from the wheel. Adds setup.py with data_files spec emitting skills/* and optional-skills/* under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper in hermes_constants that discovers the wheel-installed location via sysconfig before falling back to a source-checkout path. tools/skills_sync uses the helper so 'hermes update' works for pip-installed users. * fix: 4 small surgical bugs Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. * perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init. * feat(kanban): add --sort option to 'hermes kanban list' Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc, priority,priority-desc,status,assignee,title,updated} to 'hermes kanban list'. Validated against VALID_SORT_ORDERS map; invalid values raise ValueError. Default behaviour (priority DESC, created ASC) is unchanged when --sort is omitted. * docs: add kanban codex lane skill * feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map contributor email for attribution check * test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks. * docs(codex_app_server): document multi-root Kanban writable_roots (#27941) Update the Codex app-server runtime guide's Kanban section to reflect the new behaviour: * The sandbox override now adds the board DB directory plus every Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT, HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated, DB-dir first. * The motivation note now includes the cross-mount artifact-write scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate drive) and links to issue #27941 so readers can find the original bug report. * fix(gateway): quiet corrupt kanban dispatcher boards Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board DBs ("file is not a database" / "database disk image is malformed") and disables them by fingerprint until they're repaired, instead of flooding the gateway log with repeated logger.exception tracebacks every tick. Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was an unrelated _is_dir OSError fix for service-path lookup. Dropped a small test reformat that was bundled in the same commit. * docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold. * feat(kanban): surface per-task model_override in show + tool output Salvages #26897 by @loicnico96. The per-task model_override DB column already exists on main, but it wasn't exposed in user-facing surfaces. This adds: - 'kanban show' prints 'model: <name>' when model_override is set - kanban_show / kanban_list tool responses include the model_override field Original branch was stale (PR was authored against an older field name 'model'); applied the substantive surface exposure manually using the current 'model_override' field name. * feat(cli): add kanban swarm topology helper Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a durable Kanban Swarm v1 graph: a completed root/blackboard card, parallel worker cards, a verifier gated on all workers, and a synthesizer gated on the verifier. Stores shared swarm blackboard updates as structured JSON comments on the root card. Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring + unit tests. * feat(kanban): add optional board parameter to all MCP tools Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9 kanban_* MCP tools via shared _connect helper. Backwards compatible — omitting board keeps current pinned-board behavior. Useful for orchestrator profiles that route across multiple boards. Two-file scope: tools/kanban_tools.py + tests. * feat(kanban): stamp originating ACP session_id on tasks Salvages #23208 by @awizemann. Tracks which chat session created a kanban task so clients can render a per-session board without falling back to tenant + time-window heuristics. - Schema: tasks gains nullable session_id TEXT column with index (additive migration in _migrate_add_optional_columns). - ACP: server.py exposes the originating session id via HERMES_SESSION_ID with save/restore around the agent loop. - Tool: kanban_create reads HERMES_SESSION_ID (with explicit override). - CLI: 'hermes kanban list --session <id>' filter; JSON output exposes session_id. * feat(kanban): wire dispatcher to dispatch review agents from review column Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). * feat(kanban): stale detection for running tasks in dispatcher Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. * feat(kanban): filter tasks by workflow fields and runs by status/outcome Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing workflow_template_id and current_step_key columns: - list_tasks() accepts workflow_template_id and current_step_key kwargs - 'hermes kanban list' adds matching CLI flags - dashboard plugin_api also exposes the filters Resolved a small conflict in list_tasks signature alongside main's session_id and order_by additions; combined all three into the single filter list. * feat(kanban): add respawn guard to block repeat worker storms Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker spawn for tasks where: - a recent run already succeeded (recent_success — within guard window) - the previous run hit a quota/auth error (blocker_auth, also auto-blocks) - a recent task comment includes a GitHub PR URL (active_pr) The guard prevents repeat worker storms on the same bug/task. Includes the contributor's review-findings fixup (regex hardening, observability, auth coverage). Resolved a small DispatchResult conflict alongside main's 'stale' field; kept both. Authorship preserved via rebase merge. * feat(kanban): show dashboard cron jobs across profiles Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron jobs from all profiles, with profile-aware filter UI and storage routing. Includes test coverage for cross-profile listing, mutation, deletion, and validation. Also fixes orphan conflict markers in config.py left by an earlier salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested in HEAD/PR markers from #28452 salvage of #23790). * fix(kanban): remove orphan conflict markers from config.py (#28458) PR #28452 (salvage of #23790, stale detection) merged with leftover git conflict markers in hermes_cli/config.py around the `dispatch_stale_timeout_seconds` config block, breaking config import and any code path that loads it. Cleans up the markers and keeps both config blocks (worker log rotation/orchestrator + stale detection). Resolves a self-introduced regression. * fix(kanban): remove orphan conflict markers from kanban.py (#28459) PR #28454 (salvage of #26745, workflow filter) merged with leftover git conflict markers in hermes_cli/kanban.py at three sites: - _task_to_dict() (session_id alongside workflow_template_id/current_step_key) - p_list parser (--sort alongside --workflow-template-id/--step-key) - _cmd_list (order_by alongside the new filter kwargs) Cleans up the markers and keeps both halves at each site. Resolves a self-introduced regression. * feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: | Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section. * feat(kanban): add scheduled status for delayed follow-ups Salvages #24533 by @roycepersonalassistant. Adds a first-class 'scheduled' Kanban status for time-delay follow-ups that aren't waiting on human input. - hermes kanban schedule <task_id> [reason] CLI command - Dashboard/API transitions to/from Scheduled - unblock_task() now releases both 'blocked' AND 'scheduled' tasks (re-checking parent dependencies before moving to ready/todo) - i18n + docs updates Resolved conflicts: kept HEAD's failure-counter reset on unblock alongside the PR's scheduled state, kept HEAD's 'running' direct-set rejection, combined both bulk-status branches. Dropped the dist/ bundle changes (months-stale; would need rebuild from source). * feat(kanban): drag-to-delete trash zone + bulk delete for task cards Salvages #28125 by @Jpalmer95. Adds: - Drag-to-delete trash zone in the kanban dashboard - Bulk delete endpoint with cascading delete_task cleanup - Frontend updates (drag visual + drop handler) - Confirmation prompt before delete Resolved end-of-file test conflict by appending both halves. * docs: add Korean Kanban documentation Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and translates Kanban documentation (kanban.md, kanban-tutorial.md) and the two related skills (devops-kanban-orchestrator, devops-kanban-worker). Purely additive — adds ko to the locales list in docusaurus.config.ts and creates the website/i18n/ko/ tree. * fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) - aux_config: drop session_search from _AUX_TASKS and remove stale test (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG) - compression_boundary_hook: set compressor._last_compress_aborted=False on MagicMock so the post-compress abort branch (PR #28117) doesn't short-circuit before the session-id rotation under test - kanban_dashboard_plugin: use consecutive_failures=3 so severity stays 'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a, so failures=5 now crosses the critical floor of 2*2=4) - cli_manual_compress: accept force kwarg on DummyAgent._compress_context (cli._manual_compress now passes force=True) * fix(telegram): render full clarify choice text in message body, use short button labels When Telegram clarify prompts offer long choices, mobile clients truncate the inline button labels, making options unreadable. Previously only the question was shown in the message body with truncated choice text in button labels. Fix: append the full numbered option list to the message body so users can read complete choice text on any client. Buttons now use short numeric labels (1, 2, ...) to avoid Telegram truncation. The 'Other (type answer)' button is unchanged. Long choice labels are now rendered in full (not truncated to 57 chars + '...') since they appear in the body instead of button labels. Closes: #27497 * chore(release): map @asdlem for PR #27852 salvage * fix(telegram): default streaming transport to edit * fix(telegram): respect reply_to_mode for DM topic reply fallback The DM topic reply fallback code in send() hardcoded should_thread=True when telegram_dm_topic_reply_fallback metadata was present, bypassing _should_thread_reply() and ignoring reply_to_mode config. This caused quote bubbles on every response even with reply_to_mode: 'off'. Fix: - Add reply_to_mode param to _reply_to_message_id_for_send() and _thread_kwargs_for_send() classmethods - In send(), check self._reply_to_mode != 'off' for DM topic fallback - Suppress reply anchor and reply_to_message_id when mode is 'off' while preserving message_thread_id for correct topic routing - Thread reply_to_mode through all 29 call sites Regression coverage: 10 new tests in test_telegram_reply_mode.py covering classmethod behavior, send() integration, and backward compatibility. Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994 * fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870) Telegram distinguishes three kinds of audio payloads: - message.voice → Opus/OGG voice messages → STT pipeline ✓ - message.audio → audio file attachments → bypasses STT ← was broken - message.document (audio mime) → generic file route **Root cause** — the inbound message routing block in gateway/run.py matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription. Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed instead of being treated as files, making the transcribe skill unusable from Telegram because the path it needed was never surfaced. **Fix** - Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events. - Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ mime-type events that are not explicitly AUDIO or DOCUMENT). - After the STT block, inject a document-style context note for each audio_file_path, giving the agent the file path and asking what to do with it (consistent with how plain documents are handled). **Tests** — 5 new tests in test_telegram_audio_vs_voice.py: - voice message still transcribed (regression guard) - audio attachment skips STT (core fix) - audio attachment context note format - STT disabled still produces file note (not STT-disabled notice) - MessageType.AUDIO != MessageType.VOICE sanity check Fixes #24870 * chore(release): map bartok9 noreply for PR #24879 salvage * fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY When the send_message tool runs outside the gateway process (agent loop, TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone path in _send_telegram constructs Bot(token=token) directly, bypassing any configured proxy. In regions where api.telegram.org is blocked, the send times out after ~5s with 'Telegram send failed: Timed out' and nothing ever shows up in gateway.log because the request never reaches the gateway. Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url, which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just before constructing the Bot. When a proxy is found, attach an HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request', matching what gateway/platforms/telegram.py already does for in-gateway sends and what the Discord standalone sender already does. Any exception attaching the proxy falls back cleanly to a direct connection, preserving prior behaviour for users without a proxy configured. Adds tests/tools/test_send_message_telegram_proxy.py covering both the proxy-configured and no-proxy cases. * chore(release): map @pepelax for PR #25419 salvage * fix(kanban-dashboard): restore implementations dropped during salvages (#28481) Four kanban dashboard test failures, all from PR salvages that picked up the test additions but dropped the corresponding implementations. - BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the board API never grew the column → test_board_empty failed because VALID_STATUSES - {archived} mismatched the rendered columns). - update_task: enrich the 'ready' 409 detail with the blocking parent list (id, title, status) and add _parents_blocking_ready helper. Implementation lost in the #26744 salvage (commit e215558ba) which pinned the test but not the server-side code. - dist/index.js: add parseApiErrorMessage helper, wire it through the drag/drop banner, add patchErr state to the TaskDrawer and surface it inline by the action row. Lost in the same #26744 salvage. - test_diagnostics_endpoint_severity_filter: update to at-or-above semantics (PR a94ddd807 changed the filter from exact-match so the warning filter now correctly includes error+critical too). * fix(gateway): roll over Telegram tool progress bubbles * fix(gateway): scope audio_file_paths outside media_urls guard The audio-file-paths handling block at line 7334 references the variable unconditionally, but #24879 initialized it inside the 'if event.media_urls' block — so events without media_urls hit UnboundLocalError. Found via test_run_agent_queued_message_does_not_treat_commentary_as_final after PR #28478 landed. * fix(gateway): keep tool-progress edits alive after Telegram flood control When a progress-message edit hits Telegram flood control (RetryAfter), can_edit was unconditionally set to False, permanently disabling coalescing for the rest of the run. Subsequent tool updates were posted as separate new messages instead of updating the existing progress bubble. Fix: only set can_edit=False for non-recoverable edit errors. On flood control, back off by resetting _last_edit_ts so the throttle interval is respected before the next edit attempt. Fixes #25188 * chore(release): map @erhnysr for PR #25198 salvage * fix(telegram): preserve can_edit after transient network errors in progress edits (#27828) When edit_message_text fails with a transient error (httpx.ConnectError, NetworkError, server disconnected, timeouts), the progress-message sender must not permanently set can_edit = False — that would convert a single Telegram network hiccup into separate per-tool bubbles for the rest of the run. Changes: - gateway/platforms/telegram.py: edit_message now returns retryable=True for transient network errors (ConnectError, NetworkError, timeouts, server disconnects, temporarily unavailable). Permanent failures (flood control, message-not-found, permissions) remain retryable=False. - gateway/run.py: send_progress_messages checks result.retryable before setting can_edit = False. Transient failures skip the fallback-send and continue — the next edit cycle catches up with the accumulated lines. Permanent failures (flood, message-not-found, etc.) still disable editing. Tests: 22 new tests in test_telegram_progress_edit_transient.py covering transient vs permanent error classification, SendResult.retryable semantics, and the can_edit decision logic. Fixes #27828 * fix(telegram): recover from post-update polling conflict without entering limbo * fix(test+release): update conflict retry count for MAX=5; map @CryptoByz * fix(gateway): route background-process notifications into Telegram DM topics Background-process completion notifications (notify_on_complete) and watch-pattern notifications were always delivered to the Telegram main chat instead of the originating private-chat topic. Hermes-created Telegram DM topic lanes only render a send when it carries both message_thread_id and a reply anchor. The synthetic MessageEvent injected on process completion had no message_id, so _reply_anchor_for_event returned None and _thread_kwargs_for_send dropped message_thread_id entirely — routing the notification to the main chat. Capture the triggering message id at spawn time and thread it through to the synthetic event so it can be reply-anchored back into the topic: - session_context: add HERMES_SESSION_MESSAGE_ID context var - telegram adapter: populate SessionSource.message_id on inbound messages - terminal tool: persist watcher_message_id on the process session - process registry: carry/persist message_id on watcher dicts + checkpoint - gateway: set MessageEvent.message_id on injected notifications Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map @fabiosiqueira for PR #27212 salvage * fix(telegram): route resumed DM topic sends directly * fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button actions but not for inbound messages. Unauthorized users triggered an 'Unauthorized user' log warning but their messages were still processed by the agent — a P0 security bypass (issue #23778). Fix: add allowlist check in _should_process_message() which is called for all message types (text, command, media, location). If the sender is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow all users (existing behavior). Fixes #23778 * fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty The _is_callback_user_authorized fallback returned True when TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user to interact with the bot. Change to fail-closed: deny by default unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set. Fixes #24457 * test(telegram): stub _is_callback_user_authorized in trigger-gating fixture After PR #24468 made the empty-allowlist callback auth fail-closed (and #23795 wired _is_callback_user_authorized into _should_process_message), trigger-gating tests started failing because their fake messages from user 111 hit the new deny-by-default path before trigger evaluation. Force-authorize all senders in _make_adapter() so the trigger logic under test runs. The fail-closed behavior itself is covered by test_telegram_callback_auth_fail_closed.py. * fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS When a sticky fallback IP (from DoH discovery) becomes unreachable, the transport previously got stuck in an attempt_order that only tried the dead IP. This prevented the gateway from recovering until the service was restarted. Changes: - Always include primary DNS path (None) after the sticky IP in the attempt_order so that a primary-path retry happens on sticky failure. - Reset self._sticky_ip to None when the currently sticky IP hits a connect timeout / connect error, allowing the next request to retry from scratch. Fixes silent Telegram disconnection when discovered fallback IPs are transiently or permanently unreachable. * test+release: align stale sticky-IP test for #24511; map @falconexe * fix(telegram): propagate extra base_url config * feat(send_message): auto-detect @username mentions and create Telegram entities When sending messages containing @username patterns, auto-generate MessageEntity(type='mention') entries so that the receiving bot's require_mention filter can trigger. This enables proper bot-to-bot interop where mention-based routing is used. * test+release: align send_message mocks for MessageEntity import; map @fonhal * fix(telegram): resume typing indicator after inline approval click (#27853) The text /approve and /deny paths in gateway/run.py call resume_typing_for_chat() after resolve_gateway_approval() succeeds, but the Telegram inline-button (ea:*) callback in _handle_callback_query did not. Typing is paused when the approval is sent (gateway/run.py:15658), so without a matching resume the typing indicator stayed gone for the remainder of a long-running turn after a button click. Symmetry-match the text path: after a successful resolve, call self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0 to match /approve's "if not count" early-return — if nothing was actually resolved, the agent thread was never unblocked, so typing should remain paused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): mark final voice reply as notify-worthy so Telegram delivers it audibly In Telegram "important" notifications mode (default), TelegramPlatformAdapter sets ``disable_notification=True`` on every send unless metadata carries ``notify=True``. GatewayRunner._send_voice_reply already passes thread metadata through to ``adapter.send_voice``, but never marks the final auto-TTS voice reply as notify-worthy — so users with the default mode get the final voice note delivered silently with no push notification. Mirror the final-text path in gateway/platforms/base.py (the existing text-response final send already adds ``metadata["notify"] = True``). Issue #27970 Bug 2. Bug 1 (MP3 vs. native OGG voice-note) is being addressed by existing PRs #20182 / #20878 — this PR is intentionally scoped to the silent-delivery bug only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: avoid Telegram group reply thread session splits * chore(release): map @eliteworkstation94-ai for PR #28157 salvage * fix(gateway): avoid duplicate Telegram text after auto-TTS voice replies * chore(release): map @Zyrixtrex for PR #26754 salvage * fix(telegram): escape send_slash_confirm preview with format_message send_slash_confirm() sent the raw command preview with ParseMode.MARKDOWN, skipping the format_message() conversion applied to every other dynamic send in the adapter. Commands with underscores, dots, brackets, or other MarkdownV2-sensitive characters raised BadRequest: Can't parse entities; the exception was swallowed by the outer try/except, so the confirmation prompt silently never appeared. Fix: wrap preview through format_message() and switch to MARKDOWN_V2, symmetric with send_update_prompt and the callback sends fixed in a69404052. * chore(release): map @nftpoetrist for PR #25856 salvage * fix(telegram): retry wrapped connect timeouts * chore(release): map @samahn0601 for PR #27887 salvage * fix(tts): keep native audio outside Telegram voice delivery * chore(release): map @aqilaziz for PR #26406 salvage * fix(gateway): pin Telegram DM-topic routing to user's current topic Topic-mode DM replies were fragmenting one conversation across many sessions: a Reply on a message in another topic delivered Telegram's message_thread_id for *that* topic, and #3206's strip routed plain replies to the lobby. Both pulled the user away from their current session. Fix: when topic mode is on, rewrite source.thread_id to the user's most-recent binding if the inbound id is missing/General or not a known topic. Non-topic-mode users unchanged. * chore(release): map @karthikeyann for PR #26609 salvage * fix(send_message): add thread-not-found retry for Telegram forum topic sends The standalone _send_telegram path in send_message_tool lacked the thread-not-found fallback that the gateway adapter has. When a forum topic thread_id was stale or deleted, the send would fail entirely instead of retrying to the General topic. Changes: - Add _is_telegram_thread_not_found() helper matching gateway adapter - Add thread-not-found retry in text send path - Add thread-not-found retry in media send path (with f.seek(0)) - Separate text_kwargs from thread_kwargs to prevent disable_web_page_preview leaking into send_photo/send_video calls Closes #27012 * test(send_message): add thread-not-found retry tests for Telegram forum topics Adds two tests to TestSendTelegramThreadIdMapping: - test_thread_not_found_retries_without_message_thread_id - test_thread_not_found_for_media_retries_without_message_thread_id Refs #27012 * test(send_message): add thread-not-found retry tests for Telegram topics Three tests covering the #27012 fix: - test_is_thread_not_found_matches_expected_errors - test_text_send_retries_without_thread_id_on_thread_not_found - test_disable_web_page_preview_not_leaked_to_media_sends 116/116 existing tests still pass (no regressions). * chore(release): map @kunci115 for PR #27098 salvage * fix(gateway): register Telegram commands for groups Register Telegram bot commands across default, private, and group scopes so the slash-command menu is available outside DMs. Changes from review feedback: - Add asyncio.Lock to prevent race condition in _ensure_forum_commands - Extract MAX_COMMANDS_PER_SCOPE constant (30) to avoid magic number - Upgrade error logging from debug->warning in forum registration - Add tests covering lazy forum registration and concurrent safety - Remove /start handler from this PR (separate feature) Fixes review: needs_work (race, magic number, log levels, missing tests) * test+release: fix test fixture for forum_commands; map @chromalinx * fix(telegram): gate profile bots by allowed topics * chore(release): map @booker1207 for PR #25132 salvage * fix(cron): route Telegram cron deliveries to a dedicated topic via TELEGRAM_CRON_THREAD_ID When Telegram topic mode is enabled, cron messages delivered to the bot's root DM (TELEGRAM_HOME_CHANNEL without a thread id) land in the system lobby — replies there are rebuffed with the lobby reminder and reply_to_message_id is dropped, so users cannot interact with the cron output (#24409). Add an optional TELEGRAM_CRON_THREAD_ID env var that overrides TELEGRAM_HOME_CHANNEL_THREAD_ID for cron deliveries only. Operators can create a "Cron" forum topic in the DM, point this var at its thread id, and replies to cron messages will land in that topic's existing session instead of the lobby. The home-channel thread id (used elsewhere, e.g. restart notifications) is unchanged, and explicit deliver="telegram:chat:thread" targets continue to win over the env var. Per the reporter's clarification on 2026-05-13, option (a) (cron-side route to a dedicated topic + config knob) was chosen. Fixes #24409 * fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline When users send images as documents (Telegram file picker), they were rejected with "Unsupported document type" because SUPPORTED_DOCUMENT_TYPES only includes text/office formats. Add SUPPORTED_IMAGE_DOCUMENT_TYPES to base.py and handle them in telegram.py before the document check. - Add SUPPORTED_IMAGE_DOCUMENT_TYPES constant to base.py - Add MIME reverse-lookup for image types in telegram.py - Route image documents through cache_image_from_bytes + vision pipeline - Handle media groups for image documents Closes: #20128, #18620 * test+release: stub auth in test_telegram_documents fixture; map @kiranvk-2011 * fix(gateway): prevent Windows Telegram /restart leaving gateway stopped * chore(release): map @rak135 for PR #25960 salvage * fix(telegram): preserve topic metadata on overflow edits * feat(telegram): add disable_topic_auto_rename gateway flag When Hermes auto-titles a session in a Telegram DM topic it currently renames the topic itself to the generated title. That works for operator-managed lanes (extra.dm_topics) but is disruptive for ad-hoc Threaded-Mode topics that users name by hand — every first exchange overwrites their chosen title. Add gateway.platforms.telegram.extra.disable_topic_auto_rename (default False, preserving prior behaviour). When set, both _schedule_telegram_topic_title_rename and the underlying _rename_telegram_topic_for_session_title short-circuit before touching the Telegram API. Internal session titles (sessions list, TUI) keep working unchanged. Also bridge the legacy top-level telegram.disable_topic_auto_rename key through to gateway.platforms.telegram.extra so users on the older config layout don't have to migrate to enable it. - Tests cover the runtime flag, the scheduling entry-point, and string truthiness coercion for YAML-loaded values. - Docs updated in messaging/telegram.md with an example block. * chore(release): map @B0Tch1 for PR #27634 salvage * fix(gateway): restore Telegram DM topic thread_id after session split (#27166) When context compression triggers a mid-turn session split, source.thread_id can be None on synthetic/recovered events. _thread_metadata_for_source then returns None, causing the Telegram adapter to send with no message_thread_id and the response lands in the General thread instead of the active DM topic. Fix: - hermes_state.py: Add get_telegram_topic_binding_by_session() for reverse lookup by session_id (enabled by the existing UNIQUE INDEX on session_id). - gateway/run.py: After session-split detection, if source is a Telegram DM and source.thread_id is None, recover it from the binding via the new method so _thread_metadata_for_source produces the correct thread routing. - tests/: Coverage for the new lookup method and the recovery flow. * chore(release): map @jackjin1997 for PR #27239 salvage * fix(gateway): allow chat-scoped telegram auth without sender user_id * chore(release): map @soynchux for PR #27806 salvage * fix(telegram): add DM topic typing fallback when message_thread_id rejected When a DM topic lane's message_thread_id is rejected by Telegram (e.g. stale or deleted topic), send_typing now falls back to sending the typing indicator without thread_id so it at least appears in the main DM view, rather than being silently swallowed. Also adds test for the fallback behavior. * fix(telegram): report cron topic fallback * chore(release): map @el-analista for PR #25368 salvage * fix(telegram): wire gt: callback dispatch for gmail-triage buttons The gmail-triage skill's Telegram inline buttons emit callback_data of the form `gt:<verb>:<arg>`, but `_handle_callback_query` had no `gt:` branch — taps fell through silently and the spinner sat there until Telegram timed it out. Add `_handle_gmail_triage_callback`, dispatched from the existing callback router, that: - Authorizes the caller via the same `_is_callback_user_authorized` path as the approval / slash-confirm / clarify handlers. - Maps each verb to a script under `~/.hermes/scripts/gmail-triage/` and runs it async with a 60s timeout. - Splits verbs into one-shots (send / archive / draft / spam) — append the confirmation and strip the keyboard so the action can't fire twice — and sticky-state changes (mute / trust / vip ± -domain) — append the confirmation but leave the keyboard tappable so the user can stack actions on one email. - On failure: toast only, keyboard preserved so the user can retry. - Logs every callback outcome to gateway.log for debugging. * chore(release): map @khungate for PR #25829 salvage * feat(telegram): support quick-command-only menus * chore(release): map @stevehq26-bot for PR #28015 salvage * fix(telegram): handle channel post updates * test: address telegram channel post review * test+release: stub auth in channel_posts fixture; map @brndnsvr * Quiet noisy Telegram gateway errors * chore(release): map oracle@jarviss-mbp.home for PR #24014 salvage * Route Telegram multi-bot mentions exclusively * Document Telegram multi-profile gateway commands * fix: ignore Telegram messages for other bots * chore(release): map @OCWC22 for PR #24581 salvage * feat(telegram): ignore_root_dm with system command lobby * docs(telegram): document ignore_root_dm feature * chore(release): map @ai-hana-ai for PR #23928 salvage * feat(telegram): pin incoming user message for duration of agent turn When a user sends a message on Telegram, the incoming message is now automatically pinned at the start of processing and unpinned when the agent finishes its turn. This gives the user a visual indicator that their message is being worked on, and keeps the conversation anchored. Changes: - telegram.py: Added pinChatMessage in on_processing_start and unpinChatMessage in on_processing_complete. Restructured both hooks so pin/unpin runs independently of the reactions feature (reactions are optional; pinning is always on). - telegram.py: Pass message_id through SessionSource so it's available in the session context. - session_context.py: Added HERMES_SESSION_MESSAGE_ID context var. - run.py: Pass source.message_id through set_session_vars. Pinning is silent (disable_notification=True) and failures are logged at debug level without interrupting message processing. Only the user's incoming message is pinned -- never the agent's replies. Auto-resume events (which have no message_id) are correctly skipped. * chore(release): map @indigokarasu for PR #26636 salvage * feat(telegram): skip-STT audio path + 2GB cap via local Bot API server Two coordinated changes that unblock downstream audio pipelines (diarization, custom transcription, archival) on attachments larger than the public Bot API's 20MB getFile ceiling. - `stt.enabled: false` no longer drops voice/audio with a generic "transcription disabled" note. The gateway probes the cached file's duration (wave → mutagen → ffprobe ladder) and surfaces `[The user sent a voice message: <abs path> (duration: M:SS)]` to the agent so a skill or tool can pick up the raw file. The previous placeholder is replaced rather than appended when present. - `platforms.telegram.extra.base_url` set → adapter auto-lifts its document size cap from 20MB to 2GB (the local telegram-bot-api `--local` ceiling) and the "too large" reply reports the active limit dynamically. No new config knob; presence of `base_url` is the opt-in. - `platforms.telegram.extra.local_mode: true` wires `Application.builder().local_mode(True)` on the python-telegram-bot builder. PTB then reads files from disk instead of HTTP, which is required when telegram-bot-api runs in `--local` mode (the server returns absolute filesystem paths, not `/file/bot...` URLs). - gateway/run.py: rewrites the `stt.enabled: false` branch of `_enrich_message_with_transcription`. New `_format_duration` + `_probe_audio_duration` helpers. - gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute derived from `extra.base_url`; `local_mode` builder wiring; dynamic "too large" message. - tests/gateway/test_stt_config.py: covers path-surfacing with and without an existing user message, and placeholder replacement. - tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB without base_url, 2GB when set, empty-string base_url keeps default. - website/docs/user-guide/messaging/telegram.md: new "Skipping STT" subsection under Voice Messages and a full "Large Files (>20MB) via Local Bot API Server" walkthrough (api_id/api_hash, docker-compose, one-time `logOut` migration, `platforms.telegram.extra` config, the `local_mode` disk-access requirement, the silent HTTP-fallback 404). - website/docs/user-guide/features/voice-mode.md: documents the `stt.enabled` knob in the config reference. - `pytest tests/gateway/test_telegram_max_doc_bytes.py tests/gateway/test_stt_config.py` → 9/9 passing. - Verified end-to-end on a live deployment: gateway log shows `Using custom Telegram base_url: http://...` and `Using Telegram local_mode (read files from disk)` on startup; voice messages above 20MB cache to disk and surface their path to the agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(release): map @alber70g for PR #25280 salvage * fix(web): add scheduled column to i18n type definitions (#28549) columnLabels and columnHelp in en.ts include a scheduled entry but the Translations interface in types.ts did not declare it, causing a TypeScript build failure in the Nix derivation. Made the field optional since only en.ts provides it currently. * docs: comprehensive 2-week sweep of feature/PR coverage gaps (#28497) Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026, roughly 1,080 PRs). The audit found ~50 user-visible features that had landed in code with no docs footprint, plus a handful of stale pages. This PR closes every gap the scan turned up. New pages - user-guide/features/deliverable-mode.md — extension list, agent triggers, kanban_complete artifacts pattern, [[as_document]] override (PR #27813). - developer-guide/web-search-provider-plugin.md — authoring guide modeled on image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448). Providers / auth - Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the display label shows up; provider id stays `alibaba` (PR #24835). - Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs #28116 / #28118 / #28119). - Document Nous JWT minting from refresh token + invalid-refresh quarantine + cross-profile shared token store (PRs #27663 / #19712). - Add `## Microsoft Entra ID authentication (keyless)` section to azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic routing details (PR #28101 / #9df9816da). - Custom providers `api_mode` is now prompted-and-persisted, not just URL autodetected (PR #25068). - Delegation honours `api_mode` + auto-detects anthropic_messages base URLs (PR #26824). - `x_search` auto-enables when xAI credentials are present (PR #27376). - Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR #26534). - NVIDIA NIM billing-origin header is set automatically (PR #26585). Windows / installer - `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus the BOM-strip / git-retry hardening (PR #28169). - Document Hermes Desktop thin installer + first-launch bootstrap (PR #27822). - Document `dep_ensure` Windows bootstrap (PR #27845). - Document install-method auto-detection (pip / git / homebrew / nixos) and the matching update command (PR #27843). Gateway / messaging - `/platform list|pause|resume` full description + circuit-breaker semantics (PR #26600). - Slack / Matrix / Mattermost get parallel `allowed_channels` / `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk (PR #21251). - Discord `allow_any_attachment` + `max_attachment_bytes` (config and env vars) (PR #27245). - Discord clarify-choice button rendering (PR #25485). - Telegram `guest_mode` @mention bypass for allowlisted groups (PR #22759). - Telegram `notifications` mode (`important` vs `all`) (PR #22793). - `[[as_document]]` skill / response directive for forcing document-style media delivery (PR #21210). CLI / TUI - `/new [name]` argument (PR #19637). - `/subgoal` user-supplied criteria appended to `/goal` (PR #25449). - `/exit --delete` flag confirmation prompts for destructive slash commands (PR #22687). - Status-bar additions: ▶ N background indicator (PR #27175), context compression count (PR #21218), YOLO mode banner+statusbar warning (PR #26238). - `display.timestamps` + `docker_extra_args` config keys (PR #23599). - TUI collapsible startup banner sections (PR #20625). - `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847). i18n - Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja, de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches `agent/i18n.py:SUPPORTED_LANGUAGES`. Tools / features - `vision_analyze` native-pixel passthrough for vision-capable callers, with auxiliary text-describer fallback (PR #22955). - `session_search` rewrite to the single-shape tool (discovery / scroll / browse modes) (PRs #27590 / #27840). - Clarify MCP transport scope: client supports stdio + SSE; embedded `hermes mcp serve` is stdio-only (PR #21227). - Web search backends table: add Brave Search (free tier) and DDGS rows (PR #21337). - ACP session-scoped edit auto-approval modes (PR #27862). - Curator rename map in the user-visible per-run summary (PR #22910). - Prompt caching feature page reference in features/overview.md — Claude cross-session 1-hour prefix cache on native Anthropic / OpenRouter / Nous Portal (PR #23828). - Cron per-job profile parameter (PR #28124). - `--no-skills` flag for `hermes profile create` (PR #20986). Build - Verified with `npm run build` in `website/`; both `en` and `zh-Hans` locales compile. Remaining broken-link/anchor warnings are pre-existing (`rl-training.md` from learning-path / overview; the zh-Hans translation lag the docs skill already calls out). * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 9 (#28571) Pre-stages AUTHOR_MAP entries for 9 new/under-mapped contributors whose PRs are being salvaged in the May 2026 LHF batch group 9. Contributors: - jdelmerico (#28278 — signal require_mention filter) - justemu (#27996 — matrix thread_require_mention) - YuanHanzhong (#28029 — dashboard browser scrollback) - noctilust (#28080 — drop stale TUI resume env) - MoonJuhan (#28288 — tolerate unreadable JSONL transcripts) - outsourc-e (#28164 — cron emoji ZWJ sequences) - Zyrixtrex (#28275 — Google OAuth urlopen timeout) - ooovenenoso (#28256 — tool loop recovery hints) - vanthinh6886 (#28018 — yaml/flock/atomic write guards; non-noreply email) Per references/batch-pr-salvage-may14-additions.md. * feat(signal): add require_mention filter for group chats Add a configurable mention filter to the Signal adapter so the bot only responds in groups when it is explicitly @mentioned. Changes: - gateway/platforms/signal.py: read require_mention from adapter extra config or SIGNAL_REQUIRE_MENTION env var; skip group messages that don't mention the bot account (checked in rendered text and raw mention metadata) - gateway/config.py: map signal.require_mention YAML key to the SIGNAL_REQUIRE_MENTION env var (env var takes precedence) Config example: signal: require_mention: true Or via env var: SIGNAL_REQUIRE_MENTION=true * Revert "feat(telegram): pin incoming user message for duration of agent turn" This reverts commit a724c3b9cf5f01e28365322ae5ae3a9579567806. * Revert "feat(telegram): support quick-command-only menus" This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4. * Revert "feat(send_message): auto-detect @username mentions and create Telegram entities" This reverts commit cf814c96f613b38bd891ac941c32da653e81c7ad. * Revert "fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages" This reverts commit db50af910be6b4171ea9cf54f4cc38be27ac1da6. * fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856) Pre-mark all running agent sessions as resume_pending BEFORE the drain wait begins. If the service manager kills the process during the drain (window), the durable marker is already written so the next gateway boot can recover in-flight sessions. On graceful drain completion, clear the early markers for sessions that finished successfully. * fix(matrix): implement thread_require_mention to prevent multi-agent reply loops In multi-agent shared Matrix rooms, multiple bots all participating in the same thread could trigger infinite reply loops — each bot's reply re-engaged the others because they were all in the bot-thread set. Discord has a `thread_require_mention` opt-in for this; Matrix didn't. Add `_parse_thread_require_mention(config)` (mirrors Discord's pattern). In `_resolve_message_context`, when enabled and the message is in a bot-participated thread (not a free-response room), require @mention before processing. Salvage of @justemu's 2-commit stack (#27996). Fixes #27995. * fix(cli): show active profile in TUI prompt * fix(tui): preserve dunder identifiers in markdown * test(file_ops): add regression tests for git baseline warning in write_file Adds TestGitBaselineCheck with 6 unit tests covering _check_git_baseline and the warning field in write_file result: - Git not available → None - Not in a git repo → None - Clean repo → None - Dirty repo → returns warning string with branch name - write_file result includes warning when dirty - write_file result omits warning when clean * fix(dashboard): use browser scrollback for chat wheel * fix(cli): ignore stale HERMES_TUI_RESUME env HERMES_TUI_RESUME is an internal env var the Python wrapper exports to hand a session ID off to the Ink TUI. Because _launch_tui started from os.environ.copy(), any exported/stale value in the user's shell leaked through — so plain `hermes --tui` would try to resume a missing session and leave the UI at 'error: session not found' with no live session. Drop HERMES_TUI_RESUME from the env before conditionally re-setting it from the argparse-resolved resume_session_id. Tests cover both the drop path and the set-from-arg path. Salvage of #28080 by @noctilust. * fix(cron): allow emoji ZWJ sequences in prompts * fix: tolerate unreadable gateway JSONL transcripts * fix(skills): add timeout to Google OAuth urlopen calls * fix: add recovery hints to loop guard warnings * fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 1. trajectory_compressor.py: yaml.safe_load() returns None on empty files, crashing with TypeError on `if 'tokenizer' in data`. Fix by adding `or {}` fallback. (HIGH — blocks startup with empty config) 2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without try/except: cron/scheduler.py, hermes_cli/auth.py, agent/shell_hooks.py, tools/skill_usage.py, tools/environments/file_sync.py, tools/memory_tool.py. If unlock raises OSError, fd.close() is skipped and the lock is held forever. The msvcrt branches already had try/except; the fcntl branches did not. Fix by wrapping in try/except (OSError, IOError): pass. 3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists() followed by path.read_text() with no try/except. If file is deleted between the check and the read, FileNotFoundError propagates. Fix by using try/except FileNotFoundError. 4. gateway/sticker_cache.py: non-atomic write via Path.write_text() can leave truncated JSON on crash, causing JSONDecodeError on next load. Fix by writing to tempfile + fsync + os.replace (atomic). * chore(release): alias xxxigm noreply for upcoming #27986 salvage (#28594) Adds the canonical noreply form (54813621+xxxigm@users.noreply.github.com) alongside the existing plain-email mapping so the salvage commit for @xxxigm's codex doctor PR doesn't fail AUTHOR_MAP CI. * fix(doctor): attach codex CLI hint to OpenAI Codex auth warning for #27975 `hermes doctor` printed 'codex CLI not installed (optional — ...)' as a generic info line at the bottom of the auth section, several rows below 'OpenAI Codex auth (not logged in)' and after MiniMax/Gemini auth checks. Users reading sequentially mistook it for MiniMax-related advice. Move the hint up under the Codex auth warning so it's adjacent to the row it actually pertains to. Behavior unchanged when the codex CLI is installed (success path keeps its 'codex CLI ✓' row at the bottom). Tests cover both placement and suppression cases. Salvage of @xxxigm's 3-commit stack (#27986). Closes #27975. * fix(tests): catch up 25 stale tests after recent merges (#28626) Sweep of all CI failures on origin/main, grouped by drift source: Telegram allowlist gate (db50af910 added user-authz to _should_process_message): - Hardcoded "[Telegram]" prefix in the logger.warning so the call no longer dereferences self.name → self.platform, which test fixtures built via object.__new__ never set. - test_telegram_format / test_allowed_channels_widening fixtures stub _is_callback_user_authorized → True so the new gate doesn't reject guest-mode / allowed-channels test messages. - test_telegram_approval_buttons::test_update_prompt_callback_not_affected sets TELEGRAM_ALLOWED_USERS="*" so the fail-closed default doesn't reject the callback before it writes .update_response. Approval surface (6d495d9e7 renamed status, 214b95392 detached stdin): - test_no_callback_returns_approval_required: status is now "pending_approval" (was "approval_required"). - test_close_stdin_allows_eof_driven_process_to_finish: switch to use_pty=True; non-PTY now uses stdin=DEVNULL. Mattermost (send() now resolves root_id via _api_get first): - test_send_with_thread_reply mocks _session.get with a thread-root response so the new resolver doesn't TypeError on a bare AsyncMock. Kanban (d8ad431de rename, f55d94a1e review column, _kanban_worker_skill_available): - _safe_int → _to_epoch in the two test_kanban_db tests. - Spawn-skills tests (×3) monkey-patch _kanban_worker_skill_available to True since the isolated kanban_home fixture has no devops/kanban-worker tree. - test_gateway_dispatcher_disables_corrupt_board: connect count 3 → 5 (review-column probe now also runs per tick). Aux-config severity at_or_above (a94ddd807): - test_diagnostics_endpoint_severity_filter expects warning filter to include error+critical now (was exact-match). Anthropic error handling (conversation loop extracted from run_agent): - _no_backoff_wait fixture patches BOTH run_agent.jittered_backoff AND agent.conversation_loop.jittered_backoff. The latter is the actual call site; without the second patch tests burn ~2s per retry and hit the 30s SIGALRM timeout on CI. Other test pollution / drift: - test_auto_does_not_select_copilot_from_github_token: patch agent.bedrock_adapter.has_aws_credentials → False so boto3's credential chain can't auto-pick Bedrock from developer ~/.aws. - test_setup_openclaw_migration: patch hermes_cli.gateway.get_env_value in addition to setup_mod.get_env_value — _platform_status reads through the gateway module's binding. - test_gateway_prefix: COMPONENT_PREFIXES["gateway"] now includes "hermes_plugins" too. - test_recommended_update_command_defaults_to_hermes_update: also short-circuit get_managed_update_command in case a stray ~/.hermes/.managed marker is present. - test_user_id_is_not_explicit: _parse_target_ref now returns is_explicit=False for Slack U.../W... IDs (chat.postMessage rejects them — a DM must be opened first via conversations.open). * feat(update): syntax-validate critical files post-pull, auto-rollback on failure (#28669) Catch the PR #28452 failure mode (orphan merge-conflict markers in hermes_cli/config.py) on the user side: after git pull succeeds, compile the files every 'hermes' invocation imports at startup. If any has a syntax error, git reset --hard back to the pre-pull SHA so the install stays bootable. User can retry once a fix lands upstream. - New _capture…

@aqilaziz

Salvages NousResearch#26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three.

* feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: | Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section. * feat(kanban): add scheduled status for delayed follow-ups Salvages #24533 by @roycepersonalassistant. Adds a first-class 'scheduled' Kanban status for time-delay follow-ups that aren't waiting on human input. - hermes kanban schedule <task_id> [reason] CLI command - Dashboard/API transitions to/from Scheduled - unblock_task() now releases both 'blocked' AND 'scheduled' tasks (re-checking parent dependencies before moving to ready/todo) - i18n + docs updates Resolved conflicts: kept HEAD's failure-counter reset on unblock alongside the PR's scheduled state, kept HEAD's 'running' direct-set rejection, combined both bulk-status branches. Dropped the dist/ bundle changes (months-stale; would need rebuild from source). * feat(kanban): drag-to-delete trash zone + bulk delete for task cards Salvages #28125 by @Jpalmer95. Adds: - Drag-to-delete trash zone in the kanban dashboard - Bulk delete endpoint with cascading delete_task cleanup - Frontend updates (drag visual + drop handler) - Confirmation prompt before delete Resolved end-of-file test conflict by appending both halves. * docs: add Korean Kanban documentation Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and translates Kanban documentation (kanban.md, kanban-tutorial.md) and the two related skills (devops-kanban-orchestrator, devops-kanban-worker). Purely additive — adds ko to the locales list in docusaurus.config.ts and creates the website/i18n/ko/ tree. * fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) - aux_config: drop session_search from _AUX_TASKS and remove stale test (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG) - compression_boundary_hook: set compressor._last_compress_aborted=False on MagicMock so the post-compress abort branch (PR #28117) doesn't short-circuit before the session-id rotation under test - kanban_dashboard_plugin: use consecutive_failures=3 so severity stays 'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a, so failures=5 now crosses the critical floor of 2*2=4) - cli_manual_compress: accept force kwarg on DummyAgent._compress_context (cli._manual_compress now passes force=True) * fix(telegram): render full clarify choice text in message body, use short button labels When Telegram clarify prompts offer long choices, mobile clients truncate the inline button labels, making options unreadable. Previously only the question was shown in the message body with truncated choice text in button labels. Fix: append the full numbered option list to the message body so users can read complete choice text on any client. Buttons now use short numeric labels (1, 2, ...) to avoid Telegram truncation. The 'Other (type answer)' button is unchanged. Long choice labels are now rendered in full (not truncated to 57 chars + '...') since they appear in the body instead of button labels. Closes: #27497 * chore(release): map @asdlem for PR #27852 salvage * fix(telegram): default streaming transport to edit * fix(telegram): respect reply_to_mode for DM topic reply fallback The DM topic reply fallback code in send() hardcoded should_thread=True when telegram_dm_topic_reply_fallback metadata was present, bypassing _should_thread_reply() and ignoring reply_to_mode config. This caused quote bubbles on every response even with reply_to_mode: 'off'. Fix: - Add reply_to_mode param to _reply_to_message_id_for_send() and _thread_kwargs_for_send() classmethods - In send(), check self._reply_to_mode != 'off' for DM topic fallback - Suppress reply anchor and reply_to_message_id when mode is 'off' while preserving message_thread_id for correct topic routing - Thread reply_to_mode through all 29 call sites Regression coverage: 10 new tests in test_telegram_reply_mode.py covering classmethod behavior, send() integration, and backward compatibility. Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994 * fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870) Telegram distinguishes three kinds of audio payloads: - message.voice → Opus/OGG voice messages → STT pipeline ✓ - message.audio → audio file attachments → bypasses STT ← was broken - message.document (audio mime) → generic file route **Root cause** — the inbound message routing block in gateway/run.py matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription. Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed instead of being treated as files, making the transcribe skill unusable from Telegram because the path it needed was never surfaced. **Fix** - Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events. - Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ mime-type events that are not explicitly AUDIO or DOCUMENT). - After the STT block, inject a document-style context note for each audio_file_path, giving the agent the file path and asking what to do with it (consistent with how plain documents are handled). **Tests** — 5 new tests in test_telegram_audio_vs_voice.py: - voice message still transcribed (regression guard) - audio attachment skips STT (core fix) - audio attachment context note format - STT disabled still produces file note (not STT-disabled notice) - MessageType.AUDIO != MessageType.VOICE sanity check Fixes #24870 * chore(release): map bartok9 noreply for PR #24879 salvage * fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY When the send_message tool runs outside the gateway process (agent loop, TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone path in _send_telegram constructs Bot(token=token) directly, bypassing any configured proxy. In regions where api.telegram.org is blocked, the send times out after ~5s with 'Telegram send failed: Timed out' and nothing ever shows up in gateway.log because the request never reaches the gateway. Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url, which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just before constructing the Bot. When a proxy is found, attach an HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request', matching what gateway/platforms/telegram.py already does for in-gateway sends and what the Discord standalone sender already does. Any exception attaching the proxy falls back cleanly to a direct connection, preserving prior behaviour for users without a proxy configured. Adds tests/tools/test_send_message_telegram_proxy.py covering both the proxy-configured and no-proxy cases. * chore(release): map @pepelax for PR #25419 salvage * fix(kanban-dashboard): restore implementations dropped during salvages (#28481) Four kanban dashboard test failures, all from PR salvages that picked up the test additions but dropped the corresponding implementations. - BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the board API never grew the column → test_board_empty failed because VALID_STATUSES - {archived} mismatched the rendered columns). - update_task: enrich the 'ready' 409 detail with the blocking parent list (id, title, status) and add _parents_blocking_ready helper. Implementation lost in the #26744 salvage (commit e215558ba) which pinned the test but not the server-side code. - dist/index.js: add parseApiErrorMessage helper, wire it through the drag/drop banner, add patchErr state to the TaskDrawer and surface it inline by the action row. Lost in the same #26744 salvage. - test_diagnostics_endpoint_severity_filter: update to at-or-above semantics (PR a94ddd807 changed the filter from exact-match so the warning filter now correctly includes error+critical too). * fix(gateway): roll over Telegram tool progress bubbles * fix(gateway): scope audio_file_paths outside media_urls guard The audio-file-paths handling block at line 7334 references the variable unconditionally, but #24879 initialized it inside the 'if event.media_urls' block — so events without media_urls hit UnboundLocalError. Found via test_run_agent_queued_message_does_not_treat_commentary_as_final after PR #28478 landed. * fix(gateway): keep tool-progress edits alive after Telegram flood control When a progress-message edit hits Telegram flood control (RetryAfter), can_edit was unconditionally set to False, permanently disabling coalescing for the rest of the run. Subsequent tool updates were posted as separate new messages instead of updating the existing progress bubble. Fix: only set can_edit=False for non-recoverable edit errors. On flood control, back off by resetting _last_edit_ts so the throttle interval is respected before the next edit attempt. Fixes #25188 * chore(release): map @erhnysr for PR #25198 salvage * fix(telegram): preserve can_edit after transient network errors in progress edits (#27828) When edit_message_text fails with a transient error (httpx.ConnectError, NetworkError, server disconnected, timeouts), the progress-message sender must not permanently set can_edit = False — that would convert a single Telegram network hiccup into separate per-tool bubbles for the rest of the run. Changes: - gateway/platforms/telegram.py: edit_message now returns retryable=True for transient network errors (ConnectError, NetworkError, timeouts, server disconnects, temporarily unavailable). Permanent failures (flood control, message-not-found, permissions) remain retryable=False. - gateway/run.py: send_progress_messages checks result.retryable before setting can_edit = False. Transient failures skip the fallback-send and continue — the next edit cycle catches up with the accumulated lines. Permanent failures (flood, message-not-found, etc.) still disable editing. Tests: 22 new tests in test_telegram_progress_edit_transient.py covering transient vs permanent error classification, SendResult.retryable semantics, and the can_edit decision logic. Fixes #27828 * fix(telegram): recover from post-update polling conflict without entering limbo * fix(test+release): update conflict retry count for MAX=5; map @CryptoByz * fix(gateway): route background-process notifications into Telegram DM topics Background-process completion notifications (notify_on_complete) and watch-pattern notifications were always delivered to the Telegram main chat instead of the originating private-chat topic. Hermes-created Telegram DM topic lanes only render a send when it carries both message_thread_id and a reply anchor. The synthetic MessageEvent injected on process completion had no message_id, so _reply_anchor_for_event returned None and _thread_kwargs_for_send dropped message_thread_id entirely — routing the notification to the main chat. Capture the triggering message id at spawn time and thread it through to the synthetic event so it can be reply-anchored back into the topic: - session_context: add HERMES_SESSION_MESSAGE_ID context var - telegram adapter: populate SessionSource.message_id on inbound messages - terminal tool: persist watcher_message_id on the process session - process registry: carry/persist message_id on watcher dicts + checkpoint - gateway: set MessageEvent.message_id on injected notifications Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map @fabiosiqueira for PR #27212 salvage * fix(telegram): route resumed DM topic sends directly * fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button actions but not for inbound messages. Unauthorized users triggered an 'Unauthorized user' log warning but their messages were still processed by the agent — a P0 security bypass (issue #23778). Fix: add allowlist check in _should_process_message() which is called for all message types (text, command, media, location). If the sender is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow all users (existing behavior). Fixes #23778 * fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty The _is_callback_user_authorized fallback returned True when TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user to interact with the bot. Change to fail-closed: deny by default unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set. Fixes #24457 * test(telegram): stub _is_callback_user_authorized in trigger-gating fixture After PR #24468 made the empty-allowlist callback auth fail-closed (and #23795 wired _is_callback_user_authorized into _should_process_message), trigger-gating tests started failing because their fake messages from user 111 hit the new deny-by-default path before trigger evaluation. Force-authorize all senders in _make_adapter() so the trigger logic under test runs. The fail-closed behavior itself is covered by test_telegram_callback_auth_fail_closed.py. * fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS When a sticky fallback IP (from DoH discovery) becomes unreachable, the transport previously got stuck in an attempt_order that only tried the dead IP. This prevented the gateway from recovering until the service was restarted. Changes: - Always include primary DNS path (None) after the sticky IP in the attempt_order so that a primary-path retry happens on sticky failure. - Reset self._sticky_ip to None when the currently sticky IP hits a connect timeout / connect error, allowing the next request to retry from scratch. Fixes silent Telegram disconnection when discovered fallback IPs are transiently or permanently unreachable. * test+release: align stale sticky-IP test for #24511; map @falconexe * fix(telegram): propagate extra base_url config * feat(send_message): auto-detect @username mentions and create Telegram entities When sending messages containing @username patterns, auto-generate MessageEntity(type='mention') entries so that the receiving bot's require_mention filter can trigger. This enables proper bot-to-bot interop where mention-based routing is used. * test+release: align send_message mocks for MessageEntity import; map @fonhal * fix(telegram): resume typing indicator after inline approval click (#27853) The text /approve and /deny paths in gateway/run.py call resume_typing_for_chat() after resolve_gateway_approval() succeeds, but the Telegram inline-button (ea:*) callback in _handle_callback_query did not. Typing is paused when the approval is sent (gateway/run.py:15658), so without a matching resume the typing indicator stayed gone for the remainder of a long-running turn after a button click. Symmetry-match the text path: after a successful resolve, call self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0 to match /approve's "if not count" early-return — if nothing was actually resolved, the agent thread was never unblocked, so typing should remain paused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): mark final voice reply as notify-worthy so Telegram delivers it audibly In Telegram "important" notifications mode (default), TelegramPlatformAdapter sets ``disable_notification=True`` on every send unless metadata carries ``notify=True``. GatewayRunner._send_voice_reply already passes thread metadata through to ``adapter.send_voice``, but never marks the final auto-TTS voice reply as notify-worthy — so users with the default mode get the final voice note delivered silently with no push notification. Mirror the final-text path in gateway/platforms/base.py (the existing text-response final send already adds ``metadata["notify"] = True``). Issue #27970 Bug 2. Bug 1 (MP3 vs. native OGG voice-note) is being addressed by existing PRs #20182 / #20878 — this PR is intentionally scoped to the silent-delivery bug only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: avoid Telegram group reply thread session splits * chore(release): map @eliteworkstation94-ai for PR #28157 salvage * fix(gateway): avoid duplicate Telegram text after auto-TTS voice replies * chore(release): map @Zyrixtrex for PR #26754 salvage * fix(telegram): escape send_slash_confirm preview with format_message send_slash_confirm() sent the raw command preview with ParseMode.MARKDOWN, skipping the format_message() conversion applied to every other dynamic send in the adapter. Commands with underscores, dots, brackets, or other MarkdownV2-sensitive characters raised BadRequest: Can't parse entities; the exception was swallowed by the outer try/except, so the confirmation prompt silently never appeared. Fix: wrap preview through format_message() and switch to MARKDOWN_V2, symmetric with send_update_prompt and the callback sends fixed in a69404052. * chore(release): map @nftpoetrist for PR #25856 salvage * fix(telegram): retry wrapped connect timeouts * chore(release): map @samahn0601 for PR #27887 salvage * fix(tts): keep native audio outside Telegram voice delivery * chore(release): map @aqilaziz for PR #26406 salvage * fix(gateway): pin Telegram DM-topic routing to user's current topic Topic-mode DM replies were fragmenting one conversation across many sessions: a Reply on a message in another topic delivered Telegram's message_thread_id for *that* topic, and #3206's strip routed plain replies to the lobby. Both pulled the user away from their current session. Fix: when topic mode is on, rewrite source.thread_id to the user's most-recent binding if the inbound id is missing/General or not a known topic. Non-topic-mode users unchanged. * chore(release): map @karthikeyann for PR #26609 salvage * fix(send_message): add thread-not-found retry for Telegram forum topic sends The standalone _send_telegram path in send_message_tool lacked the thread-not-found fallback that the gateway adapter has. When a forum topic thread_id was stale or deleted, the send would fail entirely instead of retrying to the General topic. Changes: - Add _is_telegram_thread_not_found() helper matching gateway adapter - Add thread-not-found retry in text send path - Add thread-not-found retry in media send path (with f.seek(0)) - Separate text_kwargs from thread_kwargs to prevent disable_web_page_preview leaking into send_photo/send_video calls Closes #27012 * test(send_message): add thread-not-found retry tests for Telegram forum topics Adds two tests to TestSendTelegramThreadIdMapping: - test_thread_not_found_retries_without_message_thread_id - test_thread_not_found_for_media_retries_without_message_thread_id Refs #27012 * test(send_message): add thread-not-found retry tests for Telegram topics Three tests covering the #27012 fix: - test_is_thread_not_found_matches_expected_errors - test_text_send_retries_without_thread_id_on_thread_not_found - test_disable_web_page_preview_not_leaked_to_media_sends 116/116 existing tests still pass (no regressions). * chore(release): map @kunci115 for PR #27098 salvage * fix(gateway): register Telegram commands for groups Register Telegram bot commands across default, private, and group scopes so the slash-command menu is available outside DMs. Changes from review feedback: - Add asyncio.Lock to prevent race condition in _ensure_forum_commands - Extract MAX_COMMANDS_PER_SCOPE constant (30) to avoid magic number - Upgrade error logging from debug->warning in forum registration - Add tests covering lazy forum registration and concurrent safety - Remove /start handler from this PR (separate feature) Fixes review: needs_work (race, magic number, log levels, missing tests) * test+release: fix test fixture for forum_commands; map @chromalinx * fix(telegram): gate profile bots by allowed topics * chore(release): map @booker1207 for PR #25132 salvage * fix(cron): route Telegram cron deliveries to a dedicated topic via TELEGRAM_CRON_THREAD_ID When Telegram topic mode is enabled, cron messages delivered to the bot's root DM (TELEGRAM_HOME_CHANNEL without a thread id) land in the system lobby — replies there are rebuffed with the lobby reminder and reply_to_message_id is dropped, so users cannot interact with the cron output (#24409). Add an optional TELEGRAM_CRON_THREAD_ID env var that overrides TELEGRAM_HOME_CHANNEL_THREAD_ID for cron deliveries only. Operators can create a "Cron" forum topic in the DM, point this var at its thread id, and replies to cron messages will land in that topic's existing session instead of the lobby. The home-channel thread id (used elsewhere, e.g. restart notifications) is unchanged, and explicit deliver="telegram:chat:thread" targets continue to win over the env var. Per the reporter's clarification on 2026-05-13, option (a) (cron-side route to a dedicated topic + config knob) was chosen. Fixes #24409 * fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline When users send images as documents (Telegram file picker), they were rejected with "Unsupported document type" because SUPPORTED_DOCUMENT_TYPES only includes text/office formats. Add SUPPORTED_IMAGE_DOCUMENT_TYPES to base.py and handle them in telegram.py before the document check. - Add SUPPORTED_IMAGE_DOCUMENT_TYPES constant to base.py - Add MIME reverse-lookup for image types in telegram.py - Route image documents through cache_image_from_bytes + vision pipeline - Handle media groups for image documents Closes: #20128, #18620 * test+release: stub auth in test_telegram_documents fixture; map @kiranvk-2011 * fix(gateway): prevent Windows Telegram /restart leaving gateway stopped * chore(release): map @rak135 for PR #25960 salvage * fix(telegram): preserve topic metadata on overflow edits * feat(telegram): add disable_topic_auto_rename gateway flag When Hermes auto-titles a session in a Telegram DM topic it currently renames the topic itself to the generated title. That works for operator-managed lanes (extra.dm_topics) but is disruptive for ad-hoc Threaded-Mode topics that users name by hand — every first exchange overwrites their chosen title. Add gateway.platforms.telegram.extra.disable_topic_auto_rename (default False, preserving prior behaviour). When set, both _schedule_telegram_topic_title_rename and the underlying _rename_telegram_topic_for_session_title short-circuit before touching the Telegram API. Internal session titles (sessions list, TUI) keep working unchanged. Also bridge the legacy top-level telegram.disable_topic_auto_rename key through to gateway.platforms.telegram.extra so users on the older config layout don't have to migrate to enable it. - Tests cover the runtime flag, the scheduling entry-point, and string truthiness coercion for YAML-loaded values. - Docs updated in messaging/telegram.md with an example block. * chore(release): map @B0Tch1 for PR #27634 salvage * fix(gateway): restore Telegram DM topic thread_id after session split (#27166) When context compression triggers a mid-turn session split, source.thread_id can be None on synthetic/recovered events. _thread_metadata_for_source then returns None, causing the Telegram adapter to send with no message_thread_id and the response lands in the General thread instead of the active DM topic. Fix: - hermes_state.py: Add get_telegram_topic_binding_by_session() for reverse lookup by session_id (enabled by the existing UNIQUE INDEX on session_id). - gateway/run.py: After session-split detection, if source is a Telegram DM and source.thread_id is None, recover it from the binding via the new method so _thread_metadata_for_source produces the correct thread routing. - tests/: Coverage for the new lookup method and the recovery flow. * chore(release): map @jackjin1997 for PR #27239 salvage * fix(gateway): allow chat-scoped telegram auth without sender user_id * chore(release): map @soynchux for PR #27806 salvage * fix(telegram): add DM topic typing fallback when message_thread_id rejected When a DM topic lane's message_thread_id is rejected by Telegram (e.g. stale or deleted topic), send_typing now falls back to sending the typing indicator without thread_id so it at least appears in the main DM view, rather than being silently swallowed. Also adds test for the fallback behavior. * fix(telegram): report cron topic fallback * chore(release): map @el-analista for PR #25368 salvage * fix(telegram): wire gt: callback dispatch for gmail-triage buttons The gmail-triage skill's Telegram inline buttons emit callback_data of the form `gt:<verb>:<arg>`, but `_handle_callback_query` had no `gt:` branch — taps fell through silently and the spinner sat there until Telegram timed it out. Add `_handle_gmail_triage_callback`, dispatched from the existing callback router, that: - Authorizes the caller via the same `_is_callback_user_authorized` path as the approval / slash-confirm / clarify handlers. - Maps each verb to a script under `~/.hermes/scripts/gmail-triage/` and runs it async with a 60s timeout. - Splits verbs into one-shots (send / archive / draft / spam) — append the confirmation and strip the keyboard so the action can't fire twice — and sticky-state changes (mute / trust / vip ± -domain) — append the confirmation but leave the keyboard tappable so the user can stack actions on one email. - On failure: toast only, keyboard preserved so the user can retry. - Logs every callback outcome to gateway.log for debugging. * chore(release): map @khungate for PR #25829 salvage * feat(telegram): support quick-command-only menus * chore(release): map @stevehq26-bot for PR #28015 salvage * fix(telegram): handle channel post updates * test: address telegram channel post review * test+release: stub auth in channel_posts fixture; map @brndnsvr * Quiet noisy Telegram gateway errors * chore(release): map oracle@jarviss-mbp.home for PR #24014 salvage * Route Telegram multi-bot mentions exclusively * Document Telegram multi-profile gateway commands * fix: ignore Telegram messages for other bots * chore(release): map @OCWC22 for PR #24581 salvage * feat(telegram): ignore_root_dm with system command lobby * docs(telegram): document ignore_root_dm feature * chore(release): map @ai-hana-ai for PR #23928 salvage * feat(telegram): pin incoming user message for duration of agent turn When a user sends a message on Telegram, the incoming message is now automatically pinned at the start of processing and unpinned when the agent finishes its turn. This gives the user a visual indicator that their message is being worked on, and keeps the conversation anchored. Changes: - telegram.py: Added pinChatMessage in on_processing_start and unpinChatMessage in on_processing_complete. Restructured both hooks so pin/unpin runs independently of the reactions feature (reactions are optional; pinning is always on). - telegram.py: Pass message_id through SessionSource so it's available in the session context. - session_context.py: Added HERMES_SESSION_MESSAGE_ID context var. - run.py: Pass source.message_id through set_session_vars. Pinning is silent (disable_notification=True) and failures are logged at debug level without interrupting message processing. Only the user's incoming message is pinned -- never the agent's replies. Auto-resume events (which have no message_id) are correctly skipped. * chore(release): map @indigokarasu for PR #26636 salvage * feat(telegram): skip-STT audio path + 2GB cap via local Bot API server Two coordinated changes that unblock downstream audio pipelines (diarization, custom transcription, archival) on attachments larger than the public Bot API's 20MB getFile ceiling. - `stt.enabled: false` no longer drops voice/audio with a generic "transcription disabled" note. The gateway probes the cached file's duration (wave → mutagen → ffprobe ladder) and surfaces `[The user sent a voice message: <abs path> (duration: M:SS)]` to the agent so a skill or tool can pick up the raw file. The previous placeholder is replaced rather than appended when present. - `platforms.telegram.extra.base_url` set → adapter auto-lifts its document size cap from 20MB to 2GB (the local telegram-bot-api `--local` ceiling) and the "too large" reply reports the active limit dynamically. No new config knob; presence of `base_url` is the opt-in. - `platforms.telegram.extra.local_mode: true` wires `Application.builder().local_mode(True)` on the python-telegram-bot builder. PTB then reads files from disk instead of HTTP, which is required when telegram-bot-api runs in `--local` mode (the server returns absolute filesystem paths, not `/file/bot...` URLs). - gateway/run.py: rewrites the `stt.enabled: false` branch of `_enrich_message_with_transcription`. New `_format_duration` + `_probe_audio_duration` helpers. - gateway/platforms/telegram.py: `_max_doc_bytes` instance attribute derived from `extra.base_url`; `local_mode` builder wiring; dynamic "too large" message. - tests/gateway/test_stt_config.py: covers path-surfacing with and without an existing user message, and placeholder replacement. - tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB without base_url, 2GB when set, empty-string base_url keeps default. - website/docs/user-guide/messaging/telegram.md: new "Skipping STT" subsection under Voice Messages and a full "Large Files (>20MB) via Local Bot API Server" walkthrough (api_id/api_hash, docker-compose, one-time `logOut` migration, `platforms.telegram.extra` config, the `local_mode` disk-access requirement, the silent HTTP-fallback 404). - website/docs/user-guide/features/voice-mode.md: documents the `stt.enabled` knob in the config reference. - `pytest tests/gateway/test_telegram_max_doc_bytes.py tests/gateway/test_stt_config.py` → 9/9 passing. - Verified end-to-end on a live deployment: gateway log shows `Using custom Telegram base_url: http://...` and `Using Telegram local_mode (read files from disk)` on startup; voice messages above 20MB cache to disk and surface their path to the agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(release): map @alber70g for PR #25280 salvage * fix(web): add scheduled column to i18n type definitions (#28549) columnLabels and columnHelp in en.ts include a scheduled entry but the Translations interface in types.ts did not declare it, causing a TypeScript build failure in the Nix derivation. Made the field optional since only en.ts provides it currently. * docs: comprehensive 2-week sweep of feature/PR coverage gaps (#28497) Catch the website docs up to two weeks of merged work (May 4 – May 18, 2026, roughly 1,080 PRs). The audit found ~50 user-visible features that had landed in code with no docs footprint, plus a handful of stale pages. This PR closes every gap the scan turned up. New pages - user-guide/features/deliverable-mode.md — extension list, agent triggers, kanban_complete artifacts pattern, [[as_document]] override (PR #27813). - developer-guide/web-search-provider-plugin.md — authoring guide modeled on image-gen-provider-plugin, covering brave_free / ddgs / etc. (PR #25448). Providers / auth - Rename "Alibaba Cloud" → "Qwen Cloud (Alibaba DashScope)" everywhere the display label shows up; provider id stays `alibaba` (PR #24835). - Document OAuth refresh-token quarantine for xAI / MiniMax / Codex (PRs #28116 / #28118 / #28119). - Document Nous JWT minting from refresh token + invalid-refresh quarantine + cross-profile shared token store (PRs #27663 / #19712). - Add `## Microsoft Entra ID authentication (keyless)` section to azure-foundry guide — DefaultAzureCredential, RBAC, OpenAI + Anthropic routing details (PR #28101 / #9df9816da). - Custom providers `api_mode` is now prompted-and-persisted, not just URL autodetected (PR #25068). - Delegation honours `api_mode` + auto-detects anthropic_messages base URLs (PR #26824). - `x_search` auto-enables when xAI credentials are present (PR #27376). - Add `xAI Grok OAuth (SuperGrok)` row to providers headline table (PR #26534). - NVIDIA NIM billing-origin header is set automatically (PR #26585). Windows / installer - `install.ps1`: document `-Commit <sha>` and `-Tag <v>` pin params plus the BOM-strip / git-retry hardening (PR #28169). - Document Hermes Desktop thin installer + first-launch bootstrap (PR #27822). - Document `dep_ensure` Windows bootstrap (PR #27845). - Document install-method auto-detection (pip / git / homebrew / nixos) and the matching update command (PR #27843). Gateway / messaging - `/platform list|pause|resume` full description + circuit-breaker semantics (PR #26600). - Slack / Matrix / Mattermost get parallel `allowed_channels` / `allowed_rooms` allowlist sections matching Telegram/Discord/DingTalk (PR #21251). - Discord `allow_any_attachment` + `max_attachment_bytes` (config and env vars) (PR #27245). - Discord clarify-choice button rendering (PR #25485). - Telegram `guest_mode` @mention bypass for allowlisted groups (PR #22759). - Telegram `notifications` mode (`important` vs `all`) (PR #22793). - `[[as_document]]` skill / response directive for forcing document-style media delivery (PR #21210). CLI / TUI - `/new [name]` argument (PR #19637). - `/subgoal` user-supplied criteria appended to `/goal` (PR #25449). - `/exit --delete` flag confirmation prompts for destructive slash commands (PR #22687). - Status-bar additions: ▶ N background indicator (PR #27175), context compression count (PR #21218), YOLO mode banner+statusbar warning (PR #26238). - `display.timestamps` + `docker_extra_args` config keys (PR #23599). - TUI collapsible startup banner sections (PR #20625). - `HERMES_SESSION_ID` exported to tool subprocesses (PR #23847). i18n - Refresh display.language locale list from 8 → 16 (en, zh, zh-hant, ja, de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu) — matches `agent/i18n.py:SUPPORTED_LANGUAGES`. Tools / features - `vision_analyze` native-pixel passthrough for vision-capable callers, with auxiliary text-describer fallback (PR #22955). - `session_search` rewrite to the single-shape tool (discovery / scroll / browse modes) (PRs #27590 / #27840). - Clarify MCP transport scope: client supports stdio + SSE; embedded `hermes mcp serve` is stdio-only (PR #21227). - Web search backends table: add Brave Search (free tier) and DDGS rows (PR #21337). - ACP session-scoped edit auto-approval modes (PR #27862). - Curator rename map in the user-visible per-run summary (PR #22910). - Prompt caching feature page reference in features/overview.md — Claude cross-session 1-hour prefix cache on native Anthropic / OpenRouter / Nous Portal (PR #23828). - Cron per-job profile parameter (PR #28124). - `--no-skills` flag for `hermes profile create` (PR #20986). Build - Verified with `npm run build` in `website/`; both `en` and `zh-Hans` locales compile. Remaining broken-link/anchor warnings are pre-existing (`rl-training.md` from learning-path / overview; the zh-Hans translation lag the docs skill already calls out). * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 9 (#28571) Pre-stages AUTHOR_MAP entries for 9 new/under-mapped contributors whose PRs are being salvaged in the May 2026 LHF batch group 9. Contributors: - jdelmerico (#28278 — signal require_mention filter) - justemu (#27996 — matrix thread_require_mention) - YuanHanzhong (#28029 — dashboard browser scrollback) - noctilust (#28080 — drop stale TUI resume env) - MoonJuhan (#28288 — tolerate unreadable JSONL transcripts) - outsourc-e (#28164 — cron emoji ZWJ sequences) - Zyrixtrex (#28275 — Google OAuth urlopen timeout) - ooovenenoso (#28256 — tool loop recovery hints) - vanthinh6886 (#28018 — yaml/flock/atomic write guards; non-noreply email) Per references/batch-pr-salvage-may14-additions.md. * feat(signal): add require_mention filter for group chats Add a configurable mention filter to the Signal adapter so the bot only responds in groups when it is explicitly @mentioned. Changes: - gateway/platforms/signal.py: read require_mention from adapter extra config or SIGNAL_REQUIRE_MENTION env var; skip group messages that don't mention the bot account (checked in rendered text and raw mention metadata) - gateway/config.py: map signal.require_mention YAML key to the SIGNAL_REQUIRE_MENTION env var (env var takes precedence) Config example: signal: require_mention: true Or via env var: SIGNAL_REQUIRE_MENTION=true * Revert "feat(telegram): pin incoming user message for duration of agent turn" This reverts commit a724c3b9cf5f01e28365322ae5ae3a9579567806. * Revert "feat(telegram): support quick-command-only menus" This reverts commit b1acf80e17858e2e5ae7c0d412a3a573d7fcbca4. * Revert "feat(send_message): auto-detect @username mentions and create Telegram entities" This reverts commit cf814c96f613b38bd891ac941c32da653e81c7ad. * Revert "fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages" This reverts commit db50af910be6b4171ea9cf54f4cc38be27ac1da6. * fix(gateway): pre-mark sessions as resume_pending before drain to prevent data loss (#27856) Pre-mark all running agent sessions as resume_pending BEFORE the drain wait begins. If the service manager kills the process during the drain (window), the durable marker is already written so the next gateway boot can recover in-flight sessions. On graceful drain completion, clear the early markers for sessions that finished successfully. * fix(matrix): implement thread_require_mention to prevent multi-agent reply loops In multi-agent shared Matrix rooms, multiple bots all participating in the same thread could trigger infinite reply loops — each bot's reply re-engaged the others because they were all in the bot-thread set. Discord has a `thread_require_mention` opt-in for this; Matrix didn't. Add `_parse_thread_require_mention(config)` (mirrors Discord's pattern). In `_resolve_message_context`, when enabled and the message is in a bot-participated thread (not a free-response room), require @mention before processing. Salvage of @justemu's 2-commit stack (#27996). Fixes #27995. * fix(cli): show active profile in TUI prompt * fix(tui): preserve dunder identifiers in markdown * test(file_ops): add regression tests for git baseline warning in write_file Adds TestGitBaselineCheck with 6 unit tests covering _check_git_baseline and the warning field in write_file result: - Git not available → None - Not in a git repo → None - Clean repo → None - Dirty repo → returns warning string with branch name - write_file result includes warning when dirty - write_file result omits warning when clean * fix(dashboard): use browser scrollback for chat wheel * fix(cli): ignore stale HERMES_TUI_RESUME env HERMES_TUI_RESUME is an internal env var the Python wrapper exports to hand a session ID off to the Ink TUI. Because _launch_tui started from os.environ.copy(), any exported/stale value in the user's shell leaked through — so plain `hermes --tui` would try to resume a missing session and leave the UI at 'error: session not found' with no live session. Drop HERMES_TUI_RESUME from the env before conditionally re-setting it from the argparse-resolved resume_session_id. Tests cover both the drop path and the set-from-arg path. Salvage of #28080 by @noctilust. * fix(cron): allow emoji ZWJ sequences in prompts * fix: tolerate unreadable gateway JSONL transcripts * fix(skills): add timeout to Google OAuth urlopen calls * fix: add recovery hints to loop guard warnings * fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 1. trajectory_compressor.py: yaml.safe_load() returns None on empty files, crashing with TypeError on `if 'tokenizer' in data`. Fix by adding `or {}` fallback. (HIGH — blocks startup with empty config) 2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without try/except: cron/scheduler.py, hermes_cli/auth.py, agent/shell_hooks.py, tools/skill_usage.py, tools/environments/file_sync.py, tools/memory_tool.py. If unlock raises OSError, fd.close() is skipped and the lock is held forever. The msvcrt branches already had try/except; the fcntl branches did not. Fix by wrapping in try/except (OSError, IOError): pass. 3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists() followed by path.read_text() with no try/except. If file is deleted between the check and the read, FileNotFoundError propagates. Fix by using try/except FileNotFoundError. 4. gateway/sticker_cache.py: non-atomic write via Path.write_text() can leave truncated JSON on crash, causing JSONDecodeError on next load. Fix by writing to tempfile + fsync + os.replace (atomic). * chore(release): alias xxxigm noreply for upcoming #27986 salvage (#28594) Adds the canonical noreply form (54813621+xxxigm@users.noreply.github.com) alongside the existing plain-email mapping so the salvage commit for @xxxigm's codex doctor PR doesn't fail AUTHOR_MAP CI. * fix(doctor): attach codex CLI hint to OpenAI Codex auth warning for #27975 `hermes doctor` printed 'codex CLI not installed (optional — ...)' as a generic info line at the bottom of the auth section, several rows below 'OpenAI Codex auth (not logged in)' and after MiniMax/Gemini auth checks. Users reading sequentially mistook it for MiniMax-related advice. Move the hint up under the Codex auth warning so it's adjacent to the row it actually pertains to. Behavior unchanged when the codex CLI is installed (success path keeps its 'codex CLI ✓' row at the bottom). Tests cover both placement and suppression cases. Salvage of @xxxigm's 3-commit stack (#27986). Closes #27975. * fix(tests): catch up 25 stale tests after recent merges (#28626) Sweep of all CI failures on origin/main, grouped by drift source: Telegram allowlist gate (db50af910 added user-authz to _should_process_message): - Hardcoded "[Telegram]" prefix in the logger.warning so the call no longer dereferences self.name → self.platform, which test fixtures built via object.__new__ never set. - test_telegram_format / test_allowed_channels_widening fixtures stub _is_callback_user_authorized → True so the new gate doesn't reject guest-mode / allowed-channels test messages. - test_telegram_approval_buttons::test_update_prompt_callback_not_affected sets TELEGRAM_ALLOWED_USERS="*" so the fail-closed default doesn't reject the callback before it writes .update_response. Approval surface (6d495d9e7 renamed status, 214b95392 detached stdin): - test_no_callback_returns_approval_required: status is now "pending_approval" (was "approval_required"). - test_close_stdin_allows_eof_driven_process_to_finish: switch to use_pty=True; non-PTY now uses stdin=DEVNULL. Mattermost (send() now resolves root_id via _api_get first): - test_send_with_thread_reply mocks _session.get with a thread-root response so the new resolver doesn't TypeError on a bare AsyncMock. Kanban (d8ad431de rename, f55d94a1e review column, _kanban_worker_skill_available): - _safe_int → _to_epoch in the two test_kanban_db tests. - Spawn-skills tests (×3) monkey-patch _kanban_worker_skill_available to True since the isolated kanban_home fixture has no devops/kanban-worker tree. - test_gateway_dispatcher_disables_corrupt_board: connect count 3 → 5 (review-column probe now also runs per tick). Aux-config severity at_or_above (a94ddd807): - test_diagnostics_endpoint_severity_filter expects warning filter to include error+critical now (was exact-match). Anthropic error handling (conversation loop extracted from run_agent): - _no_backoff_wait fixture patches BOTH run_agent.jittered_backoff AND agent.conversation_loop.jittered_backoff. The latter is the actual call site; without the second patch tests burn ~2s per retry and hit the 30s SIGALRM timeout on CI. Other test pollution / drift: - test_auto_does_not_select_copilot_from_github_token: patch agent.bedrock_adapter.has_aws_credentials → False so boto3's credential chain can't auto-pick Bedrock from developer ~/.aws. - test_setup_openclaw_migration: patch hermes_cli.gateway.get_env_value in addition to setup_mod.get_env_value — _platform_status reads through the gateway module's binding. - test_gateway_prefix: COMPONENT_PREFIXES["gateway"] now includes "hermes_plugins" too. - test_recommended_update_command_defaults_to_hermes_update: also short-circuit get_managed_update_command in case a stray ~/.hermes/.managed marker is present. - test_user_id_is_not_explicit: _parse_target_ref now returns is_explicit=False for Slack U.../W... IDs (chat.postMessage rejects them — a DM must be opened first via conversations.open). * feat(update): syntax-validate critical files post-pull, auto-rollback on failure (#28669) Catch the PR #28452 failure mode (orphan merge-conflict markers in hermes_cli/config.py) on the user side: after git pull succeeds, compile the files every 'hermes' invocation imports at startup. If any has a syntax error, git reset --hard back to the pre-pull SHA so the install stays bootable. User can retry once a fix lands upstream. - New _capture_head_sha() + _validate_critical_files_syntax() helpers - Wires both into _cmd_update_impl after the pull/reset succeeds - Tests cover the helpers, the rollback flow, and a production-tree invariant (CI fails if main itself has a syntax error in a critical file — catches future broken commits before users hit them) * feat: show names of user-modified skills in bundled skill sync summary When 'hermes update' syncs bundled skills, the summary line only shows the count of user-modified skills that were kept (e.g. '3 user-modified (kept)'), but not *which* skills. Once the update finishes, the user has no way to know which skills need triage. Append the skill names to the summary line, truncated to 5 with a '+N more' suffix for long lists: Done: 12 new, 3 updated, 7 unchanged, 3 user-modified (kept): hermes-agent, debugging-hermes-tui-commands, system-health. 25 total bundled. Closes #28121 * fix(acp): use tempfile.gettempdir() in workspace auto-approve #28063 fixed the macOS `/tmp`→`/private/tmp` symlink issue by checking the RAW path (pre-resolve) against startswith('/tmp/'). That works on Linux + macOS but not on Windows — Path('/tmp/foo').resolve() returns C:\\tmp\\foo and isn't the real Windows temp anyway. Replace the hardcoded '/tmp/' prefix with Path(tempfile.gettempdir()). resolve() + Path.relative_to() — same idiom as the cwd branch just below. Works correctly on Linux (/tmp), macOS (/private/var/folders/...), and Windows (%LOCALAPPDATA%\\Temp). Test rewritten to use tempfile.gettempdir() so the assertion exercises the same code path on every platform. Conflict against the just-merged #28063 (raw_path approach) resolved by replacing the whole raw_path block — tempfile.gettempdir() is strictly better than that intermediate fix. Salvage of #28262 by @Zyrixtrex. * fix(kanban): stale reclaim must not tick failure counter (#28680) Follow-up to #28452. detect_stale_running() was calling _record_task_failure() on every reclaim, which ticked the consecutive_failures counter. With the default failure_limit=2, two legitimately long-running tasks (>4 h without explicit heartbeat) would auto-block via the spawn-failure circuit breaker — even though no worker actually failed. Stale reclaim is dispatcher-side absence-of-heartbeat detection, not a worker fault. Removed the _record_task_failure() call; the 'stale' event in task_events is still the audit surface, but the failure counter is now reserved for spawn_failed / timed_out / crashed (real failures). Also documents the heartbeat requirement: - KANBAN_GUIDANCE in agent/prompt_builder.py now states the rule ('call kanban_heartbeat at least once an hour for tasks running longer than 1 hour') so workers learn the contract. - kanban.md adds the stale event row to the events table and flags the heartbeat requirement in the worker lifecycle list. New regression test: test_detect_stale_does_not_tick_failure_counter locks in the new behaviour. * fix(telegram): address post-merge audit follow-ups (#28670, #28672, #28674, #28676, #28678) Five small fixes against issues filed during the post-merge salvage audit: * #28670: `_GATEWAY_PROVIDER_ERROR_RE` false-positives on legitimate prose. Replace the regex with an anchored `_GATEWAY_PROVIDER_ERROR_SHAPE_RE` and add a length-cap heuristic to `_looks_like_gateway_provider_error`: short envelope at the start of the message → real provider error; long prose containing 'HTTP 404' → assistant answer, leave alone. * #28672: drop the pointless 1s asyncio.sleep on Telegram thread-not-found retries. The same-thread retry is preserved (catches Telegram's occasional transient flake exercised by test_send_retries_transient_thread_not_found_before_fallback) but with no artificial delay. * #28674: broaden `_should_retry_without_dm_topic_reply_anchor` to also fire when Bot API rejects `direct_messages_topic_id` for synthetic / resumed sends that have no reply anchor. Avoids dropping post-resume background notifications if the topic id goes stale. * #28676: delete the dead image-document branch superseded by bd0c54d17 (which returns early on the same extension set). * #28678: extend chat-scoped allowlist (`TELEGRAM_GROUP_ALLOWED_CHATS`) to also cover `chat_type == 'channel'`, so operators can authorize channel posts by chat id without falling back to per-user allowlists. Tests: - scripts/run_tests.sh tests/gateway/test_telegram_thread_fallback.py -q → 41/41 - scripts/run_tests.sh tests/cron/test_scheduler.py -q → 127/127 - broader test set: same 3 pre-existing test-pollution failures reproduce on plain main. * chore(actions)(deps): bump the actions-minor-patch group across 1 directory with 2 updates Bumps the actions-minor-patch group with 2 updates in the / directory: [google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml](https://github.com/google/osv-scanner-action) and [sigstore/gh-action-sigstore-python](https://github.com/sigstore/gh-action-sigstore-python). Updates `google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml` from 2.3.5 to 2.3.8 - [Release notes](https://github.com/google/osv-scanner-action/releases) - [Commits](https://github.com/google/osv-scanner-action/compare/c51854704019a247608d928f370c98740469d4b5...9a498708959aeaef5ef730655706c5a1df1edbc2) Updates `sigstore/gh-action-sigstore-python` from 3.0.0 to 3.3.0 - [Release notes](https://github.com/sigstore/gh-action-sigstore-python/releases) - [Changelog](https://github.com/sigstore/gh-action-sigstore-python/blob/main/CHANGELOG.md) - [Commits](https://github.com/sigstore/gh-action-sigstore-python/compare/f514d46b907ebcd5bedc05145c03b69c1edd8b46...04cffa1d795717b140764e8b640de88853c92acc) --- updated-dependencies: - dependency-name: google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml dependency-version: 2.3.8 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions-minor-patch - dependency-name: sigstore/gh-action-sigstore-python dependency-version: 3.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions-minor-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore(actions)(deps): bump docker/login-action from 3.7.0 to 4.1.0 Bumps [docker/login-action](https://github.com/docker/login-action) from 3.7.0 to 4.1.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/c94ce9fb468520275223c153574b00df6fe4bcc9...4907a6ddec9925e35a0a9e82d7399ccc52663121) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: 4.1.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore(actions)(deps): bump docker/build-push-action from 6.19.2 to 7.1.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.19.2 to 7.1.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/10e90e3645eae34f1e60eeb005ba3a3d33f178e8...bcafcacb16a39f128d818304e6c9c0c18556b85f) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: 7.1.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore(actions)(deps): bump actions/setup-python from 5.3.0 to 6.2.0 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.3.0 to 6.2.0. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v5.3.0...a309ff8b426b58ec0e2a45f0f869d46889d02405) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: 6.2.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * fix(kanban): respawn guard defers blocker_auth instead of auto-blocking (#28683) Follow-up to #28455. The respawn guard's blocker_auth rule (last error matched a quota/auth/429 pattern) was auto-blocking the task on first occurrence. That's too aggressive: transient rate limits typically clear in seconds to minutes, but the auto-block puts the task in 'blocked' status which requires manual unblock. Now treats blocker_auth the same as recent_success and active_pr: defer the spawn this tick, leave the task in 'ready', let the next tick try again. If the auth error genuinely persists, the existing consecutive_failures counter trips the auto-block circuit breaker after failure_limit failures via the normal path — so a persistent 401/403/quota-exhausted still ends up blocked, just not on first hit. Also documents the respawn_guarded event in kanban.md's events table with the three guard reasons. Updated test_dispatch_respawn_guard_auto_blocks_auth_error → renamed to test_dispatch_respawn_guard_defers_auth_error_without_auto_block; asserts task stays in 'ready' and the guard reason is recorded. * chore(actions)(deps): bump actions/checkout from 4.3.1 to 6.0.2 Bumps [actions/checkout](https://github.com/actions/checkout) from 4.3.1 to 6.0.2. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/34e114876b0b11c390a56381ad16ebd13914f8d5...de0fac2e4500dabe0009e67214ff5f5447ce83dd) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.2 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * fix(dashboard): add scheduled kanban i18n strings (#28534) Co-authored-by: Austin Pickett <pickett.austin@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cli): exit prompt_toolkit cleanly on SIGTERM/SIGHUP instead of raising KeyboardInterrupt (#28688) The SIGTERM/SIGHUP handler raised KeyboardInterrupt() at the end of its agent-interrupt + grace-window sequence. Python delivers signals between bytecodes on the main thread, so when the signal hit mid-event-loop (typically inside prompt_toolkit's '_poll_output_size' coroutine's 'await asyncio.sleep()'), the KeyboardInterrupt unwound INTO that coroutine. prompt_toolkit's Task captured it as a BaseException; prompt_toolkit's '_handle_exception' then printed 'Unhandled exception in event loop' + the full asyncio traceback and parked the terminal on 'Press ENTER to continue...' before exiting. Same root cause as #13710, different surface: there the failure was an EIO cascade after a logging-cache KeyError escaped the handler; here it's the KBI raise itself landing inside an asyncio Task. The fix is the same shape — let the event loop unwind on its own terms. Now: schedule 'app.exit()' via 'loop.call_soon_threadsafe()'. The prompt_toolkit Application returns normally from 'app.run()' and the existing '(EOFError, KeyboardInterrupt, BrokenPipeError)' handler in the input loop catches everything else. Fallback to 'raise KeyboardInterrupt()' preserved for contexts where prompt_toolkit isn't the active app (e.g. -q one-shot mode). The agent interrupt + 1.5 s grace window run unchanged before the new exit path, so subprocess-group cleanup ('os.killpg' on Linux) still gets its window. Tested live: external SIGTERM to the CLI (with 'kill <pid>') now exits cleanly with no traceback dump and no ENTER pause. * chore(deps): bump dompurify from 3.3.3 to 3.4.2 in /website Bumps [dompurify](https://github.com/cure53/DOMPurify) from 3.3.3 to 3.4.2. - [Release notes](https://github.com/cure53/DOMPurify/releases) - [Commits](https://github.com/cure53/DOMPurify/compare/3.3.3...3.4.2) --- updated-dependencies: - dependency-name: dompurify dependency-version: 3.4.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump follow-redirects from 1.15.11 to 1.16.0 in /website Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.11 to 1.16.0. - [Release notes](https://github.com/follow-redirects/follow-redirects/releases) - [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.15.11...v1.16.0) --- updated-dependencies: - dependency-name: follow-redirects dependency-version: 1.16.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump lodash from 4.17.23 to 4.18.1 in /website Bumps [lodash](https://github.com/lodash/lodash) from 4.17.23 to 4.18.1. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](https://github.com/lodash/lodash/compare/4.17.23...4.18.1) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.18.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump lodash-es and langium in /website Bumps [lodash-es](https://github.com/lodash/lodash) and [langium](https://github.com/eclipse-langium/langium/tree/HEAD/packages/langium). These dependencies needed to be updated together. Updates `lodash-es` from 4.17.23 to 4.18.1 - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](https://github.com/lodash/lodash/compare/4.17.23...4.18.1) Updates `langium` from 4.2.1 to 4.2.3 - [Release notes](https://github.com/eclipse-langium/langium/releases) - [Changelog](https://github.com/eclipse-langium/langium/blob/main/packages/langium/CHANGELOG.md) - [Commits](https://github.com/eclipse-langium/langium/commits/HEAD/packages/langium) --- updated-dependencies: - dependency-name: lodash-es dependency-version: 4.18.1 dependency-type: indirect - dependency-name: langium dependency-version: 4.2.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump python-multipart from 0.0.22 to 0.0.27 Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.22 to 0.0.27. - [Release notes](https://github.com/Kludex/python-multipart/releases) - [Changelog](https://github.com/Kludex/python-multipart/blob/main/CHANGELOG.md) - [Commits](https://github.com/Kludex/python-multipart/compare/0.0.22...0.0.27) --- updated-dependencies: - dependency-name: python-multipart dependency-version: 0.0.27 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump python-dotenv from 1.2.1 to 1.2.2 Bumps [python-dotenv](https://github.com/theskumar/python-dotenv) from 1.2.1 to 1.2.2. - [Release notes](https://github.com/theskumar/python-dotenv/releases) - [Changelog](https://github.com/theskumar/python-dotenv/blob/main/CHANGELOG.md) - [Commits](https://github.com/theskumar/python-dotenv/compare/v1.2.1...v1.2.2) --- updated-dependencies: - dependency-name: python-dotenv dependency-version: 1.2.2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump fast-uri from 3.1.0 to 3.1.2 in /website Bumps [fast-uri](https://github.com/fastify/fast-uri) from 3.1.0 to 3.1.2. - [Release notes](https://github.com/fastify/fast-uri/releases) - [Commits](https://github.com/fastify/fast-uri/compare/v3.1.0...v3.1.2) --- updated-dependencies: - dependency-name: fast-uri dependency-version: 3.1.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump @babel/plugin-transform-modules-systemjs in /website Bumps [@babel/plugin-transform-modules-systemjs](https://github.com/babel/babel/tree/HEAD/packages/babel-plugin-transform-modules-systemjs) from 7.29.0 to 7.29.4. - [Release notes](https://github.com/babel/babel/releases) - [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md) - [Commits](https://github.com/babel/babel/commits/v7.29.4/packages/babel-plugin-transform-modules-systemjs) --- updated-dependencies: - dependency-name: "@babel/plugin-transform-modules-systemjs" dependency-version: 7.29.4 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * fix(web): consume bundled design system assets (#26391) * fix: update design system…

@aqilaziz

Salvages NousResearch#26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three.

@aqilaziz

Salvages NousResearch#26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. #AI commit#

@aqilaziz

Salvages NousResearch#26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three.

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/cli CLI entry point, hermes_cli/, setup wizard labels May 15, 2026

feat(kanban): configure worktree paths and branches

a7558cf

aqilaziz force-pushed the feat-kanban-worktree-branch branch from f95a405 to a7558cf Compare May 15, 2026 22:41

fix(gateway): ignore inaccessible service path dirs

6ae191b

teknium1 mentioned this pull request May 19, 2026

feat(kanban): configure worktree paths and branches (#26496) #28462

Merged

teknium1 closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kanban): configure worktree paths and branches#26496

feat(kanban): configure worktree paths and branches#26496
aqilaziz wants to merge 2 commits into
NousResearch:mainfrom
aqilaziz:feat-kanban-worktree-branch

aqilaziz commented May 15, 2026

Uh oh!

teknium1 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aqilaziz commented May 15, 2026

Summary

Tests

Uh oh!

teknium1 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants