fix: handle whitespace-only cron responses by joe102084 · Pull Request #28151 · NousResearch/hermes-agent

joe102084 · 2026-05-18T19:04:15Z

Summary

Treat whitespace-only cron final responses the same as empty responses.
Avoid delivering blank cron messages to users.
Mark those runs as soft failures so broken/model-empty jobs remain visible in cron status.

Test Plan

scripts/run_tests.sh tests/cron/test_scheduler.py::TestSilentDelivery -v --tb=short
scripts/run_tests.sh tests/cron/test_scheduler.py -q

Note: local working tree had a pre-existing uncommitted ui-tui/package-lock.json diff; this PR only includes cron/scheduler.py and tests/cron/test_scheduler.py.

BoardJames-Bot

Reviewed by Hermes Agent. Whitespace-only cron final responses now suppress blank delivery and are marked as soft failures; existing silent-delivery behavior is preserved. Focused local test passed (tests/cron/test_scheduler.py::TestSilentDelivery: 7 passed). No blockers found.

…28328) Pre-stages AUTHOR_MAP entries for 10 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 8). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - AceWattGit (#28159 — _pool_may_recover_from_rate_limit NameError) - YuanHanzhong (#28032 — x.com/status fallbacks link-like) - colin-chang (#28245, #28249, #28251 — gateway + mattermost fixes) - felix-windsor (#28019 — preserve cron asterisks in strip mode) - houenyang-momo (#28205 — charizard completion menu contrast) - iqdoctor (#28095 — windows installer docs) - joe102084 (#28151 — whitespace-only cron responses) - jvinals (#27936 — Slack U-IDs → DM channel) - maxmilian (#28267 — ModelPickerDialog portal) - samggggflynn (#27952 — dingtalk pre_start) Per references/batch-pr-salvage-may14-additions.md.

teknium1 · 2026-05-19T03:08:18Z

Merged via PR #28352 (cherry-picked onto current main with your authorship preserved via rebase-merge — commit 6143013). Thanks for the contribution!

* fix(wecom): handle WSMsgType.CLOSING to prevent CPU spin The WeCom adapter's _read_events() loop only handled CLOSE, CLOSED, and ERROR websocket message types. When the server initiates a graceful shutdown, aiohttp returns WSMsgType.CLOSING before the connection is fully closed. This message type was not handled, causing the receive() call to return immediately in a tight loop while self._ws.closed remained False. The result was 100% CPU usage on the asyncio event loop. Add WSMsgType.CLOSING to the set of terminal message types that raise RuntimeError("WeCom websocket closed"), allowing _listen_loop() to enter its normal reconnect backoff path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(auth): treat empty credential pool entries as unauthenticated Fixes #28140 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: include hermes_plugins in gateway.log component filter gateway.log uses a _ComponentFilter that only passes records from loggers starting with ('gateway',). Plugin modules are loaded under the hermes_plugins.* namespace, so all plugin log output is silently dropped from gateway.log. This makes plugin registration — which directly affects gateway hooks (pre_gateway_dispatch, transform_llm_output, etc.) — invisible in the gateway-specific log. Operators debugging gateway behavior check gateway.log and see no plugin activity, even when plugins are working correctly. Add 'hermes_plugins' to the gateway component prefixes tuple so plugin log messages appear in gateway.log. Closes #28138 * fix(gateway): align kanban artifact _IMAGE_EXTS with response dispatch _deliver_kanban_artifacts used a broader _IMAGE_EXTS that included .bmp, .tiff, and .svg. These three extensions are absent from the equivalent set in _deliver_media_from_response (line 10661), which intentionally routes them through send_document rather than send_multiple_images (comment near line 10522 notes that Telegram sendPhoto recompresses and rejects non-raster formats). Routing .svg (XML text), .bmp, or .tiff through the photo API causes send_multiple_images to raise on most platforms; the exception is caught and logged as a warning, silently dropping the artifact. Aligning the two sets ensures kanban deliverables with these extensions follow the same send_document path as regular agent responses. No behaviour change for .png/.jpg/.jpeg/.gif/.webp. * fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze Background process non-PTY path used stdin=subprocess.PIPE unconditionally, creating an orphan pipe that was never written to and never closed. Child processes that read stdin would block indefinitely, competing with the parent's prompt_toolkit event loop for terminal ownership and causing complete keyboard lockout. Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin reads instead of blocking forever. For interactive stdin, the PTY path (which has its own independent PTY via ptyprocess.PtyProcess.spawn) should be used instead. Fixes #17959 * chore(release): alias stale-ID salvage commit for @LifeJiggy (#28317) * fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze Background process non-PTY path used stdin=subprocess.PIPE unconditionally, creating an orphan pipe that was never written to and never closed. Child processes that read stdin would block indefinitely, competing with the parent's prompt_toolkit event loop for terminal ownership and causing complete keyboard lockout. Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin reads instead of blocking forever. For interactive stdin, the PTY path (which has its own independent PTY via ptyprocess.PtyProcess.spawn) should be used instead. Fixes #17959 * chore(release): alias stale-ID salvage commit for LifeJiggy PR #28315 was salvaged with a wrong noreply numeric ID (192385615 vs the correct 141562589). The commit on main is correctly authored to LifeJiggy by username, but the noreply email doesn't match AUTHOR_MAP. Adds an alias so release-notes generation maps both forms to the same contributor. --------- Co-authored-by: LifeJiggy <192385615+LifeJiggy@users.noreply.github.com> * fix: elevate plugin discovery failures from debug to warning Plugin discovery exceptions in gateway startup (gateway/run.py) and CLI startup (hermes_cli/main.py) are caught and logged at DEBUG level, making them invisible at the default INFO log level. If any plugin import fails — syntax error, missing dependency, import cycle — operators get zero indication unless they bump the log level to DEBUG. This makes broken plugins appear enabled but silently non-functional. Change both locations to logger.warning() so failures are visible at production log levels. Closes #28137 * fix: treat inline-shell timeout guard as timeout * fix(acp): resolve /tmp symlink before workspace auto-approve check on macOS Path.resolve() follows the /tmp -> /private/tmp symlink on macOS, so str(path).startswith("/tmp/") is always False for temp-dir paths. The "Accept Edits" (workspace_session) mode silently refused to auto-approve every /tmp write on macOS, breaking the documented behaviour and making the existing test fail on this platform. Fix: keep the raw expanded path (pre-resolve) for the /tmp prefix check and continue using the resolved form only for the cwd relative_to() call where symlink resolution is correct behaviour. * fix(kanban): single-row horizontal scroll for board columns Switch .hermes-kanban-columns from auto-fit CSS grid to a flex row with overflow-x: auto and a hidden scrollbar (scrollbar-width / ::-webkit- scrollbar), and pin .hermes-kanban-column to flex: 0 0 280px so columns sit side-by-side at a fixed width instead of wrapping into a 2xN grid. Page vertical scroll is unaffected: each column already caps at max-height: calc(100vh - 220px), so the container never grows tall enough to introduce its own vertical scrollbar. * fix(approval): surface pending-approval state with explicit marker visible to LLM When a tool call requires user approval in the non-blocking gateway path, the LLM previously received a result that was indistinguishable from a failed tool call (exit_code=-1, error=message). The LLM could not tell whether the tool was pending approval, had returned empty results, or had failed silently — causing it to burn context on wrong hypotheses. Fix changes the result format to include: - status: pending_approval (clear state name) - approval_pending: True (explicit boolean for LLMs to detect) - error: cleared to empty string (removes misleading error signal) This lets the LLM reason about approval latency vs actual errors, short-circuiting the previous silent failure mode. Fixes #14806 * fix: recognize emoji and caret as natural response endings GLM models via Ollama report finish_reason='stop' even when the response was truncated by max_tokens. The continuation mechanism uses _has_natural_response_ending() as one of the heuristics to detect whether the response was genuinely finished. Currently only ASCII punctuation and CJK punctuation are recognized. This means any response ending with an emoji (e.g. ⚡, 👍) or the caret character ^ (common in French ^^ smiley) is not recognized as naturally ended, triggering a false-positive continuation where the model receives 'Continue where you left off' and produces garbled output. Add: - ^ (caret) to the punctuation set - Unicode emoji range (codepoint >= 0x1F300) as natural ending This only affects GLM/Ollama users but the fix is safe for all backends since _has_natural_response_ending() is only consulted inside the continuation flow. * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 8 (#28328) Pre-stages AUTHOR_MAP entries for 10 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 8). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - AceWattGit (#28159 — _pool_may_recover_from_rate_limit NameError) - YuanHanzhong (#28032 — x.com/status fallbacks link-like) - colin-chang (#28245, #28249, #28251 — gateway + mattermost fixes) - felix-windsor (#28019 — preserve cron asterisks in strip mode) - houenyang-momo (#28205 — charizard completion menu contrast) - iqdoctor (#28095 — windows installer docs) - joe102084 (#28151 — whitespace-only cron responses) - jvinals (#27936 — Slack U-IDs → DM channel) - maxmilian (#28267 — ModelPickerDialog portal) - samggggflynn (#27952 — dingtalk pre_start) Per references/batch-pr-salvage-may14-additions.md. * fix: add pre_start() to _IncomingHandler for dingtalk SDK compatibility The dingtalk-stream SDK calls pre_start() on every registered handler before opening the WebSocket connection. Without this method, the SDK raises AttributeError and kills the stream connection, causing DingTalk to be unable to connect via Stream Mode. * fix(windows): handle redirected stdout in _cprint fallback Wraps _pt_print in try/except with a print() fallback. When a kanban worker's stdout is piped to a log file, prompt_toolkit raises NoConsoleScreenBufferError (Windows) or OSError (other) because there is no real console buffer. The fallback keeps worker output flowing instead of crashing. * chore(release): alias stale-ID salvage commit for @Grogger (#28334) PR #28330 was salvaged with a wrong noreply numeric ID (18091625 vs the correct 7065068). The commit on main is correctly authored to Grogger by username, but neither noreply form was in AUTHOR_MAP. Adds both so release-notes generation maps them to @Grogger. * fix(aux): remove stale session_search model menu entry * fix(tui): keep x status citation fallbacks link-like * fix(xai-oauth): quarantine dead tokens on terminal refresh failure resolve_xai_oauth_runtime_credentials() called _refresh_xai_oauth_tokens() with no try/except. A terminal refresh failure (HTTP 400/401/403 — invalid_grant, token revoked) propagated without clearing the dead access_token / refresh_token from auth.json, causing every subsequent session to retry the same doomed network request. Add a try/except around the refresh call that mirrors the existing credential_pool.py quarantine: when _is_terminal_xai_oauth_refresh_error identifies a non-retryable failure, clear the dead token fields from auth.json and write a last_auth_error diagnostic marker so future calls fail fast with a clear relogin_required error instead of hitting the network. active_provider is preserved (set_active=False) so multi-provider users whose chosen provider is not xai-oauth are unaffected. Tests: two new cases in test_auth_xai_oauth_provider.py cover terminal quarantine and transient pass-through. * feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644) The background review prompts (_SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT) now include explicit protection rules for bundled, hub-installed, and pinned skills — aligning with the curator's existing policy at curator.py L345/350. Before this change, bg-review could freely rewrite bundled skills like 'hermes-agent' or pinned skills, while the 7-day curator explicitly skips them. The review agent now sees: • Bundled skills (shipped with Hermes) • Hub-installed skills (installed via hermes skills install) • Pinned skills (marked via hermes curator pin) If only protected skills need updating, the review says 'Nothing to save.' and stops. Fixes #27644 * fix(web): portal Change Model modal so it renders above the app sidebar The dashboard's main column is `relative z-2` (App.tsx), which creates a stacking context that traps fixed descendants below the app sidebar (`z-50`). `ModelPickerDialog` renders `fixed inset-0 z-[100]` inline, so its z-100 is scoped to z-2 and the sidebar covers its left edge. The bug is visible across all themes but only obvious in the Large theme variants (Hermes Teal (Large), etc.) where the larger root font widens the dialog into the sidebar's column. Toast.tsx already documents the same trap and uses the same `createPortal(..., document.body)` escape. This commit ports the picker; the same pattern affects other inline z-[100] modals in the dashboard (OAuthLoginModal, Cron / Models / Profiles page modals) and is left for a follow-up — keeping this PR scoped to the reporter's specific case. Fixes #28103 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): exit code 75 on service restart so launchd relaunches When the gateway receives SIGUSR1 (graceful restart via launchd_restart), the SIGUSR1 handler calls request_restart(via_service=True) and the gateway shuts down cleanly with exit code 0. However, the generated launchd plist uses KeepAlive → SuccessfulExit → false, meaning launchd only relaunches on *non-zero* exit codes. A clean exit(0) is treated as "successful, don't restart", so the gateway stays down after /restart, /update, or SIGUSR1. The systemd unit template already uses RestartForceExitStatus=75 for the same scenario. Mirror that convention: when _restart_via_service is True, raise SystemExit(75) so launchd's SuccessfulExit=false policy triggers a relaunch. Closes #28135 * fix: guard json.loads() against invalid TTS and skill_view responses Two code paths call json.loads() on output from external tools without catching JSONDecodeError. If the tool returns a non-JSON string (error message, empty string, or None), the entire call path crashes. 1. gateway/run.py — text_to_speech_tool() result in voice reply path. A TTS failure that returns an error string instead of JSON crashes the voice reply handler, killing the message response entirely. 2. cron/scheduler.py — skill_view() result when loading skills for cron jobs. A corrupted or missing skill file that returns an error string instead of JSON crashes the cron tick, preventing all jobs from executing that cycle. Both fixes catch (json.JSONDecodeError, TypeError), log a warning, and gracefully skip the failed operation instead of crashing. * fix(gateway): bridge gateway_restart_notification from YAML platform sections Two related bugs in gateway/config.py prevented per-platform gateway_restart_notification from working through config.yaml: 1. The shared-key bridging loop (load_gateway_config) omitted 'gateway_restart_notification', so the key never landed in platform_data['extra'] even when set under e.g. 'discord:' or 'mattermost:' sections. 2. PlatformConfig.from_dict() only read gateway_restart_notification from the top-level data dict, ignoring the 'extra' sub-dict where bridged keys are stored. Fix: add the key to the bridging loop, and add an 'extra' fallback in from_dict() so that round-tripped values (YAML → bridged → extra → from_dict) resolve correctly. Impact: users can now set gateway_restart_notification: false per platform in config.yaml instead of relying on env vars or the global platforms: block. * feat(kanban): add auto_promote_children config toggle When the kanban auto-decomposer fans a triage task into child tasks, recompute_ready() immediately promotes parent-free children to 'ready' so the dispatcher picks them up. Some users want a manual workflow where children stay in 'todo' for review before dispatch. Add 'kanban.auto_promote_children' config key (default: true): - false: children stay in 'todo' after decomposition - true: existing behavior (auto-promote to 'ready') Changes: - kanban_db.py: decompose_triage_task() gains auto_promote param - kanban_decompose.py: reads auto_promote_children from config - kanban dashboard API: exposes the new setting in GET/PUT /orchestration Closes #28016 * fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace The conversation_loop.py references _pool_may_recover_from_rate_limit which was defined in run_agent.py. After the conversation-loop extraction refactor, the helper was no longer in the same module scope. Wrap the call as _ra()._pool_may_recover_from_rate_limit() to route through the run_agent monkeypatch namespace where the helper is available. Adds regression test in test_gemini_fast_fallback.py. Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError. * fix(tui): improve charizard completion menu contrast * docs(windows): avoid piping installer directly into iex * fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without enforcement steering — agents narrate "calling tool X" without actually emitting a tool call, or run partial loops. Both model families fit the same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for (gpt, codex, gemini, gemma, grok, glm). Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Squashed salvage of: - 403e567ce fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS - 9433eabe7 test(agent): use realistic qwen-plus identifier in enforcement test Fixes #28079. * fix(send_message): resolve Slack user IDs to DM channel IDs The _SLACK_TARGET_RE regex only matched IDs starting with C (channel), G (group), or D (direct message). Slack user IDs start with U, causing 'Could not resolve' errors when trying to send DMs to specific users. Changes: - Expand _SLACK_TARGET_RE to accept U-prefixed IDs (user IDs) - Add conversations.open fallback to resolve user IDs to DM channel IDs before sending, since chat.postMessage requires a conversation ID Fixes #ISSUE_NUMBER * fix(gateway): tighten MEDIA extraction regex + silent skip on file-not-found Three related fixes for the MEDIA:<path> extraction pipeline that caused 'file not found' noise in platform channels: 1. run.py — tighten tool-result MEDIA regex from \S+ (any non- whitespace) to require a path pattern with known extensions. Prevents LLM-generated placeholder paths like 'MEDIA:/path/to/example.mp4' from being captured as real media. 2. base.py — remove the |\S+ fallback in extract_media() that catches anything non-whitespace as a potential MEDIA path. This was the primary cause of false positives — strings like '' in tool output were captured as MEDIA: paths. 3. mattermost.py — replace the file-not-found error message sent to the channel with a silent logger.warning() skip. When a path extracted by MEDIA doesn't exist on disk, the channel no longer gets a noisy '(file not found: ...)' message. Impact: eliminates the persistent 'file not found' spam in Mattermost channels caused by over-broad MEDIA regex patterns matching non-path text in tool output. * fix(xai-oauth): split 403 (tier/entitlement) from 400/401 in token endpoint xAI's token endpoint returns HTTP 403 to the OAuth grant when the account isn't on the allowlist for API access (e.g. standard SuperGrok subscribers — see #26847). Treating it like a stale-token 400/401 made ``format_auth_error`` append "Run ``hermes model`` to re-authenticate", which is misleading because re-login can't change xAI's tier decision. Split 403 off in both ``refresh_xai_oauth_pure`` and the loopback login token exchange: * New error code ``xai_oauth_tier_denied`` with ``relogin_required=False`` * Message explains the entitlement gate and points at the ``XAI_API_KEY`` + ``provider: xai`` fallback * 400/401 still set ``relogin_required=True`` as before * 5xx still set ``relogin_required=False`` as before * fix(run-agent): treat any 403 on xai-oauth as entitlement to stop refresh-loop The existing ``_is_entitlement_failure`` heuristic only fires when the response body contains specific substrings ("do not have an active Grok subscription", etc.). xAI has been seen to 403 standard SuperGrok subscribers with a terser body that doesn't match those keywords (#26847), and the recovery path would then mint a fresh token, get a fresh 403, and loop until Ctrl+C. Add a defense-in-depth check at the recovery call site: any 403 on ``provider == "xai-oauth"`` short-circuits ``try_refresh_current`` so the error surfaces immediately with the friendly hint from ``_summarize_api_error``. Keeps the existing keyword path for all other providers untouched. * test(xai-oauth): pin tier-denied 403 behavior + docs warning for #26847 Tests: * ``test_refresh_xai_oauth_pure_403_marked_tier_denied_not_relogin`` — refresh-403 raises ``xai_oauth_tier_denied`` with ``relogin_required=False`` and the API-key fallback hint in body. * ``test_format_auth_error_tier_denied_does_not_suggest_relogin`` — the renderer does not append "Run ``hermes model``" for the new code. * ``test_recover_with_credential_pool_skips_refresh_on_bare_403_for_xai_oauth`` — bare ``{"reason":"forbidden","message":"Forbidden"}`` body (which does not match the existing keyword heuristic) still short-circuits ``try_refresh_current`` on xai-oauth. Docs: * Drop the "(any active tier)" claim from the xai-grok-oauth guide, add a top-of-page warning callout, and a Troubleshooting section for the 403-after-login case pointing at ``XAI_API_KEY`` + ``provider: xai`` as the documented fallback. * fix: handle whitespace-only cron responses * fix(cli): preserve cron asterisks in strip mode * fix(mattermost): resolve thread root_id and route progress to threads Two Mattermost thread-related bugs: 1. _resolve_root_id() — Mattermost CRT requires root_id to be the thread root post. Using any reply's own ID as root_id causes '400 Invalid RootId'. Add _resolve_root_id() that walks up the post chain via API to find the actual root, and apply it in send(), _send_url_as_file(), and _send_local_file(). 2. _progress_reply_to — The condition in run.py only checked Platform.FEISHU, missing Mattermost entirely. This caused tool progress messages to always land in the main channel instead of the thread. Add Platform.MATTERMOST to the condition so progress messages are routed to threads when reply_mode=thread. Impact: Tool progress messages now appear in Mattermost threads instead of flooding the main channel; thread replies no longer fail with Invalid RootId when the reply target is itself a reply. * feat(kanban): archive --rm to hard-delete archived tasks Salvages #19964 by @Beandon13. Adds `hermes kanban archive --rm` to permanently remove already-archived tasks with cascading cleanup of links, comments, events, runs, and notify-subs. Safety guard: only archived tasks can be deleted; active/blocked/done must be archived first. Cherry-picked from #19964 onto current main (severe stale base, applied manually to preserve substance only). * feat(proxy): add xai upstream adapter for Grok via OAuth * chore(release): map @yannsunn for PR #28064 xai proxy adapter salvage * docs(skill): align kanban dispatcher failure_limit text with current default * fix(oauth): add manual-paste fallback for browser-only remote consoles xAI Grok OAuth (and Spotify) use a loopback redirect to ``http://127.0.0.1:<port>/callback`` to capture the authorization code. That works when the browser and Hermes run on the same machine, and the SSH tunnel recipe handles the regular remote case. It breaks completely on **browser-only remote consoles** (GCP Cloud Shell, GitHub Codespaces, AWS EC2 Instance Connect, Gitpod, Replit, …) where the user has a browser but no real SSH client to forward a port — the redirect to 127.0.0.1 on the remote VM simply isn't reachable from the laptop, and there's nothing the existing flow can do about it (#26923). This commit adds the foundation for a manual-paste fallback: * ``_is_remote_session`` now also recognises Cloud Shell, Codespaces, Gitpod, Replit, StackBlitz (in addition to SSH), so the existing tunnel hint at least fires in those environments. * ``_parse_pasted_callback`` accepts any of: a full ``http(s)://...?code=...&state=...`` URL, a bare ``?code=...`` query string, a bare ``code=...&state=...`` fragment, or a bare opaque code value. Returns the same dict shape the HTTP callback handler produces, so the caller's state / error validation works unchanged (no CSRF bypass). * ``_prompt_manual_callback_paste`` reads stdin with a clear multi-line explanation of what's happening and what to paste. * ``_xai_oauth_loopback_login`` gains a ``manual_paste`` kwarg that skips the HTTP listener entirely. The redirect_uri, PKCE verifier, state, and nonce are byte-identical to the loopback path so xAI's token endpoint can't tell the difference at the protocol level. * ``_print_loopback_ssh_hint`` now also mentions ``--manual-paste`` so users without a real SSH client see a path forward instead of a dead-end tunnel recipe. * ``_login_xai_oauth`` threads ``args.manual_paste`` into the loopback helper. * feat(cli): wire --manual-paste into ``hermes auth add`` and ``hermes model`` Register the new ``--manual-paste`` flag on both entry points and thread it through to the xAI loopback login: * ``hermes auth add xai-oauth --manual-paste`` — pool-add path, forwarded inside ``auth_commands.handle_auth_add``. * ``hermes model --manual-paste`` — model-picker path, forwarded by ``_model_flow_xai_oauth`` into the synthetic ``argparse.Namespace`` it passes to ``_login_xai_oauth``. The picker also now forwards ``--no-browser`` and ``--timeout`` for consistency (previously hardcoded to defaults regardless of CLI flags). Help text on both flags points at #26923 and names the browser-only remote consoles (Cloud Shell, Codespaces, EC2 Instance Connect) so users searching ``hermes --help`` can find the workaround. * test+docs(oauth): pin manual-paste semantics and document browser-only path (#26923) Tests (``tests/hermes_cli/test_auth_manual_paste.py``): * 9 parametrised + scalar cases for ``_is_remote_session`` covering the new Cloud Shell / Codespaces / Gitpod / Replit / StackBlitz env vars (plus the existing SSH ones). * 9 cases for ``_parse_pasted_callback`` covering every paste form (full URL, https URL with extra params, bare ``?code=...``, bare ``code=...`` fragment, bare opaque value, error+description, empty, whitespace-only, malformed URL). * 3 cases for ``_prompt_manual_callback_paste`` (happy path, EOF, Ctrl-C). * 3 end-to-end ``_xai_oauth_loopback_login(manual_paste=True)`` cases: the HTTP server MUST NOT be started (asserted via a callable that raises if invoked), wrong state still rejected with ``xai_state_mismatch`` (no CSRF bypass), and empty paste surfaces ``xai_code_missing``. * SSH-hint mention test ensures the ``--manual-paste`` instruction is printed in the remote-session hint. Docs: * ``oauth-over-ssh.md`` — new "Browser-only remote (Cloud Shell / Codespaces / EC2 Instance Connect)" section with the ``--manual-paste`` recipe, plus a TL;DR note for the new flag. * ``xai-grok-oauth.md`` — short subsection pointing at the same recipe and the OAuth-over-SSH guide anchor. * docs(kanban): document max-retries task override * docs(kanban): document inline create shortcuts * test(kanban): cover default board dashboard pin * docs: ignore box diagrams in ascii guard Wrap existing box-drawing diagrams with ascii-guard markers so docs-site checks pass when website docs are touched. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: per-task model override for kanban workers - Add model_override field to Task class and tasks schema - Add migration for existing databases - Spawn worker with -m model when model_override is set * test(kanban-dashboard): cover _task_dict task_age fallback The fix in 061a1830 added an outer try/except in plugin_api._task_dict so that a future failure mode in kanban_db.task_age (anything _safe_int doesn't already absorb) cannot 500 the GET /board response. The _safe_int / task_age corruption paths got regression coverage in tests/hermes_cli/test_kanban_db.py, but the OUTER fallback contract remained untested -- meaning a refactor that drops the try/except would not be caught by CI. Pin that contract from both consumers of _task_dict: - GET /board returns 200 with the literal fallback age dict for the affected card (other cards continue to render via the same path) - GET /tasks/:id (drawer view) returns 200 with the same fallback, so a single corrupt task can't block its own drawer Both tests force task_age to raise RuntimeError rather than ValueError on '%s', because ValueError is absorbed by _safe_int and never reaches the outer try/except -- testing that path would only re-cover what test_kanban_db.py already pins. Manually verified the regression discipline: git checkout 061a1830^ -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both FAIL with 500 git checkout HEAD -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both PASS * fix(kanban): clear _INITIALIZED_PATHS in remove_board so recycled DBs re-init schema Archiving or deleting a board via remove_board() leaves the path's "schema already initialized" entry in the module-level cache. A concurrent connect(board=<slug>) call (e.g. the dashboard event-stream poll loop) then: 1. resolves the same kanban.db path, 2. recreates the directory + an empty sqlite file because connect() does mkdir(parents=True, exist_ok=True), 3. skips the CREATE TABLE pass because the cache entry says the schema is already in place, 4. errors on the next read with `no such table: task_events`. Drop the cache entry before mutating the filesystem so the fresh file gets a proper schema init on next connect(). Applies to both archive=True (rename) and archive=False (rmtree) branches. Fixes #23833. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): add Cache-Control: no-store to plugin static file serving Prevents browser caching of stale dashboard plugin JS files that may contain bugs already fixed upstream (e.g. COLUMN_LABEL undefined). * fix(kanban): seed bundled skills (e.g. kanban-worker) on kanban init Closes #23725 * fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards * fix(kanban): keep board-management commands independent from board override * fix(kanban): preserve notifier_profile for dashboard home subscriptions * fix(kanban): promote dependents when a parent is archived * fix(cli): make kanban specify max_tokens configurable * fix(kanban): sync slash subcommands with live parser * fix(kanban): promote blocked tasks when parent dependencies complete recompute_ready only scanned 'todo' tasks for promotion, ignoring 'blocked' tasks entirely. When a task was blocked (e.g. by the circuit breaker) and its parent dependencies later completed, the task stayed stuck in 'blocked' forever unless manually unblocked. Now recompute_ready also scans 'blocked' tasks. When all parents are done/archived, the blocked task is promoted to 'ready' with failure counters reset — equivalent to an automatic unblock. Includes a regression test for the blocked-parent-done promotion path. * fix(kanban): use 'is not None' check for max_runtime_seconds in create_task max_runtime_seconds=0 was being silently coerced to None due to a falsy check (if max_runtime_seconds). Zero is a valid value that causes the dispatcher to immediately time out a task. The adjacent max_retries parameter already used the correct 'is not None' pattern. Fixes the inconsistency by aligning max_runtime_seconds with max_retries. * fix(kanban): reset failure counters on unblock_task When a task is manually unblocked (blocked → ready/todo), the consecutive_failures counter and last_failure_error were left intact. The next failure would immediately re-trip the circuit breaker because the counter was still at or above the failure limit. Reset both fields on unblock so the task gets a fresh retry budget. Includes a regression test that verifies counters are zeroed. * fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion When a systemic failure (provider outage, auth expiry, OOM) crashes multiple workers simultaneously, detect_crashed_workers increments each task failure counter independently. The circuit breaker only trips after N × failure_limit retries across the fleet. Fingerprint crash errors by normalizing host-specific details (PIDs, timestamps). When 3+ tasks crash with the same fingerprint in a single detection cycle, immediately trip the circuit breaker (failure_limit=1) instead of waiting for repeated failures. Isolated crashes (unique fingerprints) retain their normal retry budget. Protocol violations continue to trip immediately. Includes regression tests for systemic and isolated crash paths. * fix(kanban): align board_exists with board discovery rules * fix(kanban): demote ready children when a parent is reopened * fix(kanban): serialize DB initialization * fix(kanban): task_age() tolerates ISO-8601 timestamps Prevents ValueError crash in dashboard get_board() when a task has an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch int. Adds _to_epoch() helper that normalises both formats. * Fix Kanban dashboard initial board selection * fix(kanban): persist worker session metadata on completion Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive commit (not the AUTHOR_MAP fixup tip) onto current main. * fix(kanban): make claim ttl configurable Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(kanban): pass accept-hooks to worker chat subprocess * feat(kanban): add board-level default workdir (#25430) * docs(kanban-worker): document notification routing configuration * fix(kanban): preserve worker tools with restricted toolsets * fix(kanban): make legacy task migration idempotent (cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90) * fix: harden Kanban worker Hermes command resolution * feat(kanban): allow trimmed task comments SS-1647 live SHIP validation: real code + tests for kanban comment --max-len. * fix: show scheduled kanban tasks in dashboard * fix: assign single-task kanban decompositions * fix(kanban-dashboard): make Orchestration mode checkbox label static The checkbox label echoed its state ("Auto (default)" / "Manual") instead of describing the action, so a checked box reading "Auto" parsed as a status indicator rather than a control. The accompanying sub-description was also static and started with "When on, ...", which read awkwardly when the box was unchecked. Replace the dynamic label with a static action label ("Auto-decompose triage tasks") and flip the sub-description between the two modes so it stays accurate either way. The top-of-page Orchestration pill is unchanged — that one is intentionally a status badge / toggle. Fixes #28178 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956) Salvages the env-vars docs portion of #21956 by @Bartok9. The ascii-guard-ignore tags from the original PR already landed on main. * fix(kanban): close sqlite connection on init failure to prevent fd leak Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema init raises after sqlite3.connect() succeeds, the new connection was leaking. Wrap the body in try/except so the connection is closed before the exception propagates. * fix(kanban): don't crash dispatched workers when kanban-worker skill is absent Salvages #27372 by @oemtalks. The dispatcher unconditionally injected `--skills kanban-worker` into every worker spawn, but worker profiles sometimes don't have that bundled skill in their skills dir, which is fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`). Adds `_kanban_worker_skill_available(hermes_home)` and only injects the flag when the skill resolves. The MANDATORY lifecycle still ships via KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe. * fix(packaging): ship dashboard plugin assets in wheel Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/ glob entries to setuptools package-data so wheel installs ship the bundled dashboard plugin assets (kanban, achievements, etc.). Without these, /api/dashboard/plugins can't discover plugin assets outside a source checkout. * docs(kanban): document worker protocol auto-blocks Salvages #21585 by @helix4u. Documents the protocol_violation event (worker exits successfully while task is still running), adds --max-retries to the create flag list and --failure-limit to dispatch. * fix(oneshot): pass fallback_providers from profile config to AIAgent Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers spawned via 'hermes -p <profile> chat -q ...') were not honouring the profile's fallback_providers / fallback_model chain because oneshot.py never read the config and never passed fallback_model= to AIAgent. Reads cfg.get('fallback_providers') (new list format) or cfg.get('fallback_model') (legacy single-dict) with the same normalization cli.py applies, then forwards as fallback_model=_fb. * fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry. * feat(kanban): add initial-status for human-ops cards Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag (running|blocked, default running) to 'kanban create', threaded through kanban_db.create_task() and the kanban_create tool schema. 'blocked' parks the task directly in the blocked column for R3 human-ops review, skipping the brief running-to-blocked transition. Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and slash-handler error formatting changes that were bundled in the original PR — those should ship as their own focused changes if still wanted. * fix(kanban): release scratch workspace and tmux session on task completion Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace() and _cleanup_worker_tmux() after marking a task complete. Scratch workspaces (used by swarm agents) accumulate on disk — hundreds of MB per task, never released. Stale tmux sessions from completed agents also persist indefinitely. Both gates are safe: - workspace_kind == 'scratch' gate preserves user worktree/dir workspaces - tmux #{pane_dead} == 1 gate only kills sessions where the worker has already exited - best-effort: cleanup failures never block task completion * fix(kanban): honor severity thresholds in diagnostics Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics was using exact-match (severity == filter), so '--severity warning' hid 'error' and 'critical' diagnostics. Adds severity_at_or_above() helper to kanban_diagnostics and uses it in the dashboard endpoint (CLI already used SEVERITY_ORDER comparison correctly). * test: isolate Kanban env pins in hermetic fixture Salvages the substantive part of #22295 by @steezkelly. Adds the missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK, HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so ambient developer-shell pins on those vars don't bleed into pytest runs. The frozenset extraction + standalone regression test from the original PR were dropped to keep the change minimal — main already maintains the list inline. * feat(kanban): add max_in_progress config to cap concurrent running tasks Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). * fix(packaging): ship bundled skills in wheel Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and optional-skills/ because pyproject's [tool.setuptools.packages.find] only includes Python packages — the skills directories don't have __init__.py so they were silently dropped from the wheel. Adds setup.py with data_files spec emitting skills/* and optional-skills/* under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper in hermes_constants that discovers the wheel-installed location via sysconfig before falling back to a source-checkout path. tools/skills_sync uses the helper so 'hermes update' works for pip-installed users. * fix: 4 small surgical bugs Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. * perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init. * feat(kanban): add --sort option to 'hermes kanban list' Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc, priority,priority-desc,status,assignee,title,updated} to 'hermes kanban list'. Validated against VALID_SORT_ORDERS map; invalid values raise ValueError. Default behaviour (priority DESC, created ASC) is unchanged when --sort is omitted. * docs: add kanban codex lane skill * feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map contributor email for attribution check * test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks. * docs(codex_app_server): document multi-root Kanban writable_roots (#27941) Update the Codex app-server runtime guide's Kanban section to reflect the new behaviour: * The sandbox override now adds the board DB directory plus every Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT, HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated, DB-dir first. * The motivation note now includes the cross-mount artifact-write scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate drive) and links to issue #27941 so readers can find the original bug report. * fix(gateway): quiet corrupt kanban dispatcher boards Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board DBs ("file is not a database" / "database disk image is malformed") and disables them by fingerprint until they're repaired, instead of flooding the gateway log with repeated logger.exception tracebacks every tick. Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was an unrelated _is_dir OSError fix for service-path lookup. Dropped a small test reformat that was bundled in the same commit. * docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold. * feat(kanban): surface per-task model_override in show + tool output Salvages #26897 by @loicnico96. The per-task model_override DB column already exists on main, but it wasn't exposed in user-facing surfaces. This adds: - 'kanban show' prints 'model: <name>' when model_override is set - kanban_show / kanban_list tool responses include the model_override field Original branch was stale (PR was authored against an older field name 'model'); applied the substantive surface exposure manually using the current 'model_override' field name. * feat(cli): add kanban swarm topology helper Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a durable Kanban Swarm v1 graph: a completed root/blackboard card, parallel worker cards, a verifier gated on all workers, and a synthesizer gated on the verifier. Stores shared swarm blackboard updates as structured JSON comments on the root card. Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring + unit tests. * feat(kanban): add optional board parameter to all MCP tools Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9 kanban_* MCP tools via shared _connect helper. Backwards compatible — omitting board keeps current pinned-board behavior. Useful for orchestrator profiles that route across multiple boards. Two-file scope: tools/kanban_tools.py + tests. * feat(kanban): stamp originating ACP session_id on tasks Salvages #23208 by @awizemann. Tracks which chat session created a kanban task so clients can render a per-session board without falling back to tenant + time-window heuristics. - Schema: tasks gains nullable session_id TEXT column with index (additive migration in _migrate_add_optional_columns). - ACP: server.py exposes the originating session id via HERMES_SESSION_ID with save/restore around the agent loop. - Tool: kanban_create reads HERMES_SESSION_ID (with explicit override). - CLI: 'hermes kanban list --session <id>' filter; JSON output exposes session_id. * feat(kanban): wire dispatcher to dispatch review agents from review column Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). * feat(kanban): stale detection for running tasks in dispatcher Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. * feat(kanban): filter tasks by workflow fields and runs by status/outcome Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing workflow_template_id and current_step_key columns: - list_tasks() accepts workflow_template_id and current_step_key kwargs - 'hermes kanban list' adds matching CLI flags - dashboard plugin_api also exposes the filters Resolved a small conflict in list_tasks signature alongside main's session_id and order_by additions; combined all three into the single filter list. * feat(kanban): add respawn guard to block repeat worker storms Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker spawn for tasks where: - a recent run already succeeded (recent_success — within guard window) - the previous run hit a quota/auth error (blocker_auth, also auto-blocks) - a recent task comment includes a GitHub PR URL (active_pr) The guard prevents repeat worker storms on the same bug/task. Includes the contributor's review-findings fixup (regex hardening, observability, auth coverage). Resolved a small DispatchResult conflict alongside main's 'stale' field; kept both. Authorship preserved via rebase merge. * feat(kanban): show dashboard cron jobs across profiles Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron jobs from all profiles, with profile-aware filter UI and storage routing. Includes test coverage for cross-profile listing, mutation, deletion, and validation. Also fixes orphan conflict markers in config.py left by an earlier salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested in HEAD/PR markers from #28452 salvage of #23790). * fix(kanban): remove orphan conflict markers from config.py (#28458) PR #28452 (salvage of #23790, stale detection) merged with leftover git conflict markers in hermes_cli/config.py around the `dispatch_stale_timeout_seconds` config block, breaking config import and any code path that loads it. Cleans up the markers and keeps both config blocks (worker log rotation/orchestrator + stale detection). Resolves a self-introduced regression. * fix(kanban): remove orphan conflict markers from kanban.py (#28459) PR #28454 (salvage of #26745, workflow filter) merged with leftover git conflict markers in hermes_cli/kanban.py at three sites: - _task_to_dict() (session_id alongside workflow_template_id/current_step_key) - p_list parser (--sort alongside --workflow-template-id/--step-key) - _cmd_list (order_by alongside the new filter kwargs) Cleans up the markers and keeps both halves at each site. Resolves a self-introduced regression. * feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: | Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section. * feat(kanban): add scheduled status for delayed follow-ups Salvages #24533 by @roycepersonalassistant. Adds a first-class 'scheduled' Kanban status for time-delay follow-ups that aren't waiting on human input. - hermes kanban schedule <task_id> [reason] CLI command - Dashboard/API transitions to/from Scheduled - unblock_task() now releases both 'blocked' AND 'scheduled' tasks (re-checking parent dependencies before moving to ready/todo) - i18n + docs updates Resolved conflicts: kept HEAD's failure-counter reset on unblock alongside the PR's scheduled state, kept HEAD's 'running' direct-set rejection, combined both bulk-status branches. Dropped the dist/ bundle changes (months-stale; would need rebuild from source). * feat(kanban): drag-to-delete trash zone + bulk delete for task cards Salvages #28125 by @Jpalmer95. Adds: - Drag-to-delete trash zone in the kanban dashboard - Bulk delete endpoint with cascading delete_task cleanup - Frontend updates (drag visual + drop handler) - Confirmation prompt before delete Resolved end-of-file test conflict by appending both halves. * docs: add Korean Kanban documentation Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and translates Kanban documentation (kanban.md, kanban-tutorial.md) and the two related skills (devops-kanban-orchestrator, devops-kanban-worker). Purely additive — adds ko to the locales list in docusaurus.config.ts and creates the website/i18n/ko/ tree. * fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) - aux_config: drop session_search from _AUX_TASKS and remove stale test (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG) - compression_boundary_hook: set compressor._last_compress_aborted=False on MagicMock so the post-compress abort branch (PR #28117) doesn't short-circuit before the session-id rotation under test - kanban_dashboard_plugin: use consecutive_failures=3 so severity stays 'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a, so failures=5 now crosses the critical floor of 2*2=4) - cli_manual_compress: accept force kwarg on DummyAgent._compress_context (cli._manual_compress now passes force=True) * fix(telegram): render full clarify choice text in message body, use short button labels When Telegram clarify prompts offer long choices, mobile clients truncate the inline button labels, making options unreadable. Previously only the question was shown in the message body with truncated choice text in button labels. Fix: append the full numbered option list to the message body so users can read complete choice text on any client. Buttons now use short numeric labels (1, 2, ...) to avoid Telegram truncation. The 'Other (type answer)' button is unchanged. Long choice labels are now rendered in full (not truncated to 57 chars + '...') since they appear in the body instead of button labels. Closes: #27497 * chore(release): map @asdlem for PR #27852 salvage * fix(telegram): default streaming transport to edit * fix(telegram): respect reply_to_mode for DM topic reply fallback The DM topic reply fallback code in send() hardcoded should_thread=True when telegram_dm_topic_reply_fallback metadata was present, bypassing _should_thread_reply() and ignoring reply_to_mode config. This caused quote bubbles on every response even with reply_to_mode: 'off'. Fix: - Add reply_to_mode param to _reply_to_message_id_for_send() and _thread_kwargs_for_send() classmethods - In send(), check self._reply_to_mode != 'off' for DM topic fallback - Suppress reply anchor and reply_to_message_id when mode is 'off' while preserving message_thread_id for correct topic routing - Thread reply_to_mode through all 29 call sites Regression coverage: 10 new tests in test_telegram_reply_mode.py covering classmethod behavior, send() integration, and backward compatibility. Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994 * fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870) Telegram distinguishes three kinds of audio payloads: - message.voice → Opus/OGG voice messages → STT pipeline ✓ - message.audio → audio file attachments → bypasses STT ← was broken - message.document (audio mime) → generic file route **Root cause** — the inbound message routing block in gateway/run.py matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription. Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed instead of being treated as files, making the transcribe skill unusable from Telegram because the path it needed was never surfaced. **Fix** - Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events. - Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ mime-type events that are not explicitly AUDIO or DOCUMENT). - After the STT block, inject a document-style context note for each audio_file_path, giving the agent the file path and asking what to do with it (consistent with how plain documents are handled). **Tests** — 5 new tests in test_telegram_audio_vs_voice.py: - voice message still transcribed (regression guard) - audio attachment skips STT (core fix) - audio attachment context note format - STT disabled still produces file note (not STT-disabled notice) - MessageType.AUDIO != MessageType.VOICE sanity check Fixes #24870 * chore(release): map bartok9 noreply for PR #24879 salvage * fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY When the send_message tool runs outside the gateway process (agent loop, TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone path in _send_telegram constructs Bot(token=token) directly, bypassing any configured proxy. In regions where api.telegram.org is blocked, the send times out after ~5s with 'Telegram send failed: Timed out' and nothing ever shows up in gateway.log because the request never reaches the gateway. Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url, which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just before constructing the Bot. When a proxy is found, attach an HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request', matching what gateway/platforms/telegram.py already does for in-gateway sends and what the Discord standalone sender already does. Any exception attaching the proxy falls back cleanly to a direct connection, preserving prior behaviour for users without a proxy configured. Adds tests/tools/test_send_message_telegram_proxy.py covering both the proxy-configured and no-proxy cases. * chore(release): map @pepelax for PR #25419 salvage * fix(kanban-dashboard): restore implementations dropped during salvages (#28481) Four kanban dashboard test failures, all from PR salvages that picked up the test additions but dropped the corresponding implementations. - BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the board API never grew the column → test_board_empty failed because VALID_STATUSES - {archived} mismatched the rendered columns). - update_task: enrich the 'ready' 409 detail with the blocking parent list (id, title, status) and add _parents_blocking_ready helper. Implementation lost in the #26744 salvage (commit e215558ba) which pinned the test but not the server-side code. - dist/index.js: add parseApiErrorMessage helper, wire it through the drag/drop banner, add patchErr state to the TaskDrawer and surface it inline by the action row. Lost in the same #26744 salvage. - test_diagnostics_endpoint_severity_filter: update to at-or-above semantics (PR a94ddd807 changed the filter from exact-match so the warning filter now correctly includes error+critical too). * fix(gateway): roll over Telegram tool progress bubbles * fix(gateway): scope audio_file_paths outside media_urls guard The audio-file-paths handling block at line 7334 references the variable unconditionally, but #24879 initialized it inside the 'if event.media_urls' block — so events without media_urls hit UnboundLocalError. Found via test_run_agent_queued_message_does_not_treat_commentary_as_final after PR #28478 landed. * fix(gateway): keep tool-progress edits alive after Telegram flood control When a progress-message edit hits Telegram flood control (RetryAfter), can_edit was unconditionally set to False, permanently disabling coalescing for the rest of the run. Subsequent tool updates were posted as separate new messages instead of updating the existing progress bubble. Fix: only set can_edit=False for non-recoverable edit errors. On flood control, back off by resetting _last_edit_ts so the throttle interval is respected before the next edit attempt. Fixes #25188 * chore(release): map @erhnysr for PR #25198 salvage * fix(telegram): preser…

* fix(approval): surface pending-approval state with explicit marker visible to LLM When a tool call requires user approval in the non-blocking gateway path, the LLM previously received a result that was indistinguishable from a failed tool call (exit_code=-1, error=message). The LLM could not tell whether the tool was pending approval, had returned empty results, or had failed silently — causing it to burn context on wrong hypotheses. Fix changes the result format to include: - status: pending_approval (clear state name) - approval_pending: True (explicit boolean for LLMs to detect) - error: cleared to empty string (removes misleading error signal) This lets the LLM reason about approval latency vs actual errors, short-circuiting the previous silent failure mode. Fixes #14806 * fix: recognize emoji and caret as natural response endings GLM models via Ollama report finish_reason='stop' even when the response was truncated by max_tokens. The continuation mechanism uses _has_natural_response_ending() as one of the heuristics to detect whether the response was genuinely finished. Currently only ASCII punctuation and CJK punctuation are recognized. This means any response ending with an emoji (e.g. ⚡, 👍) or the caret character ^ (common in French ^^ smiley) is not recognized as naturally ended, triggering a false-positive continuation where the model receives 'Continue where you left off' and produces garbled output. Add: - ^ (caret) to the punctuation set - Unicode emoji range (codepoint >= 0x1F300) as natural ending This only affects GLM/Ollama users but the fix is safe for all backends since _has_natural_response_ending() is only consulted inside the continuation flow. * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 8 (#28328) Pre-stages AUTHOR_MAP entries for 10 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 8). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - AceWattGit (#28159 — _pool_may_recover_from_rate_limit NameError) - YuanHanzhong (#28032 — x.com/status fallbacks link-like) - colin-chang (#28245, #28249, #28251 — gateway + mattermost fixes) - felix-windsor (#28019 — preserve cron asterisks in strip mode) - houenyang-momo (#28205 — charizard completion menu contrast) - iqdoctor (#28095 — windows installer docs) - joe102084 (#28151 — whitespace-only cron responses) - jvinals (#27936 — Slack U-IDs → DM channel) - maxmilian (#28267 — ModelPickerDialog portal) - samggggflynn (#27952 — dingtalk pre_start) Per references/batch-pr-salvage-may14-additions.md. * fix: add pre_start() to _IncomingHandler for dingtalk SDK compatibility The dingtalk-stream SDK calls pre_start() on every registered handler before opening the WebSocket connection. Without this method, the SDK raises AttributeError and kills the stream connection, causing DingTalk to be unable to connect via Stream Mode. * fix(windows): handle redirected stdout in _cprint fallback Wraps _pt_print in try/except with a print() fallback. When a kanban worker's stdout is piped to a log file, prompt_toolkit raises NoConsoleScreenBufferError (Windows) or OSError (other) because there is no real console buffer. The fallback keeps worker output flowing instead of crashing. * chore(release): alias stale-ID salvage commit for @Grogger (#28334) PR #28330 was salvaged with a wrong noreply numeric ID (18091625 vs the correct 7065068). The commit on main is correctly authored to Grogger by username, but neither noreply form was in AUTHOR_MAP. Adds both so release-notes generation maps them to @Grogger. * fix(aux): remove stale session_search model menu entry * fix(tui): keep x status citation fallbacks link-like * fix(xai-oauth): quarantine dead tokens on terminal refresh failure resolve_xai_oauth_runtime_credentials() called _refresh_xai_oauth_tokens() with no try/except. A terminal refresh failure (HTTP 400/401/403 — invalid_grant, token revoked) propagated without clearing the dead access_token / refresh_token from auth.json, causing every subsequent session to retry the same doomed network request. Add a try/except around the refresh call that mirrors the existing credential_pool.py quarantine: when _is_terminal_xai_oauth_refresh_error identifies a non-retryable failure, clear the dead token fields from auth.json and write a last_auth_error diagnostic marker so future calls fail fast with a clear relogin_required error instead of hitting the network. active_provider is preserved (set_active=False) so multi-provider users whose chosen provider is not xai-oauth are unaffected. Tests: two new cases in test_auth_xai_oauth_provider.py cover terminal quarantine and transient pass-through. * feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644) The background review prompts (_SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT) now include explicit protection rules for bundled, hub-installed, and pinned skills — aligning with the curator's existing policy at curator.py L345/350. Before this change, bg-review could freely rewrite bundled skills like 'hermes-agent' or pinned skills, while the 7-day curator explicitly skips them. The review agent now sees: • Bundled skills (shipped with Hermes) • Hub-installed skills (installed via hermes skills install) • Pinned skills (marked via hermes curator pin) If only protected skills need updating, the review says 'Nothing to save.' and stops. Fixes #27644 * fix(web): portal Change Model modal so it renders above the app sidebar The dashboard's main column is `relative z-2` (App.tsx), which creates a stacking context that traps fixed descendants below the app sidebar (`z-50`). `ModelPickerDialog` renders `fixed inset-0 z-[100]` inline, so its z-100 is scoped to z-2 and the sidebar covers its left edge. The bug is visible across all themes but only obvious in the Large theme variants (Hermes Teal (Large), etc.) where the larger root font widens the dialog into the sidebar's column. Toast.tsx already documents the same trap and uses the same `createPortal(..., document.body)` escape. This commit ports the picker; the same pattern affects other inline z-[100] modals in the dashboard (OAuthLoginModal, Cron / Models / Profiles page modals) and is left for a follow-up — keeping this PR scoped to the reporter's specific case. Fixes #28103 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): exit code 75 on service restart so launchd relaunches When the gateway receives SIGUSR1 (graceful restart via launchd_restart), the SIGUSR1 handler calls request_restart(via_service=True) and the gateway shuts down cleanly with exit code 0. However, the generated launchd plist uses KeepAlive → SuccessfulExit → false, meaning launchd only relaunches on *non-zero* exit codes. A clean exit(0) is treated as "successful, don't restart", so the gateway stays down after /restart, /update, or SIGUSR1. The systemd unit template already uses RestartForceExitStatus=75 for the same scenario. Mirror that convention: when _restart_via_service is True, raise SystemExit(75) so launchd's SuccessfulExit=false policy triggers a relaunch. Closes #28135 * fix: guard json.loads() against invalid TTS and skill_view responses Two code paths call json.loads() on output from external tools without catching JSONDecodeError. If the tool returns a non-JSON string (error message, empty string, or None), the entire call path crashes. 1. gateway/run.py — text_to_speech_tool() result in voice reply path. A TTS failure that returns an error string instead of JSON crashes the voice reply handler, killing the message response entirely. 2. cron/scheduler.py — skill_view() result when loading skills for cron jobs. A corrupted or missing skill file that returns an error string instead of JSON crashes the cron tick, preventing all jobs from executing that cycle. Both fixes catch (json.JSONDecodeError, TypeError), log a warning, and gracefully skip the failed operation instead of crashing. * fix(gateway): bridge gateway_restart_notification from YAML platform sections Two related bugs in gateway/config.py prevented per-platform gateway_restart_notification from working through config.yaml: 1. The shared-key bridging loop (load_gateway_config) omitted 'gateway_restart_notification', so the key never landed in platform_data['extra'] even when set under e.g. 'discord:' or 'mattermost:' sections. 2. PlatformConfig.from_dict() only read gateway_restart_notification from the top-level data dict, ignoring the 'extra' sub-dict where bridged keys are stored. Fix: add the key to the bridging loop, and add an 'extra' fallback in from_dict() so that round-tripped values (YAML → bridged → extra → from_dict) resolve correctly. Impact: users can now set gateway_restart_notification: false per platform in config.yaml instead of relying on env vars or the global platforms: block. * feat(kanban): add auto_promote_children config toggle When the kanban auto-decomposer fans a triage task into child tasks, recompute_ready() immediately promotes parent-free children to 'ready' so the dispatcher picks them up. Some users want a manual workflow where children stay in 'todo' for review before dispatch. Add 'kanban.auto_promote_children' config key (default: true): - false: children stay in 'todo' after decomposition - true: existing behavior (auto-promote to 'ready') Changes: - kanban_db.py: decompose_triage_task() gains auto_promote param - kanban_decompose.py: reads auto_promote_children from config - kanban dashboard API: exposes the new setting in GET/PUT /orchestration Closes #28016 * fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace The conversation_loop.py references _pool_may_recover_from_rate_limit which was defined in run_agent.py. After the conversation-loop extraction refactor, the helper was no longer in the same module scope. Wrap the call as _ra()._pool_may_recover_from_rate_limit() to route through the run_agent monkeypatch namespace where the helper is available. Adds regression test in test_gemini_fast_fallback.py. Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError. * fix(tui): improve charizard completion menu contrast * docs(windows): avoid piping installer directly into iex * fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without enforcement steering — agents narrate "calling tool X" without actually emitting a tool call, or run partial loops. Both model families fit the same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for (gpt, codex, gemini, gemma, grok, glm). Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Squashed salvage of: - 403e567ce fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS - 9433eabe7 test(agent): use realistic qwen-plus identifier in enforcement test Fixes #28079. * fix(send_message): resolve Slack user IDs to DM channel IDs The _SLACK_TARGET_RE regex only matched IDs starting with C (channel), G (group), or D (direct message). Slack user IDs start with U, causing 'Could not resolve' errors when trying to send DMs to specific users. Changes: - Expand _SLACK_TARGET_RE to accept U-prefixed IDs (user IDs) - Add conversations.open fallback to resolve user IDs to DM channel IDs before sending, since chat.postMessage requires a conversation ID Fixes #ISSUE_NUMBER * fix(gateway): tighten MEDIA extraction regex + silent skip on file-not-found Three related fixes for the MEDIA:<path> extraction pipeline that caused 'file not found' noise in platform channels: 1. run.py — tighten tool-result MEDIA regex from \S+ (any non- whitespace) to require a path pattern with known extensions. Prevents LLM-generated placeholder paths like 'MEDIA:/path/to/example.mp4' from being captured as real media. 2. base.py — remove the |\S+ fallback in extract_media() that catches anything non-whitespace as a potential MEDIA path. This was the primary cause of false positives — strings like '' in tool output were captured as MEDIA: paths. 3. mattermost.py — replace the file-not-found error message sent to the channel with a silent logger.warning() skip. When a path extracted by MEDIA doesn't exist on disk, the channel no longer gets a noisy '(file not found: ...)' message. Impact: eliminates the persistent 'file not found' spam in Mattermost channels caused by over-broad MEDIA regex patterns matching non-path text in tool output. * fix(xai-oauth): split 403 (tier/entitlement) from 400/401 in token endpoint xAI's token endpoint returns HTTP 403 to the OAuth grant when the account isn't on the allowlist for API access (e.g. standard SuperGrok subscribers — see #26847). Treating it like a stale-token 400/401 made ``format_auth_error`` append "Run ``hermes model`` to re-authenticate", which is misleading because re-login can't change xAI's tier decision. Split 403 off in both ``refresh_xai_oauth_pure`` and the loopback login token exchange: * New error code ``xai_oauth_tier_denied`` with ``relogin_required=False`` * Message explains the entitlement gate and points at the ``XAI_API_KEY`` + ``provider: xai`` fallback * 400/401 still set ``relogin_required=True`` as before * 5xx still set ``relogin_required=False`` as before * fix(run-agent): treat any 403 on xai-oauth as entitlement to stop refresh-loop The existing ``_is_entitlement_failure`` heuristic only fires when the response body contains specific substrings ("do not have an active Grok subscription", etc.). xAI has been seen to 403 standard SuperGrok subscribers with a terser body that doesn't match those keywords (#26847), and the recovery path would then mint a fresh token, get a fresh 403, and loop until Ctrl+C. Add a defense-in-depth check at the recovery call site: any 403 on ``provider == "xai-oauth"`` short-circuits ``try_refresh_current`` so the error surfaces immediately with the friendly hint from ``_summarize_api_error``. Keeps the existing keyword path for all other providers untouched. * test(xai-oauth): pin tier-denied 403 behavior + docs warning for #26847 Tests: * ``test_refresh_xai_oauth_pure_403_marked_tier_denied_not_relogin`` — refresh-403 raises ``xai_oauth_tier_denied`` with ``relogin_required=False`` and the API-key fallback hint in body. * ``test_format_auth_error_tier_denied_does_not_suggest_relogin`` — the renderer does not append "Run ``hermes model``" for the new code. * ``test_recover_with_credential_pool_skips_refresh_on_bare_403_for_xai_oauth`` — bare ``{"reason":"forbidden","message":"Forbidden"}`` body (which does not match the existing keyword heuristic) still short-circuits ``try_refresh_current`` on xai-oauth. Docs: * Drop the "(any active tier)" claim from the xai-grok-oauth guide, add a top-of-page warning callout, and a Troubleshooting section for the 403-after-login case pointing at ``XAI_API_KEY`` + ``provider: xai`` as the documented fallback. * fix: handle whitespace-only cron responses * fix(cli): preserve cron asterisks in strip mode * fix(mattermost): resolve thread root_id and route progress to threads Two Mattermost thread-related bugs: 1. _resolve_root_id() — Mattermost CRT requires root_id to be the thread root post. Using any reply's own ID as root_id causes '400 Invalid RootId'. Add _resolve_root_id() that walks up the post chain via API to find the actual root, and apply it in send(), _send_url_as_file(), and _send_local_file(). 2. _progress_reply_to — The condition in run.py only checked Platform.FEISHU, missing Mattermost entirely. This caused tool progress messages to always land in the main channel instead of the thread. Add Platform.MATTERMOST to the condition so progress messages are routed to threads when reply_mode=thread. Impact: Tool progress messages now appear in Mattermost threads instead of flooding the main channel; thread replies no longer fail with Invalid RootId when the reply target is itself a reply. * feat(kanban): archive --rm to hard-delete archived tasks Salvages #19964 by @Beandon13. Adds `hermes kanban archive --rm` to permanently remove already-archived tasks with cascading cleanup of links, comments, events, runs, and notify-subs. Safety guard: only archived tasks can be deleted; active/blocked/done must be archived first. Cherry-picked from #19964 onto current main (severe stale base, applied manually to preserve substance only). * feat(proxy): add xai upstream adapter for Grok via OAuth * chore(release): map @yannsunn for PR #28064 xai proxy adapter salvage * docs(skill): align kanban dispatcher failure_limit text with current default * fix(oauth): add manual-paste fallback for browser-only remote consoles xAI Grok OAuth (and Spotify) use a loopback redirect to ``http://127.0.0.1:<port>/callback`` to capture the authorization code. That works when the browser and Hermes run on the same machine, and the SSH tunnel recipe handles the regular remote case. It breaks completely on **browser-only remote consoles** (GCP Cloud Shell, GitHub Codespaces, AWS EC2 Instance Connect, Gitpod, Replit, …) where the user has a browser but no real SSH client to forward a port — the redirect to 127.0.0.1 on the remote VM simply isn't reachable from the laptop, and there's nothing the existing flow can do about it (#26923). This commit adds the foundation for a manual-paste fallback: * ``_is_remote_session`` now also recognises Cloud Shell, Codespaces, Gitpod, Replit, StackBlitz (in addition to SSH), so the existing tunnel hint at least fires in those environments. * ``_parse_pasted_callback`` accepts any of: a full ``http(s)://...?code=...&state=...`` URL, a bare ``?code=...`` query string, a bare ``code=...&state=...`` fragment, or a bare opaque code value. Returns the same dict shape the HTTP callback handler produces, so the caller's state / error validation works unchanged (no CSRF bypass). * ``_prompt_manual_callback_paste`` reads stdin with a clear multi-line explanation of what's happening and what to paste. * ``_xai_oauth_loopback_login`` gains a ``manual_paste`` kwarg that skips the HTTP listener entirely. The redirect_uri, PKCE verifier, state, and nonce are byte-identical to the loopback path so xAI's token endpoint can't tell the difference at the protocol level. * ``_print_loopback_ssh_hint`` now also mentions ``--manual-paste`` so users without a real SSH client see a path forward instead of a dead-end tunnel recipe. * ``_login_xai_oauth`` threads ``args.manual_paste`` into the loopback helper. * feat(cli): wire --manual-paste into ``hermes auth add`` and ``hermes model`` Register the new ``--manual-paste`` flag on both entry points and thread it through to the xAI loopback login: * ``hermes auth add xai-oauth --manual-paste`` — pool-add path, forwarded inside ``auth_commands.handle_auth_add``. * ``hermes model --manual-paste`` — model-picker path, forwarded by ``_model_flow_xai_oauth`` into the synthetic ``argparse.Namespace`` it passes to ``_login_xai_oauth``. The picker also now forwards ``--no-browser`` and ``--timeout`` for consistency (previously hardcoded to defaults regardless of CLI flags). Help text on both flags points at #26923 and names the browser-only remote consoles (Cloud Shell, Codespaces, EC2 Instance Connect) so users searching ``hermes --help`` can find the workaround. * test+docs(oauth): pin manual-paste semantics and document browser-only path (#26923) Tests (``tests/hermes_cli/test_auth_manual_paste.py``): * 9 parametrised + scalar cases for ``_is_remote_session`` covering the new Cloud Shell / Codespaces / Gitpod / Replit / StackBlitz env vars (plus the existing SSH ones). * 9 cases for ``_parse_pasted_callback`` covering every paste form (full URL, https URL with extra params, bare ``?code=...``, bare ``code=...`` fragment, bare opaque value, error+description, empty, whitespace-only, malformed URL). * 3 cases for ``_prompt_manual_callback_paste`` (happy path, EOF, Ctrl-C). * 3 end-to-end ``_xai_oauth_loopback_login(manual_paste=True)`` cases: the HTTP server MUST NOT be started (asserted via a callable that raises if invoked), wrong state still rejected with ``xai_state_mismatch`` (no CSRF bypass), and empty paste surfaces ``xai_code_missing``. * SSH-hint mention test ensures the ``--manual-paste`` instruction is printed in the remote-session hint. Docs: * ``oauth-over-ssh.md`` — new "Browser-only remote (Cloud Shell / Codespaces / EC2 Instance Connect)" section with the ``--manual-paste`` recipe, plus a TL;DR note for the new flag. * ``xai-grok-oauth.md`` — short subsection pointing at the same recipe and the OAuth-over-SSH guide anchor. * docs(kanban): document max-retries task override * docs(kanban): document inline create shortcuts * test(kanban): cover default board dashboard pin * docs: ignore box diagrams in ascii guard Wrap existing box-drawing diagrams with ascii-guard markers so docs-site checks pass when website docs are touched. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: per-task model override for kanban workers - Add model_override field to Task class and tasks schema - Add migration for existing databases - Spawn worker with -m model when model_override is set * test(kanban-dashboard): cover _task_dict task_age fallback The fix in 061a1830 added an outer try/except in plugin_api._task_dict so that a future failure mode in kanban_db.task_age (anything _safe_int doesn't already absorb) cannot 500 the GET /board response. The _safe_int / task_age corruption paths got regression coverage in tests/hermes_cli/test_kanban_db.py, but the OUTER fallback contract remained untested -- meaning a refactor that drops the try/except would not be caught by CI. Pin that contract from both consumers of _task_dict: - GET /board returns 200 with the literal fallback age dict for the affected card (other cards continue to render via the same path) - GET /tasks/:id (drawer view) returns 200 with the same fallback, so a single corrupt task can't block its own drawer Both tests force task_age to raise RuntimeError rather than ValueError on '%s', because ValueError is absorbed by _safe_int and never reaches the outer try/except -- testing that path would only re-cover what test_kanban_db.py already pins. Manually verified the regression discipline: git checkout 061a1830^ -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both FAIL with 500 git checkout HEAD -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both PASS * fix(kanban): clear _INITIALIZED_PATHS in remove_board so recycled DBs re-init schema Archiving or deleting a board via remove_board() leaves the path's "schema already initialized" entry in the module-level cache. A concurrent connect(board=<slug>) call (e.g. the dashboard event-stream poll loop) then: 1. resolves the same kanban.db path, 2. recreates the directory + an empty sqlite file because connect() does mkdir(parents=True, exist_ok=True), 3. skips the CREATE TABLE pass because the cache entry says the schema is already in place, 4. errors on the next read with `no such table: task_events`. Drop the cache entry before mutating the filesystem so the fresh file gets a proper schema init on next connect(). Applies to both archive=True (rename) and archive=False (rmtree) branches. Fixes #23833. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): add Cache-Control: no-store to plugin static file serving Prevents browser caching of stale dashboard plugin JS files that may contain bugs already fixed upstream (e.g. COLUMN_LABEL undefined). * fix(kanban): seed bundled skills (e.g. kanban-worker) on kanban init Closes #23725 * fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards * fix(kanban): keep board-management commands independent from board override * fix(kanban): preserve notifier_profile for dashboard home subscriptions * fix(kanban): promote dependents when a parent is archived * fix(cli): make kanban specify max_tokens configurable * fix(kanban): sync slash subcommands with live parser * fix(kanban): promote blocked tasks when parent dependencies complete recompute_ready only scanned 'todo' tasks for promotion, ignoring 'blocked' tasks entirely. When a task was blocked (e.g. by the circuit breaker) and its parent dependencies later completed, the task stayed stuck in 'blocked' forever unless manually unblocked. Now recompute_ready also scans 'blocked' tasks. When all parents are done/archived, the blocked task is promoted to 'ready' with failure counters reset — equivalent to an automatic unblock. Includes a regression test for the blocked-parent-done promotion path. * fix(kanban): use 'is not None' check for max_runtime_seconds in create_task max_runtime_seconds=0 was being silently coerced to None due to a falsy check (if max_runtime_seconds). Zero is a valid value that causes the dispatcher to immediately time out a task. The adjacent max_retries parameter already used the correct 'is not None' pattern. Fixes the inconsistency by aligning max_runtime_seconds with max_retries. * fix(kanban): reset failure counters on unblock_task When a task is manually unblocked (blocked → ready/todo), the consecutive_failures counter and last_failure_error were left intact. The next failure would immediately re-trip the circuit breaker because the counter was still at or above the failure limit. Reset both fields on unblock so the task gets a fresh retry budget. Includes a regression test that verifies counters are zeroed. * fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion When a systemic failure (provider outage, auth expiry, OOM) crashes multiple workers simultaneously, detect_crashed_workers increments each task failure counter independently. The circuit breaker only trips after N × failure_limit retries across the fleet. Fingerprint crash errors by normalizing host-specific details (PIDs, timestamps). When 3+ tasks crash with the same fingerprint in a single detection cycle, immediately trip the circuit breaker (failure_limit=1) instead of waiting for repeated failures. Isolated crashes (unique fingerprints) retain their normal retry budget. Protocol violations continue to trip immediately. Includes regression tests for systemic and isolated crash paths. * fix(kanban): align board_exists with board discovery rules * fix(kanban): demote ready children when a parent is reopened * fix(kanban): serialize DB initialization * fix(kanban): task_age() tolerates ISO-8601 timestamps Prevents ValueError crash in dashboard get_board() when a task has an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch int. Adds _to_epoch() helper that normalises both formats. * Fix Kanban dashboard initial board selection * fix(kanban): persist worker session metadata on completion Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive commit (not the AUTHOR_MAP fixup tip) onto current main. * fix(kanban): make claim ttl configurable Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(kanban): pass accept-hooks to worker chat subprocess * feat(kanban): add board-level default workdir (#25430) * docs(kanban-worker): document notification routing configuration * fix(kanban): preserve worker tools with restricted toolsets * fix(kanban): make legacy task migration idempotent (cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90) * fix: harden Kanban worker Hermes command resolution * feat(kanban): allow trimmed task comments SS-1647 live SHIP validation: real code + tests for kanban comment --max-len. * fix: show scheduled kanban tasks in dashboard * fix: assign single-task kanban decompositions * fix(kanban-dashboard): make Orchestration mode checkbox label static The checkbox label echoed its state ("Auto (default)" / "Manual") instead of describing the action, so a checked box reading "Auto" parsed as a status indicator rather than a control. The accompanying sub-description was also static and started with "When on, ...", which read awkwardly when the box was unchecked. Replace the dynamic label with a static action label ("Auto-decompose triage tasks") and flip the sub-description between the two modes so it stays accurate either way. The top-of-page Orchestration pill is unchanged — that one is intentionally a status badge / toggle. Fixes #28178 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956) Salvages the env-vars docs portion of #21956 by @Bartok9. The ascii-guard-ignore tags from the original PR already landed on main. * fix(kanban): close sqlite connection on init failure to prevent fd leak Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema init raises after sqlite3.connect() succeeds, the new connection was leaking. Wrap the body in try/except so the connection is closed before the exception propagates. * fix(kanban): don't crash dispatched workers when kanban-worker skill is absent Salvages #27372 by @oemtalks. The dispatcher unconditionally injected `--skills kanban-worker` into every worker spawn, but worker profiles sometimes don't have that bundled skill in their skills dir, which is fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`). Adds `_kanban_worker_skill_available(hermes_home)` and only injects the flag when the skill resolves. The MANDATORY lifecycle still ships via KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe. * fix(packaging): ship dashboard plugin assets in wheel Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/ glob entries to setuptools package-data so wheel installs ship the bundled dashboard plugin assets (kanban, achievements, etc.). Without these, /api/dashboard/plugins can't discover plugin assets outside a source checkout. * docs(kanban): document worker protocol auto-blocks Salvages #21585 by @helix4u. Documents the protocol_violation event (worker exits successfully while task is still running), adds --max-retries to the create flag list and --failure-limit to dispatch. * fix(oneshot): pass fallback_providers from profile config to AIAgent Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers spawned via 'hermes -p <profile> chat -q ...') were not honouring the profile's fallback_providers / fallback_model chain because oneshot.py never read the config and never passed fallback_model= to AIAgent. Reads cfg.get('fallback_providers') (new list format) or cfg.get('fallback_model') (legacy single-dict) with the same normalization cli.py applies, then forwards as fallback_model=_fb. * fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry. * feat(kanban): add initial-status for human-ops cards Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag (running|blocked, default running) to 'kanban create', threaded through kanban_db.create_task() and the kanban_create tool schema. 'blocked' parks the task directly in the blocked column for R3 human-ops review, skipping the brief running-to-blocked transition. Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and slash-handler error formatting changes that were bundled in the original PR — those should ship as their own focused changes if still wanted. * fix(kanban): release scratch workspace and tmux session on task completion Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace() and _cleanup_worker_tmux() after marking a task complete. Scratch workspaces (used by swarm agents) accumulate on disk — hundreds of MB per task, never released. Stale tmux sessions from completed agents also persist indefinitely. Both gates are safe: - workspace_kind == 'scratch' gate preserves user worktree/dir workspaces - tmux #{pane_dead} == 1 gate only kills sessions where the worker has already exited - best-effort: cleanup failures never block task completion * fix(kanban): honor severity thresholds in diagnostics Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics was using exact-match (severity == filter), so '--severity warning' hid 'error' and 'critical' diagnostics. Adds severity_at_or_above() helper to kanban_diagnostics and uses it in the dashboard endpoint (CLI already used SEVERITY_ORDER comparison correctly). * test: isolate Kanban env pins in hermetic fixture Salvages the substantive part of #22295 by @steezkelly. Adds the missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK, HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so ambient developer-shell pins on those vars don't bleed into pytest runs. The frozenset extraction + standalone regression test from the original PR were dropped to keep the change minimal — main already maintains the list inline. * feat(kanban): add max_in_progress config to cap concurrent running tasks Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). * fix(packaging): ship bundled skills in wheel Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and optional-skills/ because pyproject's [tool.setuptools.packages.find] only includes Python packages — the skills directories don't have __init__.py so they were silently dropped from the wheel. Adds setup.py with data_files spec emitting skills/* and optional-skills/* under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper in hermes_constants that discovers the wheel-installed location via sysconfig before falling back to a source-checkout path. tools/skills_sync uses the helper so 'hermes update' works for pip-installed users. * fix: 4 small surgical bugs Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. * perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init. * feat(kanban): add --sort option to 'hermes kanban list' Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc, priority,priority-desc,status,assignee,title,updated} to 'hermes kanban list'. Validated against VALID_SORT_ORDERS map; invalid values raise ValueError. Default behaviour (priority DESC, created ASC) is unchanged when --sort is omitted. * docs: add kanban codex lane skill * feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map contributor email for attribution check * test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks. * docs(codex_app_server): document multi-root Kanban writable_roots (#27941) Update the Codex app-server runtime guide's Kanban section to reflect the new behaviour: * The sandbox override now adds the board DB directory plus every Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT, HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated, DB-dir first. * The motivation note now includes the cross-mount artifact-write scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate drive) and links to issue #27941 so readers can find the original bug report. * fix(gateway): quiet corrupt kanban dispatcher boards Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board DBs ("file is not a database" / "database disk image is malformed") and disables them by fingerprint until they're repaired, instead of flooding the gateway log with repeated logger.exception tracebacks every tick. Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was an unrelated _is_dir OSError fix for service-path lookup. Dropped a small test reformat that was bundled in the same commit. * docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold. * feat(kanban): surface per-task model_override in show + tool output Salvages #26897 by @loicnico96. The per-task model_override DB column already exists on main, but it wasn't exposed in user-facing surfaces. This adds: - 'kanban show' prints 'model: <name>' when model_override is set - kanban_show / kanban_list tool responses include the model_override field Original branch was stale (PR was authored against an older field name 'model'); applied the substantive surface exposure manually using the current 'model_override' field name. * feat(cli): add kanban swarm topology helper Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a durable Kanban Swarm v1 graph: a completed root/blackboard card, parallel worker cards, a verifier gated on all workers, and a synthesizer gated on the verifier. Stores shared swarm blackboard updates as structured JSON comments on the root card. Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring + unit tests. * feat(kanban): add optional board parameter to all MCP tools Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9 kanban_* MCP tools via shared _connect helper. Backwards compatible — omitting board keeps current pinned-board behavior. Useful for orchestrator profiles that route across multiple boards. Two-file scope: tools/kanban_tools.py + tests. * feat(kanban): stamp originating ACP session_id on tasks Salvages #23208 by @awizemann. Tracks which chat session created a kanban task so clients can render a per-session board without falling back to tenant + time-window heuristics. - Schema: tasks gains nullable session_id TEXT column with index (additive migration in _migrate_add_optional_columns). - ACP: server.py exposes the originating session id via HERMES_SESSION_ID with save/restore around the agent loop. - Tool: kanban_create reads HERMES_SESSION_ID (with explicit override). - CLI: 'hermes kanban list --session <id>' filter; JSON output exposes session_id. * feat(kanban): wire dispatcher to dispatch review agents from review column Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). * feat(kanban): stale detection for running tasks in dispatcher Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. * feat(kanban): filter tasks by workflow fields and runs by status/outcome Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing workflow_template_id and current_step_key columns: - list_tasks() accepts workflow_template_id and current_step_key kwargs - 'hermes kanban list' adds matching CLI flags - dashboard plugin_api also exposes the filters Resolved a small conflict in list_tasks signature alongside main's session_id and order_by additions; combined all three into the single filter list. * feat(kanban): add respawn guard to block repeat worker storms Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker spawn for tasks where: - a recent run already succeeded (recent_success — within guard window) - the previous run hit a quota/auth error (blocker_auth, also auto-blocks) - a recent task comment includes a GitHub PR URL (active_pr) The guard prevents repeat worker storms on the same bug/task. Includes the contributor's review-findings fixup (regex hardening, observability, auth coverage). Resolved a small DispatchResult conflict alongside main's 'stale' field; kept both. Authorship preserved via rebase merge. * feat(kanban): show dashboard cron jobs across profiles Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron jobs from all profiles, with profile-aware filter UI and storage routing. Includes test coverage for cross-profile listing, mutation, deletion, and validation. Also fixes orphan conflict markers in config.py left by an earlier salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested in HEAD/PR markers from #28452 salvage of #23790). * fix(kanban): remove orphan conflict markers from config.py (#28458) PR #28452 (salvage of #23790, stale detection) merged with leftover git conflict markers in hermes_cli/config.py around the `dispatch_stale_timeout_seconds` config block, breaking config import and any code path that loads it. Cleans up the markers and keeps both config blocks (worker log rotation/orchestrator + stale detection). Resolves a self-introduced regression. * fix(kanban): remove orphan conflict markers from kanban.py (#28459) PR #28454 (salvage of #26745, workflow filter) merged with leftover git conflict markers in hermes_cli/kanban.py at three sites: - _task_to_dict() (session_id alongside workflow_template_id/current_step_key) - p_list parser (--sort alongside --workflow-template-id/--step-key) - _cmd_list (order_by alongside the new filter kwargs) Cleans up the markers and keeps both halves at each site. Resolves a self-introduced regression. * feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: | Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section. * feat(kanban): add scheduled status for delayed follow-ups Salvages #24533 by @roycepersonalassistant. Adds a first-class 'scheduled' Kanban status for time-delay follow-ups that aren't waiting on human input. - hermes kanban schedule <task_id> [reason] CLI command - Dashboard/API transitions to/from Scheduled - unblock_task() now releases both 'blocked' AND 'scheduled' tasks (re-checking parent dependencies before moving to ready/todo) - i18n + docs updates Resolved conflicts: kept HEAD's failure-counter reset on unblock alongside the PR's scheduled state, kept HEAD's 'running' direct-set rejection, combined both bulk-status branches. Dropped the dist/ bundle changes (months-stale; would need rebuild from source). * feat(kanban): drag-to-delete trash zone + bulk delete for task cards Salvages #28125 by @Jpalmer95. Adds: - Drag-to-delete trash zone in the kanban dashboard - Bulk delete endpoint with cascading delete_task cleanup - Frontend updates (drag visual + drop handler) - Confirmation prompt before delete Resolved end-of-file test conflict by appending both halves. * docs: add Korean Kanban documentation Salvages #21823 by @pochi-gio. Adds Korean (ko) Docusaurus locale and translates Kanban documentation (kanban.md, kanban-tutorial.md) and the two related skills (devops-kanban-orchestrator, devops-kanban-worker). Purely additive — adds ko to the locales list in docusaurus.config.ts and creates the website/i18n/ko/ tree. * fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465) - aux_config: drop session_search from _AUX_TASKS and remove stale test (PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG) - compression_boundary_hook: set compressor._last_compress_aborted=False on MagicMock so the post-compress abort branch (PR #28117) doesn't short-circuit before the session-id rotation under test - kanban_dashboard_plugin: use consecutive_failures=3 so severity stays 'error' (failure_threshold default dropped from 3 to 2 in d9fef0c8a, so failures=5 now crosses the critical floor of 2*2=4) - cli_manual_compress: accept force kwarg on DummyAgent._compress_context (cli._manual_compress now passes force=True) * fix(telegram): render full clarify choice text in message body, use short button labels When Telegram clarify prompts offer long choices, mobile clients truncate the inline button labels, making options unreadable. Previously only the question was shown in the message body with truncated choice text in button labels. Fix: append the full numbered option list to the message body so users can read complete choice text on any client. Buttons now use short numeric labels (1, 2, ...) to avoid Telegram truncation. The 'Other (type answer)' button is unchanged. Long choice labels are now rendered in full (not truncated to 57 chars + '...') since they appear in the body instead of button labels. Closes: #27497 * chore(release): map @asdlem for PR #27852 salvage * fix(telegram): default streaming transport to edit * fix(telegram): respect reply_to_mode for DM topic reply fallback The DM topic reply fallback code in send() hardcoded should_thread=True when telegram_dm_topic_reply_fallback metadata was present, bypassing _should_thread_reply() and ignoring reply_to_mode config. This caused quote bubbles on every response even with reply_to_mode: 'off'. Fix: - Add reply_to_mode param to _reply_to_message_id_for_send() and _thread_kwargs_for_send() classmethods - In send(), check self._reply_to_mode != 'off' for DM topic fallback - Suppress reply anchor and reply_to_message_id when mode is 'off' while preserving message_thread_id for correct topic routing - Thread reply_to_mode through all 29 call sites Regression coverage: 10 new tests in test_telegram_reply_mode.py covering classmethod behavior, send() integration, and backward compatibility. Fixes reply_to_mode: 'off' ignored by Telegram DM topic reply fallback code #23994 * fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870) Telegram distinguishes three kinds of audio payloads: - message.voice → Opus/OGG voice messages → STT pipeline ✓ - message.audio → audio file attachments → bypasses STT ← was broken - message.document (audio mime) → generic file route **Root cause** — the inbound message routing block in gateway/run.py matched both MessageType.VOICE *and* MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription. Audio file attachments (.mp3, .m4a, etc.) were therefore auto-transcribed instead of being treated as files, making the transcribe skill unusable from Telegram because the path it needed was never surfaced. **Fix** - Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events. - Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ mime-type events that are not explicitly AUDIO or DOCUMENT). - After the STT block, inject a document-style context note for each audio_file_path, giving the agent the file path and asking what to do with it (consistent with how plain documents are handled). **Tests** — 5 new tests in test_telegram_audio_vs_voice.py: - voice message still transcribed (regression guard) - audio attachment skips STT (core fix) - audio attachment context note format - STT disabled still produces file note (not STT-disabled notice) - MessageType.AUDIO != MessageType.VOICE sanity check Fixes #24870 * chore(release): map bartok9 noreply for PR #24879 salvage * fix(send_message): route standalone Telegram sends through TELEGRAM_PROXY When the send_message tool runs outside the gateway process (agent loop, TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone path in _send_telegram constructs Bot(token=token) directly, bypassing any configured proxy. In regions where api.telegram.org is blocked, the send times out after ~5s with 'Telegram send failed: Timed out' and nothing ever shows up in gateway.log because the request never reaches the gateway. Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url, which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just before constructing the Bot. When a proxy is found, attach an HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request', matching what gateway/platforms/telegram.py already does for in-gateway sends and what the Discord standalone sender already does. Any exception attaching the proxy falls back cleanly to a direct connection, preserving prior behaviour for users without a proxy configured. Adds tests/tools/test_send_message_telegram_proxy.py covering both the proxy-configured and no-proxy cases. * chore(release): map @pepelax for PR #25419 salvage * fix(kanban-dashboard): restore implementations dropped during salvages (#28481) Four kanban dashboard test failures, all from PR salvages that picked up the test additions but dropped the corresponding implementations. - BOARD_COLUMNS: add 'review' (status added by PR f55d94a1e but the board API never grew the column → test_board_empty failed because VALID_STATUSES - {archived} mismatched the rendered columns). - update_task: enrich the 'ready' 409 detail with the blocking parent list (id, title, status) and add _parents_blocking_ready helper. Implementation lost in the #26744 salvage (commit e215558ba) which pinned the test but not the server-side code. - dist/index.js: add parseApiErrorMessage helper, wire it through the drag/drop banner, add patchErr state to the TaskDrawer and surface it inline by the action row. Lost in the same #26744 salvage. - test_diagnostics_endpoint_severity_filter: update to at-or-above semantics (PR a94ddd807 changed the filter from exact-match so the warning filter now correctly includes error+critical too). * fix(gateway): roll over Telegram tool progress bubbles * fix(gateway): scope audio_file_paths outside media_urls guard The audio-file-paths handling block at line 7334 references the variable unconditionally, but #24879 initialized it inside the 'if event.media_urls' block — so events without media_urls hit UnboundLocalError. Found via test_run_agent_queued_message_does_not_treat_commentary_as_final after PR #28478 landed. * fix(gateway): keep tool-progress edits alive after Telegram flood control When a progress-message edit hits Telegram flood control (RetryAfter), can_edit was unconditionally set to False, permanently disabling coalescing for the rest of the run. Subsequent tool updates were posted as separate new messages instead of updating the existing progress bubble. Fix: only set can_edit=False for non-recoverable edit errors. On flood control, back off by resetting _last_edit_ts so the throttle interval is respected before the next edit attempt. Fixes #25188 * chore(release): map @erhnysr for PR #25198 salvage * fix(telegram): preserve can_edit after transient network errors in progress edits (#27828) When edit_message_text fails with a transient error (httpx.ConnectError, NetworkError, server disconnected, timeouts), the progress-message sender must not permanently set can_edit = False — that would convert a single Telegram network hiccup into separate per-tool bubbles for the rest of the run. Changes: - gateway/platforms/telegram.py: edit_message now returns retryable=True for transient network errors (ConnectError, NetworkError, timeouts, server disconnects, temporarily unavailable). Permanent failures (flood control, message-not-found, permissions) remain retryable=False. - gateway/run.py: send_progress_messages checks result.retryable before setting can_edit = False. Transient failures skip the fallback-send and continue — the next edit cycle catches up with the accumulated lines. Permanent failures (flood, message-not-found, etc.) still disable editing. Tests: 22 new tests in test_telegram_progress_edit_transient.py covering transient vs permanent error classification, SendResult.retryable semantics, and the can_edit decision logic. Fixes #27828 * fix(telegram): recover from post-update polling conflict without entering limbo * fix(test+release): update conflict retry count for MAX=5; map @CryptoByz * fix(gateway): route background-process notifications into Telegram DM topics Background-process completion notifications (notify_on_complete) and watch-pattern notifications were always delivered to the Telegram main chat instead of the originating private-chat topic. Hermes-created Telegram DM topic lanes only render a send when it carries both message_thread_id and a reply anchor. The synthetic MessageEvent injected on process completion had no message_id, so _reply_anchor_for_event returned None and _thread_kwargs_for_send dropped message_thread_id entirely — routing the notification to the main chat. Capture the triggering message id at spawn time and thread it through to the synthetic event so it can be reply-anchored back into the topic: - session_context: add HERMES_SESSION_MESSAGE_ID context var - telegram adapter: populate SessionSource.message_id on inbound messages - terminal tool: persist watcher_message_id on the process session - process registry: carry/persist message_id on watcher dicts + checkpoint - gateway: set MessageEvent.message_id on injected notifications Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map @fabiosiqueira for PR #27212 salvage * fix(telegram): route resumed DM topic sends directly * fix(telegram): enforce TELEGRAM_ALLOWED_USERS allowlist on inbound messages TELEGRAM_ALLOWED_USERS was only checked for callback/inline-button actions but not for inbound messages. Unauthorized users triggered an 'Unauthorized user' log warning but their messages were still processed by the agent — a P0 security bypass (issue #23778). Fix: add allowlist check in _should_process_message() which is called for all message types (text, command, media, location). If the sender is not in TELEGRAM_ALLOWED_USERS, the message is dropped immediately with a warning log. Empty TELEGRAM_ALLOWED_USERS continues to allow all users (existing behavior). Fixes #23778 * fix(telegram): fail-closed auth fallback when TELEGRAM_ALLOWED_USERS is empty The _is_callback_user_authorized fallback returned True when TELEGRAM_ALLOWED_USERS was not set, allowing any Telegram user to interact with the bot. Change to fail-closed: deny by default unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set. Fixes #24457 * test(telegram): stub _is_callback_user_authorized in trigger-gating fixture After PR #24468 made the empty-allowlist callback auth fail-closed (and #23795 wired _is_callback_user_authorized into _should_process_message), trigger-gating tests started failing because their fake messages from user 111 hit the new deny-by-default path before trigger evaluation. Force-authorize all senders in _make_adapter() so the trigger logic under test runs. The fail-closed behavior itself is covered by test_telegram_callback_auth_fail_closed.py. * fix(telegram): reset sticky fallback IP on connect failure, retry primary DNS When a sticky fallback IP (from DoH discovery) becomes unreachable, the transport previously got stuck in an attempt_order that only tried the dead IP. This prevented the gateway from recovering until the service was restarted. Changes: - Always include primary DNS path (None) after the sticky IP in the attempt_order so that a primary-path retry happens on sticky failure. - Reset self._sticky_ip to None when the currently sticky IP hits a connect timeout / connect error, allowing the next request to retry from scratch. Fixes silent Telegram disconnection when discovered fallback IPs are transiently or permanently unreachable. * test+release: align stale sticky-IP test for #24511; map @falconexe * fix(telegram): propagate extra base_url config * feat(send_message): auto-detect @username mentions and create Telegram entities When sending messages containing @username patterns, auto-generate MessageEntity(type='mention') entries so that the receiving bot's require_mention filter can trigger. This enables proper bot-to-bot interop where mention-based routing is used. * test+release: align send_message mocks for MessageEntity import; map @fonhal * fix(telegram): resume typing indicator after inline approval click (#27853) The text /approve and /deny paths in gateway/run.py call resume_typing_for_chat() after resolve_gateway_approval() succeeds, but the Telegram inline-button (ea:*) callback in _handle_callback_query did not. Typing is paused when the approval is sent (gateway/run.py:15658), so without a matching resume the typing indicator stayed gone for the remainder of a long-running turn after a button click. Symmetry-match the text path: after a successful resolve, call self.resume_typing_for_chat(str(query_chat_id)). Guarded by count > 0 to matc…

* fix(acp): treat polished tool error payloads as failed * fix(acp): also mark raised-exception tool results as failed Extends #26573 to also catch the case the original PR deliberately left out: when a tool raises an exception, the agent's tool executor wraps it in a canonical 'Error executing tool '<name>': ...' string prefix (see agent/tool_executor.py around the try/except). That prefix is unique to the wrapper and cannot legitimately appear in well-behaved tool output, so it is a safe signal that the tool blew up. Without this, the canonical 'tool raised' case still rendered as a green 'completed' row in Zed despite being a runtime failure — exactly the class of bug #26573 set out to fix. Adds a positive test (raised-exception prefix -> failed) and a negative test (bare 'Error:' word in legit tool output stays completed) so a future contributor doesn't accidentally widen the rule to false-positive on compiler/linter diagnostics. * fix(acp): refresh session info after auto-title * fix(acp): use refresh moment as updated_at on session info push Follow-up to #26543. The sessions table does not have an updated_at column (see hermes_state.py — only started_at/ended_at), so row.get('updated_at') always returned None and the str() coercion was dead code. Use datetime.now(UTC).isoformat() instead, which reflects exactly what the field means here: 'the title was refreshed at this moment'. Drop the dead coercion. * feat(acp): enrich permission request cards * feat(web): mobile dashboard UX polish (#28127) * feat(web): mobile dashboard UX polish Bottom sheets for sidebar theme/language pickers on narrow viewports with enter/exit animation and drag-to-close; inline header badges beside titles; bottom padding on the route outlet for scroll clearance; profiles loading uses a unicode braille spinner; align profile/cron card actions to the top; viewport-fit cover and supporting layout tweaks across dashboard pages. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix Nix web npm hash and mobile sheet accessibility. Align fetchNpmDeps in nix/web.nix with web/package-lock.json for CI. Improve BottomPickSheet backdrop labeling, avoid aria-hidden on the dialog during exit animation, and wire theme/language sheets with listbox semantics and localized dismiss labels. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * feat(install.ps1): strip BOM, add -Commit/-Tag pin params, harden git ops Three install.ps1 improvements pulled from the thin-installer work on bb/gui (PR #27822) that benefit the canonical CLI install flow on main: 1. Strip UTF-8 BOM from scripts/install.ps1. The canonical 'irm <raw URL> | iex' install flow has been broken since commit 4279da4db re-introduced a UTF-8 BOM that PR #27224 had explicitly stripped. PowerShell 5.1's 'irm' returns the response body as a string with the BOM surviving as a leading \ufeff character; 'iex' then evaluates that string and the parser chokes on the invisible character before param(), surfacing as a cascade of 'The assignment expression is not valid' errors at every param default value. File body is verified pure ASCII (no character above byte 127), so PS 5.1 with no BOM falls back to Windows-1252 decoding which is identical to ASCII for our content. Both install paths work: - 'irm ... | iex' (canonical one-liner) - 'powershell -File install.ps1' (programmatic / desktop bootstrap) 2. New -Commit and -Tag string params for reproducible pinning. Higher-precedence variants of -Branch. When set, the repository stage clones $Branch (fast partial fetch) and then 'git checkout's the exact ref. Precedence: Commit > Tag > Branch. Honoured by all three code paths: - Update path (existing valid checkout): fetch + checkout --detach <commit|tag> instead of checkout + pull. - Fresh clone: clone --branch $Branch, then post-clone 'git checkout --detach' to the requested ref. - ZIP fallback: pick archive URL for the most-specific ref (commit -> archive/<sha>.zip, tag -> archive/refs/tags/ <tag>.zip, else archive/refs/heads/<branch>.zip). Used by the Hermes desktop's first-launch bootstrap to pin the .exe to the exact commit it was built against, so the cloned Hermes Agent tree always matches what the .exe was tested with. Also enables release-bundle pinning (e.g. Microsoft Store builds pinning to a release tag) and CI reproducibility. 3. EAP=Continue wrap around the new pin-step git invocations. 'git fetch origin <commit>' writes the routine 'From <url>' info line to stderr. Under the script's global $ErrorActionPreference = 'Stop' that stderr line is wrapped as an ErrorRecord and terminates the script even though fetch+checkout actually succeed. Same EAP=Stop + native-stderr footgun we hit during the install.ps1 hardening pass in Install-Uv, Test-Python, _Run-NpmInstall. Wrap both the update-path fetch/checkout block AND the post-clone pin block in $ErrorActionPreference = 'Continue' (restored in finally). Real failures still caught by $LASTEXITCODE checks. * fix: add default base_url_override for ollama-cloud provider * chore(release): add AUTHOR_MAP entry for falasi * feat(cli): add /update slash command to CLI and TUI (#23854) * feat: add /update slash command to CLI and TUI * test(cli): add Python tests for /update slash command Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cli): address Copilot review for /update slash command Route classic CLI /update through prompt_toolkit modal confirmation and defer relaunch to the main-thread cleanup path after app.exit(). Tighten Y/n semantics, add Python wrapper and catalog coverage tests, and assert /update stays visible in the TUI command catalog. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cli): address review feedback on /update command - Replace raw input() with _prompt_text_input_modal in _handle_update_command to avoid EOF/hang/keystroke-leak races with prompt_toolkit's stdin ownership - Fix confirmation logic: only proceed on recognized affirmative aliases (y/yes/1/ok); cancel on everything else including empty string, typos, and unrecognized input — matches all other [Y/n] prompts in the codebase - Route relaunch through main-thread shutdown path: set _pending_relaunch and return False from process_command so process_loop triggers app.exit(); run() then calls relaunch() after prompt_toolkit has restored terminal modes and after cleanup — safe on both POSIX (execvp) and Windows (subprocess+exit) - Fix misleading docstring in test_update_command.py: the Vitest only covers the TypeScript slash handler that emits code 42, not the Python wrapper branch that acts on it - Rewrite tests to use SimpleNamespace pattern (like test_destructive_slash_confirm) so _prompt_text_input_modal can be stubbed directly - Add Python test for _launch_tui exit-code-42 → relaunch branch in main.py Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com> * fix(cli): polish test fixtures for /update command - Remove unused _prompt_text_input from SimpleNamespace stub - Use pytest.fail sentinel in managed-install guard test to catch unexpected modal invocations Agent-Logs-Url: https://github.com/NousResearch/hermes-agent/sessions/f6da68cf-e7b1-4b7a-aed6-3d4b0f523bdb Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com> * chore: re-trigger CI after Copilot review fixes Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: austinpickett <260188+austinpickett@users.noreply.github.com> * feat(skills): add baoyu-article-illustrator skill * feat(skills): adapt baoyu-article-illustrator for Hermes Adapts the upstream baoyu-article-illustrator skill (verbatim-copied in the previous commit) to Hermes' tool ecosystem, matching the pattern used by baoyu-infographic. - Metadata: openclaw → hermes; add author, license, tags, category - Triggering: slash command + CLI flags → natural language - User config: remove EXTEND.md, first-time-setup, preferences-schema - User prompts: AskUserQuestion (batched) → clarify (one at a time) - Image gen: baoyu-imagine → image_generate (describe refs in prompt text) - Platform: drop Windows/PowerShell; Linux/macOS only - File ops: switch to write_file / read_file - Watermark: opt-in per-article instead of EXTEND.md-driven - Add PORT_NOTES.md describing the adaptation and sync procedure Style, palette, and prompt/system.md reference files are verbatim copies and are the sync points with upstream. * fix(skills): align article-illustrator with real Hermes tool capabilities Addresses review feedback on #13193: 1. Reference-image flow no longer assumes write_file/read_file handle binaries. vision_analyze produces a textual description; the binary is optionally copied via terminal (cp/curl). The description is what gets embedded in prompts. 2. image_generate's URL-only return is now explicit. Step 6 downloads the returned URL to local disk via terminal (curl -sSL -o ...), then verifies non-zero size before proceeding. 3. Removed "Please use nano banana pro..." line from prompts/system.md — the backend is user-configured and not agent-selectable, so routing hints in the prompt are misleading. PORT_NOTES.md updated: prompts/system.md is no longer verbatim, and the file-ops/backend-selection rows now reflect Hermes' actual tool surface (write_file/read_file for text, terminal for binaries and URL downloads, vision_analyze for reading images). * chore(skills/baoyu-article-illustrator): tighten description, add platforms, regen docs * chore(release): map Jack Yang contributor email Adds the contributor email mapping for Jack Yang (@0xjackyang) so future release-note generation attributes commits correctly. Salvage of #27964 by @0xjackyang. * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 7 Pre-stages AUTHOR_MAP entries for 5 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 7). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - 02356abc (#28286 — wecom WSMsgType.CLOSING) - burjorjee (#28201 — inline-shell timeout guard) - oseftg (#28168 — natural response ending: emoji + caret) - rudi193-cmd (#28241 — empty credential pool entries) - sadiksaifi (#27982 — kanban horizontal scroll) Per references/batch-pr-salvage-may14-additions.md. * fix(wecom): handle WSMsgType.CLOSING to prevent CPU spin The WeCom adapter's _read_events() loop only handled CLOSE, CLOSED, and ERROR websocket message types. When the server initiates a graceful shutdown, aiohttp returns WSMsgType.CLOSING before the connection is fully closed. This message type was not handled, causing the receive() call to return immediately in a tight loop while self._ws.closed remained False. The result was 100% CPU usage on the asyncio event loop. Add WSMsgType.CLOSING to the set of terminal message types that raise RuntimeError("WeCom websocket closed"), allowing _listen_loop() to enter its normal reconnect backoff path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(auth): treat empty credential pool entries as unauthenticated Fixes #28140 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: include hermes_plugins in gateway.log component filter gateway.log uses a _ComponentFilter that only passes records from loggers starting with ('gateway',). Plugin modules are loaded under the hermes_plugins.* namespace, so all plugin log output is silently dropped from gateway.log. This makes plugin registration — which directly affects gateway hooks (pre_gateway_dispatch, transform_llm_output, etc.) — invisible in the gateway-specific log. Operators debugging gateway behavior check gateway.log and see no plugin activity, even when plugins are working correctly. Add 'hermes_plugins' to the gateway component prefixes tuple so plugin log messages appear in gateway.log. Closes #28138 * fix(gateway): align kanban artifact _IMAGE_EXTS with response dispatch _deliver_kanban_artifacts used a broader _IMAGE_EXTS that included .bmp, .tiff, and .svg. These three extensions are absent from the equivalent set in _deliver_media_from_response (line 10661), which intentionally routes them through send_document rather than send_multiple_images (comment near line 10522 notes that Telegram sendPhoto recompresses and rejects non-raster formats). Routing .svg (XML text), .bmp, or .tiff through the photo API causes send_multiple_images to raise on most platforms; the exception is caught and logged as a warning, silently dropping the artifact. Aligning the two sets ensures kanban deliverables with these extensions follow the same send_document path as regular agent responses. No behaviour change for .png/.jpg/.jpeg/.gif/.webp. * fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze Background process non-PTY path used stdin=subprocess.PIPE unconditionally, creating an orphan pipe that was never written to and never closed. Child processes that read stdin would block indefinitely, competing with the parent's prompt_toolkit event loop for terminal ownership and causing complete keyboard lockout. Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin reads instead of blocking forever. For interactive stdin, the PTY path (which has its own independent PTY via ptyprocess.PtyProcess.spawn) should be used instead. Fixes #17959 * chore(release): alias stale-ID salvage commit for @LifeJiggy (#28317) * fix(process-registry): detach stdin from background subprocesses to prevent keyboard freeze Background process non-PTY path used stdin=subprocess.PIPE unconditionally, creating an orphan pipe that was never written to and never closed. Child processes that read stdin would block indefinitely, competing with the parent's prompt_toolkit event loop for terminal ownership and causing complete keyboard lockout. Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin reads instead of blocking forever. For interactive stdin, the PTY path (which has its own independent PTY via ptyprocess.PtyProcess.spawn) should be used instead. Fixes #17959 * chore(release): alias stale-ID salvage commit for LifeJiggy PR #28315 was salvaged with a wrong noreply numeric ID (192385615 vs the correct 141562589). The commit on main is correctly authored to LifeJiggy by username, but the noreply email doesn't match AUTHOR_MAP. Adds an alias so release-notes generation maps both forms to the same contributor. --------- Co-authored-by: LifeJiggy <192385615+LifeJiggy@users.noreply.github.com> * fix: elevate plugin discovery failures from debug to warning Plugin discovery exceptions in gateway startup (gateway/run.py) and CLI startup (hermes_cli/main.py) are caught and logged at DEBUG level, making them invisible at the default INFO log level. If any plugin import fails — syntax error, missing dependency, import cycle — operators get zero indication unless they bump the log level to DEBUG. This makes broken plugins appear enabled but silently non-functional. Change both locations to logger.warning() so failures are visible at production log levels. Closes #28137 * fix: treat inline-shell timeout guard as timeout * fix(acp): resolve /tmp symlink before workspace auto-approve check on macOS Path.resolve() follows the /tmp -> /private/tmp symlink on macOS, so str(path).startswith("/tmp/") is always False for temp-dir paths. The "Accept Edits" (workspace_session) mode silently refused to auto-approve every /tmp write on macOS, breaking the documented behaviour and making the existing test fail on this platform. Fix: keep the raw expanded path (pre-resolve) for the /tmp prefix check and continue using the resolved form only for the cwd relative_to() call where symlink resolution is correct behaviour. * fix(kanban): single-row horizontal scroll for board columns Switch .hermes-kanban-columns from auto-fit CSS grid to a flex row with overflow-x: auto and a hidden scrollbar (scrollbar-width / ::-webkit- scrollbar), and pin .hermes-kanban-column to flex: 0 0 280px so columns sit side-by-side at a fixed width instead of wrapping into a 2xN grid. Page vertical scroll is unaffected: each column already caps at max-height: calc(100vh - 220px), so the container never grows tall enough to introduce its own vertical scrollbar. * fix(approval): surface pending-approval state with explicit marker visible to LLM When a tool call requires user approval in the non-blocking gateway path, the LLM previously received a result that was indistinguishable from a failed tool call (exit_code=-1, error=message). The LLM could not tell whether the tool was pending approval, had returned empty results, or had failed silently — causing it to burn context on wrong hypotheses. Fix changes the result format to include: - status: pending_approval (clear state name) - approval_pending: True (explicit boolean for LLMs to detect) - error: cleared to empty string (removes misleading error signal) This lets the LLM reason about approval latency vs actual errors, short-circuiting the previous silent failure mode. Fixes #14806 * fix: recognize emoji and caret as natural response endings GLM models via Ollama report finish_reason='stop' even when the response was truncated by max_tokens. The continuation mechanism uses _has_natural_response_ending() as one of the heuristics to detect whether the response was genuinely finished. Currently only ASCII punctuation and CJK punctuation are recognized. This means any response ending with an emoji (e.g. ⚡, 👍) or the caret character ^ (common in French ^^ smiley) is not recognized as naturally ended, triggering a false-positive continuation where the model receives 'Continue where you left off' and produces garbled output. Add: - ^ (caret) to the punctuation set - Unicode emoji range (codepoint >= 0x1F300) as natural ending This only affects GLM/Ollama users but the fix is safe for all backends since _has_natural_response_ending() is only consulted inside the continuation flow. * chore(release): pre-stage AUTHOR_MAP for May 2026 LHF batch group 8 (#28328) Pre-stages AUTHOR_MAP entries for 10 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 8). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - AceWattGit (#28159 — _pool_may_recover_from_rate_limit NameError) - YuanHanzhong (#28032 — x.com/status fallbacks link-like) - colin-chang (#28245, #28249, #28251 — gateway + mattermost fixes) - felix-windsor (#28019 — preserve cron asterisks in strip mode) - houenyang-momo (#28205 — charizard completion menu contrast) - iqdoctor (#28095 — windows installer docs) - joe102084 (#28151 — whitespace-only cron responses) - jvinals (#27936 — Slack U-IDs → DM channel) - maxmilian (#28267 — ModelPickerDialog portal) - samggggflynn (#27952 — dingtalk pre_start) Per references/batch-pr-salvage-may14-additions.md. * fix: add pre_start() to _IncomingHandler for dingtalk SDK compatibility The dingtalk-stream SDK calls pre_start() on every registered handler before opening the WebSocket connection. Without this method, the SDK raises AttributeError and kills the stream connection, causing DingTalk to be unable to connect via Stream Mode. * fix(windows): handle redirected stdout in _cprint fallback Wraps _pt_print in try/except with a print() fallback. When a kanban worker's stdout is piped to a log file, prompt_toolkit raises NoConsoleScreenBufferError (Windows) or OSError (other) because there is no real console buffer. The fallback keeps worker output flowing instead of crashing. * chore(release): alias stale-ID salvage commit for @Grogger (#28334) PR #28330 was salvaged with a wrong noreply numeric ID (18091625 vs the correct 7065068). The commit on main is correctly authored to Grogger by username, but neither noreply form was in AUTHOR_MAP. Adds both so release-notes generation maps them to @Grogger. * fix(aux): remove stale session_search model menu entry * fix(tui): keep x status citation fallbacks link-like * fix(xai-oauth): quarantine dead tokens on terminal refresh failure resolve_xai_oauth_runtime_credentials() called _refresh_xai_oauth_tokens() with no try/except. A terminal refresh failure (HTTP 400/401/403 — invalid_grant, token revoked) propagated without clearing the dead access_token / refresh_token from auth.json, causing every subsequent session to retry the same doomed network request. Add a try/except around the refresh call that mirrors the existing credential_pool.py quarantine: when _is_terminal_xai_oauth_refresh_error identifies a non-retryable failure, clear the dead token fields from auth.json and write a last_auth_error diagnostic marker so future calls fail fast with a clear relogin_required error instead of hitting the network. active_provider is preserved (set_active=False) so multi-provider users whose chosen provider is not xai-oauth are unaffected. Tests: two new cases in test_auth_xai_oauth_provider.py cover terminal quarantine and transient pass-through. * feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644) The background review prompts (_SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT) now include explicit protection rules for bundled, hub-installed, and pinned skills — aligning with the curator's existing policy at curator.py L345/350. Before this change, bg-review could freely rewrite bundled skills like 'hermes-agent' or pinned skills, while the 7-day curator explicitly skips them. The review agent now sees: • Bundled skills (shipped with Hermes) • Hub-installed skills (installed via hermes skills install) • Pinned skills (marked via hermes curator pin) If only protected skills need updating, the review says 'Nothing to save.' and stops. Fixes #27644 * fix(web): portal Change Model modal so it renders above the app sidebar The dashboard's main column is `relative z-2` (App.tsx), which creates a stacking context that traps fixed descendants below the app sidebar (`z-50`). `ModelPickerDialog` renders `fixed inset-0 z-[100]` inline, so its z-100 is scoped to z-2 and the sidebar covers its left edge. The bug is visible across all themes but only obvious in the Large theme variants (Hermes Teal (Large), etc.) where the larger root font widens the dialog into the sidebar's column. Toast.tsx already documents the same trap and uses the same `createPortal(..., document.body)` escape. This commit ports the picker; the same pattern affects other inline z-[100] modals in the dashboard (OAuthLoginModal, Cron / Models / Profiles page modals) and is left for a follow-up — keeping this PR scoped to the reporter's specific case. Fixes #28103 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): exit code 75 on service restart so launchd relaunches When the gateway receives SIGUSR1 (graceful restart via launchd_restart), the SIGUSR1 handler calls request_restart(via_service=True) and the gateway shuts down cleanly with exit code 0. However, the generated launchd plist uses KeepAlive → SuccessfulExit → false, meaning launchd only relaunches on *non-zero* exit codes. A clean exit(0) is treated as "successful, don't restart", so the gateway stays down after /restart, /update, or SIGUSR1. The systemd unit template already uses RestartForceExitStatus=75 for the same scenario. Mirror that convention: when _restart_via_service is True, raise SystemExit(75) so launchd's SuccessfulExit=false policy triggers a relaunch. Closes #28135 * fix: guard json.loads() against invalid TTS and skill_view responses Two code paths call json.loads() on output from external tools without catching JSONDecodeError. If the tool returns a non-JSON string (error message, empty string, or None), the entire call path crashes. 1. gateway/run.py — text_to_speech_tool() result in voice reply path. A TTS failure that returns an error string instead of JSON crashes the voice reply handler, killing the message response entirely. 2. cron/scheduler.py — skill_view() result when loading skills for cron jobs. A corrupted or missing skill file that returns an error string instead of JSON crashes the cron tick, preventing all jobs from executing that cycle. Both fixes catch (json.JSONDecodeError, TypeError), log a warning, and gracefully skip the failed operation instead of crashing. * fix(gateway): bridge gateway_restart_notification from YAML platform sections Two related bugs in gateway/config.py prevented per-platform gateway_restart_notification from working through config.yaml: 1. The shared-key bridging loop (load_gateway_config) omitted 'gateway_restart_notification', so the key never landed in platform_data['extra'] even when set under e.g. 'discord:' or 'mattermost:' sections. 2. PlatformConfig.from_dict() only read gateway_restart_notification from the top-level data dict, ignoring the 'extra' sub-dict where bridged keys are stored. Fix: add the key to the bridging loop, and add an 'extra' fallback in from_dict() so that round-tripped values (YAML → bridged → extra → from_dict) resolve correctly. Impact: users can now set gateway_restart_notification: false per platform in config.yaml instead of relying on env vars or the global platforms: block. * feat(kanban): add auto_promote_children config toggle When the kanban auto-decomposer fans a triage task into child tasks, recompute_ready() immediately promotes parent-free children to 'ready' so the dispatcher picks them up. Some users want a manual workflow where children stay in 'todo' for review before dispatch. Add 'kanban.auto_promote_children' config key (default: true): - false: children stay in 'todo' after decomposition - true: existing behavior (auto-promote to 'ready') Changes: - kanban_db.py: decompose_triage_task() gains auto_promote param - kanban_decompose.py: reads auto_promote_children from config - kanban dashboard API: exposes the new setting in GET/PUT /orchestration Closes #28016 * fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace The conversation_loop.py references _pool_may_recover_from_rate_limit which was defined in run_agent.py. After the conversation-loop extraction refactor, the helper was no longer in the same module scope. Wrap the call as _ra()._pool_may_recover_from_rate_limit() to route through the run_agent monkeypatch namespace where the helper is available. Adds regression test in test_gemini_fast_fallback.py. Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError. * fix(tui): improve charizard completion menu contrast * docs(windows): avoid piping installer directly into iex * fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without enforcement steering — agents narrate "calling tool X" without actually emitting a tool call, or run partial loops. Both model families fit the same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for (gpt, codex, gemini, gemma, grok, glm). Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Squashed salvage of: - 403e567ce fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS - 9433eabe7 test(agent): use realistic qwen-plus identifier in enforcement test Fixes #28079. * fix(send_message): resolve Slack user IDs to DM channel IDs The _SLACK_TARGET_RE regex only matched IDs starting with C (channel), G (group), or D (direct message). Slack user IDs start with U, causing 'Could not resolve' errors when trying to send DMs to specific users. Changes: - Expand _SLACK_TARGET_RE to accept U-prefixed IDs (user IDs) - Add conversations.open fallback to resolve user IDs to DM channel IDs before sending, since chat.postMessage requires a conversation ID Fixes #ISSUE_NUMBER * fix(gateway): tighten MEDIA extraction regex + silent skip on file-not-found Three related fixes for the MEDIA:<path> extraction pipeline that caused 'file not found' noise in platform channels: 1. run.py — tighten tool-result MEDIA regex from \S+ (any non- whitespace) to require a path pattern with known extensions. Prevents LLM-generated placeholder paths like 'MEDIA:/path/to/example.mp4' from being captured as real media. 2. base.py — remove the |\S+ fallback in extract_media() that catches anything non-whitespace as a potential MEDIA path. This was the primary cause of false positives — strings like '' in tool output were captured as MEDIA: paths. 3. mattermost.py — replace the file-not-found error message sent to the channel with a silent logger.warning() skip. When a path extracted by MEDIA doesn't exist on disk, the channel no longer gets a noisy '(file not found: ...)' message. Impact: eliminates the persistent 'file not found' spam in Mattermost channels caused by over-broad MEDIA regex patterns matching non-path text in tool output. * fix(xai-oauth): split 403 (tier/entitlement) from 400/401 in token endpoint xAI's token endpoint returns HTTP 403 to the OAuth grant when the account isn't on the allowlist for API access (e.g. standard SuperGrok subscribers — see #26847). Treating it like a stale-token 400/401 made ``format_auth_error`` append "Run ``hermes model`` to re-authenticate", which is misleading because re-login can't change xAI's tier decision. Split 403 off in both ``refresh_xai_oauth_pure`` and the loopback login token exchange: * New error code ``xai_oauth_tier_denied`` with ``relogin_required=False`` * Message explains the entitlement gate and points at the ``XAI_API_KEY`` + ``provider: xai`` fallback * 400/401 still set ``relogin_required=True`` as before * 5xx still set ``relogin_required=False`` as before * fix(run-agent): treat any 403 on xai-oauth as entitlement to stop refresh-loop The existing ``_is_entitlement_failure`` heuristic only fires when the response body contains specific substrings ("do not have an active Grok subscription", etc.). xAI has been seen to 403 standard SuperGrok subscribers with a terser body that doesn't match those keywords (#26847), and the recovery path would then mint a fresh token, get a fresh 403, and loop until Ctrl+C. Add a defense-in-depth check at the recovery call site: any 403 on ``provider == "xai-oauth"`` short-circuits ``try_refresh_current`` so the error surfaces immediately with the friendly hint from ``_summarize_api_error``. Keeps the existing keyword path for all other providers untouched. * test(xai-oauth): pin tier-denied 403 behavior + docs warning for #26847 Tests: * ``test_refresh_xai_oauth_pure_403_marked_tier_denied_not_relogin`` — refresh-403 raises ``xai_oauth_tier_denied`` with ``relogin_required=False`` and the API-key fallback hint in body. * ``test_format_auth_error_tier_denied_does_not_suggest_relogin`` — the renderer does not append "Run ``hermes model``" for the new code. * ``test_recover_with_credential_pool_skips_refresh_on_bare_403_for_xai_oauth`` — bare ``{"reason":"forbidden","message":"Forbidden"}`` body (which does not match the existing keyword heuristic) still short-circuits ``try_refresh_current`` on xai-oauth. Docs: * Drop the "(any active tier)" claim from the xai-grok-oauth guide, add a top-of-page warning callout, and a Troubleshooting section for the 403-after-login case pointing at ``XAI_API_KEY`` + ``provider: xai`` as the documented fallback. * fix: handle whitespace-only cron responses * fix(cli): preserve cron asterisks in strip mode * fix(mattermost): resolve thread root_id and route progress to threads Two Mattermost thread-related bugs: 1. _resolve_root_id() — Mattermost CRT requires root_id to be the thread root post. Using any reply's own ID as root_id causes '400 Invalid RootId'. Add _resolve_root_id() that walks up the post chain via API to find the actual root, and apply it in send(), _send_url_as_file(), and _send_local_file(). 2. _progress_reply_to — The condition in run.py only checked Platform.FEISHU, missing Mattermost entirely. This caused tool progress messages to always land in the main channel instead of the thread. Add Platform.MATTERMOST to the condition so progress messages are routed to threads when reply_mode=thread. Impact: Tool progress messages now appear in Mattermost threads instead of flooding the main channel; thread replies no longer fail with Invalid RootId when the reply target is itself a reply. * feat(kanban): archive --rm to hard-delete archived tasks Salvages #19964 by @Beandon13. Adds `hermes kanban archive --rm` to permanently remove already-archived tasks with cascading cleanup of links, comments, events, runs, and notify-subs. Safety guard: only archived tasks can be deleted; active/blocked/done must be archived first. Cherry-picked from #19964 onto current main (severe stale base, applied manually to preserve substance only). * feat(proxy): add xai upstream adapter for Grok via OAuth * chore(release): map @yannsunn for PR #28064 xai proxy adapter salvage * docs(skill): align kanban dispatcher failure_limit text with current default * fix(oauth): add manual-paste fallback for browser-only remote consoles xAI Grok OAuth (and Spotify) use a loopback redirect to ``http://127.0.0.1:<port>/callback`` to capture the authorization code. That works when the browser and Hermes run on the same machine, and the SSH tunnel recipe handles the regular remote case. It breaks completely on **browser-only remote consoles** (GCP Cloud Shell, GitHub Codespaces, AWS EC2 Instance Connect, Gitpod, Replit, …) where the user has a browser but no real SSH client to forward a port — the redirect to 127.0.0.1 on the remote VM simply isn't reachable from the laptop, and there's nothing the existing flow can do about it (#26923). This commit adds the foundation for a manual-paste fallback: * ``_is_remote_session`` now also recognises Cloud Shell, Codespaces, Gitpod, Replit, StackBlitz (in addition to SSH), so the existing tunnel hint at least fires in those environments. * ``_parse_pasted_callback`` accepts any of: a full ``http(s)://...?code=...&state=...`` URL, a bare ``?code=...`` query string, a bare ``code=...&state=...`` fragment, or a bare opaque code value. Returns the same dict shape the HTTP callback handler produces, so the caller's state / error validation works unchanged (no CSRF bypass). * ``_prompt_manual_callback_paste`` reads stdin with a clear multi-line explanation of what's happening and what to paste. * ``_xai_oauth_loopback_login`` gains a ``manual_paste`` kwarg that skips the HTTP listener entirely. The redirect_uri, PKCE verifier, state, and nonce are byte-identical to the loopback path so xAI's token endpoint can't tell the difference at the protocol level. * ``_print_loopback_ssh_hint`` now also mentions ``--manual-paste`` so users without a real SSH client see a path forward instead of a dead-end tunnel recipe. * ``_login_xai_oauth`` threads ``args.manual_paste`` into the loopback helper. * feat(cli): wire --manual-paste into ``hermes auth add`` and ``hermes model`` Register the new ``--manual-paste`` flag on both entry points and thread it through to the xAI loopback login: * ``hermes auth add xai-oauth --manual-paste`` — pool-add path, forwarded inside ``auth_commands.handle_auth_add``. * ``hermes model --manual-paste`` — model-picker path, forwarded by ``_model_flow_xai_oauth`` into the synthetic ``argparse.Namespace`` it passes to ``_login_xai_oauth``. The picker also now forwards ``--no-browser`` and ``--timeout`` for consistency (previously hardcoded to defaults regardless of CLI flags). Help text on both flags points at #26923 and names the browser-only remote consoles (Cloud Shell, Codespaces, EC2 Instance Connect) so users searching ``hermes --help`` can find the workaround. * test+docs(oauth): pin manual-paste semantics and document browser-only path (#26923) Tests (``tests/hermes_cli/test_auth_manual_paste.py``): * 9 parametrised + scalar cases for ``_is_remote_session`` covering the new Cloud Shell / Codespaces / Gitpod / Replit / StackBlitz env vars (plus the existing SSH ones). * 9 cases for ``_parse_pasted_callback`` covering every paste form (full URL, https URL with extra params, bare ``?code=...``, bare ``code=...`` fragment, bare opaque value, error+description, empty, whitespace-only, malformed URL). * 3 cases for ``_prompt_manual_callback_paste`` (happy path, EOF, Ctrl-C). * 3 end-to-end ``_xai_oauth_loopback_login(manual_paste=True)`` cases: the HTTP server MUST NOT be started (asserted via a callable that raises if invoked), wrong state still rejected with ``xai_state_mismatch`` (no CSRF bypass), and empty paste surfaces ``xai_code_missing``. * SSH-hint mention test ensures the ``--manual-paste`` instruction is printed in the remote-session hint. Docs: * ``oauth-over-ssh.md`` — new "Browser-only remote (Cloud Shell / Codespaces / EC2 Instance Connect)" section with the ``--manual-paste`` recipe, plus a TL;DR note for the new flag. * ``xai-grok-oauth.md`` — short subsection pointing at the same recipe and the OAuth-over-SSH guide anchor. * docs(kanban): document max-retries task override * docs(kanban): document inline create shortcuts * test(kanban): cover default board dashboard pin * docs: ignore box diagrams in ascii guard Wrap existing box-drawing diagrams with ascii-guard markers so docs-site checks pass when website docs are touched. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: per-task model override for kanban workers - Add model_override field to Task class and tasks schema - Add migration for existing databases - Spawn worker with -m model when model_override is set * test(kanban-dashboard): cover _task_dict task_age fallback The fix in 061a1830 added an outer try/except in plugin_api._task_dict so that a future failure mode in kanban_db.task_age (anything _safe_int doesn't already absorb) cannot 500 the GET /board response. The _safe_int / task_age corruption paths got regression coverage in tests/hermes_cli/test_kanban_db.py, but the OUTER fallback contract remained untested -- meaning a refactor that drops the try/except would not be caught by CI. Pin that contract from both consumers of _task_dict: - GET /board returns 200 with the literal fallback age dict for the affected card (other cards continue to render via the same path) - GET /tasks/:id (drawer view) returns 200 with the same fallback, so a single corrupt task can't block its own drawer Both tests force task_age to raise RuntimeError rather than ValueError on '%s', because ValueError is absorbed by _safe_int and never reaches the outer try/except -- testing that path would only re-cover what test_kanban_db.py already pins. Manually verified the regression discipline: git checkout 061a1830^ -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both FAIL with 500 git checkout HEAD -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both PASS * fix(kanban): clear _INITIALIZED_PATHS in remove_board so recycled DBs re-init schema Archiving or deleting a board via remove_board() leaves the path's "schema already initialized" entry in the module-level cache. A concurrent connect(board=<slug>) call (e.g. the dashboard event-stream poll loop) then: 1. resolves the same kanban.db path, 2. recreates the directory + an empty sqlite file because connect() does mkdir(parents=True, exist_ok=True), 3. skips the CREATE TABLE pass because the cache entry says the schema is already in place, 4. errors on the next read with `no such table: task_events`. Drop the cache entry before mutating the filesystem so the fresh file gets a proper schema init on next connect(). Applies to both archive=True (rename) and archive=False (rmtree) branches. Fixes #23833. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): add Cache-Control: no-store to plugin static file serving Prevents browser caching of stale dashboard plugin JS files that may contain bugs already fixed upstream (e.g. COLUMN_LABEL undefined). * fix(kanban): seed bundled skills (e.g. kanban-worker) on kanban init Closes #23725 * fix(kanban): ignore stale HERMES_KANBAN_BOARD for removed boards * fix(kanban): keep board-management commands independent from board override * fix(kanban): preserve notifier_profile for dashboard home subscriptions * fix(kanban): promote dependents when a parent is archived * fix(cli): make kanban specify max_tokens configurable * fix(kanban): sync slash subcommands with live parser * fix(kanban): promote blocked tasks when parent dependencies complete recompute_ready only scanned 'todo' tasks for promotion, ignoring 'blocked' tasks entirely. When a task was blocked (e.g. by the circuit breaker) and its parent dependencies later completed, the task stayed stuck in 'blocked' forever unless manually unblocked. Now recompute_ready also scans 'blocked' tasks. When all parents are done/archived, the blocked task is promoted to 'ready' with failure counters reset — equivalent to an automatic unblock. Includes a regression test for the blocked-parent-done promotion path. * fix(kanban): use 'is not None' check for max_runtime_seconds in create_task max_runtime_seconds=0 was being silently coerced to None due to a falsy check (if max_runtime_seconds). Zero is a valid value that causes the dispatcher to immediately time out a task. The adjacent max_retries parameter already used the correct 'is not None' pattern. Fixes the inconsistency by aligning max_runtime_seconds with max_retries. * fix(kanban): reset failure counters on unblock_task When a task is manually unblocked (blocked → ready/todo), the consecutive_failures counter and last_failure_error were left intact. The next failure would immediately re-trip the circuit breaker because the counter was still at or above the failure limit. Reset both fields on unblock so the task gets a fresh retry budget. Includes a regression test that verifies counters are zeroed. * fix(kanban): fingerprint crash errors to prevent fleet-wide retry exhaustion When a systemic failure (provider outage, auth expiry, OOM) crashes multiple workers simultaneously, detect_crashed_workers increments each task failure counter independently. The circuit breaker only trips after N × failure_limit retries across the fleet. Fingerprint crash errors by normalizing host-specific details (PIDs, timestamps). When 3+ tasks crash with the same fingerprint in a single detection cycle, immediately trip the circuit breaker (failure_limit=1) instead of waiting for repeated failures. Isolated crashes (unique fingerprints) retain their normal retry budget. Protocol violations continue to trip immediately. Includes regression tests for systemic and isolated crash paths. * fix(kanban): align board_exists with board discovery rules * fix(kanban): demote ready children when a parent is reopened * fix(kanban): serialize DB initialization * fix(kanban): task_age() tolerates ISO-8601 timestamps Prevents ValueError crash in dashboard get_board() when a task has an ISO timestamp (e.g. "2026-05-10T15:00:00Z") instead of a unix epoch int. Adds _to_epoch() helper that normalises both formats. * Fix Kanban dashboard initial board selection * fix(kanban): persist worker session metadata on completion Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive commit (not the AUTHOR_MAP fixup tip) onto current main. * fix(kanban): make claim ttl configurable Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(kanban): pass accept-hooks to worker chat subprocess * feat(kanban): add board-level default workdir (#25430) * docs(kanban-worker): document notification routing configuration * fix(kanban): preserve worker tools with restricted toolsets * fix(kanban): make legacy task migration idempotent (cherry picked from commit 293f1c3a7241b0117669e049d9aa746c9645ac90) * fix: harden Kanban worker Hermes command resolution * feat(kanban): allow trimmed task comments SS-1647 live SHIP validation: real code + tests for kanban comment --max-len. * fix: show scheduled kanban tasks in dashboard * fix: assign single-task kanban decompositions * fix(kanban-dashboard): make Orchestration mode checkbox label static The checkbox label echoed its state ("Auto (default)" / "Manual") instead of describing the action, so a checked box reading "Auto" parsed as a status indicator rather than a control. The accompanying sub-description was also static and started with "When on, ...", which read awkwardly when the box was unchecked. Replace the dynamic label with a static action label ("Auto-decompose triage tasks") and flip the sub-description between the two modes so it stays accurate either way. The top-of-page Orchestration pill is unchanged — that one is intentionally a status badge / toggle. Fixes #28178 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(env): add HERMES_KANBAN_DISPATCH_IN_GATEWAY override (#21956) Salvages the env-vars docs portion of #21956 by @Bartok9. The ascii-guard-ignore tags from the original PR already landed on main. * fix(kanban): close sqlite connection on init failure to prevent fd leak Salvages #28301 by @Ade5954. If WAL setup, PRAGMA application, or schema init raises after sqlite3.connect() succeeds, the new connection was leaking. Wrap the body in try/except so the connection is closed before the exception propagates. * fix(kanban): don't crash dispatched workers when kanban-worker skill is absent Salvages #27372 by @oemtalks. The dispatcher unconditionally injected `--skills kanban-worker` into every worker spawn, but worker profiles sometimes don't have that bundled skill in their skills dir, which is fatal at CLI startup (`ValueError: Unknown skill(s): kanban-worker`). Adds `_kanban_worker_skill_available(hermes_home)` and only injects the flag when the skill resolves. The MANDATORY lifecycle still ships via KANBAN_GUIDANCE in the system prompt, so omitting the flag is safe. * fix(packaging): ship dashboard plugin assets in wheel Salvages #23737 by @LeonSGP43. Adds plugins/* manifest.json and dist/ glob entries to setuptools package-data so wheel installs ship the bundled dashboard plugin assets (kanban, achievements, etc.). Without these, /api/dashboard/plugins can't discover plugin assets outside a source checkout. * docs(kanban): document worker protocol auto-blocks Salvages #21585 by @helix4u. Documents the protocol_violation event (worker exits successfully while task is still running), adds --max-retries to the create flag list and --failure-limit to dispatch. * fix(oneshot): pass fallback_providers from profile config to AIAgent Salvages #23368 by @uzunkuyruk. Oneshot workers (e.g. kanban workers spawned via 'hermes -p <profile> chat -q ...') were not honouring the profile's fallback_providers / fallback_model chain because oneshot.py never read the config and never passed fallback_model= to AIAgent. Reads cfg.get('fallback_providers') (new list format) or cfg.get('fallback_model') (legacy single-dict) with the same normalization cli.py applies, then forwards as fallback_model=_fb. * fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry. * feat(kanban): add initial-status for human-ops cards Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag (running|blocked, default running) to 'kanban create', threaded through kanban_db.create_task() and the kanban_create tool schema. 'blocked' parks the task directly in the blocked column for R3 human-ops review, skipping the brief running-to-blocked transition. Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and slash-handler error formatting changes that were bundled in the original PR — those should ship as their own focused changes if still wanted. * fix(kanban): release scratch workspace and tmux session on task completion Salvages #27369 by @LeonJS. complete_task() now calls _cleanup_workspace() and _cleanup_worker_tmux() after marking a task complete. Scratch workspaces (used by swarm agents) accumulate on disk — hundreds of MB per task, never released. Stale tmux sessions from completed agents also persist indefinitely. Both gates are safe: - workspace_kind == 'scratch' gate preserves user worktree/dir workspaces - tmux #{pane_dead} == 1 gate only kills sessions where the worker has already exited - best-effort: cleanup failures never block task completion * fix(kanban): honor severity thresholds in diagnostics Salvages #26431 by @LeonSGP43. Dashboard plugin_api list_diagnostics was using exact-match (severity == filter), so '--severity warning' hid 'error' and 'critical' diagnostics. Adds severity_at_or_above() helper to kanban_diagnostics and uses it in the dashboard endpoint (CLI already used SEVERITY_ORDER comparison correctly). * test: isolate Kanban env pins in hermetic fixture Salvages the substantive part of #22295 by @steezkelly. Adds the missing HERMES_KANBAN_HOME, HERMES_KANBAN_RUN_ID, HERMES_KANBAN_CLAIM_LOCK, HERMES_KANBAN_DISPATCH_IN_GATEWAY entries to _HERMES_BEHAVIORAL_VARS so ambient developer-shell pins on those vars don't bleed into pytest runs. The frozenset extraction + standalone regression test from the original PR were dropped to keep the change minimal — main already maintains the list inline. * feat(kanban): add max_in_progress config to cap concurrent running tasks Salvages #22981 by @SimbaKingjoe. Adds 'kanban.max_in_progress' config that caps simultaneously running tasks. When the board already has N running, dispatcher skips spawning so slow workers (local LLMs, resource-constrained hosts) don't pile up and time out. Threads through dispatch_once(max_in_progress=) and gateway dispatcher config parsing with validation (warns on invalid/below-1 values). * fix(packaging): ship bundled skills in wheel Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and optional-skills/ because pyproject's [tool.setuptools.packages.find] only includes Python packages — the skills directories don't have __init__.py so they were silently dropped from the wheel. Adds setup.py with data_files spec emitting skills/* and optional-skills/* under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper in hermes_constants that discovers the wheel-installed location via sysconfig before falling back to a source-checkout path. tools/skills_sync uses the helper so 'hermes update' works for pip-installed users. * fix: 4 small surgical bugs Salvages #23302 by @Bartok9. Four independent one-area fixes: 1. kanban boards delete alias now hard-deletes (not archives) — the alias didn't carry --delete, so getattr(args, 'delete', False) returned False. Detect boards_action=='delete' explicitly. 2. Gateway auto-title failures no longer leak as user-visible warnings — debug-log only since they're not actionable. 3. Background process completion notification snaps truncation to the next newline boundary, prepends a marker when content is dropped. 4. _cprint() schedules the run_in_terminal coroutine via asyncio.ensure_future so output isn't silently dropped from background threads (fixes #23185 Bug A). Skips the double-print fallback that would fire for mock paths. * perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init. * feat(kanban): add --sort option to 'hermes kanban list' Salvages #25745 by @LizerAIDev. Adds --sort {created,created-desc, priority,priority-desc,status,assignee,title,updated} to 'hermes kanban list'. Validated against VALID_SORT_ORDERS map; invalid values raise ValueError. Default behaviour (priority DESC, created ASC) is unchanged when --sort is omitted. * docs: add kanban codex lane skill * feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map contributor email for attribution check * test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks. * docs(codex_app_server): document multi-root Kanban writable_roots (#27941) Update the Codex app-server runtime guide's Kanban section to reflect the new behaviour: * The sandbox override now adds the board DB directory plus every Kanban path the dispatcher pinned (HERMES_KANBAN_WORKSPACES_ROOT, HERMES_KANBAN_WORKSPACE, legacy HERMES_KANBAN_ROOT) -- deduplicated, DB-dir first. * The motivation note now includes the cross-mount artifact-write scenario (e.g. ``/media/.../kanban-workspaces/...`` on a separate drive) and links to issue #27941 so readers can find the original bug report. * fix(gateway): quiet corrupt kanban dispatcher boards Salvages substantive part of #26490 by @aqilaziz. Detects corrupt board DBs ("file is not a database" / "database disk image is malformed") and disables them by fingerprint until they're repaired, instead of flooding the gateway log with repeated logger.exception tracebacks every tick. Cherry-picked the substantive commit (ea5b4ec2a); the tip commit was an unrelated _is_dir OSError fix for service-path lookup. Dropped a small test reformat that was bundled in the same commit. * docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold. * feat(kanban): surface per-task model_override in show + tool output Salvages #26897 by @loicnico96. The per-task model_override DB column already exists on main, but it wasn't exposed in user-facing surfaces. This adds: - 'kanban show' prints 'model: <name>' when model_override is set - kanban_show / kanban_list tool responses include the model_override field Original branch was stale (PR was authored against an older field name 'model'); applied the substantive surface exposure manually using the current 'model_override' field name. * feat(cli): add kanban swarm topology helper Salvages #26791 by @Niraven. Adds 'hermes kanban swarm' to create a durable Kanban Swarm v1 graph: a completed root/blackboard card, parallel worker cards, a verifier gated on all workers, and a synthesizer gated on the verifier. Stores shared swarm blackboard updates as structured JSON comments on the root card. Self-contained: new hermes_cli/kanban_swarm.py module + CLI wiring + unit tests. * feat(kanban): add optional board parameter to all MCP tools Salvages #27598 by @nnnet. Adds optional 'board' parameter to all 9 kanban_* MCP tools via shared _connect helper. Backwards compatible — omitting board keeps current pinned-board behavior. Useful for orchestrator profiles that route across multiple boards. Two-file scope: tools/kanban_tools.py + tests. * feat(kanban): stamp originating ACP session_id on tasks Salvages #23208 by @awizemann. Tracks which chat session created a kanban task so clients can render a per-session board without falling back to tenant + time-window heuristics. - Schema: tasks gains nullable session_id TEXT column with index (additive migration in _migrate_add_optional_columns). - ACP: server.py exposes the originating session id via HERMES_SESSION_ID with save/restore around the agent loop. - Tool: kanban_create reads HERMES_SESSION_ID (with explicit override). - CLI: 'hermes kanban list --session <id>' filter; JSON output exposes session_id. * feat(kanban): wire dispatcher to dispatch review agents from review column Salvages #23772 by @thewillhuang. Adds 'review' as a valid kanban task status and extends dispatch_once to monitor the review column as a second dispatch source (in addition to the existing ready column). - Adds 'review' to VALID_STATUSES - Adds claim_review_task() — atomically transitions review → running - Adds has_spawnable_review() — health telemetry mirror - Extends dispatch_once with a review column dispatch loop - Review agents get 'sdlc-review' skill auto-loaded Resolved 2 conflicts (VALID_STATUSES merge with main's 'scheduled' state, test file additions). Adapted claim_review_task to main's ttl_seconds: Optional[int] = None convention (matches claim_task). * feat(kanban): stale detection for running tasks in dispatcher Salvages #23790 by @thewillhuang. Adds detect_stale_running() to the dispatcher cycle. Running tasks that have been started for longer than dispatch_stale_timeout_seconds (default 14400 = 4h) without a heartbeat in the last hour are auto-reclaimed to ready. - New config kanban.dispatch_stale_timeout_seconds (default 14400, 0 disables) - New 'stale' field on DispatchResult - detect_stale_running() in kanban_db.py with heartbeat freshness check - Records outcome='stale' on run close + 'stale' event; ticks failure counter - Wires config through gateway embedded dispatcher - Updates _cmd_dispatch verbose/JSON output and daemon logging Resolved test-file end-of-file conflict by appending both halves. * feat(kanban): filter tasks by workflow fields and runs by status/outcome Salvages #26745 by @nehaaprasaad. Exposes filtering for the existing workflow_template_id and current_step_key columns: - list_tasks() accepts workflow_template_id and current_step_key kwargs - 'hermes kanban list' adds matching CLI flags - dashboard plugin_api also exposes the filters Resolved a small conflict in list_tasks signature alongside main's session_id and order_by additions; combined all three into the single filter list. * feat(kanban): add respawn guard to block repeat worker storms Salvages #27484 by @fardoche6. Adds a respawn guard that skips worker spawn for tasks where: - a recent run already succeeded (recent_success — within guard window) - the previous run hit a quota/auth error (blocker_auth, also auto-blocks) - a recent task comment includes a GitHub PR URL (active_pr) The guard prevents repeat worker storms on the same bug/task. Includes the contributor's review-findings fixup (regex hardening, observability, auth coverage). Resolved a small DispatchResult conflict alongside main's 'stale' field; kept both. Authorship preserved via rebase merge. * feat(kanban): show dashboard cron jobs across profiles Salvages #27568 by @SerenityTn. Dashboard cron page now lists cron jobs from all profiles, with profile-aware filter UI and storage routing. Includes test coverage for cross-profile listing, mutation, deletion, and validation. Also fixes orphan conflict markers in config.py left by an earlier salvage merge (kanban.dispatch_stale_timeout_seconds was double-nested in HEAD/PR markers from #28452 salvage of #23790). * fix(kanban): remove orphan conflict markers from config.py (#28458) PR #28452 (salvage of #23790, stale detection) merged with leftover git conflict markers in hermes_cli/config.py around the `dispatch_stale_timeout_seconds` config block, breaking config import and any code path that loads it. Cleans up the markers and keeps both config blocks (worker log rotation/orchestrator + stale detection). Resolves a self-introduced regression. * fix(kanban): remove orphan conflict markers from kanban.py (#28459) PR #28454 (salvage of #26745, workflow filter) merged with leftover git conflict markers in hermes_cli/kanban.py at three sites: - _task_to_dict() (session_id alongside workflow_template_id/current_step_key) - p_list parser (--sort alongside --workflow-template-id/--step-key) - _cmd_list (order_by alongside the new filter kwargs) Cleans up the markers and keeps both halves at each site. Resolves a self-introduced regression. * feat(kanban): configure worktree paths and branches Salvages #26496 by @aqilaziz. Adds branch_name column + CLI flag so tasks with workspace_kind='worktree' can pin a target branch on create. Schema migration added to _migrate_add_optional_columns. - Task.branch_name field + DB column + migration - create_task accepts branch_name kwarg - hermes kanban create --branch <name> flag - kanban show output includes 'Branch: <name>' when set Cherry-picked the substantive commit (a7558cf27); the PR's tip was an unrelated service-path-dirs commit. Resolved 2 INSERT-column-list and show-output conflicts alongside main's session_id and max_runtime_seconds additions; kept all three. * feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-developme…

…ousResearch#28328) Pre-stages AUTHOR_MAP entries for 10 new contributors whose PRs are being salvaged in the May 2026 low-hanging-fruit batch (group 8). Lands ahead of the per-PR salvage PRs so they don't get blocked by AUTHOR_MAP CI. Contributors: - AceWattGit (NousResearch#28159 — _pool_may_recover_from_rate_limit NameError) - YuanHanzhong (NousResearch#28032 — x.com/status fallbacks link-like) - colin-chang (NousResearch#28245, NousResearch#28249, NousResearch#28251 — gateway + mattermost fixes) - felix-windsor (NousResearch#28019 — preserve cron asterisks in strip mode) - houenyang-momo (NousResearch#28205 — charizard completion menu contrast) - iqdoctor (NousResearch#28095 — windows installer docs) - joe102084 (NousResearch#28151 — whitespace-only cron responses) - jvinals (NousResearch#27936 — Slack U-IDs → DM channel) - maxmilian (NousResearch#28267 — ModelPickerDialog portal) - samggggflynn (NousResearch#27952 — dingtalk pre_start) Per references/batch-pr-salvage-may14-additions.md.

fix: handle whitespace-only cron responses

7781aa5

BoardJames-Bot approved these changes May 18, 2026

View reviewed changes

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cron Cron scheduler and job management labels May 18, 2026

teknium1 mentioned this pull request May 19, 2026

chore(release): AUTHOR_MAP entries for batch salvage group 8 #28328

Merged

teknium1 mentioned this pull request May 19, 2026

fix(cron): handle whitespace-only responses (#28151) #28352

Merged

teknium1 closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle whitespace-only cron responses#28151

fix: handle whitespace-only cron responses#28151
joe102084 wants to merge 1 commit into
NousResearch:mainfrom
joe102084:fix/cron-whitespace-empty-response

joe102084 commented May 18, 2026

Uh oh!

BoardJames-Bot left a comment

Uh oh!

teknium1 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

joe102084 commented May 18, 2026

Summary

Test Plan

Uh oh!

BoardJames-Bot left a comment

Choose a reason for hiding this comment

Uh oh!

teknium1 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants