some cleanups by teknium1 · Pull Request #7 · NousResearch/hermes-agent

teknium1 · 2025-11-05T03:48:29Z

No description provided.

1. browser_tool.py: Replace **args spread on browser_click, browser_type, and browser_scroll handlers with explicit parameter extraction. The **args pattern passed all dict keys as keyword arguments, causing TypeError if the LLM sent unexpected parameters. Now extracts only the expected params (ref, text, direction) with safe defaults. 2. fuzzy_match.py: Update module docstring to match actual strategy order in code. Block anchor was listed as #3 but is actually #7. Multi-occurrence is not a separate strategy but a flag. Updated count from 9 to 8.

Mobile responsive (Issue NousResearch#21): - Hamburger sidebar: slide-in overlay on mobile (<640px) with backdrop. Tap hamburger in topbar to open, tap outside to close. Full session list, project chips, all panel content accessible. - Bottom navigation bar: 5-tab fixed bar (Chat, Tasks, Skills, Memory, Spaces) replaces sidebar nav tabs on mobile. iOS-style layout. Tapping a tab opens the sidebar overlay with that panel active. - Right panel slide-over: Files button in topbar chips opens workspace panel as a slide-over from the right on mobile/tablet. - Touch targets: all interactive elements get min 44x44px touch areas. Session items, approval buttons, composer buttons all sized for fingers. - Composer positioned above bottom nav bar with proper spacing. - Sidebar nav tabs and bottom section hidden on mobile (replaced by bottom nav + topbar chips). - Clicking a session auto-closes the sidebar overlay. - Desktop layout completely unchanged — all mobile elements are display:none by default, only shown inside @media(max-width:640px). Docker (Issue NousResearch#7): - Dockerfile: python:3.12-slim, HERMES_WEBUI_HOST=0.0.0.0, port 8787. - docker-compose.yml: named volume for state persistence, optional ~/.hermes mount for agent features, password env var documented. - README: Docker quick start section with compose and manual commands. Tests: 392 passed, 23 pre-existing failures, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sprint 21: Mobile responsive layout + Docker support (Issues NousResearch#21, NousResearch#7)

Phase4.2 pinned models

- Add message input bar at bottom of AgentOutputPanel - Send messages via /api/sessions/send to agent sessions - User messages appear immediately in output feed - Agent resumes from idle state on message send - Feed event logged for sent messages - Both compact and full-size panel instances wired

#3 Cost Analytics Dashboard - New Analytics tab with summary cards (total tokens, cost, avg/mission, today, week) - CSS bar charts: cost by agent, cost by model, daily timeline (7d) - No external chart libraries — pure Tailwind #4 Export Mission as Markdown - Download .md file with full mission report (goal, team, transcript, artifacts) - Copy to clipboard button with visual feedback - Wired into Mission Detail Overlay #5 Word-by-word Streaming in Agent Chat - Replaced polling with SSE EventSource in AgentChatPanel - Real-time chunk streaming with fallback to polling on error - Streaming assistant message updates in-place #6 Remote Agents Panel - Fetches external sessions from gateway /api/sessions - Filters out local agent sessions — shows only remote/external - Auto-polls every 15s, card layout with status, model, tokens, cost - Open Chat links to ClawSuite chat tab #7 Real-time Collaboration (Presence) - BroadcastChannel-based cross-tab presence detection - Shows colored avatars of other users viewing Agent Hub - Heartbeat every 5s, stale cleanup at 30s - Shows which tab each peer is viewing

- CollaborationPresence component using BroadcastChannel API - Per-tab userId + random color via sessionStorage - 3s heartbeats, 10s stale timeout, localStorage persistence - Colored avatar circles with overflow count - Replaces old PresenceIndicator in tab nav header

…xec, not only at spawn' (NousResearch#7) from feat/docker-env-files-per-exec into main

…adiness PR NousResearch#7 shipped `docker_env_files` per-exec re-read as a tactical fix (rotating BW_SESSION on oikos). This refactors it into a shape that's review-ready for an upstream proposal, addressing every gap a careful reviewer would flag, while preserving full backwards compatibility for the existing oikos config. Changes: - Extract the file-reading loop in `_DockerEnvironment.execute()` into a new instance hook `_extra_env_for_exec(self) -> dict[str, str]`. Default implementation re-reads `self._env_files`. Subclasses and sibling subsystems (the credential registry being the canonical example) can override to inject any other dynamic env per exec without re-touching `execute()`. The exec path now does `exec_env.update(self._extra_env_for_exec())` instead of an inline loop. Hook exceptions are non-fatal — logged and the exec proceeds with whatever static env is available. - Move parsing into a `_parse_env_files` classmethod that: * canonicalizes each path via `Path.resolve()` once at __init__ time, so subsequent symlink swaps cannot redirect reads mid-task (TOCTOU defense — mirrors the existing pattern in `tools/credential_files.py:84-95`), * validates the resolved path is inside an allowlist of safe parent directories (default: /run/hermes-creds, /run/secrets, $XDG_RUNTIME_DIR, $HERMES_HOME), overridable via TERMINAL_DOCKER_ENV_FILES_ALLOWED_DIRS env var (colon-separated; empty disables the check entirely), * accepts paths that don't exist yet — sidecars may write the file after hermes-agent startup, * rejects empty var names and empty paths with a clear log line. - Add `_read_env_file_value(var_name, file_path)` classmethod that: * caps reads at 64 KiB (`_ENV_FILES_MAX_SIZE`) — files larger than that fail with a clear log line instead of confusing E2BIG when `execve` runs, * trims exactly one trailing `\n` or `\r\n` (the common `echo $value > file` shape) — NOT `.strip()`, which would corrupt PEM bodies, JSON blobs, and any value with significant leading whitespace, * decodes UTF-8 and rejects non-UTF-8 binary content with a clear log line. - 17 new tests in `tests/tools/test_docker_environment.py` covering: valid entry; invalid format; empty var name; symlink resolution; path doesn't have to exist; allowlist rejects outside paths; allowlist accepts inside paths; trailing newline trim; CRLF trim; leading-whitespace preservation (PEM body case); JSON-blob leading-whitespace preservation; size limit rejection; size limit boundary acceptance; missing file → None + warning; non-UTF-8 → None + warning; per-exec re-read (rotation propagates); failed entries skipped without breaking good ones. - Document `terminal.docker_env_files` in `cli-config.yaml.example` with a worked Bitwarden-sidecar example, the security model (canonicalization, allowlist, size cap, no inspect leakage), and the broader use cases (Vault, OIDC, AWS credential helpers). Behavior preserved for the existing oikos config: `BW_SESSION:/run/hermes-creds/bw-session` still resolves correctly (/run/hermes-creds is in the default allowlist), still re-reads on every exec, still injects per-exec via `-e BW_SESSION=...`. The subprocess that reads the file is unchanged. Net effect for oikos is zero behavior change; the upstream-readiness machinery sits underneath. Pre-existing test failures unrelated to this branch: test_execute_uses_hermes_dotenv_for_allowlisted_env, test_execute_prefers_shell_env_over_hermes_dotenv, test_auto_mount_host_cwd_adds_volume — all failing on `main` before this branch, all unrelated to env_files. Tracked separately.

Discovered while verifying PR NousResearch#8 on the live oikos sidecar: BW_SESSION values were appearing UNMASKED in /opt/data/logs/agent.log because the existing masking heuristic only matches env names whose UPPERCASE form contains TOKEN / KEY / SECRET / PASSWORD / CREDENTIAL / PASSWD — and "BW_SESSION" matches none of those. The full rotating session token was sitting plaintext in a long-lived log file, defeating the entire point of rotating it. Two-pronged fix to make this hard to mis-name in the future: 1. **Origin-based masking (the load-bearing rule).** Anything that came from `_extra_env_for_exec()` is masked unconditionally, regardless of variable name. By construction the hook produces dynamic credential values (otherwise why inject them per exec rather than at container spawn?). This catches the gitea PR NousResearch#7 / NousResearch#8 `docker_env_files` path AND any future credentials++ registry subscribers, without relying on operators to pick "secret-sounding" variable names. 2. **Expanded name heuristic** for static env vars whose values come from `self._env` / forward_env / passthrough. Adds: SESSION, AUTH, COOKIE, JWT, BEARER, SIGNATURE, PIN, PASSPHRASE, PRIVATE. False-positive cost (over-masking a log line) is much lower than false-negative cost (a real credential in plaintext), so the policy is "add aggressively than reluctantly." 4 new tests in tests/tools/test_docker_environment.py: - SESSION-named env masked (regression for the BW_SESSION case) - AUTH/COOKIE/JWT/BEARER/PASSPHRASE values masked - dynamic-origin values masked even with innocuous names like INNOCENT_VAR (origin rule overrides name heuristic absence) - readability check: PORT/DEBUG/HOME etc are NOT over-masked Verified live on the running hermes-angelos build: agent.log now shows `-e BW_SESSION=***` instead of the full session token.

…t, stop/undo honesty, json_error crash, codex validation, deep-link race Bug #1: ChatPage loadSession reads res.items (not res.transcript) to match backend Bug NousResearch#2: Add GET /api/gui/session-search backed by SessionDB.search_messages (FTS5) Bug NousResearch#3: Stop button now checks res.supported before claiming run was stopped Bug NousResearch#4: Undo button now checks res.supported before removing messages locally Bug NousResearch#5: Fix _json_error positional calls in handle_chat_compress (was crashing 500) Bug NousResearch#6: Codex provider validation now also guards switching TO openai-codex Bug NousResearch#7: Deep-link hash check runs before health callback to prevent race condition

Security: - NousResearch#1: secret is now required (ValueError if empty), server won't start without it - NousResearch#2: error responses return generic 'Delivery failed', no internal state leak - NousResearch#3: rate limiting: 1 msg per chat_id per 30s, returns 429 + retry_after - NousResearch#8: logging uses %-style formatting, not f-strings Functionality: - NousResearch#4: uses gateway.adapters dict directly (verified interface exists) - NousResearch#5: thread_id support via metadata dict (matches DeliveryRouter pattern) - NousResearch#6: uses adapter.send(chat_id, content, metadata=) — the real BasePlatformAdapter interface Code quality: - NousResearch#7: added tests/test_notify.py with 14 test cases covering all 8 issues - Platform string → Platform enum conversion with validation

Completes the skill scaffolded in the previous commit (Day 3-5 of the staged plan). Added ----- * scripts/sandbox.py — subprocess sandbox for code candidates: - POSIX rlimit caps (CPU, AS, DATA, CORE) applied as best-effort in a preexec_fn; each limit failure is swallowed independently so macOS's RLIMIT_AS-incompatibility doesn't break the others. - Wall-clock timeout via subprocess.run(timeout=...) as a backstop. - run_candidate_code() and run_pytest_suite() helpers; the latter parses the pytest terse summary into a pass fraction. * scripts/adapters.py — Tier 2 + Tier 3 wrappers: - ExternalEvolverAdapter: lazy shutil.which detection, raises AdapterUnavailable with install hint instead of crashing. - openevolve_adapter (Apache 2.0, default Tier 2 recommendation). - darwinian_evolver_adapter (Imbue, AGPL v3) — subprocess only, never imported, so license-viral code never enters the Hermes process. - export_dspy_jsonl: DSPy-compatible offline records with full lineage; default keeps one winner per generation, --all emits every candidate. - export_gepa_trace: reflective-operator edges (critique_then_edit, meta_mutator) in the shape GEPA's trainer expects. * templates/*.py — five copy-paste-ready fitness templates for prompt, regex, SQL, code (uses sandbox), and multi-objective (NSGA-II) runs. * demos/summarize_10_words/ — end-to-end packaged demo: - fitness.py: deterministic scoring (word count proximity, brevity keyword, char budget), so the demo runs cheaply without an LLM judge and the improvement curve is visible on a small local model. - seed/initial.txt, README.md with exact commands and expected trajectory. Changed ------- * scripts/evolver.py: cmd_export now delegates to adapters.export_*; added --all flag for dspy-jsonl exports; imports adapters module. Tests (tests/skills/test_darwinian_evolver.py) ---------------------------------------------- Expanded from 26 to 39 cases — all green: * TestSandbox — simple candidate runs, syntax error fails cleanly, runaway while-True loop killed within wall-clock+overhead, pytest terse-summary parser. * TestAdapterGracefulAbsence — openevolve + darwinian-evolver adapters raise AdapterUnavailable with license-informed install hint when the binary is missing (monkeypatched shutil.which). * TestDspyBridge — default export keeps one record per generation; --all mode emits every candidate; GEPA export filters to reflective operators and preserves parent/child metadata. * TestLLMClient — seed propagation verified by intercepting the AsyncClient.post body; BudgetLedger records spend across calls and raises BudgetExceeded when cap is crossed. * TestMapElitesCoverage — random 200-sample run fills ≥60 % of a 4×4 descriptor grid (acceptance checklist NousResearch#4). * TestEndToEnd — single generation with a fully-mocked LLM produces an offspring strictly better than the seed and yields a stable lineage hash (acceptance checklist NousResearch#7, plus determinism smoke). Acceptance checklist status: 10/10 covered. Repo-level notes ---------------- The original examples/ subdirectory is renamed demos/ because the repository-level .gitignore lists ``examples/``; keeping the demo as ``demos/`` means it ships to users who install the skill without requiring an exception in .gitignore.

Follow-up on #13724: showing literally every source was too noisy.\n\n now fetches a wider window (, larger limit) and then filters to a curated allowlist of human-facing sources (tui/cli plus chat adapters like telegram/discord/slack/whatsapp/etc). This keeps row #7 fixed (telegram sessions visible in /resume) without surfacing internal source kinds such as tool/acp.

* fix(tui): raise picker selection contrast with inverse + bold Selected rows in the model/session/skills pickers and approval/clarify prompts only changed from dim gray to cornsilk, which reads as low contrast on lighter themes and LCDs (reported during TUI v2 blitz). Switch the selected row to `inverse bold` with the brand accent color across modelPicker, sessionPicker, skillsHub, and prompts so the highlight is terminal-portable and unambiguous. Unselected rows stay dim. Also extends the sessionPicker middle meta column (which was always dim) to inherit the row's selection state. * fix(model-switch): drop stale provider from fallback chain and env after /model Reported during the TUI v2 blitz test: switching from openrouter to anthropic via `/model <name> --provider anthropic` appeared to succeed, but the next turn kept hitting openrouter — the provider the user was deliberately moving away from. Two gaps caused this: 1. `Agent.switch_model` reset `_fallback_activated` / `_fallback_index` but left `_fallback_chain` intact. The chain was seeded from `fallback_providers:` at agent init for the *original* primary, so when the new primary returned 401 (invalid/expired Anthropic key), `_try_activate_fallback()` picked the old provider back up without informing the user. Prune entries matching either the old primary (user is moving away) or the new primary (redundant) whenever the primary provider actually changes. 2. `_apply_model_switch` persisted `HERMES_MODEL` but never updated `HERMES_INFERENCE_PROVIDER`. Any ambient re-resolution of the runtime (credential pool refresh, compressor rebuild, aux clients) falls through to that env var in `resolve_requested_provider`, so it kept reporting the original provider even after an in-memory switch. Adds three regression tests: fallback-chain prune on primary change, no-op on same-provider model swap, and env-var sync on explicit switch. * fix(tui): @folder: only yields directories, @file: only yields files Reported during TUI v2 blitz testing: typing `@folder:` in the composer pulled up .dockerignore, .env, .gitignore, and every other file in the cwd alongside the actual directories. The completion loop yielded every entry regardless of the explicit prefix and auto-rewrote each completion to @file: vs @folder: based on is_dir — defeating the user's choice. Also fixed a pre-existing adjacent bug: a bare `@file:` or `@folder:` (no path) used expanded=="." as both search_dir AND match_prefix, filtering the list to dotfiles only. When expanded is empty or ".", search in cwd with no prefix filter. - want_dir = prefix == "@folder:" drives an explicit is_dir filter - preserve the typed prefix in completion text instead of rewriting - three regression tests cover: folder-only, file-only, and the bare- prefix case where completions keep the `@folder:` prefix * fix(tui): truncate long picker rows so the height stays stable A6 added a fixed-height grid (Array.from({length: VISIBLE})), but the row <Text> itself had no wrap prop so Ink defaulted to wrap="wrap". A sufficiently long model or provider name would wrap to a second visual line and bounce the overall picker height right back — which is exactly what reappeared during the TUI v2 blitz retest on /model. Pin every picker row (and the empty-state / padding rows) to wrap="truncate-end" so each slot is guaranteed one line. Applies across modelPicker, sessionPicker, and skillsHub. * fix(tui): stabilize slash-completion dropdown height The completion popup (e.g. typing `/model`) grew from 8 rows at compIdx=0 up to 16 rows at compIdx≥8 — the slice end was `compIdx + 8` so every arrow-down added another rendered row until the window filled. Reported during TUI v2 retest: "as i scroll and more options appear, for some reason more options appear and it expands the height". Fixed viewport (`COMPLETION_WINDOW = 16`) centered on compIdx, clamped so it never slides past the array bounds. Renders exactly `min(WINDOW, completions.length)` rows every frame. * fix(tui): pager supports scrolling (up/down/page/top/bottom) The pager overlay backing /history, /toolsets, /help and any paged slash output only advanced with Enter/Space and closed at the end. Could not scroll back, scroll line-by-line, or jump to endpoints. Adds Up/Down (↑↓, j/k), PgUp (b), g/G for top/bottom, keeps existing Enter/Space/PgDn forward-and-auto-close, and clamps offset so over-scrolling past the last page is a no-op. * fix(tui): preserve prior segment output on Ctrl+C interrupt interruptTurn only flushed the in-flight streaming chunk (bufRef) to the transcript before calling idle(), which wiped segmentMessages and pendingSegmentTools. Every tool call and commentary line the agent had already emitted in the current turn disappeared the moment the user cancelled, even though that output is exactly what they want to keep when they hit Ctrl+C (quote from the blitz feedback: "everything was fine up until the point where you wanted to push to main"). Append each flushed segment message to the transcript first, then render the in-flight partial with the `*[interrupted]*` marker and its pendingSegmentTools. Sys-level "interrupted" note still fires when there is nothing to preserve. * fix(tui): route skills.manage through the long-handler thread pool `/skills browse` is documented to scan 6 sources and take ~15s, but the gateway dispatched `skills.manage` on the main RPC thread. While it ran, every other inbound RPC — completions, new slash commands, even `approval.respond` — blocked until the HTTP fetches finished, making the whole TUI feel frozen. Reported during TUI v2 retest: "/skills browse blocks everything else". `_LONG_HANDLERS` already exists precisely for this pattern (slash.exec, shell.exec, session.resume, etc. run on `_pool`). Add `skills.manage` to that set so browse/search/install run off the dispatcher; the fast `list` / `inspect` actions pay a negligible thread-pool hop. * improve llama.cpp skill * fix(skills/llama-cpp): concise description, restore python bindings, fix curl - Description truncated to 60 chars in system prompt (extract_skill_description), so the 500-char HF workflow description never reached the agent; shortened to 'llama.cpp local GGUF inference + HF Hub model discovery.' (56 chars). - Restore llama-cpp-python section (basic, chat+stream, embeddings, Llama.from_pretrained) and frontmatter dependencies entry. - Fix broken 'Authorization: Bearer ***' curl line (missing closing quote; llama-server doesn't require auth by default). * fix(gateway): always inject reply-to pointer, not just when quoted text is absent (#13676) The [Replying to: "..."] prefix is disambiguation, not deduplication. When a user explicitly replies to a prior message, the agent needs a pointer to which specific message they're referencing — even when the quoted text already exists somewhere in history. History can contain the same or similar text multiple times; without an explicit pointer the agent has to guess (or answer for both subjects), and the reply signal is silently dropped. Example: in a conversation comparing Japan and Italy, replying to the "Japan is great for culture..." message and asking "What's the best time to go?" — previously the found_in_history check suppressed the prefix because the quoted text was already in history, leaving the agent to guess which destination the user meant. Now the pointer is always present. Drops the found_in_history guard added in #1594. Token overhead is minimal (snippet capped at 500 chars on the new user turn; cached prefix unaffected). Behavior becomes deterministic: reply sent ⇒ pointer present. Thanks to smartyi for flagging this. * feat(image-gen): add GPT Image 2 to FAL catalog (#13677) Adds OpenAI's new GPT Image 2 model via FAL.ai, selectable through `hermes tools` → Image Generation. SOTA text rendering (including CJK) and world-aware photorealism. - FAL_MODELS entry with image_size_preset style - 4:3 presets on all aspect ratios — 16:9 (1024x576) falls below GPT-Image-2's 655,360 min-pixel floor and would be rejected - quality pinned to medium (same rule as gpt-image-1.5) for predictable Nous Portal billing - BYOK (openai_api_key) deliberately omitted from supports so all users stay on shared FAL billing - 6 new tests covering preset mapping, quality pinning, and supports-whitelist integrity - Docs table + aspect-ratio map updated Live-tested end-to-end: 39.9s cold request, clean 1024x768 PNG * refactor(delegate): drop dead default_toolsets from CLI default config delegation.default_toolsets was declared in cli.py's CLI_CONFIG default dict and documented in cli-config.yaml.example, but never read: none of tools/delegate_tool.py, _load_config(), or any call site ever looked it up. The live fallback is the DEFAULT_TOOLSETS module constant at tools/delegate_tool.py:101, which stays as-is. hermes_cli/config.py's DEFAULT_CONFIG["delegation"] already omits the key — this commit aligns cli.py with that. Adds a regression test in tests/hermes_cli/test_config_drift.py so a future refactor that re-adds the key without wiring it up to _load_config() fails loudly. Part of Initiative 2 / M0.5. * docs(delegate): remove default_toolsets from example config and docs Matches the default-config removal in the preceding commit. default_toolsets was documented for users to set but was never actually read at runtime, so showing it in the example config and the delegation user guide was misleading. No deprecation note is added: the key was always a no-op, so users who copied it from the example continue to see no behavior change. Their config.yaml still parses; the key is just silently unused, same as before. Part of Initiative 2 / M0.5. * test(delegate): make default_toolsets regression test robust to user config The prior form of this test asserted on CLI_CONFIG["delegation"] after importing cli, which only passed by accident of pytest-xdist worker scheduling. cli._hermes_home is frozen at module import time (cli.py:76), before the tests/conftest.py autouse HERMES_HOME-isolation fixture can fire, so CLI_CONFIG ends up populated by deep-merging the contributor's actual ~/.hermes/config.yaml over the defaults (cli.py:359-366). Any contributor (like me) who still has the legacy key set in their own config causes a false failure the moment another test file in the same xdist worker imports cli at module level. Asserting on the source of load_cli_config() instead sidesteps all of that: the test now checks the defaults literal directly and is independent of user config, HERMES_HOME, import order, and worker scheduling. Demonstrated failure mode before this fix: pytest tests/hermes_cli/test_config_drift.py \ tests/hermes_cli/test_skills_hub.py -o addopts="" -> FAILED (CLI_CONFIG["delegation"] contained "default_toolsets" from the user's ~/.hermes/config.yaml) Part of Initiative 2 / M0.5. * feat(gateway): recognize .pdf in MEDIA: tag extraction (#13683) PDFs emitted by tools (report generators, document exporters, etc.) now deliver as native attachments when wrapped in MEDIA: — same as images, audio, and video. Bare .pdf paths are intentionally NOT added to extract_local_files(), so the agent can still reference PDFs in text without auto-sending them. * fix(tui): inject VS16 so text-default emoji render as color glyphs Models frequently emit bare codepoints like U+26A0 (⚠), U+2139 (ℹ), U+2764 (❤), U+2714 (✔), U+2600 (☀), U+263A (☺) which, per Unicode, have Emoji_Presentation=No and render as monochrome text-style glyphs in terminals unless followed by VS16 (U+FE0F). Agent output leaked through the TUI like `⚠ careful` instead of `⚠️ careful`. Added `ensureEmojiPresentation` (lib/emoji.ts): scans for the curated set of text-default codepoints and appends VS16 when the next char is not already VS16, ZWJ, or a keycap-enclosing mark. Idempotent and fast-pathed by a Unicode-range regex so ASCII-heavy text is untouched. Applied once at the top of `Md`'s line parse. Hermes-ink's stringWidth already accounts for VS16, so cursor/layout stays correct. * feat(delegate): orchestrator role and configurable spawn depth (default flat) Adds role='leaf'|'orchestrator' to delegate_task. With max_spawn_depth>=2, an orchestrator child retains the 'delegation' toolset and can spawn its own workers; leaf children cannot delegate further (identical to today). Default posture is flat — max_spawn_depth=1 means a depth-0 parent's children land at the depth-1 floor and orchestrator role silently degrades to leaf. Users opt into nested delegation by raising max_spawn_depth to 2 or 3 in config.yaml. Also threads acp_command/acp_args through the main agent loop's delegate dispatch (previously silently dropped in the schema) via a new _dispatch_delegate_task helper, and adds a DelegateEvent enum with legacy-string back-compat for gateway/ACP/CLI progress consumers. Config (hermes_cli/config.py defaults): delegation.max_concurrent_children: 3 # floor-only, no upper cap delegation.max_spawn_depth: 1 # 1=flat (default), 2-3 unlock nested delegation.orchestrator_enabled: true # global kill switch Salvaged from @pefontana's PR #11215. Overrides vs. the original PR: concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only, no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which silently enabled one level of orchestration for every user). Co-authored-by: pefontana <fontana.pedro93@gmail.com> * fix(auxiliary): refresh Nous runtime credentials after aux 401s * docs(delegate): clarify that the parent agent, not the user, populates goal/context (#13698) The 'subagents know nothing' warning and the 'no conversation history' constraint both said the user provides the goal/context fields. In practice the LLM parent agent calls delegate_task; the user configures the feature but doesn't write delegation calls. Rewording to point at the parent agent matches how the tool actually works. * fix(vision): resolve Nous vision model correctly in auto-detect path Two changes: 1. _PROVIDER_VISION_MODELS: add 'nous' -> 'xiaomi/mimo-v2-omni' entry so the vision auto-detect chain picks the correct multimodal model. 2. resolve_provider_client: detect when the requested model is a vision model (from _PROVIDER_VISION_MODELS or known vision model names) and pass vision=True to _try_nous(). Previously, _try_nous() was always called without vision=True in resolve_provider_client(), causing it to return the default text model (gemini-3-flash-preview or mimo-v2-pro) instead of the vision-capable mimo-v2-omni. The _try_nous() function already handled free-tier vision correctly, but the resolve_provider_client() path (used by the auto-detect vision chain) never signaled that a vision task was in progress. Verified: xiaomi/mimo-v2-omni returns HTTP 200 with image inputs on Nous inference API. google/gemini-3-flash-preview returns 404 with images. * chore(release): add Ifkellx to AUTHOR_MAP for PR #12687 * fix(security): TUI approval overlay accepts blind keystrokes, CLI thread-local callback invisible to agent Two bugs that allow dangerous commands to execute without informed user consent. TUI (Ink): useInputHandlers consumes the isBlocked return path, but Ink's EventEmitter delivers keystrokes to ALL registered useInput listeners. The ApprovalPrompt component receives arrow keys, number keys, and Enter even though the overlay appears frozen. The user sees no visual feedback, but keystrokes are processed — allowing blind approval, session-wide auto-approve (choice "session"), or permanent allowlist writes (choice "always") without the user knowing. Discovered while replicating #13618 (TUI approval overlay freezes terminal). Fix: in useInputHandlers, when overlay.approval/clarify/confirm is active, only intercept Ctrl+C. All other keys pass through. This makes the overlay visually responsive so the user can see what they are selecting. CLI (prompt_toolkit): _callback_tls in terminal_tool.py is threading.local(). set_approval_callback() is called in the main thread during run(), but the agent executes in a background thread. _get_approval_callback() returns None in the agent thread, falling back to stdin input() which prompt_toolkit blocks. The user sees the approval text but cannot respond — the terminal is unusable until the 60s timeout expires with a default "deny". Fix: set callbacks inside run_agent() (the thread target), matching the pattern already used by acp_adapter/server.py. Clear on thread exit to avoid stale references. Closes #13618 * test(approval): regression guards for thread-local callback contract Two unit tests that pin down the threading.local semantics the CLI freeze fix (#13617 / #13618) relies on: - main-thread registration must be invisible to child threads (documents the underlying bug — if this ever starts passing visible, ACP's GHSA-qg5c-hvr5-hjgr race has returned) - child-thread registration must be visible from that same thread AND cleared by the finally block (documents the fix pattern used by cli.py's run_agent closure and acp_adapter/server.py) Pairs with the fix in the preceding commit by @Societus. * fix(vision): route Nous main-provider vision through tier-aware backend * fix(vision): restore tier-aware Nous vision model selection (#13703) Revert two overreaches from #13699 that forced paid Nous vision to xiaomi/mimo-v2-omni instead of the tier-appropriate gemini-3-flash-preview: 1. Remove "nous": "xiaomi/mimo-v2-omni" from _PROVIDER_VISION_MODELS — #13696 already routes nous main-provider vision through the strict backend, and this entry caused any direct resolve_provider_client( "nous", ...) aggregator-lookup path to pick the wrong model for paid. 2. Drop the 'elif vision' paid override in _try_nous() that forced mimo-v2-omni on every Nous vision call regardless of tier. Paid accounts now keep gemini-3-flash-preview for vision as well as text. Free-tier behavior unchanged: still uses mimo-v2-omni for vision, mimo-v2-pro for text (check_nous_free_tier() branch). E2E verified: paid vision → google/gemini-3-flash-preview free vision → xiaomi/mimo-v2-omni paid text → google/gemini-3-flash-preview free text → xiaomi/mimo-v2-pro * feat(llm-wiki): port provenance markers, source hashing, and quality signals from llm-wiki-compiler (#13700) Three additive conventions inspired by github.com/atomicmemory/llm-wiki-compiler: - Paragraph-level provenance: `^[raw/articles/source.md]` markers on pages synthesizing 3+ sources, so readers can trace individual claims without re-reading full source files. - Raw source content hashing: `sha256:` in raw/ frontmatter enables re-ingest drift detection — skip unchanged sources, flag changed ones. - Optional `confidence` and `contested` frontmatter fields let lint surface weak or disputed claims without re-reading every page's prose. Lint gains two new checks (quality signals, source drift) and one expanded check (contradictions now surfaces frontmatter-flagged pages). Also adds a Related Tools section pointing users who want batch/scheduled compilation at llm-wiki-compiler (Obsidian-compatible, works on the same vault). All additions are opt-in — existing wikis need no migration. Skill version 2.0.0 -> 2.1.0. * fix(tui): don't swallow Kimi/Qwen ~! ~? kaomoji as subscript spans The inline markdown regex had `~([^~\s][^~]*?)~` for Pandoc-style subscript (H~2~O, CO~2~). On models that decorate prose with kaomoji like `thing ~!` and `cool ~?` — Kimi especially — the opener `~!` paired with the next stray `~` on the line and dim-formatted everything between them with a leading `_` character, mangling markdown output. Tighten the pattern to short alphanumeric-only content (`~[A-Za-z0-9]{1,8}~`) since real subscript never contains punctuation, spaces, or long runs. Same tightening applied to stripInlineMarkup so width measurement stays consistent. Classic CLI was unaffected because it renders these literally. * refactor(tui): clean markdown.tsx per KISS/DRY - Drop the outer no-op capture group from INLINE_RE and restructure the source as an ordered list of patterns-with-index-comments so each alternative is individually greppable. Shift group indices in MdInline down by one accordingly. - Inline single-use helpers (parseFence, isFenceClose, isMarkdownFence, trimBareUrl) and intermediate variables (path, lang, raw, prefix, body, depth, task body, setext match, etc.). - Hoist block-level regexes used inside MdImpl (FENCE_CLOSE_RE, SETEXT_RE, BULLET_RE, TASK_RE, NUMBERED_RE, QUOTE_RE) to top-level consts so they're compiled once instead of per-line. - Collapse the duplicate compact-vs-normal blank-line branches into one if/!compact gap call. - Move Fence and MdProps types to the bottom per house style. - Shorten splitTableRow → splitRow and use optional chaining in a few match sites. No behavior change; 162/162 tests pass. Net -22 LoC. * fix(tui): /resume picker shows telegram/discord/etc sessions Reported during TUI v2 blitz retest: /resume modal only surfaced tui/cli rows, even though `hermes --tui --resume <id>` with a pasted telegram session id works fine. The handler double-fetched with explicit `source="tui"` and `source="cli"` filters and dropped everything else on the floor. Drop the filter — list_sessions_rich(source=None) already excludes child sessions (subagents, compression continuations) via its default, and users want to resume messenger sessions from inside the TUI. Adds gateway regression coverage. * fix(tui): up-arrow inside a multi-line buffer moves cursor, not history Reported during TUI v2 blitz retest: typing a multi-line message with shift-Enter and then pressing Up to edit an earlier line swapped the whole buffer for the previous history entry instead of moving the cursor up a line. Down then restored the draft → the buffer appeared to "flip" between the draft and a prior prompt. `useInputHandlers` cycles history on Up/Down, but textInput only checked `inputBuf.length` — that only counts lines committed with a trailing backslash, not shift-Enter newlines inside `input` itself. Fix: detect logical lines inside the input string and move the cursor one line up/down preserving column offset (clamp to line end when the destination is shorter, standard editor behavior). Only fall through to history cycling when the cursor is already on the first line (Up) or last line (Down). Adds unit coverage for the new `lineNav` helper. * fix(tui): /history shows the TUI's own transcript, scrollable Reported during TUI v2 blitz retest: `/history` in the TUI only shows prompts from non-TUI Hermes runs and can't scroll the window. Root cause is the slash-worker subprocess: it's a detached HermesCLI that never sees the TUI's turns, so its `conversation_history` starts empty and `show_history` surfaces whatever was persisted from earlier CLI sessions — not what the user just did inside the TUI. Intercept `/history` as a local slash command so it dumps `ctx.local.getHistoryItems()` — the TUI's own transcript — routed through the pager (which scrolls after #13591). Accepts an optional preview-length argument (default 400 chars per message). Adds createSlashHandler coverage. * fix(tui): tool inline_diff renders inline with the active turn Reported during TUI v2 blitz retest: code-review diffs from tool.complete appeared at the top of the current interaction thread, out of sequence with the agent's messages and tool rows below them. Root cause — `sys(inline_diff)` appends to `historyItems`, which sits above the `StreamingAssistant` pane that renders the active turn. Until the turn closed, the diff visually floated above everything else happening in the same turn. Route the diff through `turnController.appendSegmentMessage` instead so it flushes any pending streaming text first, then lands in the segment stream beside assistant output and tool calls. On `message.complete` the segment list is committed to history in emit order (diff → final text), matching what the gateway sent. Adds a regression test that exercises tool.complete → message.complete with an inline_diff payload and asserts both the streaming and final placement. * feat(delegate): cross-agent file state coordination for concurrent subagents (#13718) * feat(models): hide OpenRouter models that don't advertise tool support Port from Kilo-Org/kilocode#9068. hermes-agent is tool-calling-first — every provider path assumes the model can invoke tools. Models whose OpenRouter supported_parameters doesn't include 'tools' (e.g. image-only or completion-only models) cannot be driven by the agent loop and fail at the first tool call. Filter them out of fetch_openrouter_models() so they never appear in the model picker (`hermes model`, setup wizard, /model slash command). Permissive when the field is missing — OpenRouter-compatible gateways (Nous Portal, private mirrors, older snapshots) don't always populate supported_parameters. Treat missing as 'unknown → allow' rather than silently emptying the picker on those gateways. Only hide models whose supported_parameters is an explicit list that omits tools. Tests cover: tools present → kept, tools absent → dropped, field missing → kept, malformed non-list → kept, non-dict item → kept, empty list → dropped. * feat(delegate): cross-agent file state coordination for concurrent subagents Prevents mangled edits when concurrent subagents touch the same file (same process, same filesystem — the mangle scenario from #11215). Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1: 1. FileStateRegistry (tools/file_state.py) — process-wide singleton tracking per-agent read stamps and the last writer globally. check_stale() names the sibling subagent in the warning when a non-owning agent wrote after this agent's last read. 2. Per-path threading.Lock wrapped around the read-modify-write region in write_file_tool and patch_tool. Concurrent siblings on the same path serialize; different paths stay fully parallel. V4A multi-file patches lock in sorted path order (deadlock-free). 3. Delegate-completion reminder in tools/delegate_tool.py: after a subagent returns, writes_since(parent, child_start, parent_reads) appends '[NOTE: subagent modified files the parent previously read — re-read before editing: ...]' to entry.summary when the child touched anything the parent had already seen. Complements (does not replace) the existing path-overlap check in run_agent._should_parallelize_tool_batch — batch check prevents same-file parallel dispatch within one agent's turn (cheap prevention, zero API cost), registry catches cross-subagent and cross-turn staleness at write time (detection). Behavior is warning-only, not hard-failing — matches existing project style. Errors surface naturally: sibling writes often invalidate the old_string in patch operations, which already errors cleanly. Tests: tests/tools/test_file_state_registry.py — 16 tests covering registry state transitions, per-path locking, per-path-not-global locking, writes_since filtering, kill switch, and end-to-end integration through the real read_file/write_file/patch handlers. * fix(tui): only cycle history at input boundaries on arrows Follow-up on #13726 from blitz feedback: Up/Down history cycling should only trigger when the caret is at the start/end boundary (or the input is empty).\n\nPreviously useInputHandlers intercepted arrows whenever inputBuf was empty, which still stole Up/Down from normal multiline editing. textInput now publishes caret position through inputSelectionStore even with no active selection, and useInputHandlers gates history/queue cycling on those boundaries. * fix(tui): keep inline diffs below tool rows and strip ANSI Follow-up on #13729 from blitz screenshot feedback.\n\n- When tool.complete carried inline_diff but no buffered assistant text existed, pending tool rows were still in streamPendingTools, so diff rendered above the tool row section. appendSegmentMessage now emits pending tool rows as a trail segment before appending the diff artifact.\n- Strip ANSI color escapes from inline_diff payloads so we don't render loud red/green terminal palettes in the transcript. * fix(tui): narrow /resume sources to human adapters Follow-up on #13724: showing literally every source was too noisy.\n\n now fetches a wider window (, larger limit) and then filters to a curated allowlist of human-facing sources (tui/cli plus chat adapters like telegram/discord/slack/whatsapp/etc). This keeps row #7 fixed (telegram sessions visible in /resume) without surfacing internal source kinds such as tool/acp. * fix(tui): arrow history fallback when no line exists Follow-up on multiline arrow behavior: Up/Down now fall back to queue/history whenever there is no logical line above/below the caret (not only at absolute start/end character positions). This makes Up from the end of the top line cycle history, matching expected readline-ish behavior. * fix(tui): render inline diffs inside assistant completion Follow-up for #13729: segment-level system artifacts still looked detached in real flow.\n\nInstead of appending inline_diff as a standalone segment/system row, queue sanitized diffs during tool.complete and append them as a fenced diff block to the assistant completion text on message.complete. This keeps the diff in the same message flow as the assistant response. * fix(tui): dedupe inline_diff when assistant already echoes it Avoid duplicate diff rendering in #13729 flow. We now skip queued inline diffs that are already present in final assistant text and dedupe repeated queued diffs by exact content. * fix(tui): keep review-diff tool rows terse When tool.complete already carries inline_diff, the assistant message owns the full diff block. Suppress the tool-row summary/detail in that case so the turn shows one detailed diff surface instead of a rich diff plus a duplicated tool-detail payload. * fix(tui): dedupe inline diffs, strip CLI review-diff header After the prior inline-diff fix, the gateway still prepends a literal " ┊ review diff" line to inline_diff (it's terminal chrome written by `_emit_inline_diff`). Wrapping that in a ```diff fence left that header inside the code block. The agent also often narrates its own edit in a second fenced diff, so the assistant message ended up stacking two diff blocks for the same change. - Strip the leading "┊ review diff" header from queued inline diffs before fencing. - Skip appending the fenced diff entirely when the assistant already wrote its own ```diff (or ```patch) fence. Keeps the single-surface diff UX even when the agent is chatty. * fix(tts): use per-provider input-character caps instead of global 4000 (#13743) A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at 4000 chars, causing long inputs to be silently chopped even though the underlying APIs allow much more: - OpenAI: 4096 - xAI: 15000 - MiniMax: 10000 - ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware) - Gemini: ~5000 - Edge: ~5000 The schema description also told the model 'Keep under 4000 characters', which encouraged the agent to self-chunk long briefs into multiple TTS calls (producing 3 separate audio files instead of one). New behavior: - PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH encode the documented per-provider limits. - _resolve_max_text_length(provider, cfg) resolves: 1. tts.<provider>.max_text_length user override 2. ElevenLabs model_id lookup 3. provider default 4. 4000 fallback - text_to_speech_tool() and stream_tts_to_speaker() both call the resolver; old MAX_TEXT_LENGTH alias kept for back-compat. - Schema description no longer hardcodes 4000. Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253 voice-command/voice-cli tests still pass. * feat(skills): add baoyu-comic skill * refactor(skills): adapt baoyu-comic for Hermes Port the upstream baoyu-comic skill to Hermes' tool ecosystem, matching the earlier baoyu-infographic adaptation: - metadata namespace openclaw -> hermes (+ tags, homepage) - drop EXTEND.md preferences system (references/config/ removed, workflow Step 1.1 removed) - user prompts via clarify (one question at a time) instead of AskUserQuestion batches - image generation via image_generate instead of baoyu-imagine, with aspect-ratio mapping to landscape/portrait/square - Windows/PowerShell/WSL shell snippets dropped - file I/O referenced via Hermes write_file/read_file tools - CLI-style --flags converted to natural-language options and user-intent cues (skill matching has no slash command trigger) Add PORT_NOTES.md documenting the adaptations and a sync procedure. Art-style/tone/layout reference files are preserved verbatim from upstream v1.56.1. * fix(skills): address baoyu-comic PR review - Remove PDF merge feature and scripts/ directory (no pdf-lib dep) - Correct image_generate docs: prompt-only, returns URL; add curl download step after every call - Downgrade reference images to text-based trait extraction (style/palette/scene); character sheet is agent-facing reference - Unify source file naming on source-{slug}.md across SKILL.md and workflow.md * fix(skills): clarify baoyu-comic character sheet role Page prompts are written in Step 5 from the text descriptions in characters/characters.md — the PNG sheet generated in Step 7.1 cannot be used to write them. Reposition the PNG as a human-facing review artifact (and reference for later regenerations / manual edits), and drop the confusing "Character sheet | Strategy" tables since the embedding rule is uniform. * docs: document delegation width + depth knobs (#13745) Fills the three gaps left by the orchestrator/width-depth salvage: - configuration.md §Delegation: max_concurrent_children, max_spawn_depth, orchestrator_enabled are now in the canonical config.yaml reference with a paragraph covering defaults, clamping, role-degradation, and the 3x3x3=27-leaf cost scaling. - environment-variables.md: adds DELEGATION_MAX_CONCURRENT_CHILDREN to the Agent Behavior table. - features/delegation.md: corrects stale 'default 5, cap 8' wording (that was from the original PR; the salvage landed on default 3 with no ceiling and a tool error on excess instead of truncation). * fix(website): run skill extraction automatically on npm run build/start (#13747) website/src/pages/skills/index.tsx imports ../../data/skills.json, but that file is git-ignored and generated at build time by website/scripts/extract-skills.py. CI workflows (deploy-site.yml, docs-site-checks.yml) run the script explicitly before 'npm run build', so production and PR checks always work — but 'npm run build' on a contributor's machine fails with: Module not found: Can't resolve '../../data/skills.json' because the extraction step was never wired into the npm scripts. Adds a prebuild/prestart hook that runs extract-skills.py automatically. If python3 or pyyaml aren't installed locally, writes an empty skills.json instead of hard-failing — the Skills Hub page renders with an empty state, the rest of the site builds normally, and CI (which always has the deps) still generates the full catalog for production. * fix(skills/baoyu-comic): absolute curl paths + clarify-timeout handling (#13775) * fix(skills/baoyu-comic): require absolute paths for curl -o downloads When downloading generated images across several batches of image_generate calls, relying on persistent-shell CWD is unsafe. The terminal tool's shell can rotate (TERMINAL_LIFETIME_SECONDS expiry, a failed cd that leaves the shell somewhere else), and 'curl -fsSL <url> -o relative.png' then silently writes to the wrong directory with no error. Update the skill's Step 7 Download step to require absolute -o paths (or workdir= on the terminal tool) and add a matching pitfall entry referencing the Apr 2026 incident where pages 06-09 of a 10-page comic landed at the repo root instead of comic/<slug>/. The agent then spent several turns claiming the files existed where they didn't. * fix(skills/baoyu-comic): handle clarify timeouts correctly in Step 2 A clarify timeout returning 'Use your best judgement to make the choice and proceed' is NOT user consent to default the entire Step 2 questionnaire. It is a per-question default only. Add guidance at both instruction sites (SKILL.md User Questions section, references/workflow.md Step 2 header) telling the agent to: 1. Continue asking the remaining questions in the sequence after a timeout — each question is an independent consent point. 2. Surface every defaulted choice in the next user-visible message so the user can correct it when they return. An unreported default is indistinguishable from never having asked. Reported live Apr 2026: agent asked style question via clarify, got a timeout response, and silently defaulted style + narrative focus + audience + review flags in one pass. User only learned style had defaulted to 'ohmsha' after the comic was fully generated. * fix(prompt): tell CLI agents not to emit MEDIA:/path tags (#13766) The CLI has no attachment channel — MEDIA:<path> tags are only intercepted on messaging gateway platforms (Telegram, Discord, Slack, WhatsApp, Signal, BlueBubbles, email, etc.). On the CLI they render as literal text, which is confusing for users. The CLI platform hint was the one PLATFORM_HINTS entry that said nothing about file delivery, so models trained on the messaging hints would default to MEDIA: tags on the CLI too. Tool schemas (browser_tool, tts_tool, etc.) also recommend MEDIA: generically. Extend the CLI hint to explicitly discourage MEDIA: tags and tell the agent to reference files by plain absolute path instead. Add a regression test asserting the CLI hint carries negative guidance about MEDIA: while messaging hints keep positive guidance. * fix: add User-Agent claude-code/0.1.0 for Kimi /coding endpoint - Add _is_kimi_coding_endpoint() to detect Kimi coding API - Place Kimi check BEFORE _requires_bearer_auth to ensure User-Agent header is set - Without this header, Kimi returns 403 on /coding/v1/messages - Fixes kimi-2.5, kimi-for-coding, kimi-k2.6-code-preview all returning 403 * fix: auto-detect anthropic_messages mode for Kimi /coding/v1 endpoints * fix(kimi-coding): add KIMI_CODING_API_KEY fallback + api_mode detection for /coding endpoint * fix(kimi-coding): set anthropic_messages api_mode for /coding endpoint * fix: Update Kimi Coding API endpoint and User-Agent * fix: Enhance Kimi Coding API mode detection and User-Agent * fix(kimi): reconcile sk-kimi- routing with Anthropic SDK URL semantics Follow-ups after salvaging xiaoqiang243's kimi-for-coding patches: - KIMI_CODE_BASE_URL: drop trailing /v1 (was /coding/v1). The /coding endpoint speaks Anthropic Messages, and the Anthropic SDK appends /v1/messages internally. /coding/v1 + SDK suffix produced /coding/v1/v1/messages (a 404). /coding + SDK suffix now yields /coding/v1/messages correctly. - kimi-coding ProviderConfig: keep legacy default api.moonshot.ai/v1 so non-sk-kimi- moonshot keys still authenticate. sk-kimi- keys are already redirected to api.kimi.com/coding via _resolve_kimi_base_url. - doctor.py: update Kimi UA to claude-code/0.1.0 (was KimiCLI/1.30.0) and rewrite /coding base URLs to /coding/v1 for the /models health check (Anthropic surface has no /models). - test_kimi_env_vars: accept KIMI_CODING_API_KEY as a secondary env var. E2E verified: sk-kimi-<key> → https://api.kimi.com/coding/v1/messages (Anthropic) sk-<legacy> → https://api.moonshot.ai/v1/chat/completions (OpenAI) UA: claude-code/0.1.0, x-api-key: <sk-kimi-*> * chore(release): map xiaoqiang243 personal email in AUTHOR_MAP * feat: add ResponsesApiTransport + wire all Codex transport paths Add ResponsesApiTransport wrapping codex_responses_adapter.py behind the ProviderTransport ABC. Auto-registered via _discover_transports(). Wire ALL Codex transport methods to production paths in run_agent.py: - build_kwargs: main _build_api_kwargs codex branch (50 lines extracted) - normalize_response: main loop + flush + summary + retry (4 sites) - convert_tools: memory flush tool override - convert_messages: called internally via build_kwargs - validate_response: response validation gate - preflight_kwargs: request sanitization (2 sites) Remove 7 dead legacy wrappers from AIAgent (_responses_tools, _chat_messages_to_responses_input, _normalize_codex_response, _preflight_codex_api_kwargs, _preflight_codex_input_items, _extract_responses_message_text, _extract_responses_reasoning_text). Keep 3 ID manipulation methods still used by _build_assistant_message. Update 18 test call sites across 3 test files to call adapter functions directly instead of through deleted AIAgent wrappers. 24 new tests. 343 codex/responses/transport tests pass (0 failures). PR 4 of the provider transport refactor. * fix(delegation): add hard timeout and stale detection for subagent execution (#13770) - Wrap child.run_conversation() in a ThreadPoolExecutor with configurable timeout (delegation.child_timeout_seconds, default 300s) to prevent indefinite blocking when a subagent's API call or tool HTTP request hangs. - Add heartbeat stale detection: if a child's api_call_count doesn't advance for 5 consecutive heartbeat cycles (~2.5 min), stop touching the parent's activity timestamp so the gateway inactivity timeout can fire as a last resort. - Add 'timeout' as a new exit_reason/status alongside the existing completed/max_iterations/interrupted states. - Use shutdown(wait=False) on the timeout executor to avoid the ThreadPoolExecutor.__exit__ deadlock when a child is stuck on blocking I/O. Closes #13768 * remove Nous Portal free-model allowlist Drop _NOUS_ALLOWED_FREE_MODELS + filter_nous_free_models and its two call sites. Whatever Nous Portal prices as free now shows up in the picker as-is — no local allowlist gatekeeping. Free-tier partitioning (paid vs free in the menu) still runs via partition_nous_models_by_tier. * feat(aux): use Portal /api/nous/recommended-models for auxiliary models Wire the auxiliary client (compaction, vision, session search, web extract) to the Nous Portal's curated recommended-models endpoint when running on Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for pricing. hermes_cli/models.py - fetch_nous_recommended_models(portal_base_url, force_refresh=False) 10-minute TTL cache, keyed per portal URL (staging vs prod don't collide). Public endpoint, no auth required. Returns {} on any failure so callers always get a dict. - get_nous_recommended_aux_model(vision, free_tier=None, ...) Tier-aware pick from the payload: - Paid tier → paidRecommended{Vision,Compaction}Model, falling back to freeRecommended* when the paid field is null (common during staged rollouts of new paid models). - Free tier → freeRecommended* only, never leaks paid models. When free_tier is None, auto-detects via the existing check_nous_free_tier() helper (already cached 3 min against /api/oauth/account). Detection errors default to paid so we never silently downgrade a paying user. agent/auxiliary_client.py — _try_nous() - Replaces the hardcoded xiaomi/mimo free-tier branch with a single call to get_nous_recommended_aux_model(vision=vision). - Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the Portal is unreachable or returns a null recommendation. - The Portal is now the source of truth for aux model selection; the xiaomi allowlist we used to carry is effectively dead. Tests (15 new) - tests/hermes_cli/test_models.py::TestNousRecommendedModels Fetch caching, per-portal keying, network failure, force_refresh; paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid, auto-detect, detection-error → paid default, null/blank modelName handling. - tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh _try_nous honors Portal recommendation for text + vision, falls back to google/gemini-3-flash-preview on None or exception. Behavior won't visibly change today — both tier recommendations currently point at google/gemini-3-flash-preview — but the moment the Portal ships a better paid recommendation, subscribers pick it up within 10 minutes without a Hermes release. * feat: add ChatCompletionsTransport + wire all default paths Third concrete transport — handles the default 'chat_completions' api_mode used by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama, DeepSeek, xAI, Kimi, custom, etc.). Wires build_kwargs + validate_response to production paths. Based on PR #13447 by @kshitijk4poor, with fixes: - Preserve tool_call.extra_content (Gemini thought_signature) via ToolCall.provider_data — the original shim stripped it, causing 400 errors on multi-turn Gemini 3 thinking requests. - Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so the thinking-prefill retry check (_has_structured) still triggers. - Port Kimi/Moonshot quirks (32000 max_tokens, top-level reasoning_effort, extra_body.thinking) that landed on main after the original PR was opened. - Keep _qwen_prepare_chat_messages_inplace alive and call it through the transport when sanitization already deepcopied (avoids a second deepcopy). - Skip the back-compat SimpleNamespace shim in the main normalize loop — for chat_completions, response.choices[0].message is already the right shape with .content/.tool_calls/.reasoning/.reasoning_content/.reasoning_details and per-tool-call .extra_content from the OpenAI SDK. run_agent.py: -239 lines in _build_api_kwargs default branch extracted to the transport. build_kwargs now owns: codex-field sanitization, Qwen portal prep, developer role swap, provider preferences, max_tokens resolution (ephemeral > user > NVIDIA 16384 > Qwen 65536 > Kimi 32000 > anthropic_max_output), Kimi reasoning_effort + extra_body.thinking, OpenRouter/Nous/GitHub reasoning, Nous product attribution tags, Ollama num_ctx, custom-provider think=false, Qwen vl_high_resolution_images, request_overrides. 39 new transport tests (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize including extra_content regression, 3 cache stats, 3 basic). Tests/run_agent/ targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the test_concurrent_interrupt flake present on origin/main). * fix(tui): don't force-open Activity on every error Reverts the auto-expand-on-new-error effect added in 93b47d96. The effect overrode the user's chosen detailsMode and visually interrupted every turn. Red/yellow chevron tint remains as the passive signal — click to read, just like Thinking and Tool calls. * fix(tui): demote gateway log-noise from Activity to info tone Restore the old-CLI contract where only complete failures tint Activity red. Everything else is still visible for debugging but no longer commandeers attention. - gateway.stderr: always tone='info' (drops the ERRLIKE_RE regex) - gateway.protocol_error: both pushes demoted to 'info' - commands.catalog cold-start failure: demoted to 'info' - approval.request: no longer duplicates the overlay into Activity Kept as 'error': terminal `error` event, gateway.start_timeout, gateway-exited, explicit status.update kinds. * feat: add BedrockTransport + wire all Bedrock transport paths Fourth and final transport — completes the transport layer with all four api_modes covered. Wraps agent/bedrock_adapter.py behind the ProviderTransport ABC, handles both raw boto3 dicts and already-normalized SimpleNamespace. Wires all transport methods to production paths in run_agent.py: - build_kwargs: _build_api_kwargs bedrock branch - validate_response: response validation, new bedrock_converse branch - finish_reason: new bedrock_converse branch in finish_reason extraction Based on PR #13467 by @kshitijk4poor, with one adjustment: the main normalize loop does NOT add a bedrock_converse branch to invoke normalize_response on the already-normalized response. Bedrock's normalize_converse_response runs at the dispatch site (run_agent.py:5189), so the response already has the OpenAI-compatible .choices[0].message shape by the time the main loop sees it. Falling through to the chat_completions else branch is correct and sidesteps a redundant NormalizedResponse rebuild. Transport coverage — complete: | api_mode | Transport | build_kwargs | normalize | validate | |--------------------|--------------------------|:------------:|:---------:|:--------:| | anthropic_messages | AnthropicTransport | ✅ | ✅ | ✅ | | codex_responses | ResponsesApiTransport | ✅ | ✅ | ✅ | | chat_completions | ChatCompletionsTransport | ✅ | ✅ | ✅ | | bedrock_converse | BedrockTransport | ✅ | ✅ | ✅ | 17 new BedrockTransport tests pass. 117 transport tests total pass. 160 bedrock/converse tests across tests/agent/ pass. Full tests/run_agent/ targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the pre-existing test_concurrent_interrupt flake on origin/main). * chore(models): drop 3 models from nous portal recommended list (#13822) Remove nvidia/nemotron-3-super-120b-a12b:free, arcee-ai/trinity-large-preview:free, and openrouter/elephant-alpha from _PROVIDER_MODELS['nous']. The paid nemotron and arcee-thinking variants remain. * fix(kimi): don't send Anthropic thinking to api.kimi.com/coding (#13826) Kimi's /coding endpoint speaks the Anthropic Messages protocol but has its own thinking semantics: when thinking.enabled is sent, Kimi validates the history and requires every prior assistant tool-call message to carry OpenAI-style reasoning_content. The Anthropic path never populates that field, and convert_messages_to_anthropic strips Anthropic thinking blocks on third-party endpoints — so after one tool-calling turn the next request fails with: HTTP 400: thinking is enabled but reasoning_content is missing in assistant tool call message at index N Kimi on chat_completions handles thinking via extra_body in ChatCompletionsTransport (#13503). On the Anthropic route, drop the parameter entirely and let Kimi drive reasoning server-side. build_anthropic_kwargs now gates the reasoning_config -> thinking block on not _is_kimi_coding_endpoint(base_url). Tests: 8 new parametric tests cover /coding, /coding/v1, /coding/anthropic, /coding/ (trailing slash), explicit disabled, other third-party endpoints still getting thinking (MiniMax), native Anthropic unaffected, and the non-/coding Kimi root route. * feat(models): add minimax/minimax-m2.5:free to OpenRouter catalog (#13836) Surfaces the free variant alongside the paid minimax-m2.5 entry in both the OPENROUTER_MODELS fallback snapshot and the nous/openrouter provider model list. * feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) * feat(plugins): pluggable image_gen backends + OpenAI provider Adds a ImageGenProvider ABC so image generation backends register as bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner gains three primitives to make this work generically: - `kind:` manifest field (`standalone` | `backend` | `exclusive`). Bundled `kind: backend` plugins auto-load — no `plugins.enabled` incantation. User-installed backends stay opt-in. - Path-derived keys: `plugins/image_gen/openai/` gets key `image_gen/openai`, so a future `tts/openai` cannot collide. - Depth-2 recursion into category namespaces (parent dirs without a `plugin.yaml` of their own). Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5 default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64 responses save to `$HERMES_HOME/cache/images/`; URL responses pass through. FAL stays in-tree for this PR — a follow-up ports it into `plugins/image_gen/fal/` so the in-tree `image_generation_tool.py` slims down. The dispatch shim in `_handle_image_generate` only fires when `image_gen.provider` is explicitly set to a non-FAL value, so existing FAL setups are untouched. - 41 unit tests (scanner recursion, kind parsing, gate logic, registry, OpenAI payload shapes) - E2E smoke verified: bundled plugin autoloads, registers, and `_handle_image_generate` routes to OpenAI when configured * fix(image_gen/openai): don't send response_format to gpt-image-* The live API rejects it: 'Unknown parameter: response_format' (verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return b64_json unconditionally, so the parameter was both unnecessary and actively broken. * feat(image_gen/openai): gpt-image-2 only, drop legacy catalog gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21) and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 / dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward (dall-e-2 squares only). Trim the catalog down to a single model. Live-verified end-to-end: landscape 1536x1024 render of a Moog-style synth matches prompt exactly, 2.4MB PNG saved to cache. * feat(image_gen/openai): expose gpt-image-2 as three quality tiers Users pick speed/fidelity via the normal model picker instead of a hidden quality knob. All three tier IDs resolve to the single underlying gpt-image-2 API model with a different quality parameter: gpt-image-2-low ~15s fast iteration gpt-image-2-medium ~40s default gpt-image-2-high ~2min highest fidelity Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the same 1024x1024 prompt. Config: image_gen.openai.model: gpt-image-2-high # or image_gen.model: gpt-image-2-low # or env var for scripts/tests OPENAI_IMAGE_MODEL=gpt-image-2-medium Live-verified end-to-end with the low tier: 18.8s landscape render of a golden retriever in wildflowers, vision-confirmed exact match. * feat(tools_config): plugin image_gen providers inject themselves into picker 'hermes tools' → Image Generation now shows plugin-registered backends alongside Nous Subscription and FAL.ai without tools_config.py needing to know about them. OpenAI appears as a third option today; future backends appear automatically as they're added. Mechanism: - ImageGenProvider gains an optional get_setup_schema() hook (name, badge, tag, env_vars). Default derived from display_name. - tools_config._plugin_image_gen_providers() pulls the schemas from every registered non-FAL plugin provider. - _visible_providers() appends those rows when rendering the Image Generation category. - _configure_provider() handles the new image_gen_plugin_name marker: writes image_gen.provider and routes to the plugin's list_models() catalog for the model picker. - _toolset_needs_configuration_prompt('image_gen') stops demanding a FAL key when any plugin provider reports is_available(). FAL is skipped in the plugin path because it already has hardcoded TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up PR the hardcoded rows go away and it surfaces through the same path as OpenAI. Verified live: picker shows Nous Subscription / FAL.ai / OpenAI. Picking OpenAI prompts for OPENAI_API_KEY, then shows the gpt-image-2-low/medium/high model picker sourced from the plugin. 397 tests pass across plugins/, tools_config, registry, and picker. * fix(image_gen): close final gaps for plugin-backend parity with FAL Two small places that still hardcoded FAL: - hermes_cli/setup.py status line: an OpenAI-only setup showed 'Image Generation: missing FAL_KEY'. Now probes plugin providers and reports '(OpenAI)' when one is_available() — or falls back to 'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured. - image_generate tool schema description: said 'using FAL.ai, default FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are user-configured' — and notes the 'image' field can be a URL or an absolute path, which the gateway delivers either way via extract_local_files(). * feat: add Step Plan provider support (salvage #6005) Adds a first-class 'stepfun' API-key provider surfaced as Step Plan: - Support Step Plan setup for both International and China regions - Discover Step Plan models live from /step_plan/v1/models, with a small coding-focused fallback catalog when discovery is unavailable - Thread StepFun through provider metadata, setup persistence, status and doctor output, auxiliary routing, and model normalization - Add tests for provider resolution, model validation, metadata mapping, and StepFun region/model persistence Based on #6005 by @hengm3467. Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com> * fix(packaging): include agent.* sub-packages in pyproject.toml The transport refactor (PRs #13862 ff.) added agent/transports/ as a sub-package but the setuptools packages.find include list only had "agent" (top-level files), not "agent.*" (sub-packages). pip install / Nix builds therefore ship run_agent.py (which now imports from agent.transports on every API call) but omit the transports directory entirely, causing: ModuleNotFoundError: No module named 'agent.transports' on every LLM call for packaged installs. Adds "agent.*" to match the existing pattern used by tools, gateway, tui_gateway, and plugins. * fix: preserve reasoning_content on Kimi replay * feat(optional-skills): add page-agent skill under new web-development category (#13976) Adds an optional skill that walks users through installing and using alibaba/page-agent — a pure-JS in-page GUI agent that web developers embed into their own webapps so end users can drive the UI with natural language. Three install paths: CDN demo (30s, no install), npm install into an existing app with provider config table (Qwen/OpenAI/Ollama/OpenRouter), and clone-from-source for dev/contributor workflow. Clear use-case framing up front (embed AI copilot in SaaS/admin/B2B, modernize legacy UIs, accessibility via natural language) and an explicit NOT-for list that points users wanting server-side browser automation back to Hermes' built-in browser tool. Live-verified: repo builds on Node 22.22 + npm 10.9, dev:demo serves at localhost:5174, API surface (new PageAgent{...}, panel.show(), execute(task)) matches what the skill documents. Also verified discovery end-to-end via OptionalSkillSource with isolated HERMES_HOME — search/inspect/fetch all resolve official/web-development/page-agent correctly. New category directory: optional-skills/web-development/ with a DESCRIPTION.md explaining the distinction from Hermes' own browser automation (outside-in vs inside-out). * feat(wecom): add QR scan flow and interactive setup wizard for bot credentials * docs(wecom): document QR scan-to-create setup flow * fix(wecom): visible poll progress + clearer no-bot-info failure + docstring note Follow-ups on top of salvaged #13923 (@keifergu): - Print QR poll dot every 3s instead of every 18s so "Fetching configuration results..." doesn't look hung. - On "status=success but no bot_info" from the WeCom query endpoint, log the full payload at WARNING and tell the user we're falling back to manual entry (was previously a single opaque line). - Document in the qr_scan_for_bot_info() docstring that the work.weixin.qq.com/ai/qc/* endpoints are the admin-console web-UI flow, not the public developer API, and may change without notice. Also add keifergu@tencent.com to scripts/release.py AUTHOR_MAP so release notes attribute the feature correctly. * feat(state): auto-prune old sessions + VACUUM state.db at startup (#13861) * feat(state): auto-prune old sessions + VACUUM state.db at startup state.db accumulates every session, message, and FTS5 index entry forever. A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM brought it to 43MB. The prune command and VACUUM are not wired to run automatically anywhere — sessions grew unbounded until users noticed. Changes: - hermes_state.py: new state_meta key/value table, vacuum() method, and maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in state_meta so it only actually executes once per min_interval_hours across all Hermes processes for a given HERMES_HOME. Never raises. - hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG (auto_prune=True, retention_days=90, vacuum_after_prune=True, min_interval_hours=24). Added to _KNOWN_ROOT_KEYS. - cli.py: call maintenance once at HermesCLI init (shared helper _run_state_db_auto_maintenance reads config and delegates to DB). - gateway/run.py: call maintenance once at GatewayRunner init. - Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section. Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays bloated forever. VACUUM only runs when the prune actually removed rows, so tight DBs don't pay the I/O cost. Tests: 10 new tests in tests/test_hermes_state.py covering state_meta, vacuum, idempotency, interval skipping, VACUUM-only-when-needed, corrupt-marker recovery. All 246 existing state/config/gateway tests still pass. Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG exposes the new block, load_config() returns it for fresh installs, first call prunes+vacuums, second call within min_interval_hours skips, and the state_meta marker persists across connection close/reopen. * sessions.auto_prune defaults to false (opt-in) Session history powers session_search recall across past conversations, so silently pruning on startup could surprise users. Ship the machinery disabled and let users opt in when they notice state.db is hurting performance. - DEFAULT_CONFIG.sessions.auto_prune: True → False - Call-site fallbacks in cli.py and gateway/run.py match the new default (so unmigrated configs still see off) - Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff * feat(hindsight): richer session-scoped retain metadata - Add configurable retain_tags / retain_source / retain_user_prefix / retain_assistant_prefix knobs for native Hindsight. - Thread gateway session identity (user_name, chat_id, chat_name, chat_type, thread_id) through AIAgent and MemoryManager into MemoryProvider.initialize kwargs so providers can scope and tag retained memories. - Hindsight attaches the new identity fields as retain metadata, merges per-call tool tags with configured default tags, and uses the configurable transcript labels for auto-retained turns. Co-authored-by: Abner <abner.the.foreman@agentmail.to> * chore(release): map Abner email to Abnertheforeman * refactor(qqbot): migrate qr onboard flow to sync + consolidate into onboard.py - Replace async create_bind_task/poll_bind_result with synchronous httpx.Client equivalents, eliminating manual event loop management - Move _render_qr and full qr_register() entry-point into onboard.py, mirroring the Feishu onboarding pattern - Remove _qqbot_render_qr and _qqbot_qr_flow from gateway.py (~90 lines); call site becomes a single qr_register() import - Fix potential segfault: previous code called loop.close() in the EXPIRED branch and again in the finally block (double-close crashed under uvloop) * fix(cli): ensure project .env is sanitized before loading * chore(release): map hharry11 email to GitHub handle * feat(dashboard): track real API call count per session Adds schema v7 'api_call_count' column. run_agent.py increments it by 1 per LLM API call, web_server analytics SQL aggregates it, frontend uses the real counter instead of summing sessions. The 'API Calls' card on the analytics dashboard previously displayed COUNT(*) from the sessions table — the number of conversations, not LLM requests. Each session makes 10-90 API calls through the tool loop, so the reported number was ~30x lower than real. Salvaged from PR #10140 (@kshitijk4poor). The cache-token accuracy portions of the original PR were deferred — per-provider analytics is the better path there, since cache_write_tokens and actual_cost_usd are only reliably available from a subset of providers (Anthropic native, Codex Responses, OpenRouter with usage.include). Tests: - schema_version v7 assertion - migration v2 -> v7 adds api_call_count column with default 0 - update_token_counts increments api_call_count by provided delta - absolute=True sets api_call_count directly - /api/analytics/usage exposes total_api_calls in totals * fix(plugins+nous): auto-coerce memory plugins; actionable Nous 401 diagnostic (#14005) * fix(plugins): auto-coerce user-installed memory plugins to kind=exclusive User-installed memory provider plugins…

Sequences: #1 (Xtheirs) → #8 (taxonomy) → #2 (retry context) → #6 (softer verify) → #5 (backend adapter) → #3 (bg companions) → #4 (candidate refill) → #7 (CSV audit). Each is one commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures Tier 3 tooling work that was considered during the 2026-04-26 overhaul but deferred. Each section explains the design, why it wasn't shipped, and the trigger condition that should reopen it. Tier 3 items deferred: NousResearch#7 Lazy MCP discovery — high refactor surface, low immediate value (gateway RAM/startup not currently constrained). NousResearch#8 Health-aware tool router — overlaps with already-shipped manual fallback (Tavily→DDG/httpx); marginal gain doesn't justify scaffolding. NousResearch#10 Batch tool API — concurrent path + within-turn dedup cache already mitigate the common case; would be disruptive. Each section includes a reference design so future work can pick up without re-deriving rationale, plus a "reopen when..." condition tied to observable signals (memory > 4GB, fallback rate > 30%, etc.) so the weekly health check (or operator) can flag when revisit is justified. Tier 3 NousResearch#9 (LLM-summarize large results) was shipped in 9b3cef1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Split in-flight assistant text at the last stable block boundary so only the unclosed tail re-tokenizes per stream delta. Previously the full text was rendered as plain <Text> during streaming and only flipped to <Md> at message.complete — cheap per delta but loses live markdown formatting. New StreamingMd component holds a monotonically-growing stablePrefix in a ref (idempotent under StrictMode double-render), renders it as one <Md> that memoizes across deltas, and renders the unstable suffix as a second <Md> that re-parses on each delta. Cost per delta drops from O(total length) to O(unstable length). findStableBoundary walks back to the last "\n\n" outside an open fenced code block — splitting inside an open fence would orphan the opener and break highlighting in the prefix. Adapted from claude-code's src/components/Markdown.tsx:186 but built on our line-based tokenizer instead of marked.lexer. 9 new tests cover fence balance, boundary walk, and empty input. Part of the --tui perf audit (see audit #7).

some cleanups

…etection, workflow report, parallel batch import Five items from the Texas Senate review. 1. Bug fix — /api-key now appears in the TUI catalog/autocomplete. The slash was wired locally in core.ts (and worked when typed) but was missing from the server-side hermes_cli/commands.COMMAND_REGISTRY that feeds commands.catalog. Added a CommandDef("api-key", …) entry (aliases apikey/api-keys/keys, subcommands list/show/set/unset) and listed "api-key" in FORECAST_DESK_SUBCOMMANDS so /forecast api-key also tab-completes. 2. Watched-source auto-attach (NousResearch#2). New auto_watch boolean on forecast_ledger.import_source_evidence: after a successful import, attach the (source_type, source) tuple as a watched source on the question, deduped against existing watches. The response now includes watched_source + auto_watch_note so the agent sees whether the watch was new or already there. Future reruns start from a known identifier instead of broad search. 3. Blocked-source detection (NousResearch#6). New detect_block_page() recognises Cloudflare challenges, DataDome / PerimeterX / Imperva walls, cookie / JS-required pages, and generic CAPTCHAs disguised as HTTP 200. _archive_url_evidence_ snapshot now runs the detector after fetching and writes blocked + reason + signal into both the snapshot metadata file AND the evidence row's top-level metadata. show_question / list_evidence and the workflow report (below) can surface "blocked" without opening the snapshot file. Patterns are conservative; the matched signal is exposed for audit. 4. workflow_report action (NousResearch#5). forecast_ledger action="workflow_report" aggregates one question's recent ledger activity into a compact report: evidence by source_type + by domain, blocked counts/reasons/items, snapshot probability deltas, watched-source / model-run / alert / postmortem breakdowns, a chronological timeline (capped via limit_timeline), and a heuristic `suggestions` list (e.g. "X blocked rows — prefer the structured adapter", "FRED in use but FRED_API_KEY not set", "unwatched repeated domain — pass auto_watch=true"). Designed for postmortems and to spot what slowed a session. 5. Parallel batch import (NousResearch#7). New action="import_source_evidence_batch" takes sources=[{source_type, source, …}, …] plus concurrency (default 4, capped 8) and fetches every entry concurrently in a ThreadPoolExecutor, then writes evidence rows sequentially in the main thread (keeps SQLite single-writer). Per-source failures are captured in results without aborting the batch. Verified: 4 FRED series — sum of individual fetches 4.27s, wall 1.23s ≈ 3.5× speedup. auto_watch propagates per-source with the same dedup. Tests: 8 new tool tests (auto_watch attach + dedup + off, workflow_report aggregation + blocked suggestions, batch parallelism + per-source error isolation + empty rejection) + the ledger blocked-URL test; full forecasting + hermes_cli + ui-tui suites green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

TheAlanMS · 2026-06-02T00:18:47Z

@codex please review the latest commit ee8e6c9 which fixes the proposed_order preservation issue in run_generation()

teknium and others added 2 commits November 5, 2025 03:47

some cleanups

c82741c

Merge branch 'main' into test

4135cf4

teknium1 merged commit 69fd0ca into main Nov 5, 2025

heyalchang mentioned this pull request Mar 12, 2026

feat: payload visualization dashboard heyalchang/hermes-agent#1

Merged

6 tasks

teknium1 mentioned this pull request Mar 17, 2026

fix(tools): browser handlers TypeError on unexpected LLM params + fuzzy_match docstring #1735

Merged

sudo-yf pushed a commit to sudo-yf/hermes-agent that referenced this pull request Apr 5, 2026

Merge pull request NousResearch#40 from nesquena/sprint-21-mobile-docker

af73a5d

Sprint 21: Mobile responsive layout + Docker support (Issues NousResearch#21, NousResearch#7)

aaronlab mentioned this pull request Apr 9, 2026

fix: strip mcp_ prefix in auxiliary client, log JSON parse fallback and shutdown message loss #6635

Open

4 tasks

h4x3rotab referenced this pull request in Clawdi-AI/hermes-agent Apr 10, 2026

Merge pull request #7 from outsourc-e/phase4.2-pinned-models

25ff9dd

Phase4.2 pinned models

malaiwah pushed a commit to malaiwah/hermes-agent that referenced this pull request Apr 11, 2026

Merge pull request 'feat(docker): re-read docker_env_files on every e…

3ba1a2d

…xec, not only at spawn' (NousResearch#7) from feat/docker-env-files-per-exec into main

falses00 mentioned this pull request Apr 15, 2026

Code Review Refactoring Implementations (P0-P2) #10445

Open

kshitijk4poor mentioned this pull request Apr 18, 2026

[Bug]: Compression trigger includes reasoning tokens, causing premature session splits for thinking models (GLM-5.1, QwQ, etc.) #12026

Closed

briandevans mentioned this pull request Apr 21, 2026

fix(telegram): log document/video send failures instead of printing to stdout (#13356) #13387

Closed

OutThisLife mentioned this pull request Apr 21, 2026

fix(tui): v2 blitz-test UX pack #13589

Closed

8 tasks

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026

Merge pull request NousResearch#7 from NousResearch/test

dafa055

some cleanups

fchaudhryspear mentioned this pull request May 24, 2026

Add Antfarm operator bridge command #31295

Open

This was referenced May 24, 2026

🔒 Security Audit: 9 confirmed issues in hermes-agent #31408

Open

🔒 Security Audit: 14 confirmed issues in hermes-agent #31715

Open

Interstellar-code mentioned this pull request May 28, 2026

[a2a_fleet] MEDIUM: relocate tests to tests/plugins/ and cover sync-register + auth-default paths #34167

Closed

midtskog mentioned this pull request May 30, 2026

docs: add native-Windows + AMD Strix Halo deployment field notes #35564

Open

This was referenced Jun 1, 2026

Docs: warn that the Hetzner (and similar) browser console mangles special chars; recommend SSH #36279

Closed

[Setup]: #22812

Closed

dsameer0-code mentioned this pull request Jun 1, 2026

ddgs web search provider hangs indefinitely — no overall timeout on search calls #36776

Open

ricardocamiloconsir mentioned this pull request Jun 2, 2026

feat(gateway): session model pool — concurrency-aware auto-assignment with auxiliary slot tracking #37519

Open

alt-glitch mentioned this pull request Jun 4, 2026

fix(installer): symlink bundled node/npm into command bin dir for FHS root installs #38889

Merged

3 tasks

jarvis-stark-ops mentioned this pull request Jun 7, 2026

feat(gateway): dispatcher heartbeat — detect silent stalls from outside #41588

Closed

3 tasks

cristianmgm7 mentioned this pull request Jun 10, 2026

feat(platforms): add Carbon Voice as a native messaging platform #43226

Open

AutomalyRo mentioned this pull request Jun 10, 2026

[Bug]: OpenAI Codex usage completely Broken/Being Treated as Custom API #43461

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some cleanups#7

some cleanups#7
teknium1 merged 2 commits into
mainfrom
test

teknium1 commented Nov 5, 2025

Uh oh!

TheAlanMS commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented Nov 5, 2025

Uh oh!

TheAlanMS commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants