Skip to content

fix(kanban): enforce completion-evidence invariant — block phantom-done#1

Merged
Brecht-H merged 1821 commits into
mainfrom
orion-cc/completion-evidence-invariant
May 16, 2026
Merged

fix(kanban): enforce completion-evidence invariant — block phantom-done#1
Brecht-H merged 1821 commits into
mainfrom
orion-cc/completion-evidence-invariant

Conversation

@Brecht-H

Copy link
Copy Markdown
Owner

Root cause

complete_task() in hermes_cli/kanban_db.py writes status='done', result=?
with result defaulting to None and no evidence check. A caller doing
complete_task(conn, id) with no result, no summary, and no comment produces a
phantom-done task — status=done with zero audit trail of what was
delivered. 59 live tasks are currently in this state (mostly the daily
lane: 22; researcher: 16; mac: 9).

The guard

  • New CompletionEvidenceError(ValueError) + _has_completion_evidence() helper.
  • Before the main write txn (mirroring the existing HallucinatedCardsError
    created_cards gate), complete_task verifies the task has ANY of:
    a non-empty result, a non-empty summary, or >=1 non-empty comment.
  • On rejection it emits an auditable completion_blocked_no_evidence event in a
    tiny dedicated txn, then raises. The task never mutates to done — it
    stays in its current status (running/ready/blocked).
  • The guard only fires for tasks in a completable state; terminal/unknown tasks
    keep the existing "return False" contract.

Callers surfaced

  • hermes kanban complete CLI: catches CompletionEvidenceError, prints an
    actionable message (--result / comment hint), exits non-zero.
  • Dashboard PATCH /tasks/{id} to-done endpoint: returns 409, mirroring its
    other transition-rejection codes.
  • kanban_janitor.py is unchanged — its explicit_complete_comment
    auto-close supplies a comment, which is evidence, so it passes naturally.

⚠️ Behaviour change — flag for Mac

The daily lane's routine no-result completions WILL start being rejected by
this guard.
This is intended — those completions are exactly the
phantom-done pathology. Routines must now pass --result / --summary or leave
a comment before marking a task done.

Backfill — separate Mac-gated step

scripts/backfill_phantom_done.py triages the 59 existing phantom-done tasks:
it greps /home/orion/hermes-plan for a deliverable per task-id / close
title-keyword match and prints a triage table. With --apply (default OFF,
dry-run only) it writes the found deliverable into tasks.result; NO_TRACE
tasks are printed for manual review, never auto-reopened. Dry-run found a
deliverable for all 59 tasks.
Do not run --apply without Mac sign-off — the
live backfill is a Mac-gated operator step.

This feeds ADR-013.

Verification

  • py_compile clean on all changed files.
  • Full tests/hermes_cli/ + kanban plugin/gateway/tools suite: 4672 passed
    (3 unrelated pre-existing failures: 2x systemd PermissionError, 1x
    order-dependent stdout-stream test — all fail on baseline origin/main too).
  • Stress harness test_atypical_scenarios.py: only the 2 pre-existing baseline
    failures remain; the new empty_string_fields scenario was updated to assert
    the invariant.
  • New tests in test_kanban_db.py: result succeeds, no-evidence raises +
    audits, empty result raises, comment-as-evidence succeeds, summary-as-evidence
    succeeds, helper unit coverage, edit_completed_task_result still works.

🤖 Generated with Claude Code

kshitijk4poor and others added 30 commits May 13, 2026 22:31
Two ABC additions to cover the surface area of the remaining four
providers (exa, parallel, tavily, firecrawl) which were untouched by the
initial spike:

1. supports_crawl() + crawl() — Tavily natively crawls a seed URL via
   its /crawl endpoint. Exposing supports_crawl=True lets the crawl
   tool's dispatcher route to Tavily when configured, falling back to
   the auxiliary-model summarization path otherwise. Firecrawl could
   add this in a follow-up (the SDK supports it; we just don't surface
   it as a tool today).

2. Async-or-sync extract() — Parallel's SDK is natively async
   (AsyncParallel.beta.extract); Exa and Tavily are sync; Firecrawl is
   sync but called inside asyncio.to_thread() with a 60s timeout. The
   ABC docstring now permits either shape: implementations declare
   their own sync/async signature and the dispatcher uses
   inspect.iscoroutinefunction to detect and await.

Also adds get_active_crawl_provider() to web_search_registry mirroring
the search/extract resolvers, with web.crawl_backend as the explicit
override config key.

No behavior change on its own — these are scaffolds for the four
remaining provider migrations.
…tract)

Migrates Exa from the inline `_exa_search()` / `_exa_extract()` helpers in
tools/web_tools.py to a bundled plugin at plugins/web/exa/.

This is the first plugin in this PR to advertise supports_extract=True,
exercising the multi-capability ABC path that the initial three migrations
(brave_free, ddgs, searxng — all search-only) did not cover.

Both Exa methods are sync — the SDK is sync-only. The web_extract_tool
dispatcher in tools/web_tools.py will continue to call them inline until
Task "dispatch-extract-all" cuts it over to the registry.

Behaviour preserved bit-for-bit aside from the ABC method-name change:
  - is_configured()  -> is_available()
  - provider_name()  -> name (property)
  - "exa" stays as the registered name
  - Module-level `_exa_client` cache + lazy `from exa_py import Exa`
    preserved at the new location.
  - Errors (ValueError for missing API key, ImportError for missing SDK,
    generic Exception) caught and surfaced as {"success": False, "error": ...}
    instead of raising.

Adds "exa" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so the
hardcoded TOOL_CATEGORIES["web"] row and the plugin-injected row don't
duplicate during the spike. The skip-list goes away in the cleanup phase
along with the hardcoded row.

The legacy inline `_exa_search` / `_exa_extract` / `_get_exa_client` /
`_exa_client` in tools/web_tools.py are NOT deleted yet — the dispatcher
still references them. They go away in the next dispatcher-cutover commit.

E2E verified:
  - Plugin discovers + registers
  - .supports_search/.supports_extract/.supports_crawl = (True, True, False)
  - .get_setup_schema() returns the picker row shape
  - resolve(): explicit exa + EXA_API_KEY -> exa; without key -> exa (registered
    but unavailable, dispatcher surfaces "EXA_API_KEY not set" error)
Migrates Parallel.ai from inline `_parallel_search()` / `_parallel_extract()`
in tools/web_tools.py to a bundled plugin at plugins/web/parallel/.

First plugin in the codebase to expose an async :meth:`extract`:

  - search() is sync — Parallel.beta.search
  - extract() is **async def** — AsyncParallel.beta.extract

The ABC's docstring on supports_extract() already permits sync-or-async;
this commit is the first to exercise the async path. The web_extract_tool
dispatcher (next commit) detects coroutines via
inspect.iscoroutinefunction and awaits accordingly.

Behavior preserved:
  - PARALLEL_API_KEY required (raises ValueError if missing → surfaced
    as {"success": False, "error": "..."} instead)
  - PARALLEL_SEARCH_MODE env var honored (agentic|fast|one-shot, default
    agentic), validated via _resolve_search_mode()
  - Limit capped at 20 server-side via min(limit, 20)
  - Per-URL failure mode preserved: response.errors[] each become a
    result dict with an "error" field rather than raising
  - Module-level _parallel_client / _async_parallel_client caches kept
    (mirrors legacy singleton pattern)

Adds "parallel" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so
the picker doesn't double-list.

The legacy inline _parallel_search, _parallel_extract, _get_parallel_client,
_get_async_parallel_client in tools/web_tools.py are NOT deleted yet — the
dispatcher still calls them. They go away when the dispatcher cuts over.

E2E verified:
  - inspect.iscoroutinefunction(p.search) -> False
  - inspect.iscoroutinefunction(p.extract) -> True
  - extract() returns a coroutine (not a list)
  - 5 providers register correctly (brave-free, ddgs, exa, parallel, searxng)
…tract + crawl)

Migrates Tavily from inline _tavily_request() / _normalize_tavily_*
helpers in tools/web_tools.py to a bundled plugin at plugins/web/tavily/.

First plugin in the codebase to advertise supports_crawl=True. Tavily is
unique among built-in backends in offering a native /crawl endpoint that
walks linked pages from a seed URL with optional natural-language
instructions and depth ("basic" or "advanced").

Capabilities:
  - supports_search()  -> True (Tavily /search)
  - supports_extract() -> True (Tavily /extract)
  - supports_crawl()   -> True (Tavily /crawl)
  All sync (httpx.post under the hood).

The crawl method accepts forward-compat kwargs (instructions, depth,
limit) and is gated against unsafe URLs/policy by the dispatcher in
web_crawl_tool — exactly as before.

Behavior preserved:
  - TAVILY_API_KEY required (ValueError → typed error response)
  - TAVILY_BASE_URL env override honored
  - /crawl requires both body auth AND Bearer header — preserved
  - failed_results[] and failed_urls[] response keys mapped to per-URL
    items with error fields rather than raising
  - max_results capped at 20 server-side

Adds "tavily" to _WEB_PLUGIN_SKIPLIST.

The legacy inline _tavily_request / _normalize_tavily_search_results /
_normalize_tavily_documents / _TAVILY_BASE_URL in tools/web_tools.py are
NOT deleted yet — search/extract dispatch and the entire web_crawl_tool
function still reference them. They go away when those dispatchers are
cut over to the registry.

E2E verified:
  - Tavily registers with all 3 capabilities
  - Provider list now: brave-free, ddgs, exa, parallel, searxng, tavily
…ct + dual auth)

Migrates Firecrawl from inline code in tools/web_tools.py to a bundled
plugin at plugins/web/firecrawl/. By line count this is the largest of
the seven provider migrations: the firecrawl path captured most of the
file's vendor-specific complexity.

What moved into the plugin (all previously in tools/web_tools.py):

  Lazy Firecrawl SDK proxy
    - _load_firecrawl_cls() — caches the imported SDK class
    - _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK
      imports until first construction or isinstance check.

  Client construction (dual auth)
    - _get_direct_firecrawl_config()  — direct FIRECRAWL_API_KEY/URL path
    - _get_firecrawl_gateway_url()    — managed Nous tool-gateway URL
    - _is_tool_gateway_ready()        — gateway URL + Nous token check
    - _has_direct_firecrawl_config()  — direct config present?
    - _get_firecrawl_client()         — combined client construction
                                        honoring web.use_gateway
    - check_firecrawl_api_key()       — top-level "is firecrawl usable"
    - _firecrawl_backend_help_suffix() — managed-gateway help string
    - _raise_web_backend_configuration_error() — typed misconfig error

  Response shape normalization (vendor-specific)
    - _to_plain_object(), _normalize_result_list() — SDK→dict helpers
    - _extract_web_search_results() — handles SDK/direct/gateway shapes
    - _extract_scrape_payload()     — nested-data unwrap for scrape

  Per-URL extract loop
    - 60s asyncio.wait_for timeout per URL
    - Pre-scrape website-policy gate
    - Post-scrape redirect-aware SSRF re-check
    - Format-aware content selection (markdown / html / auto)
    - Per-URL errors returned as {"error": str} entries, no raises

Extract is declared `async def` — each URL is scraped in
asyncio.to_thread(...). This is the second async-extract plugin after
parallel.

The plugin re-exports `Firecrawl` (the lazy proxy) and
`check_firecrawl_api_key()` so existing tests doing
`patch("tools.web_tools.Firecrawl")` or
`monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep
working — tools/web_tools.py re-exports both names in the next
dispatcher-cutover commit.

Note: web_crawl_tool still has its own Firecrawl crawl path inline
(separate from extract); the Firecrawl SDK supports /crawl but we don't
expose supports_crawl=True on this plugin yet. Tavily handles crawl
today. Adding Firecrawl crawl is a clean follow-up.

Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST.

E2E verified:
  - All 7 providers register: brave-free, ddgs, exa, firecrawl,
    parallel, searxng, tavily
  - inspect.iscoroutinefunction(firecrawl.extract) -> True
  - Firecrawl proxy is a callable lazy proxy at module level
  - check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence
Cuts over web_search_tool, web_extract_tool, and web_crawl_tool in
tools/web_tools.py to dispatch through agent.web_search_registry
instead of the legacy hardcoded if-elif backend chains.

Per-tool changes:

  web_search_tool (sync)
    Replace 5 backend branches (parallel, exa, registry-3-providers,
    tavily, firecrawl-fallthrough) with a single registry path:
      1. _get_search_backend() resolves the configured name
      2. _wsp_get_provider(name) for explicit-config-wins semantics
      3. get_active_search_provider() fallback for typo / unknown name
      4. provider.search(query, limit) — sync for all 7 providers

  web_extract_tool (async)
    Replace 4 backend branches (parallel-async, exa-sync, tavily-sync,
    search-only-error, firecrawl-perurl-loop) with:
      1. Same provider resolution as search.
      2. When configured backend IS registered but doesn't support
         extract (search-only providers like brave-free), surface a
         typed "search-only" error matching the legacy text — tests
         assert that wording.
      3. inspect.iscoroutinefunction(provider.extract) detects sync vs
         async: parallel + firecrawl are async; exa + tavily are sync.
         Sync extracts run in asyncio.to_thread() so we don't block.

  web_crawl_tool (async)
    Replace tavily-specific branch + search-only-error block with:
      1. _wsp_get_provider(backend) — explicit config first
      2. Search-only typed error when the configured name doesn't
         support crawl (matches legacy phrasing)
      3. get_active_crawl_provider() fallback otherwise
      4. provider.crawl(url, **kwargs) — async-or-sync dispatch as above
      5. Response post-processing (LLM summarization, trimming) stays
         unchanged — it's not provider-specific.
    When no plugin advertises supports_crawl, falls through to the
    existing Firecrawl-via-web-summarize path below (unchanged).

Test updates (2 tests in tests/tools/test_web_tools_config.py):
  - test_web_search_clamps_limit_before_backend_call:
      patch("tools.web_tools._parallel_search") -> patch the registry
      provider returned by agent.web_search_registry.get_provider
  - test_search_error_response_does_not_expose_diagnostics:
      patch("tools.web_tools._get_firecrawl_client") -> same pattern

Tests unchanged (still pass):
  - All TestXBackendWiring classes (test _get_backend / _is_backend_available
    config-resolution, independent of dispatch)
  - All TestXSearchOnlyErrors classes (test the search-only error path
    via web_extract_tool / web_crawl_tool — error text preserved)
  - 141 passing web tests total, 0 regressions.

Dead-code cleanup deferred to a follow-up commit so this diff stays
focused on the cutover. After this commit:
  - tools.web_tools._exa_search / _exa_extract / _parallel_search /
    _parallel_extract / _tavily_request / _normalize_tavily_* /
    _get_firecrawl_client / _extract_web_search_results /
    _extract_scrape_payload / _to_plain_object / _normalize_result_list
    are no longer called by the dispatchers, but still exist.
  - The config-resolution layer (_get_backend, _is_backend_available,
    _is_tool_gateway_ready, _has_direct_firecrawl_config) IS still in
    use and must stay.
  - The Firecrawl proxy and check_firecrawl_api_key are still imported
    by integration tests and patched by unit tests — must stay (or be
    re-exported from the plugin).
Two regressions discovered by running the full tests/tools/ suite after
the dispatcher cutover, both fixed in this commit:

1. web_crawl_tool incorrectly errored "search-only" for firecrawl
---------------------------------------------------------------------
The cutover treated any provider with supports_crawl()==False as a
search-only backend and returned the typed search-only error. But
firecrawl can crawl via the legacy multi-page-extract path inside
web_crawl_tool — it just doesn't expose supports_crawl on the plugin
(adding native firecrawl crawl is a clean follow-up).

Fix: only emit the search-only error when the provider supports
NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the
provider supports extract but not crawl (firecrawl), fall through to
the legacy firecrawl-via-extract path below.

2. firecrawl plugin's check_website_access wasn't patchable
---------------------------------------------------------------------
The plugin imported `from tools.website_policy import check_website_access`
INSIDE the extract() function body, so monkeypatching the name on
plugins.web.firecrawl.provider had no effect — the inner import re-bound
the name on every call.

Fix: hoist the import to module level. Cheap (website_policy itself
has no heavy deps) and makes the standard
monkeypatch.setattr(firecrawl_provider, "check_website_access", ...)
pattern work.

Test updates (tests/tools/test_website_policy.py — 4 tests):
  - test_web_extract_short_circuits_blocked_url
  - test_web_extract_blocks_redirected_final_url
    Both: patch the gate at plugins.web.firecrawl.provider (where it
    runs after migration) and force the firecrawl plugin to be the
    active extract provider via FIRECRAWL_API_KEY.
  - test_web_crawl_short_circuits_blocked_url
  - test_web_crawl_blocks_redirected_final_url
    Both: unchanged — the dispatcher-level gate at tools.web_tools.py
    line 1651 still uses the imported `check_website_access` name and
    the firecrawl-fallthrough path is exercised as before.

Verified: 22/22 tests/tools/test_website_policy.py pass.
Removes ~580 lines of dead code from tools/web_tools.py that were
superseded by the plugin migration but kept around in the cutover commit
to keep the diff focused. Replaces them with thin re-export shims so
existing tests and external callers that reach for the legacy
``tools.web_tools.<name>`` paths continue to work transparently.

Deleted from tools/web_tools.py
--------------------------------
- Lazy Firecrawl SDK proxy (_load_firecrawl_cls, _FirecrawlProxy,
  _FIRECRAWL_CLS_CACHE, the Firecrawl singleton)
- Firecrawl client section (_get_direct_firecrawl_config,
  _get_firecrawl_gateway_url, _is_tool_gateway_ready,
  _has_direct_firecrawl_config, _raise_web_backend_configuration_error,
  _firecrawl_backend_help_suffix, _get_firecrawl_client)
- Parallel client section (_get_parallel_client,
  _get_async_parallel_client, _parallel_client, _async_parallel_client)
- Tavily client section (_TAVILY_BASE_URL, _tavily_request,
  _normalize_tavily_search_results, _normalize_tavily_documents)
- Generic SDK normalizers (_to_plain_object, _normalize_result_list,
  _extract_web_search_results, _extract_scrape_payload)
- Exa client section (_get_exa_client, _exa_client, _exa_search,
  _exa_extract)
- Parallel helpers (_parallel_search, _parallel_extract)
- Duplicate inline check_firecrawl_api_key

Net: tools/web_tools.py drops from 2227 → 1613 lines (-614 lines).

Re-exports added at top of tools/web_tools.py
---------------------------------------------
- From plugins.web.firecrawl.provider:
  Firecrawl, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, _load_firecrawl_cls,
  _get_direct_firecrawl_config, _get_firecrawl_gateway_url,
  _is_tool_gateway_ready, _has_direct_firecrawl_config,
  _firecrawl_backend_help_suffix, _raise_web_backend_configuration_error,
  _get_firecrawl_client, _to_plain_object, _normalize_result_list,
  _extract_web_search_results, _extract_scrape_payload,
  check_firecrawl_api_key
- From plugins.web.tavily.provider:
  _tavily_request, _normalize_tavily_search_results,
  _normalize_tavily_documents
- From plugins.web.parallel.provider:
  _get_parallel_client, _get_async_parallel_client
- From plugins.web.exa.provider:
  _get_exa_client

Plus retained module-level imports for backward-compat with tests:
- httpx (tests patch tools.web_tools.httpx for tavily request mocking)
- build_vendor_gateway_url, _read_nous_access_token,
  resolve_managed_tool_gateway, managed_nous_tools_enabled,
  prefers_gateway (tests patch tools.web_tools.<name>)

Plugin indirection pattern (key technique)
------------------------------------------
For functions inside the firecrawl/parallel/exa plugins to honor
unit-test patches that target ``tools.web_tools.<name>``, the plugin
implementations now do ``import tools.web_tools as _wt`` at call time
and read helper names through that module (``_wt._read_nous_access_token``,
``_wt.Firecrawl``, ``_wt.prefers_gateway``, etc.). This makes the
existing test patches transparently reach the plugin code without any
test changes.

The cached client globals (_firecrawl_client, _firecrawl_client_config,
_parallel_client, _async_parallel_client, _exa_client) also now live on
tools.web_tools so existing test setup_method handlers that reset
``tools.web_tools._<vendor>_client = None`` between cases keep working.
The plugins read/write the cache via getattr/setattr on the web_tools
module.

Verified
--------
- 173/173 targeted web tests pass:
  test_web_providers.py, test_web_providers_brave_free.py,
  test_web_providers_ddgs.py, test_web_providers_searxng.py,
  test_web_tools_config.py, test_web_tools_tavily.py,
  test_website_policy.py, test_config_null_guard.py
- Compile-clean (py_compile.compile passes)
- All inline implementations now exist in exactly one place
  (plugins.web.<vendor>.provider)

Follow-up clean-up
------------------
- Drop _WEB_PLUGIN_SKIPLIST + hardcoded TOOL_CATEGORIES["web"] rows
  (next commit)
- Delete tools/web_providers/ directory entirely
- Add tests/plugins/web/ coverage
- Full tests/tools/ + tests/gateway/ regression sweep before promoting PR
…re sole source

Removes the seven hardcoded TOOL_CATEGORIES["web"] provider rows that
duplicated the plugin-registered providers, and deletes the
_WEB_PLUGIN_SKIPLIST that existed to prevent duplicate picker rows
during the migration. The Web Search & Extract category now derives its
provider rows entirely from agent.web_search_registry via
_plugin_web_search_providers(), matching how Spotify, Google Meet, and
the image_gen plugins are surfaced.

Removed (deduplicated against plugin schemas):
  - Firecrawl Cloud         → plugins.web.firecrawl
  - Exa                     → plugins.web.exa
  - Parallel                → plugins.web.parallel
  - Tavily                  → plugins.web.tavily
  - SearXNG                 → plugins.web.searxng
  - Brave Search (Free Tier) → plugins.web.brave_free
  - DuckDuckGo (ddgs)       → plugins.web.ddgs (post_setup hook preserved)

Retained in TOOL_CATEGORIES["web"]:
  - Nous Subscription   — requires requires_nous_auth +
                          managed_nous_feature + override_env_vars
                          to drive the managed-gateway UX. Not a
                          provider — a different *setup flow* for the
                          firecrawl backend.
  - Firecrawl Self-Hosted — points firecrawl at a private Docker URL
                            via FIRECRAWL_API_URL only. Same reason:
                            UX setup-flow row, not a provider.

These two rows describe alternative auth/billing paths for the
firecrawl backend; they intentionally share web_backend="firecrawl"
with the plugin row but light up different env-var prompts.

Plugin schema extensions
------------------------
- ddgs plugin's get_setup_schema() now emits `post_setup: "ddgs"` so
  selection still triggers the pip-install hook in _run_post_setup().
- _plugin_web_search_providers() passes `post_setup` through verbatim
  when present in the schema (other future plugins like camofox / a
  hypothetical playwright-web plugin can opt in the same way).
- Picker rows now carry both `web_backend` (legacy field consumed by
  setup + selection helpers) and `web_search_plugin_name`
  (informational marker), so behavior is identical between hardcoded
  and plugin-registered rows.

Net diff
--------
- hermes_cli/tools_config.py: -141/+50 lines (~91 lines net)
- plugins/web/ddgs/provider.py: +7/-4 (post_setup field + badge polish)

Verified
--------
- Compile-clean for both files
- Picker shows: 2 hardcoded rows (Nous Subscription, Firecrawl
  Self-Hosted) + 7 plugin rows (alphabetically: Brave Search,
  DuckDuckGo, Exa, Firecrawl, Parallel, SearXNG, Tavily). DuckDuckGo
  row carries post_setup="ddgs" for first-time install.
- 173 web-specific tests still pass.
… ABC tests

Removes the legacy in-tree provider scaffolding that PR NousResearch#25182 fully
replaced with the plugin architecture:

  tools/web_providers/__init__.py        (6 lines)
  tools/web_providers/base.py            (89 lines — old ABCs)
  tools/web_providers/ARCHITECTURE.md    (73 lines — old design doc)

These were the staging-ground ABCs and provider modules that the
plugin migration absorbed. All seven web providers now implement the
single :class:`agent.web_search_provider.WebSearchProvider` ABC and
live under ``plugins/web/<vendor>/``. Nothing else in the tree imports
``tools.web_providers`` — verified via grep before deletion.

Test migration (tests/tools/test_web_providers.py)
--------------------------------------------------
Rewrote ``TestWebProviderABCs`` to test the new unified ABC at
:mod:`agent.web_search_provider`:

  - test_cannot_instantiate_abc_directly — abstract ``name`` + ``is_available``
  - test_concrete_search_only_provider_works — exercise default
    ``supports_extract=False`` / ``supports_crawl=False`` flags
  - test_concrete_multi_capability_provider_works — exercise all three
    capabilities, async extract supported (declared sync here for
    simplicity; real plugins like parallel + firecrawl use async)
  - test_search_only_provider_skips_extract_and_crawl — verify
    ``supports_*()`` flags default to False so search-only providers
    don't have to implement extract() or crawl()

The 9 other tests in the file (per-capability backend selection,
DEFAULT_CONFIG merge, dispatcher routing) test public helpers in
``tools.web_tools`` that still exist and pass unchanged.

agent/web_search_provider.py docstring updated to reflect that the
legacy ABCs no longer exist; the response-shape contract is preserved
bit-for-bit so external consumers see no behavioral change.

Net diff
--------
- tools/web_providers/ removed (-168 lines)
- tests/tools/test_web_providers.py rewritten ABC section (+78/-30 net,
  same coverage, new API)
- agent/web_search_provider.py docstring (-3/+5 lines)

Verified
--------
- 173/173 targeted web tests pass
- 12/12 ABC contract tests pass with the new interface
- No remaining grep hits for ``tools.web_providers`` outside of
  intentional historical references in plugin docstrings.
Adds 44 focused tests under tests/plugins/web/ covering the surface that
the PR NousResearch#25182 web-provider migration introduced. Complements the
existing tests/tools/ coverage which is dispatcher-centric; this file is
plugin-centric and tests each plugin + the registry directly.

Test classes (44 tests, ~1.1s on 4 workers)
-------------------------------------------

TestBundledPluginsRegister (16 tests)
  - All seven plugins present in the registry after
    _ensure_plugins_discovered()
  - Per-plugin parametrized capability-flag assertions
    (brave-free / ddgs / searxng: search-only;
     exa / parallel / firecrawl: search + extract;
     tavily: search + extract + crawl)
  - Every plugin exposes name + display_name properties
  - Every plugin returns a picker-compatible get_setup_schema() dict

TestIsAvailable (7 tests)
  - Each premium plugin reports is_available()==False when its env var is
    absent and True once set (brave-free / searxng / tavily / exa /
    parallel)
  - firecrawl recognizes either FIRECRAWL_API_KEY or FIRECRAWL_API_URL
    as a "configured" signal
  - ddgs is the always-on fallback and must not raise from is_available()

TestRegistryResolution (4 tests)
  - Option B semantics validated end-to-end:
    1. Explicit configured provider wins even when is_available()==False
       (dispatcher surfaces typed credential errors, no silent switch)
    2. Unknown/typo name falls back to first available legacy-preference
       provider
    3. Asking for extract via a search-only backend falls back to an
       extract-capable available provider (capability-incompatible
       branch in _resolve())
    4. No config + no credentials → None (or ddgs if installed)

TestAsyncExtractDispatch (4 tests)
  - parallel + firecrawl extract() are coroutine functions (async path
    in dispatcher uses await)
  - exa + tavily extract() are sync (dispatcher wraps in
    asyncio.to_thread)

TestErrorResponseShapes (7 tests)
  - Plugins return typed error dicts (success=False + "error" key) when
    credentials are missing, never raise
  - async extract() returns list of per-URL error dicts
  - tavily crawl() returns {"results": [{"error": ...}]} on missing
    credentials

Design notes
------------
- All tests use real imports of plugin modules — no mocking of provider
  classes themselves — so they catch drift in the ABC, registry, and
  glue layer simultaneously. Per the hermes-agent-dev skill's E2E
  testing guidance.
- The autouse _isolate_env fixture clears every web-provider env var
  before each test so is_available() reflects the test's setup.
- Resolution tests use the lower-level _resolve() directly rather than
  rebuilding the HERMES_HOME config dance — same observable behavior,
  no sys.modules.pop side-effects that would break the ABC isinstance
  check inside ctx.register_web_search_provider().
…line path

The web-provider migration originally left firecrawl crawl as the only
provider-specific code remaining inline in tools/web_tools.py (~250
lines of Firecrawl-specific crawl orchestration that didn't fit the
plugin's existing surface). This commit closes that gap.

What this adds
--------------
1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)``
   - Accepts the same kwargs as the dispatcher passes to any crawl
     provider (``instructions``, ``depth``, ``limit``); Firecrawl's
     /crawl endpoint ignores ``instructions`` and ``depth`` so we log
     and drop with a clear info message.
   - Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the
     gateway event loop isn't blocked on a multi-page crawl.
   - Preserves the response-shape normalization across pydantic /
     typed-object / dict variants that the legacy inline code did.
   - Preserves per-page website-policy re-check (catches blocked
     redirects after the SDK returns).
   - Returns the same {"results": [...]} shape so the dispatcher's
     shared LLM-summarization post-processing path works unchanged.
   - Sets supports_crawl() to True so the dispatcher routes through
     the plugin instead of the legacy fallthrough.

2. tools/web_tools.py: delete the entire legacy firecrawl crawl block
   that used to run after "No registered provider supports crawl" —
   ~270 lines including:
   - check_firecrawl_api_key gate + typed error
   - inline SSRF + website-policy seed-URL gate (dispatcher already
     does this)
   - Firecrawl client setup with crawl_params
   - 100+ lines of pydantic/dict/typed-object normalization
   - Per-page LLM-processing loop (kept in the dispatcher's shared
     post-processing path; that's where it always belonged)
   - trimming + base64 image cleanup (still done in the dispatcher's
     shared path)

   Replaced with a single typed-error branch when no crawl-capable
   provider is available: "web_crawl has no available backend. Set
   FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set
   TAVILY_API_KEY for Tavily."

Test updates
------------
- tests/tools/test_website_policy.py:
  - test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL
    gate still runs on web_tools.check_website_access (no change to
    that patch), but the firecrawl client lockdown moved to the
    plugin module — patch firecrawl_provider._get_firecrawl_client
    instead of web_tools._get_firecrawl_client. The dispatcher
    short-circuits before the plugin runs, so the test still passes.
  - test_web_crawl_blocks_redirected_final_url: patch the per-page
    policy gate at plugins.web.firecrawl.provider.check_website_access
    (where it now runs) AND on web_tools (where the seed-URL gate
    still runs). Patch firecrawl_provider._get_firecrawl_client for
    the FakeCrawlClient injection. Both checks flow through the same
    fake_check function.
- tests/plugins/web/test_web_search_provider_plugins.py:
  - Update parametrized capability-flag spec: firecrawl supports_crawl
    is now True.
  - Add test_firecrawl_crawl_returns_error_dict_when_unconfigured —
    verifies inspect.iscoroutinefunction(p.crawl) is True and that
    the async crawl returns a per-page error dict (not a raise) when
    FIRECRAWL_API_KEY is missing.

Verified
--------
- 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl
  crawl test from this commit = 218 with the test deduplication).
- Compile-clean (py_compile passes on both files).
- Provider capabilities matrix confirmed end-to-end:
    name        search  extract  crawl   async-extract?  async-crawl?
    firecrawl   True    True     True    True            True
    tavily      True    True     True    False           False
  Both crawl-capable providers exercise the dispatcher's
  inspect.iscoroutinefunction async-or-sync detection.

Net diff
--------
- tools/web_tools.py: -254 lines (legacy inline crawl gone)
- plugins/web/firecrawl/provider.py: +185 lines (crawl method)
- test_website_policy.py: +14/-9 lines (patch locations)
- test_web_search_provider_plugins.py: +22/-1 lines (capability flag
  + new firecrawl crawl test)
- Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763
  before this commit, 2227 before the migration started).
… cleanup

Self-review of the plugin migration surfaced one warning and a handful of
doc/dead-code cleanups. None affect production behaviour through the main
dispatcher (which always calls `tools.web_tools._get_backend()` first and
preserves the full 7-provider walk), but direct callers of
`agent.web_search_registry.get_active_*_provider()` previously diverged
from the legacy order and could return `None` for users with credentials
but no explicit `web.backend` config key.

Changes
-------
1. `_LEGACY_PREFERENCE` was shipped as a 4-tuple
   `("brave-free", "firecrawl", "searxng", "ddgs")` while the PR
   description and the legacy `_get_backend()` candidate order both
   call for the 7-tuple
   `(firecrawl, parallel, tavily, exa, searxng, brave-free, ddgs)`.
   Replaced with the 7-tuple. Verified empirically: with TAVILY+EXA keys
   and no config, `get_active_search_provider()` now returns tavily
   (was None); with EXA+PARALLEL it returns parallel (was None); with
   BRAVE+FIRECRAWL it returns firecrawl (was brave-free).

2. `agent/web_search_registry.py` — module docstring, `_resolve` step-3
   docstring, and inline comment all listed the old 4-tuple and claimed
   "brave-free first because it was the shipped default". The legacy
   default is `"firecrawl"`. Rewritten to match the new ordering and
   reference `tools.web_tools._get_backend()` as the source of truth.

3. `agent/web_search_registry.py` — `get_active_crawl_provider`
   docstring said "only Tavily implements it among built-in providers".
   Firecrawl also advertises `supports_crawl=True` after the previous
   commit. Updated to "Tavily and Firecrawl".

4. `plugins/web/tavily/provider.py` — module docstring said "Tavily is
   the only built-in backend that natively crawls". Updated.

5. `agent/web_search_provider.py` — ABC docstring mentioned only
   `search` / `extract` capabilities. Added `crawl` for accuracy.

6. `plugins/web/{firecrawl,parallel,exa}/provider.py` — dead plugin-level
   cache globals (`_firecrawl_client`, `_parallel_client`,
   `_async_parallel_client`, `_exa_client`) were declared but never read
   (all reads/writes go through `_wt.*` per the `extracting-inline-
   helpers-to-plugins` recipe). Removed the dead declarations; the
   reset-for-tests helpers in firecrawl + parallel now clear the
   canonical `_wt._<name>` slots, matching the pattern exa already used.

Tests
-----
218/218 web-targeted tests still pass (no test changes needed). 4910/4910
in `tests/tools/` still green.
Surfaced by local E2E behavior-parity testing of PR vs origin/main: the
plugin-migrated dispatchers were quietly changing the error envelope
shape returned to function-calling models on unconfigured systems.

Two findings, both from per-result error wrapping bleeding into the
pre-flight configuration error path:

1. **search**: ``firecrawl.search()`` caught the
   ``ValueError("Web tools are not configured...")`` from
   ``_get_firecrawl_client()`` and returned it as
   ``{"success": False, "error": ...}``, losing the legacy
   ``{"error": "Error searching web: ..."}`` envelope that
   ``tool_error()`` emits on main. Models that special-case the
   ``error`` key still detect the failure, but the prefix is part of
   the legacy contract some users rely on.

2. **crawl**: ``firecrawl.crawl()`` caught the same pre-flight
   ``ValueError`` and wrapped it as a per-page error inside
   ``results[0]``. Main short-circuits on ``check_firecrawl_api_key()``
   BEFORE dispatching, so its unconfigured response is
   ``{"success": False, "error": "web_crawl requires Firecrawl..."}``
   at the top level. The PR's per-page burying hid the failure inside
   ``results[]`` where models that check ``result.get("error")`` would
   miss it.

Fix:
- ``plugins/web/firecrawl/provider.py``: pull
  ``_get_firecrawl_client()`` outside the broad ``try`` in
  ``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate
  to the dispatcher's top-level exception handler. In-flight SDK
  errors still get wrapped as ``{"success": False, ...}``.
- ``tools/web_tools.py``: mirror main's upstream availability gate in
  ``web_crawl_tool``. When the resolved crawl provider is
  ``is_available()==False``, short-circuit BEFORE dispatching with the
  same top-level error shape main emits.
- ``tests/tools/test_web_providers.py``: 2 regression tests
  (``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so
  future plugin work can't undo this.

Verified via local subprocess-based parity test (14/14 scenarios match
origin/main shape exactly) and full 210/210 web test suite green.
The cherry-picked tests from NousResearch#6173 set HERMES_HOME outside Path.home()/.hermes,
which forces get_default_hermes_root() down its Docker branch and returns
HERMES_HOME directly — so _get_default_hermes_home() never resolves to the
~/.hermes directory the tests were trying to assert about.

Rewire both tests to use the real profile layout (HERMES_HOME pointing at
~/.hermes/profiles/<name>) so _get_default_hermes_home() resolves back to
~/.hermes and the default-profile fallback is actually exercised.
Tighten _is_png_file() to read just the 8-byte PNG magic via path.open()
+ read(8), instead of slurping the entire image into memory only to check
the prefix.
…ousResearch#25449)

* feat(goals): /subgoal — user-added criteria appended to active /goal

Layers a /subgoal command on top of the existing freeform Ralph judge
loop. The user can append extra criteria mid-loop; the judge factors
them into its done/continue verdict and the continuation prompt
surfaces them to the agent. No new tool, no agent self-judging — the
existing judge model just sees a richer prompt.

Forms:
  /subgoal                  show current subgoals
  /subgoal <text>           append a criterion
  /subgoal remove <n>       drop subgoal n (1-based)
  /subgoal clear            wipe all subgoals

How it integrates:

- GoalState gains `subgoals: List[str]` (default []), backwards-compat
  for existing state_meta rows.
- judge_goal accepts an optional subgoals kwarg; non-empty switches to
  JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE which lists them as
  numbered criteria and asks 'is the goal AND every additional
  criterion satisfied?'
- next_continuation_prompt picks CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE
  when non-empty so the agent sees what to target.
- /subgoal is allowed mid-run on the gateway since it only touches the
  state the judge reads at turn boundary — no race with the running
  turn.
- Status line shows '... , N subgoals' when present.

Surface:
- hermes_cli/goals.py — field, prompt blocks, manager methods, judge weave
- hermes_cli/commands.py — /subgoal CommandDef
- cli.py — _handle_subgoal_command
- gateway/run.py — _handle_subgoal_command + mid-run dispatch
- tests/hermes_cli/test_goals.py — 15 new tests (backcompat, mutation,
  persistence, prompt template selection, judge-prompt content via mock,
  status-line rendering)

77 goal-related tests passing across goals + cli + gateway + tui.

* fix(goals): slash commands don't preempt the goal-continuation hook

Two findings from live-testing /subgoal:

1. Slash commands queued while the agent is running landed in
   _pending_input (same queue as real user messages). The goal hook's
   'is a real user message pending?' check returned True and silently
   skipped — but the slash command consumes its queue slot via
   process_command() which never re-fires the goal hook, so the loop
   stalls indefinitely. Now the hook peeks the queue and only defers
   when a non-slash payload is present.

2. The with-subgoals judge prompt was too soft — opus 4.7 said 'done,
   implying all requirements met' without verifying. Tightened to
   demand specific per-criterion evidence (file contents, output line,
   command result) and explicitly reject phrases like 'implying it was
   done.'

Live verified: /subgoal injected mid-loop now correctly forces the
judge to refuse done until the new criterion is met. Agent gets the
continuation prompt with subgoals listed, updates the script, judge
confirms done with specific evidence cited.
The salvaged regression test called skin.get_spinner_list() which
doesn't exist on SkinConfig. Replace with direct dict access on
skin.spinner — same intent (verify default empty spinner is preserved
when user override is invalid).
Previously ACP dangerous-command approvals mixed an invalid ACP
payload shape with partial Hermes option mapping, and the callback
plumbing was shared across worker threads. This commit uses ACP
tool-call updates, preserves Hermes once/session/always semantics,
and scopes approval callbacks to the current worker thread.

- Build permission requests with `update_tool_call` and unique
  `perm-check-*` ids in `acp_adapter/permissions.py`
- Keep ACP option mapping explicit and fail closed on unknown outcomes
  or request failures
- Set approval callbacks inside the ACP executor worker and read them
  from thread-local state in `tools/terminal_tool.py`
- Replace duplicated ACP bridge coverage with focused tests in
  `tests/acp/test_permissions.py` and add a thread-local callback test
Maps mr@shu.io to the mrshu GitHub handle so the release script
attributes the salvaged ACP approval bridging commit correctly.
Pre-stages AUTHOR_MAP for 12 new contributors whose PRs are being salvaged
in the upcoming batch:

- 1RB        (NousResearch#25462)
- ayushere   (NousResearch#25342)
- domtriola  (NousResearch#25424)
- ephron-ren (NousResearch#25358)
- freqyfreqy (NousResearch#25423)
- fu576      (NousResearch#25369)
- kfa-ai     (NousResearch#25398)
- magic524   (NousResearch#25361)
- PaTTeeL    (NousResearch#25359)
- pearjelly  (NousResearch#25388)
- raymaylee  (NousResearch#25394)
- Tianyu199509 (NousResearch#25421)
The word "worktree" (a git subcommand feature for parallel checkouts)
was used interchangeably with "repository" in the LSP docs, causing
confusion. LSP only requires a git-initialized directory, not an actual
worktree.

Fixes two instances: section "When LSP runs" and the troubleshooting
"Editing a file outside any git repo" heading.
Xiaomi MiMo emits reasoning via OpenAI's reasoning_content field and
requires reasoning_content on every assistant tool-call message when
replaying history. Without echo-back, subsequent API calls fail with
HTTP 400 — same shape as DeepSeek and Kimi/Moonshot thinking modes.

Adds _needs_mimo_tool_reasoning() detection (provider == 'xiaomi',
'mimo' in model, or xiaomimimo.com base url) and wires it into the
_needs_thinking_reasoning_pad() check.

Salvage of NousResearch#25358 by @ephron-ren (manually re-applied — original branch
was severely stale against current main).
Discord introduced message_snapshots for forwarded messages — text and
attachments live inside snap.content / snap.attachments rather than on
the parent message. _handle_message wasn't reading them, so forwards
showed up empty.

Defensively extracts snapshot text (when raw_content is empty) and
appends snapshot attachments to the working all_attachments list used
for type detection and media routing. hasattr/getattr guards keep this
safe on older discord.py installs without the field.

Salvage of NousResearch#25462 by @1RB (manually re-applied — original branch was
stale against current main).
When the auxiliary client fallback chain reaches a provider that has no
credentials configured (no API key, no pool entry), the current code
just returns (None, None) which counts toward the per-call timeout
budget on the next attempt. Mark the provider unhealthy with a short
TTL so the chain advances quickly to the next viable option.

Closes NousResearch#25384.

Salvage of NousResearch#25395 by @AllynSheep.
- _read_process_cmdline: /proc and 'ps' are unavailable on Windows,
  so process cmdline was always empty. Add psutil fallback (already
  a hard dependency used by _pid_exists in the same module).

- _record_looks_like_gateway: argv paths use backslashes on Windows
  but patterns use forward slashes/dots, so the fallback record check
  always failed. Normalize backslashes to forward slashes before
  matching.

Together these caused get_running_pid() to return None on Windows
even when the gateway process is alive, making the dashboard report
gateway as 'stopped' despite it functioning normally.
teknium1 and others added 20 commits May 16, 2026 01:55
…arch#355) (NousResearch#26729)

* feat(skills): add osint-investigation optional skill (closes NousResearch#355)

Phase-1 public-records OSINT investigation framework adapted from
ShinMegamiBoson/OpenPlanter (MIT). Lives in optional-skills/research/.

Six data-source wiki entries (FEC, SEC EDGAR, USAspending, Senate LD,
OFAC SDN, ICIJ Offshore Leaks), each following the 9-section template:
summary, access, schema, coverage, cross-reference keys, data quality,
acquisition, legal, references.

Six stdlib-only acquisition scripts that emit normalized CSV, plus three
analysis scripts:

  - entity_resolution.py  — three-tier match (exact / fuzzy / token overlap)
                            with explicit confidence per row
  - timing_analysis.py    — permutation test for donation/contract timing
                            correlation, joins through cross-links
  - build_findings.py     — assembles structured findings.json with
                            evidence chains pointing back to source rows

Validation: full pipeline runs end-to-end on synthetic fixtures. Entity
resolution found 24 cross-matches with 0 false positives on a 5-row /
4-row test set. Timing analysis on 5 donations clustered near 3 awards
returned p=0.000, effect size 2.41 SD. Findings JSON correctly tags
HIGH-severity timing pattern. All 9 scripts pass --help and py_compile.

Docs site page auto-generated by website/scripts/generate-skill-docs.py;
sidebar + catalog entries updated by the same generator.

* fix(osint-investigation): live API fixes from end-to-end sweep

Live-tested the skill on a real public-citizen query and found three bugs
the synthetic E2E missed. All three are now fixed and re-verified.

1. FEC fetch hung on contributor name searches.
   The combination of two_year_transaction_period + sort=date +
   contributor_name puts the OpenFEC query plan on a slow path that the
   upstream gateway times out (25s+). Switched to min_date/max_date with no
   explicit sort. Renamed --candidate to --contributor (the original name
   was misleading: FEC searches by donor, not by candidate; --candidate is
   kept as a deprecated alias). Added --state filter for narrowing.

2. ICIJ Offshore Leaks reconcile endpoint returns 404.
   ICIJ removed the Open Refine reconciliation API. Rewrote
   fetch_icij_offshore.py to download the official bulk CSV ZIP (~70 MB,
   public, no auth) and search it locally. Cached under
   $HERMES_OSINT_CACHE/icij/ (default ~/.cache/hermes-osint/icij/) for
   30 days, --force-refresh to refetch. Verified live: 'PUTIN' query
   returns 5 Panama Papers officer matches in 0.5s after first download.

3. SEC EDGAR silently returned 0 when the company-name resolver matched
   an individual Form 3/4/5 filer (insider trading disclosures).
   Now surfaces 'Resolved company X → CIK Y (Z)' on stderr, prints a
   filing-type histogram when the type filter wipes results, and
   explicitly warns when the matched CIK appears to be an individual
   filer rather than a corporate registrant.

Bonus: _http.py was retrying 429 responses with exponential backoff plus
honoring (often-missing) Retry-After headers, which compounded into
multi-second hangs per page when the upstream key was over quota.
Changed to fail-fast on 429 with a clear, actionable error showing the
upstream's quota message. Verified: 0.3s fast-fail vs the previous 60s
hang on DEMO_KEY rate-limit exhaustion.

Updated SKILL.md, fec.md, and icij-offshore.md to match the new CLI
flags and ICIJ bulk-cache flow. Regenerated the docusaurus page via
website/scripts/generate-skill-docs.py.

Live sweep results across all 6 sources for 'Dillon Rolnick, New York':
- OFAC SDN: 0 matches ✓ (correctly not sanctioned)
- USAspending: 0 matches ✓ (correctly not a federal contractor)
- Senate LDA: 0 matches ✓ (correctly not a lobbying client)
- SEC EDGAR: warns it resolved to 'Rolnick Michael' (CIK 0001845264)
    who is an individual Form 3 filer, not a corporate registrant
- ICIJ: 0 matches ✓ (correctly not in any offshore leak)
- FEC: rate-limited (DEMO_KEY); fails fast with clear quota message

* feat(osint-investigation): expand to 12 sources covering identity, property, courts, archives, news

Phase-2 expansion per Teknium feedback that the original 6-source skill
(federal financial/regulatory only) wasn't a complete OSINT toolkit. Adds
6 more sources covering the major omissions a real investigation would
reach for first.

New sources (6 fetch scripts + 6 wiki entries):

1. NYC ACRIS — Real property records (deeds, mortgages, liens) via the
   city's Socrata API. Search by party name or property address. Joins
   Parties to Master to populate doc_type, dates, borough, and amount.
   Coverage: 5 NYC boroughs, ~70M party records, 1966-present.

2. OpenCorporates — Global corporate registry covering 130+ jurisdictions
   (~200M companies). Free API token at
   https://opencorporates.com/api_accounts/new raises the rate limit;
   HTML fallback works without one (limited fields).

3. CourtListener (Free Law Project) — federal + state court opinions
   (~10M back to colonial era) + PACER dockets via RECAP. Anonymous v4
   search works; COURTLISTENER_TOKEN raises rate limits.

4. Wayback Machine CDX — historical web captures (~900B+). Used both for
   surveillance-of-record (when did this site change?) and as a
   content-recovery layer when other sources point to dead URLs.

5. Wikipedia + Wikidata — narrative bio + structured facts. Wikipedia
   OpenSearch for article matching, REST summary for extracts, Wikidata
   Action API (wbgetentities) for claims. Avoids the SPARQL Query
   Service which is aggressively rate-limited.

6. GDELT 2.0 DOC API — global news monitoring in 100+ languages,
   ~2015-present. Auto-retries with 6s backoff on the standard
   1-req-per-5-sec throttle.

Other changes in this commit:

- SEC EDGAR no longer raises SystemExit when the company-name resolver
  finds no CIK; writes an empty CSV with header so the rest of a
  pipeline can keep moving and the warning is just on stderr.

- _http.py User-Agent updated per Wikimedia policy: includes app name,
  version, and a 'set HERMES_OSINT_UA to identify yourself' instruction.

- SKILL.md workflow now groups sources into two clusters (federal
  financial vs identity/property/courts/archives/news) with bash
  examples for each. 'When to use this skill' lists the broader set of
  investigation patterns the expanded sources unlock.

Live sweep results on 'Dillon Rolnick, New York' across all 12 sources:

  ofac           ✓ 0 (correctly clean)
  icij           ✓ 0 (correctly not in any leak)
  usaspending    ✓ 0 (correctly not a federal contractor)
  senate_lda     ✓ 0 (correctly not a lobbying client)
  sec_edgar      ✓ 0, warns: resolved to 'Rolnick Michael' (CIK 0001845264),
                   individual Form 3 filer, NOT a corporate registrant
  fec            — rate-limited (DEMO_KEY exhausted), fails fast with
                   clear quota message
  nyc_acris      ✓ 200 records named Rolnick across NYC; 48 records at
                   571 Hudson (the property the web identifies as his)
  opencorporates ✓ 0 (no API token configured; HTML fallback)
  courtlistener  ✓ 0 for 'Dillon Rolnick'; 20 for 'Rolnick' generally;
                   5 for 'Microsoft' sanity check
  wayback        ✓ 30 captures of nousresearch.com from 2011-present
  wikipedia      ✓ 0 (correctly not notable enough); Bill Gates sanity
                   returns full structured facts (occupation, employer,
                   DOB, place of birth, country)
  gdelt          ✓ 0 for 'Dillon Rolnick'; 5 for 'Nous Research'

All 17 scripts compile clean and pass --help. Synthetic analysis pipeline
regression still passes (entity_resolution 30 matches, timing p=0.000,
findings 2).

* feat(osint-investigation): remove FEC; DEMO_KEY rate-limits make it unreliable

The FEC fetcher consistently failed the live sweep because the OpenFEC
DEMO_KEY tier (40 calls/hour) exhausts on a single investigation, and
the upstream returns slow-path query plans for unindexed contributor-name
searches that the gateway times out. Without a real API key it's not
usable; with one the user has to sign up at api.data.gov first. That's
too much setup friction for a skill that should work out of the box.

Removed:
  - scripts/fetch_fec.py
  - references/sources/fec.md

Updated:
  - SKILL.md frontmatter description + tags
  - 'When NOT to use' now points users at https://www.fec.gov/data/ for
    federal donations
  - entity_resolution example switched from donor↔contractor to
    lobbying-client↔contractor (Senate LDA + USAspending pair)
  - timing_analysis example switched to lobbying-filings vs awards
  - 8 wiki entries had their 'FEC ↔ ...' cross-reference bullets removed

11 sources remain (5 federal financial + 6 identity/property/courts/
archives/news). All scripts compile, pass --help, and the synthetic
analysis pipeline still passes on the new lobbying-shaped regression
fixture (30 matches, p=0.000 on tight clustering, 2 findings).
Fixes NousResearch#26693

`hermes doctor` currently promotes invalid direct API keys into the final
summary even when the matching OAuth path is already healthy. That makes
the setup look more broken than it really is.

This change keeps the failed API Connectivity row visible but stops
treating it as a blocking summary issue when a healthy OAuth fallback
already exists for the same provider family.

Covered cases:
- Gemini OAuth + invalid direct Gemini key
- MiniMax OAuth + invalid direct MiniMax key

Based on NousResearch#26704 by @worlldz.
…rs (NousResearch#10648)

Address two blocking issues when using GitHub Copilot integrations:

1. ACP mode: detect the gh-copilot CLI deprecation error from stderr
   and surface an actionable message with alternatives instead of
   hanging or showing a cryptic error.

2. GitHub Models (Azure) 413: recognize models.inference.ai.azure.com
   as a known GitHub Models URL, and print a targeted hint explaining
   the hard 8K token limit that makes this endpoint incompatible with
   Hermes' system prompt size.
…apping

Cover the deprecation pattern matching against real gh-copilot stderr
output, verify the GitHub Models Azure URL is in _URL_TO_PROVIDER, and
confirm _is_github_models_base_url recognises the Azure endpoint.
…ls 413 hint

Follow-up improvements on top of @konsisumer's cherry-picked fix for NousResearch#10648:

1. Deprecation patterns required BOTH a product fingerprint ('gh-copilot') and
   a deprecation marker. The previous list included 'copilot-cli' and bare
   'deprecation', which would false-positive on stderr from the NEW
   @github/copilot CLI — whose repo is literally github.com/github/copilot-cli
   and which legitimately surfaces those substrings in its own messages.

2. Replace the deprecation hint. The user in NousResearch#10648 installed
   'gh extension install github/gh-copilot' (the deprecated extension)
   thinking that's what ACP mode uses, when ACP actually spawns the new
   'copilot' binary from '@github/copilot'. The hint now points users at the
   correct install command ('npm install -g @github/copilot') with the new
   CLI's repo URL, and demotes provider-switching to a fallback alternative.

3. Change _URL_TO_PROVIDER value for models.inference.ai.azure.com from the
   'github-models' alias to the canonical 'copilot' provider id, matching the
   convention used by every other entry in the table.

4. Sharpen the 413 hint message. The free tier's ~8K cap is below the
   system-prompt floor, so this endpoint is fundamentally incompatible with
   an agentic loop — not a 'use a different URL' problem.

Tests:
- New parametrized false-positive coverage for the new CLI's stderr shape.
- Updated assertion to require canonical 'copilot' provider mapping.
- All 14 deprecation/URL tests pass.
…sResearch#4469) (NousResearch#26822)

When the agent is running and the user sends multiple TEXT messages in
rapid succession, base.py's active-session branch stored the pending
event as a single-slot replacement:

    self._pending_messages[session_key] = event

Three rapid messages A, B, C landed as: A (interrupts), B (replaces A
before consumer reads), C (replaces B). Only C reached the next turn —
A and B were silently dropped. This is the symptom in NousResearch#4469.

Route the follow-up through merge_pending_message_event(..., merge_text=True)
so TEXT events accumulate into the existing pending event's text instead
of clobbering it. Photo and media bursts already merged through the same
helper; this just extends the merge_text path (already used by the
Telegram bursty-grace branch in gateway/run.py) to all platforms.

Test exercises BasePlatformAdapter.handle_message directly with the
session marked active and asserts three rapid TEXT events merge to
'part two\\npart three' rather than dropping the middle message.
Sanity-checked the test would fail without the fix.

Credits @devorun for the original investigation and analysis in NousResearch#4491
that surfaced the underlying queue handling, though their fix targeted
GatewayRunner._pending_messages which is now dead state on main.
The PKCE flow reused the code_verifier as the OAuth state parameter.
Per RFC 6749 §10.12 and RFC 7636, these serve different purposes:
state is an anti-CSRF token visible in the authorization URL; the
code_verifier must remain secret for the token exchange.

Generate an independent secrets.token_urlsafe(32) for state and
validate it on callback to provide actual CSRF protection.

Closes NousResearch#10693
Group the secrets import with time and webbrowser at the top of
run_hermes_oauth_login_pure(), matching the existing pattern.
Drop the _secrets alias — no name conflict in this scope.
…tion

Two unit tests for run_hermes_oauth_login_pure():

1. test_authorization_url_state_is_not_pkce_verifier — asserts state in the
   auth URL is independent from the PKCE code_verifier sent in the token
   exchange, and that the verifier never appears in the URL.

2. test_callback_state_mismatch_aborts — asserts the flow returns None
   (no token exchange) when the callback state does not match the value
   we generated.

Negative control verified: reintroducing the b17e5c1 vulnerable pattern
(state = verifier, no callback validation) makes both tests fail.

Also adds AUTHOR_MAP entry for shaun0927 (contributor of the fix).
The Foundation Release — Hermes installs and runs anywhere now.

Highlights:
- Native Windows support (early beta) — PowerShell installer, native subprocess/PTY paths, ~40 follow-up Windows-only fixes
- pip install hermes-agent — PyPI wheel
- Cold-start wave — ~19s off hermes launch, 180x faster browser_console (CDP WS)
- Supply-chain advisory checker + lazy-deps + tiered install fallback
- OpenAI-compatible local proxy for OAuth providers (Claude Pro, ChatGPT Pro, SuperGrok)
- Cross-session 1h Claude prompt cache (Anthropic / OpenRouter / Nous Portal)
- 2 new platforms: LINE + SimpleX Chat (22 total)
- Microsoft Graph foundation — Teams pipeline + webhook adapter
- /handoff actually transfers sessions live
- x_search first-class tool, vision_analyze pixel passthrough
- LSP semantic diagnostics on every write
- Unified video_generate with pluggable backends
- computer_use cua-driver backend
- 9 new optional skills, OpenRouter Pareto Code router, xAI Grok OAuth
- 12 P0 + 50 P1 closures

808 commits · 633 PRs · 1393 files · 165k insertions · 545 issues closed · 215 contributors
…n/load (NousResearch#12285) (NousResearch#26943)

Persisted assistant `reasoning_content` / `reasoning` fields are now emitted
as ACP `agent_thought_chunk` notifications during `_replay_session_history`,
so editor clients (Zed, etc.) rebuild collapsed Thinking panes when the user
re-opens a session that used a thinking model.

Ordering matches live streaming: thought precedes message text within the
same assistant turn, mirroring how `reasoning_callback` deltas arrive before
`stream_delta_callback` deltas in `events.py::make_thinking_cb` /
`make_message_cb`.

Behavior on non-reasoning histories is unchanged; the replay loop's existing
text / tool_call / tool_call_update / plan emission is preserved bit-for-bit.

Closes NousResearch#12285.

Credit:
- @Yukipukii1 (NousResearch#14691) — original thought-replay design via
  `acp.update_agent_thought_text`; the tool-call portion of that PR has
  since landed via NousResearch#19139, but the reasoning replay is theirs.
- @HenkDz (NousResearch#17652 / NousResearch#18578) — established the `_replay_session_history` and
  `_history_*` helper conventions this builds on.
- @D1zzyDwarf (NousResearch#16531) — also closed by this work.
…ousResearch#12285 follow-up) (NousResearch#26957)

Switches `_replay_session_history` from `loop.call_soon`-deferred (after the
`LoadSessionResponse` is written) to `await`-inline (before the response is
constructed) for both `session/load` and `session/resume`. Adds defensive
try/except around the awaited call so a replay helper crash still yields a
successful load response — partial transcripts are acceptable, total
load failure is not.

The deferral was added on May 2 in commit 19854c7 with the rationale "Zed
only attaches streamed transcript/tool updates once the load/resume response
has completed." That justification was incorrect:

- Zed's current ACP integration (zed-industries/zed
  crates/agent_servers/src/acp.rs) explicitly registers the session-update
  routing entry BEFORE awaiting the loadSession RPC, with the comment:
  "so that any session/update notifications that arrive during the call
  (e.g. history replay during session/load) can find the thread."
- Every other reference ACP server (Codex, Claude Code, OpenCode, Pi, agentao)
  replays history BEFORE responding to the load request.
- The ACP spec wording ("Stream the entire conversation history back to the
  client via notifications") and the natural JSON-RPC reading both mean
  "during the request's lifetime", not "after the response resolves".

Empirical reproduction (reported by Biraj on @agentclientprotocol/sdk
v0.21.1): the same custom ACP client works correctly against Codex /
Claude Code / OpenCode / Pi but receives 0 notifications from Hermes
because it measures the per-call notification count at the moment
`loadSession` resolves — which on Hermes was before the `call_soon`-
scheduled replay coroutine had a chance to run.

Changes:
- `acp_adapter/server.py`: remove `_schedule_history_replay`; both
  `load_session` and `resume_session` now `await self._replay_session_history`
  before returning, wrapped in try/except that logs and continues on
  helper exceptions.
- `tests/acp/test_server.py`: replace the single
  `test_load_session_schedules_history_replay_after_response`
  (which encoded the now-incorrect post-response ordering) with two tests
  asserting `events == ["replay", "returned"]` for load and resume.
  Add two regression tests confirming that a replay helper raising still
  yields a `LoadSessionResponse` / `ResumeSessionResponse` rather than
  propagating the exception out as a JSON-RPC error.

Result: 240 ACP tests pass (was 238), ruff clean. Verified end-to-end:
biraj's synchronous notification-counter pattern now sees 6 notifications
during `loadSession` for a 5-message session, matching all other reference
ACP servers.

The `_fenced_text` change in `acp_adapter/tools.py` from the same May 2
commit is orthogonal and intentionally left intact — it's a separate,
still-valid fix for Zed's pipe-as-table rendering.

Refs NousResearch#12285. Follows up NousResearch#26943 (which added thought-chunk replay but kept
the deferral).
…sResearch#27035)

* chore: release v0.14.0 (2026.5.16)

The Foundation Release — Hermes installs and runs anywhere now.

Highlights:
- Native Windows support (early beta) — PowerShell installer, native subprocess/PTY paths, ~40 follow-up Windows-only fixes
- pip install hermes-agent — PyPI wheel
- Cold-start wave — ~19s off hermes launch, 180x faster browser_console (CDP WS)
- Supply-chain advisory checker + lazy-deps + tiered install fallback
- OpenAI-compatible local proxy for OAuth providers (Claude Pro, ChatGPT Pro, SuperGrok)
- Cross-session 1h Claude prompt cache (Anthropic / OpenRouter / Nous Portal)
- 2 new platforms: LINE + SimpleX Chat (22 total)
- Microsoft Graph foundation — Teams pipeline + webhook adapter
- /handoff actually transfers sessions live
- x_search first-class tool, vision_analyze pixel passthrough
- LSP semantic diagnostics on every write
- Unified video_generate with pluggable backends
- computer_use cua-driver backend
- 9 new optional skills, OpenRouter Pareto Code router, xAI Grok OAuth
- 12 P0 + 50 P1 closures

808 commits · 633 PRs · 1393 files · 165k insertions · 545 issues closed · 215 contributors

* docs(release): rewrite v0.14.0 highlights for excitement framing

Demote Windows beta from headline; lead with SuperGrok / OAuth proxy /
x_search / Microsoft Teams. Frame lazy-deps as a debloating wave that
makes installs dramatically lighter. Add highlights for clickable URLs
in any terminal, dangerous-command detection bypasses, ChatGPT Pro
and SuperGrok via the local proxy. Tighten the summary paragraph.
…ol docs (NousResearch#27050)

The video_gen toolset and its video_generate tool shipped without
user-facing reference docs. toolsets-reference.md and the dev-guide
plugin page were already in, but reference/tools-reference.md had no
video_gen section at all and user-guide/features/tools.md's Media row
didn't list video_generate.

- reference/tools-reference.md: add a video_gen section after video,
  including backend list (xAI Grok-Imagine, FAL.ai Veo/Pixverse/Kling),
  unified text-to-video / image-to-video surface note, link to the
  dev-guide plugin page, and the video_generate tool row. Add
  video_generate to the standalone-tools quick-counts line.
- user-guide/features/tools.md: extend Media row with video_generate
  and video_analyze plus an opt-in caveat.
…fo (NousResearch#27051)

Port from nanocoai/nanoclaw#1962: modern Signal V2-only groups surface on
dataMessage.groupV2.id, not groupInfo.groupId. signal-cli versions differ
in which field they expose for V2 groups — some forward the underlying
libsignal envelope verbatim (groupV2), others normalize everything into
groupInfo. Without a groupV2 read, V2-only groups appear as DMs because
groupInfo is undefined and the adapter misroutes them to the sender's
DM session.

Reads groupV2.id first, falls back to groupInfo.groupId. Also hardens
chat_name extraction against non-dict groupInfo payloads (crashed with
AttributeError under malformed envelopes).

6 new tests cover V2 routing, V1 legacy compatibility, V2-preferred
precedence, no-group DM path, allowlist enforcement, and malformed
payloads.
…xt (NousResearch#27053)

Each highlight now gets 2-3 sentences explaining the user-facing value,
not just the technical change. Targeted at someone discovering Hermes
for the first time who isn't deep in the codebase.
…ch#27055)

The `@askjo/camofox-browser` npm package was a top-level entry in
the root `package.json` `dependencies` block, so `hermes update`
ran its postinstall on every user, every update. That postinstall
calls `npx camoufox-js fetch`, which silently downloads a ~300MB
Firefox-fork browser binary from GitHub Releases — multi-minute on
fast connections, and a hard block for users on slow / restricted
networks (notably users in China running through a VPN).

Camofox is an explicit opt-in browser backend. The runtime check
in `tools/browser_tool.py` only routes through Camofox when the
user has set `CAMOFOX_URL` (selected via `hermes tools` →
Browser Automation → Camofox). Users who never opted in never
touched the package at runtime, yet every `hermes update` paid
for the binary fetch anyway.

This change:

* Removes `@askjo/camofox-browser` from root `package.json`
  dependencies (and the regenerated `package-lock.json` drops
  Camofox's entire transitive tree, ~2.6k lines).
* Updates the Camofox `post_setup` handler in
  `hermes_cli/tools_config.py` to install
  `@askjo/camofox-browser@^1.5.2` explicitly when the user
  selects Camofox, and streams npm output (no `--silent`, no
  `capture_output`) so the ~300MB download is visible rather
  than appearing frozen.
* Adds `tests/test_package_json_lazy_deps.py` as a regression
  guard so future PRs can't silently re-add Camofox (or any
  binary-postinstall package) to eager root dependencies.

`agent-browser` stays eager — it is the default Chromium-driving
backend used by every session that does not have a cloud browser
provider configured, and its postinstall is small.

Validation:

| | Before | After |
|---|---|---|
| `hermes update` time on slow network | multi-minute hang at `→ Updating Node.js dependencies...` | seconds (no binary fetch) |
| Camofox opt-in install visibility | silent, looked frozen | streamed npm output |
| Regression guard against re-adding | none | `test_package_json_lazy_deps.py` |

Tests:
- `tests/test_package_json_lazy_deps.py`: 3/3 pass
- `tests/tools/test_browser_camofox*`: 92/92 pass
- `tests/hermes_cli/test_tools_config.py`: 66/66 pass
- `tests/hermes_cli/test_cmd_update.py` + adjacent: green

Reported by lulu (Discord, May 2026) — `hermes update` hangs at
`→ Updating Node.js dependencies...` in China.
Related: NousResearch#18840, NousResearch#18869.
…ible (NousResearch#18840)

`hermes update` ran the repo-root and ui-tui npm installs with both
`--silent` and `subprocess.run(..., capture_output=True)`, which hides
all output from optional postinstall scripts.  The largest of those —
`@askjo/camofox-browser`'s `npx camoufox-js fetch` — downloads a
Firefox-fork browser binary that can take many minutes on slow
connections.  Because nothing was printed during that wait, the updater
appeared to hang at "Updating Node.js dependencies..." and users
Ctrl-C'd, sometimes leaving `node_modules` partially installed.

Drop `--silent` and pass `capture_output=False` for the repo-root and
ui-tui paths so npm streams its `info run …` postinstall lines straight
to the terminal.  Output is still mirrored to `~/.hermes/logs/update.log`
by the existing `_UpdateOutputStream` wrapper, so SSH-disconnect safety
is preserved.

The `web/` install path is untouched — its build step is fast and does
not run binary-fetching postinstalls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause: complete_task() in hermes_cli/kanban_db.py writes
`status='done', result=?` with `result` defaulting to None and no
evidence check. A caller doing `complete_task(conn, id)` with no
result, no summary, and no comment produces a "phantom-done" task —
status=done with zero audit trail of what was delivered. 59 live tasks
are currently in this state.

Guard: a new CompletionEvidenceError + _has_completion_evidence()
helper. Before the main write txn (mirroring the HallucinatedCardsError
created_cards gate), complete_task verifies the task has ANY of: a
non-empty result, a non-empty summary, or >=1 non-empty comment. On
rejection it emits an auditable `completion_blocked_no_evidence` event
in a tiny txn, then raises — the task never mutates to 'done'. The
guard only fires for tasks in a completable state (running/ready/
blocked); terminal/unknown tasks keep the existing "return False"
contract.

Callers surfaced: the `hermes kanban complete` CLI prints an actionable
message and exits non-zero; the dashboard PATCH-to-done endpoint
returns 409 (mirroring its other transition-rejection codes).

Behaviour change — flag for Mac: the `daily` lane's routine no-result
completions WILL start being rejected by this guard. This is intended
— those completions are exactly the phantom-done pathology. Routines
must now pass --result / --summary or leave a comment.

Backfill is a separate Mac-gated step: scripts/backfill_phantom_done.py
triages the 59 existing phantom-done tasks (greps hermes-plan for a
deliverable per task id / title match) and, only under --apply
(default OFF, dry-run), writes the found deliverable into tasks.result.
Do not run --apply without Mac sign-off.

Feeds ADR-013.

Constraint: must not break janitor explicit_complete_comment auto-close
  (a comment is evidence — passes the guard naturally)
Rejected: unconditional guard ignoring task status | would change the
  long-standing "refuse terminal task -> return False" contract
Confidence: high
Scope-risk: broad
Not-tested: live backfill --apply path (Mac-gated, dry-run only here)
Machine: orion-terminal

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 16, 2026

Copy link
Copy Markdown

Important

Review skipped

Too many files!

This PR contains 297 files, which is 147 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 574081c2-0ad6-4fa4-b7ea-9caa1e45ef80

📥 Commits

Reviewing files that changed from the base of the PR and between e7beaaf and 3216fd2.

⛔ Files ignored due to path filters (3)
  • acp_registry/icon.svg is excluded by !**/*.svg
  • docs/hermes-kanban-v1-spec.pdf is excluded by !**/*.pdf
  • gateway/assets/telegram-botfather-threads-settings.jpg is excluded by !**/*.jpg
📒 Files selected for processing (297)
  • .dockerignore
  • .env.example
  • .github/actions/hermes-smoke-test/action.yml
  • .github/actions/nix-setup/action.yml
  • .github/dependabot.yml
  • .github/workflows/deploy-site.yml
  • .github/workflows/docker-publish.yml
  • .github/workflows/history-check.yml
  • .github/workflows/lint.yml
  • .github/workflows/nix-lockfile-check.yml
  • .github/workflows/nix-lockfile-fix.yml
  • .github/workflows/nix.yml
  • .github/workflows/osv-scanner.yml
  • .github/workflows/supply-chain-audit.yml
  • .github/workflows/tests.yml
  • .github/workflows/upload_to_pypi.yml
  • .github/workflows/uv-lockfile-check.yml
  • .gitignore
  • .gitmodules
  • AGENTS.md
  • CONTRIBUTING.md
  • Dockerfile
  • README.md
  • README.zh-CN.md
  • RELEASE_v0.12.0.md
  • RELEASE_v0.13.0.md
  • RELEASE_v0.14.0.md
  • SECURITY.md
  • acp_adapter/auth.py
  • acp_adapter/bootstrap/__init__.py
  • acp_adapter/bootstrap/bootstrap_browser_tools.ps1
  • acp_adapter/bootstrap/bootstrap_browser_tools.sh
  • acp_adapter/entry.py
  • acp_adapter/events.py
  • acp_adapter/permissions.py
  • acp_adapter/server.py
  • acp_adapter/session.py
  • acp_adapter/tools.py
  • acp_registry/agent.json
  • agent/account_usage.py
  • agent/anthropic_adapter.py
  • agent/async_utils.py
  • agent/auxiliary_client.py
  • agent/bedrock_adapter.py
  • agent/codex_responses_adapter.py
  • agent/context_compressor.py
  • agent/context_engine.py
  • agent/copilot_acp_client.py
  • agent/credential_pool.py
  • agent/credential_sources.py
  • agent/curator.py
  • agent/curator_backup.py
  • agent/display.py
  • agent/error_classifier.py
  • agent/gemini_cloudcode_adapter.py
  • agent/gemini_native_adapter.py
  • agent/google_oauth.py
  • agent/i18n.py
  • agent/image_gen_registry.py
  • agent/image_routing.py
  • agent/lsp/__init__.py
  • agent/lsp/cli.py
  • agent/lsp/client.py
  • agent/lsp/eventlog.py
  • agent/lsp/install.py
  • agent/lsp/manager.py
  • agent/lsp/protocol.py
  • agent/lsp/range_shift.py
  • agent/lsp/reporter.py
  • agent/lsp/servers.py
  • agent/lsp/workspace.py
  • agent/manual_compression_feedback.py
  • agent/markdown_tables.py
  • agent/memory_manager.py
  • agent/memory_provider.py
  • agent/model_metadata.py
  • agent/models_dev.py
  • agent/moonshot_schema.py
  • agent/nous_rate_guard.py
  • agent/plugin_llm.py
  • agent/portal_tags.py
  • agent/prompt_builder.py
  • agent/prompt_caching.py
  • agent/redact.py
  • agent/shell_hooks.py
  • agent/skill_commands.py
  • agent/skill_utils.py
  • agent/think_scrubber.py
  • agent/title_generator.py
  • agent/tool_guardrails.py
  • agent/tool_result_classification.py
  • agent/transports/__init__.py
  • agent/transports/chat_completions.py
  • agent/transports/codex.py
  • agent/transports/codex_app_server.py
  • agent/transports/codex_app_server_session.py
  • agent/transports/codex_event_projector.py
  • agent/transports/hermes_tools_mcp_server.py
  • agent/transports/types.py
  • agent/usage_pricing.py
  • agent/video_gen_provider.py
  • agent/video_gen_registry.py
  • agent/web_search_provider.py
  • agent/web_search_registry.py
  • batch_runner.py
  • cli-config.yaml.example
  • cli.py
  • cron/jobs.py
  • cron/scheduler.py
  • docker-compose.yml
  • docker/entrypoint.sh
  • docs/plans/2026-05-02-telegram-dm-user-managed-multisession-topics.md
  • environments/README.md
  • environments/__init__.py
  • environments/agent_loop.py
  • environments/agentic_opd_env.py
  • environments/benchmarks/tblite/README.md
  • environments/benchmarks/tblite/default.yaml
  • environments/benchmarks/tblite/local.yaml
  • environments/benchmarks/tblite/local_vllm.yaml
  • environments/benchmarks/tblite/run_eval.sh
  • environments/benchmarks/tblite/tblite_env.py
  • environments/benchmarks/terminalbench_2/default.yaml
  • environments/benchmarks/terminalbench_2/run_eval.sh
  • environments/benchmarks/terminalbench_2/terminalbench2_env.py
  • environments/benchmarks/yc_bench/README.md
  • environments/benchmarks/yc_bench/__init__.py
  • environments/benchmarks/yc_bench/default.yaml
  • environments/benchmarks/yc_bench/run_eval.sh
  • environments/benchmarks/yc_bench/yc_bench_env.py
  • environments/hermes_base_env.py
  • environments/hermes_swe_env/__init__.py
  • environments/hermes_swe_env/default.yaml
  • environments/hermes_swe_env/hermes_swe_env.py
  • environments/patches.py
  • environments/terminal_test_env/__init__.py
  • environments/terminal_test_env/default.yaml
  • environments/terminal_test_env/terminal_test_env.py
  • environments/tool_call_parsers/__init__.py
  • environments/tool_call_parsers/deepseek_v3_1_parser.py
  • environments/tool_call_parsers/deepseek_v3_parser.py
  • environments/tool_call_parsers/glm45_parser.py
  • environments/tool_call_parsers/glm47_parser.py
  • environments/tool_call_parsers/hermes_parser.py
  • environments/tool_call_parsers/kimi_k2_parser.py
  • environments/tool_call_parsers/llama_parser.py
  • environments/tool_call_parsers/longcat_parser.py
  • environments/tool_call_parsers/mistral_parser.py
  • environments/tool_call_parsers/qwen3_coder_parser.py
  • environments/tool_call_parsers/qwen_parser.py
  • environments/tool_context.py
  • environments/web_research_env.py
  • gateway/__init__.py
  • gateway/config.py
  • gateway/delivery.py
  • gateway/display_config.py
  • gateway/hooks.py
  • gateway/pairing.py
  • gateway/platform_registry.py
  • gateway/platforms/ADDING_A_PLATFORM.md
  • gateway/platforms/__init__.py
  • gateway/platforms/_http_client_limits.py
  • gateway/platforms/api_server.py
  • gateway/platforms/base.py
  • gateway/platforms/bluebubbles.py
  • gateway/platforms/dingtalk.py
  • gateway/platforms/discord.py
  • gateway/platforms/email.py
  • gateway/platforms/feishu.py
  • gateway/platforms/feishu_comment.py
  • gateway/platforms/feishu_comment_rules.py
  • gateway/platforms/helpers.py
  • gateway/platforms/homeassistant.py
  • gateway/platforms/matrix.py
  • gateway/platforms/mattermost.py
  • gateway/platforms/msgraph_webhook.py
  • gateway/platforms/qqbot/__init__.py
  • gateway/platforms/qqbot/adapter.py
  • gateway/platforms/qqbot/chunked_upload.py
  • gateway/platforms/qqbot/keyboards.py
  • gateway/platforms/signal.py
  • gateway/platforms/signal_rate_limit.py
  • gateway/platforms/slack.py
  • gateway/platforms/sms.py
  • gateway/platforms/telegram.py
  • gateway/platforms/telegram_network.py
  • gateway/platforms/webhook.py
  • gateway/platforms/wecom.py
  • gateway/platforms/wecom_callback.py
  • gateway/platforms/weixin.py
  • gateway/platforms/whatsapp.py
  • gateway/platforms/yuanbao.py
  • gateway/platforms/yuanbao_media.py
  • gateway/platforms/yuanbao_proto.py
  • gateway/run.py
  • gateway/session.py
  • gateway/session_context.py
  • gateway/shutdown_forensics.py
  • gateway/slash_access.py
  • gateway/status.py
  • gateway/stream_consumer.py
  • hermes_bootstrap.py
  • hermes_cli/__init__.py
  • hermes_cli/_parser.py
  • hermes_cli/_subprocess_compat.py
  • hermes_cli/auth.py
  • hermes_cli/auth_commands.py
  • hermes_cli/backup.py
  • hermes_cli/banner.py
  • hermes_cli/checkpoints.py
  • hermes_cli/claw.py
  • hermes_cli/clipboard.py
  • hermes_cli/codex_models.py
  • hermes_cli/codex_runtime_plugin_migration.py
  • hermes_cli/codex_runtime_switch.py
  • hermes_cli/commands.py
  • hermes_cli/completion.py
  • hermes_cli/config.py
  • hermes_cli/copilot_auth.py
  • hermes_cli/cron.py
  • hermes_cli/curator.py
  • hermes_cli/curses_ui.py
  • hermes_cli/debug.py
  • hermes_cli/dep_ensure.py
  • hermes_cli/dingtalk_auth.py
  • hermes_cli/doctor.py
  • hermes_cli/dump.py
  • hermes_cli/env_loader.py
  • hermes_cli/fallback_cmd.py
  • hermes_cli/gateway.py
  • hermes_cli/gateway_windows.py
  • hermes_cli/goals.py
  • hermes_cli/hooks.py
  • hermes_cli/inventory.py
  • hermes_cli/kanban.py
  • hermes_cli/kanban_db.py
  • hermes_cli/kanban_diagnostics.py
  • hermes_cli/kanban_specify.py
  • hermes_cli/main.py
  • hermes_cli/mcp_config.py
  • hermes_cli/memory_setup.py
  • hermes_cli/model_catalog.py
  • hermes_cli/model_normalize.py
  • hermes_cli/model_switch.py
  • hermes_cli/models.py
  • hermes_cli/nous_subscription.py
  • hermes_cli/oneshot.py
  • hermes_cli/pairing.py
  • hermes_cli/plugins.py
  • hermes_cli/plugins_cmd.py
  • hermes_cli/profile_distribution.py
  • hermes_cli/profiles.py
  • hermes_cli/providers.py
  • hermes_cli/proxy/__init__.py
  • hermes_cli/proxy/adapters/__init__.py
  • hermes_cli/proxy/adapters/base.py
  • hermes_cli/proxy/adapters/nous_portal.py
  • hermes_cli/proxy/cli.py
  • hermes_cli/proxy/server.py
  • hermes_cli/pt_input_extras.py
  • hermes_cli/pty_bridge.py
  • hermes_cli/relaunch.py
  • hermes_cli/runtime_provider.py
  • hermes_cli/security_advisories.py
  • hermes_cli/setup.py
  • hermes_cli/skills_hub.py
  • hermes_cli/skin_engine.py
  • hermes_cli/slack_cli.py
  • hermes_cli/status.py
  • hermes_cli/stdio.py
  • hermes_cli/tips.py
  • hermes_cli/tools_config.py
  • hermes_cli/uninstall.py
  • hermes_cli/voice.py
  • hermes_cli/web_server.py
  • hermes_cli/webhook.py
  • hermes_constants.py
  • hermes_state.py
  • hermes_time.py
  • locales/af.yaml
  • locales/de.yaml
  • locales/en.yaml
  • locales/es.yaml
  • locales/fr.yaml
  • locales/ga.yaml
  • locales/hu.yaml
  • locales/it.yaml
  • locales/ja.yaml
  • locales/ko.yaml
  • locales/pt.yaml
  • locales/ru.yaml
  • locales/tr.yaml
  • locales/uk.yaml
  • locales/zh-hant.yaml
  • locales/zh.yaml
  • mcp_serve.py
  • model_tools.py

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch orion-cc/completion-evidence-invariant

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py
skills/productivity/google-workspace/scripts/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

@github-actions

github-actions Bot commented May 16, 2026

Copy link
Copy Markdown

🔎 Lint report: orion-cc/completion-evidence-invariant vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8319 on HEAD, 0 on base (🆕 +8319)

🆕 New issues (4349):

Rule Count
unresolved-import 1337
invalid-argument-type 1026
unresolved-attribute 943
invalid-assignment 452
unsupported-operator 140
invalid-parameter-default 117
invalid-method-override 86
not-subscriptable 86
invalid-return-type 38
no-matching-overload 31
call-non-callable 28
unresolved-reference 19
unused-type-ignore-comment 16
invalid-type-form 14
not-iterable 6
+6 more rules
First entries
tests/tools/test_kanban_tools.py:580: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["properties"]` on object of type `str`
tests/gateway/test_pending_drain_race.py:54: [invalid-assignment] invalid-assignment: Object of type `AsyncMock` is not assignable to attribute `_send_with_retry` of type `def _send_with_retry(self, chat_id: str, content: str, reply_to: str | None = None, metadata: Any = None, max_retries: int = 2, base_delay: int | float = ...) -> CoroutineType[Any, Any, SendResult]`
tests/tools/test_delegate.py:1936: [call-non-callable] call-non-callable: Object of type `None` is not callable
tests/hermes_cli/test_kanban_core_functionality.py:4136: [unresolved-attribute] unresolved-attribute: Attribute `last_failure_error` is not defined on `None` in union `Task | None`
tests/hermes_cli/test_config.py:711: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["channel_prompts"]` on object of type `str`
plugins/memory/holographic/__init__.py:390: [unresolved-attribute] unresolved-attribute: Attribute `add_fact` is not defined on `None` in union `None | MemoryStore`
tests/gateway/test_discord_channel_prompts.py:16: [unresolved-attribute] unresolved-attribute: Unresolved attribute `Intents` on type `ModuleType`
tests/tools/test_browser_cleanup.py:43: [invalid-assignment] invalid-assignment: Object of type `bool` is not assignable to attribute `_cleanup_done` of type `Literal[False]`
cli.py:10508: [unsupported-operator] unsupported-operator: Operator `+` is not supported between objects of type `str | list[str] | Unknown` and `Literal["..."]`
run_agent.py:13525: [invalid-argument-type] invalid-argument-type: Argument to function `estimate_usage_cost` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
tests/tools/test_website_policy.py:94: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["shared_files"]` on object of type `str`
tests/acp/test_server.py:1535: [unresolved-import] unresolved-import: Cannot resolve imported module `acp.schema`
tests/gateway/test_telegram_thread_fallback.py:897: [invalid-method-override] invalid-method-override: Invalid override of method `send`: Definition is incompatible with `BasePlatformAdapter.send`
hermes_cli/proxy/server.py:160: [unresolved-attribute] unresolved-attribute: Attribute `ClientTimeout` is not defined on `None` in union `Unknown | None`
tests/cli/test_reasoning_command.py:613: [invalid-argument-type] invalid-argument-type: Argument to function `AIAgent._extract_reasoning` is incorrect: Expected `AIAgent`, found `None`
tests/gateway/test_matrix.py:120: [unresolved-attribute] unresolved-attribute: Unresolved attribute `client` on type `ModuleType`
mini_swe_runner.py:228: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
cli.py:6696: [not-iterable] not-iterable: Object of type `_T@enumerate` is not iterable
agent/auxiliary_client.py:3345: [invalid-type-form] invalid-type-form: Variable of type `_OpenAIProxy` is not allowed in a return type annotation
tests/cron/test_cron_script.py:79: [not-subscriptable] not-subscriptable: Cannot subscript object of type `None` with no `__getitem__` method
tests/hermes_cli/test_security_advisories.py:15: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/providers/test_provider_profiles.py:47: [unresolved-attribute] unresolved-attribute: Attribute `default_headers` is not defined on `None` in union `ProviderProfile | None`
tests/tools/test_browser_console.py:154: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> str, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[str]]` cannot be called with key of type `Literal["annotate"]` on object of type `list[str]`
tui_gateway/server.py:1894: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `str`, found `Any | None`
run_agent.py:13523: [invalid-argument-type] invalid-argument-type: Argument to function `estimate_usage_cost` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
... and 4324 more

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…ine-readable key

The bulk-complete loop fell into the bare `except Exception` for evidence
rejections, returning a generic ok=False with HTTP 200 — indistinguishable
from a not-found or transition-refused failure. Added an explicit
`except kanban_db.CompletionEvidenceError` before the bare handler that
sets `blocked_reason="no_completion_evidence"` on the per-entry result,
matching the 409 behaviour of the single-task path (lines 615-619) and
allowing callers to distinguish evidence rejections programmatically.

Three new tests cover:
- bulk-done with no evidence → ok=False + blocked_reason key present
- bulk-done with evidence → ok=True, task transitions to done
- two no-evidence tasks in one call → both rejected, batch completes (HTTP 200)

Constraint: must not touch the single-task path (already correct)
Confidence: high
Machine: orion-terminal

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py
skills/productivity/google-workspace/scripts/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

@Brecht-H Brecht-H marked this pull request as ready for review May 16, 2026 22:48
@Brecht-H Brecht-H merged commit 76f020e into main May 16, 2026
17 of 21 checks passed
Brecht-H pushed a commit that referenced this pull request May 16, 2026
…leted_at

The kanban janitor is a SECOND phantom-done path. Its
explicit_completion_evidence() completed_at_present branch returned a
CLOSE decision purely because completed_at was set — with NO requirement
of a real result/summary/comment. build_scan() collected it and
apply_closes() raw-UPDATEd status='done', stamping a tautological
placeholder ("task has completed_at but non-terminal status") as the
task's only "evidence". A brief worker claim that stamps completed_at on
a fresh ready/triage/in_progress task got rubber-stamped to done.

Live proof: task t_5dbfc384 carries a janitor_completed event closing it
~13 min after creation on completed_at_present with result_len=0; it had
to be manually reopened and is now correctly in_review with a real
result.

This is the same pathology hermes-agent PR #1 fixed for complete_task()
(CompletionEvidenceError) — but the janitor bypasses complete_task
entirely with a raw UPDATE.

Changes (scripts/kanban_janitor.py):
- Add has_completion_evidence(task, comments): True iff non-empty result
  OR any non-empty comment body. A bare completed_at is NOT evidence.
  Mirrors kanban_db._has_completion_evidence (janitor has its own
  dataclasses so it needs a local copy of the predicate).
- explicit_completion_evidence(): the completed_at_present close decision
  is now gated on has_completion_evidence; evidence-free completed_at
  tasks fall through (return None -> not closed).
- build_scan(): new phantom_completed_at anomaly category (capped 50)
  for non-terminal, non-running tasks with completed_at but no evidence.
- render_markdown(): new "Phantom completed_at" triage section so
  operators see the anomaly instead of it being silently rubber-stamped.
- apply_closes() unchanged — once evidence-free closes stop being
  emitted, it naturally never closes them.

The janitor was previously an unversioned loose script at
~/.hermes/scripts/kanban_janitor.py; this commit brings it under version
control. Operator deploy step (Mac-gated): after merge, sync
~/.hermes/scripts/kanban_janitor.py from the repo copy.

Confidence: high
Scope-risk: moderate
Machine: orion-terminal
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Brecht-H added a commit that referenced this pull request May 16, 2026
…leted_at (#2)

The kanban janitor is a SECOND phantom-done path. Its
explicit_completion_evidence() completed_at_present branch returned a
CLOSE decision purely because completed_at was set — with NO requirement
of a real result/summary/comment. build_scan() collected it and
apply_closes() raw-UPDATEd status='done', stamping a tautological
placeholder ("task has completed_at but non-terminal status") as the
task's only "evidence". A brief worker claim that stamps completed_at on
a fresh ready/triage/in_progress task got rubber-stamped to done.

Live proof: task t_5dbfc384 carries a janitor_completed event closing it
~13 min after creation on completed_at_present with result_len=0; it had
to be manually reopened and is now correctly in_review with a real
result.

This is the same pathology hermes-agent PR #1 fixed for complete_task()
(CompletionEvidenceError) — but the janitor bypasses complete_task
entirely with a raw UPDATE.

Changes (scripts/kanban_janitor.py):
- Add has_completion_evidence(task, comments): True iff non-empty result
  OR any non-empty comment body. A bare completed_at is NOT evidence.
  Mirrors kanban_db._has_completion_evidence (janitor has its own
  dataclasses so it needs a local copy of the predicate).
- explicit_completion_evidence(): the completed_at_present close decision
  is now gated on has_completion_evidence; evidence-free completed_at
  tasks fall through (return None -> not closed).
- build_scan(): new phantom_completed_at anomaly category (capped 50)
  for non-terminal, non-running tasks with completed_at but no evidence.
- render_markdown(): new "Phantom completed_at" triage section so
  operators see the anomaly instead of it being silently rubber-stamped.
- apply_closes() unchanged — once evidence-free closes stop being
  emitted, it naturally never closes them.

The janitor was previously an unversioned loose script at
~/.hermes/scripts/kanban_janitor.py; this commit brings it under version
control. Operator deploy step (Mac-gated): after merge, sync
~/.hermes/scripts/kanban_janitor.py from the repo copy.

Confidence: high
Scope-risk: moderate
Machine: orion-terminal

Co-authored-by: Hermes (daily) <hermes@orion.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.