Skip to content

test: scrub custom base url in shared env fixture#11

Closed
DIZ-admin wants to merge 189 commits into
mainfrom
test/runtime-env-hermeticity-clean
Closed

test: scrub custom base url in shared env fixture#11
DIZ-admin wants to merge 189 commits into
mainfrom
test/runtime-env-hermeticity-clean

Conversation

@DIZ-admin

Copy link
Copy Markdown
Owner

Summary\n- add to the shared credential/base-url scrub list in \n- prevent live developer shell base-url overrides from leaking into provider/runtime tests\n\n## Validation\n- bringing up nodes...

bringing up nodes...

........................................................................ [ 46%]
........................................................................ [ 92%]
........... [100%]
155 passed in 10.19s\n- result: \n\n## Risk\n- test-only change\n- no production runtime or provider logic changed\n

teknium1 and others added 30 commits May 16, 2026 16:00
…ousResearch#27110)

xAI announced on 2026-05-16 (https://x.ai/news/grok-hermes) that X Premium
subscriptions now work in Hermes Agent. The hint we shipped in PR NousResearch#26644
asserted the opposite ("X Premium+ does NOT include xAI API access — only
standalone SuperGrok subscribers can use this provider"), which would now
misdirect Premium+ users who hit any other 403 (no Grok sub at all, wrong
tier, exhausted quota) into thinking they need to switch subscriptions
when their sub is in fact valid.

Remove _decorate_xai_entitlement_error and its two call sites in
_summarize_api_error. xAI's own body text already says "Manage subscriptions
at https://grok.com/?_s=usage" — surface that verbatim and let xAI's wording
do the diagnosis.

The _is_entitlement_failure guard (which prevents credential-pool refresh
loops on entitlement 403s) and the reasoning-replay gating for xai-oauth
are unrelated and untouched.

Update tests to assert the body still surfaces verbatim and that no
Hermes-side editorializing is appended.
…e running (NousResearch#27175)

Surface live background-task count in the prompt_toolkit status bar so users
can see at a glance that a /background task exists and is running — no need
to ask the agent about it (the agent has no visibility into bg sessions by
design).

- _get_status_bar_snapshot now reports active_background_tasks from len()
  of the live _background_tasks dict (entries are removed in the task
  thread's finally block, so this reflects truly-running tasks)
- Indicator shown only on medium (<76) and wide (>=76) tiers; narrow (<52)
  stays minimal since it's already cramped
- No invalidate plumbing needed: status bar fragments are pulled via lambda
  on every redraw, and the bg thread already calls _app.invalidate() on exit

Refs NousResearch#8568
)

Adds a pure-local recap of recent session activity — turn counts,
tools used, files touched, last user ask, last assistant reply —
appended to the existing /status output. Useful when juggling multiple
sessions and you want a one-glance reminder of where this one left off.

Inspired by Claude Code 2.1.114's /recap, but folded into /status so
we don't add a 6th info command. Pure local computation: no LLM call,
no auxiliary model, no prompt-cache invalidation, instant and free.

Salvage of NousResearch#18587 — kept the shared hermes_cli.session_recap.build_recap
helper and its 13 unit tests, dropped the /recap slash command +
ACTIVE_SESSION_BYPASS_COMMANDS entry + Level-2 bypass since /status
already covers both surfaces.

Tailored to hermes-agent's tool vocabulary: file-editing tools
(patch, write_file, read_file, skill_manage, skill_view) surface
touched paths; tool-call counts highlight which classes of work
drove the session.

Source: https://code.claude.com/docs/en/whats-new/2026-w17
…NousResearch#27184)

xAI's Responses stream emits 'type=error' as the FIRST SSE frame when an
OAuth account is unsubscribed/exhausted or rejects the encrypted-reasoning
replay introduced in the May 2026 SuperGrok rollout. The SDK helper
raises RuntimeError(Expected to have received response.created before
error), which the caller correctly routes to
_run_codex_create_stream_fallback. The fallback then opens a new stream
that emits the same 'error' frame — but the fallback loop only handled
{response.completed, response.incomplete, response.failed} and silently
continue'd past 'error' events. Result: the loop fell off the end of
the stream and raised the useless 'fallback did not emit a terminal
response' RuntimeError, which the classifier marked retryable=True and
looped 3x before failing with no clue what went wrong.

Now: 'error' frames raise a synthesized _StreamErrorEvent with an OpenAI
SDK-shaped .body so _summarize_api_error, _extract_api_error_context,
_is_entitlement_failure, and classify_api_error all see the real
provider message. Users on unsubscribed accounts now see 'do not have
an active Grok subscription' once, not three RuntimeErrors.

Verified end-to-end: classifier returns reason=auth retryable=False;
entitlement detector matches even with status_code=None; summarizer
returns the full xAI message.

Tests: 4 new in TestCodexFallbackErrorEvent covering xAI subscription
message, dict-shaped events, summarizer integration, and the empty-stream
case (must still raise the original RuntimeError so 'truncated mid-flight'
stays distinguishable from 'provider rejected the call').
… activated

In long-lived interactive sessions, _try_activate_fallback() advances
_fallback_index before attempting client resolution.  When resolution
fails (provider not configured, etc.) the function returns False without
ever setting _fallback_activated=True.  _restore_primary_runtime() then
skips its reset block entirely (guarded by `if not _fallback_activated`),
leaving _fallback_index >= len(_fallback_chain) for all subsequent turns.
The eager-fallback guard at the top of the retry loop checks
`_fallback_index < len(_fallback_chain)`, so the condition fails silently
and no fallback is ever attempted again for that session.

Cron jobs spawn a fresh AIAgent per run and never hit this path, which is
why the same fallback chain works reliably for cron but not interactive.

Fix: reset _fallback_index=0 in the `not _fallback_activated` early-return
branch so every new turn starts with the full chain available.

Fixes NousResearch#20465
…latform (NousResearch#27188)

Introduces a thin CLI wrapper around the existing send_message_tool so
shell scripts, cron scripts, CI hooks, and monitoring daemons can reuse
the gateway's already-configured platform credentials without
reimplementing each platform's REST client.

  hermes send --to telegram "deploy finished"
  echo "RAM 92%" | hermes send --to telegram:-1001234567890
  hermes send --to discord:#ops --file report.md
  hermes send --to slack:#eng --subject "[CI]" --file build.log
  hermes send --list                  # all targets
  hermes send --list telegram         # filter by platform

Supports all platforms the send_message tool already does (Telegram,
Discord, Slack, Signal, SMS, WhatsApp, Matrix, Feishu, DingTalk, WeCom,
Weixin, Email, etc.), including threaded targets and #channel-name
resolution via the channel directory.

hermes_cli/send_cmd.py delegates to tools.send_message_tool.send_message_tool,
which means there is zero new platform-specific code. The subcommand just:

1. Bridges ~/.hermes/.env and top-level ~/.hermes/config.yaml scalars into
   os.environ (same bootstrap the gateway does at startup) — required so
   TELEGRAM_HOME_CHANNEL and friends are visible to load_gateway_config().
2. Resolves the message body from positional arg, --file, or piped stdin.
3. Calls the shared tool and translates its JSON result to exit codes:
   0 success, 1 delivery failure, 2 usage error.

No running gateway is required for bot-token platforms (Telegram, Discord,
Slack, Signal, SMS, WhatsApp) — the tool hits each platform's REST API
directly. Plugin platforms that rely on a live adapter connection still
need the gateway running; the error message is forwarded verbatim.

- New guide: website/docs/guides/pipe-script-output.md covering real-world
  patterns (memory watchdogs, CI hooks, cron pipes, long-running task
  completion pings) and the security/gateway notes.
- Cross-links added from automate-with-cron.md ("no LLM? use hermes send")
  and developer-guide/gateway-internals.md (delivery-path section).

tests/hermes_cli/test_send_cmd.py (20 tests, all green):

- Happy paths: positional message, stdin, --file, --file -, --subject,
  --json, --quiet.
- Error paths: missing --to, missing body, file not found, tool returns
  error payload (exit 1), tool skipped-send result (exit 0).
- --list: human output, --json output, platform filter, unknown platform.
- Env loader: bridges config.yaml scalars into env, does not override
  existing env vars, gracefully handles missing files.
- Registrar contract: register_send_subparser() returns a working parser.

Smoke-tested end-to-end against a live Telegram bot before commit.
`_discover_all_plugins()` in plugins_cmd.py did a flat scan of the
bundled and user plugin directories — only direct children with a
plugin.yaml were surfaced. Category directories like `observability/`,
`image_gen/`, `platforms/`, `model-providers/`, `web/`, and `video_gen/`
have no plugin.yaml of their own, so their nested plugins
(`observability/langfuse`, `image_gen/openai`, etc.) never appeared in
`hermes plugins list` or the interactive `hermes plugins` UI — even
though the runtime loader (`PluginManager._scan_directory_level`)
discovers them correctly and they do load at runtime.

This broke the documented promise that bundled plugins appear in
`hermes plugins list` and the interactive UI before being enabled,
and made it look like `observability/langfuse` didn't exist.

Refactor `_discover_all_plugins()` to mirror the loader's recursion
(depth cap = 2, same skip set, user overrides bundled on key collision).
Return the path-derived registry key (e.g. `observability/langfuse`) as
the displayed name, matching what the user passes to
`hermes plugins enable …` / writes under `plugins.enabled` in
config.yaml.

Also clarify the plugins docs: spell out that sub-category plugins
surface by their `<category>/<plugin>` key in `hermes plugins list` /
interactive UI, add an `observability/langfuse` example to the command
reference, and include a nested entry in the interactive-UI mock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The langfuse plugin is hooks-only (no toolsets), so it never appears in
`hermes tools` — that menu iterates `_get_effective_configurable_toolsets()`
(= `CONFIGURABLE_TOOLSETS` + plugin-registered toolsets), and "langfuse"
is in neither. The `TOOL_CATEGORIES["langfuse"]` setup wizard (with its
`post_setup: "langfuse"` hook that pip-installs the SDK and writes
`plugins.enabled`) was reachable only when a toolset key "langfuse" got
enabled, which can't happen — so it's been dead code, and the docs that
promised "Setup (interactive): hermes tools → Langfuse Observability"
were silently broken.

Right home for that wizard is `hermes plugins` (e.g. auto-running a
plugin's post-setup hook on enable), which is a generic plugin-setup
mechanism worth designing properly rather than shoehorning langfuse
back into `hermes tools`. Until that exists, point users at the
working manual flow.

Code:
- Delete `TOOL_CATEGORIES["langfuse"]` (24 lines) — unreachable.
- Delete the `post_setup_key == "langfuse"` branch in `_run_post_setup`
  (29 lines) — only caller was the deleted TOOL_CATEGORIES entry.

Docs / comments (point at the manual flow + interactive `hermes plugins`):
- `plugins/observability/langfuse/README.md`: collapse the two-option
  setup section to the single working flow.
- `plugins/observability/langfuse/plugin.yaml`: update `description`.
- `plugins/observability/langfuse/__init__.py`: update module docstring.
- `hermes_cli/config.py`: update inline comment above the LANGFUSE_*
  env-var allow-list.
- `website/docs/user-guide/features/built-in-plugins.md`: collapse
  "Setup (interactive)" + "Setup (manual)" into one accurate block.
- `website/docs/reference/environment-variables.md`: update the
  cross-reference in the Langfuse env-vars section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ugins

The `if key in seen and source == "bundled": continue` check was
unreachable: bundled is scanned before user, so `key in seen` can never
be true while `source == "bundled"`. The "user overrides bundled"
semantics are preserved automatically by the unconditional
`seen[key] = …` on the user pass.

Replaces the dead guard with a one-line comment explaining the
overwrite semantics, so a future contributor adding a third source
(e.g. project plugins) can see at a glance how ordering interacts with
the dict-overwrite. Matches `PluginManager.discover_and_load`'s
"user wins" rule.

Spotted by Copilot in code review on NousResearch#27161.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a TestDiscoverAllPlugins class covering the six cases the recursive
scan needs to handle:

- flat plugin uses its manifest ``name:`` as the key
- category-namespaced plugin keys off ``<category>/<dirname>`` even when
  the manifest ``name:`` is bare (regression test for the original bug —
  ``plugins/observability/langfuse/`` with ``name: langfuse`` must
  surface as ``observability/langfuse``, not ``langfuse``)
- user-installed plugin overrides bundled on key collision
- depth cap: anything below ``<root>/<category>/<plugin>/`` is ignored
- bundled ``memory/`` and ``context_engine/`` are skipped (they have
  their own loaders), but user plugins under those category names are
  still scanned

Also add an in-source comment next to the key derivation pointing at the
loader's matching line (``PluginManager._parse_manifest`` in
plugins.py:1027-1028), so future renames of one site flag the other.

Both items raised in Copilot review on NousResearch#27161.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ssion (NousResearch#27189)

After context compression, the protected tail messages retain their
original image parts. When those include multi-MB pasted screenshots,
every subsequent API request re-ships the same base-64 blobs forever —
which can push the request past provider body-size limits and wedge the
session even though compression 'succeeded'.

Add _strip_historical_media() to agent/context_compressor.py. After the
summary is built, find the newest user message that carries an image
part and replace image parts in every earlier message with a short
text placeholder ('[Attached image — stripped after compression]').
The newest image-bearing user turn keeps its media so the model can
still analyse what the user just sent.

Handles all three multimodal shapes:
  - OpenAI chat.completions image_url
  - OpenAI Responses API input_image
  - Anthropic native {type: image, source: ...}

Includes 27 unit tests covering the helpers and the end-to-end
compress() integration, plus a manual E2E check confirming a ~4MB
two-image conversation shrinks to ~2MB after compression.
…nitization.py

Pull the 10 pure sanitization/repair helpers (\_sanitize_surrogates,
\_sanitize_structure_surrogates, \_sanitize_messages_surrogates,
\_escape_invalid_chars_in_json_strings, \_repair_tool_call_arguments,
\_strip_non_ascii, \_sanitize_messages_non_ascii, \_sanitize_tools_non_ascii,
\_strip_images_from_messages, \_sanitize_structure_non_ascii) and the
\_SURROGATE_RE constant out of run_agent.py into a new module.

These are stateless byte-walking helpers with no AIAgent dependency.

Backward compatibility: run_agent re-exports every name via a single
import block, so existing 'from run_agent import _sanitize_surrogates'
imports in tests and cli.py keep working unchanged. Same pattern the
file already uses for _summarize_user_message_for_log (codex_responses_adapter).

run_agent.py: 16077 -> 15682 lines (-395).
…atch_helpers.py

Pull module-level helpers used by the tool-execution path out of
run_agent.py:

* parallelism gating — _NEVER_PARALLEL_TOOLS, _PARALLEL_SAFE_TOOLS,
  _PATH_SCOPED_TOOLS, _DESTRUCTIVE_PATTERNS, _REDIRECT_OVERWRITE,
  _is_destructive_command, _should_parallelize_tool_batch,
  _extract_parallel_scope_path, _paths_overlap
* multimodal envelopes — _is_multimodal_tool_result,
  _multimodal_text_summary, _append_subdir_hint_to_multimodal
* file-mutation verifier inputs — _extract_file_mutation_targets,
  _extract_error_preview
* trajectory normalization — _trajectory_normalize_msg

All pure functions. run_agent re-exports every name so existing
'from run_agent import _is_multimodal_tool_result' callers in
tests/tools/, tests/run_agent/, and tools/file_state.py keep working.

tests/run_agent/: 1341 passed, 3 skipped.
run_agent.py: 15682 -> 15427 lines (-255).
Three small extractions into focused modules:

* agent/process_bootstrap.py — \_OpenAIProxy (lazy openai.OpenAI import),
  \_SafeWriter (broken-pipe-resistant stdio wrapper), \_install_safe_stdio,
  \_get_proxy_from_env, \_get_proxy_for_base_url. All process / IO bootstrap.
* agent/iteration_budget.py — IterationBudget class (thread-safe consume/
  refund counter shared by parent agent and subagents).

run_agent re-exports every name so existing test patches like
patch('run_agent.OpenAI', ...) and 'from run_agent import IterationBudget'
keep working unchanged.  Verified the patch-rebinding contract for OpenAI
explicitly.

tests/run_agent/ + tests/agent/test_gemini_fast_fallback.py:
1347 passed, 3 skipped.
run_agent.py: 15427 -> 15261 lines (-166).
…background_review.py

Move the background-review subsystem (the self-improvement loop — see the
README) out of run_agent.py into a dedicated module.

* summarize_background_review_actions — was the @staticmethod that builds
  the user-facing action summary
* spawn_background_review_thread — builds the thread target + prompt;
  the actual review loop body (forked AIAgent, runtime inheritance,
  tool whitelist, suppression, teardown) lives in _run_review_in_thread
* build_memory_write_metadata — provenance for external memory mirrors

AIAgent keeps thin wrappers for backward compatibility AND because tests
patch run_agent.threading.Thread to assert lifecycle behavior — the
threading.Thread construction stays in AIAgent._spawn_background_review,
the inner work moves out.

tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure
(test_auxiliary_client.py::test_custom_endpoint... — confirmed failing
on main before this change). 3 skipped.

run_agent.py: 15272 -> 14972 lines (-300).
…n_compression.py

Move four compression-related methods to a dedicated module:

* check_compression_model_feasibility — startup probe + auto-lowered threshold + hard floor
* replay_compression_warning — re-emit stored warning through gateway status_callback
* compress_context — run compressor, split SQLite session, notify plugins+memory
* try_shrink_image_parts_in_messages — image-too-large recovery via re-encode

AIAgent keeps thin forwarder methods so existing call sites and tests
that patch run_agent.AIAgent methods keep working.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 15013 -> 14535 lines (-478).
…ompt.py

Four AIAgent methods move into a dedicated module:

* build_system_prompt_parts — three-tier stable/context/volatile dict
* build_system_prompt        — joiner used at session start
* invalidate_system_prompt   — drop cache + reload memory
* format_tools_for_system_message — trajectory-format tool dump

The extracted helpers look up patch-target names (load_soul_md,
build_skills_system_prompt, get_toolset_for_tool, build_environment_hints,
build_context_files_prompt, build_nous_subscription_prompt) through the
run_agent module via _ra() instead of importing them directly.  That
preserves the patch surface tests rely on
(patch('run_agent.load_soul_md', ...) and friends).

AIAgent keeps thin forwarder methods.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 14555 -> 14292 lines (-263).
Move the two big tool-dispatch methods out of run_agent.py:

* execute_tool_calls_concurrent — 408-line concurrent path (interrupt
  pre-flight, guardrail+plugin block, callback fan-out, ContextVar-
  preserving ThreadPoolExecutor, periodic heartbeats for the gateway
  inactivity monitor, per-tool result handling with subdir hints +
  guardrail observations + checkpoint, /steer drain)
* execute_tool_calls_sequential — 441-line sequential path (the
  original behavior used for single-tool batches and interactive
  tools)

Both take the parent AIAgent as their first argument; AIAgent keeps
thin forwarders so call sites unchanged. handle_function_call is
routed through _ra() so tests that patch run_agent.handle_function_call
keep working. _set_interrupt likewise.

The AST guard in test_tool_executor_contextvar_propagation.py is
updated to scan both run_agent.py AND agent/tool_executor.py so it
still catches the executor.submit(_run_tool, ...) regression
regardless of which file the body lives in.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 14309 -> 13461 lines (-848).
Move the five stream-drop diagnostic helpers + the headers tuple:

* STREAM_DIAG_HEADERS — cf-ray, x-openrouter-provider, x-request-id, etc.
* stream_diag_init — fresh per-attempt diagnostic dict
* stream_diag_capture_response — snapshot upstream headers + HTTP status
* flatten_exception_chain — compact Outer(msg) <- Inner(msg) rendering
* log_stream_retry — structured WARNING with provider/bytes/elapsed/ttfb
* emit_stream_drop — user-facing status line + activity touch

AIAgent keeps thin forwarder methods (and exposes the headers tuple as
_STREAM_DIAG_HEADERS for back-compat).  All test patches and call sites
unchanged.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 13470 -> 13227 lines (-243).
…mpletion_helpers.py

Six methods move into a new module — bodies live there, AIAgent keeps
thin forwarder methods so call sites and tests are unchanged.

* interruptible_api_call — non-streaming API call with interrupt handling
* build_api_kwargs — assemble OpenAI / Anthropic / Codex / Bedrock request kwargs
* build_assistant_message — normalize assistant message dict (reasoning,
  tool_calls, codex passthrough fields, alibaba glm-4.7 quirk)
* try_activate_fallback — provider fallback chain activation
* handle_max_iterations — controlled stop when iteration budget exhausts
* cleanup_task_resources — per-turn VM + browser teardown (skipped for
  persistent environments)

Names tests patch on run_agent (cleanup_vm, cleanup_browser) are routed
through _ra() so the patch surface is preserved.

Two TestAnthropicInterruptHandler source-introspection tests were
updated to scan agent.chat_completion_helpers.interruptible_api_call
instead of AIAgent._interruptible_api_call — the body lives in the
extracted module now.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 13282 -> 12253 lines (-1029).
…chat_completion_helpers.py

Move _interruptible_streaming_api_call out of run_agent.py — the biggest
single method in the file.  Body lives next to interruptible_api_call
in agent/chat_completion_helpers.py so streaming + non-streaming code
share one home.

Nested closures (_call_chat_completions, _call_anthropic, the codex
stream branch) all come along with the body and still capture the
parent function's locals as expected.

AIAgent keeps a thin forwarder method.  is_local_endpoint added to
the import block (used by the stream stale-timeout disable logic).

One source-introspection test in TestAnthropicInterruptHandler is
updated to scan agent.chat_completion_helpers.interruptible_streaming_api_call
instead of AIAgent._interruptible_streaming_api_call.

tests/run_agent/ + tests/agent/: 4312 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 12277 -> 11385 lines (-892).
…cated modules

Two new modules:

* agent/codex_runtime.py — three Codex API-mode methods
  - run_codex_app_server_turn (148 LOC) — Codex CLI subprocess driver
  - run_codex_stream (125 LOC) — Codex Responses API stream
  - run_codex_create_stream_fallback (78 LOC) — fallback after Responses
    stream=true initial create failure

* agent/agent_runtime_helpers.py — twelve assorted AIAgent helpers
  totalling ~1,166 LOC: convert_to_trajectory_format, sanitize_tool_call_arguments
  (static), repair_message_sequence, strip_think_blocks,
  recover_with_credential_pool, try_recover_primary_transport,
  drop_thinking_only_and_merge_users (static), restore_primary_runtime,
  extract_reasoning, dump_api_request_debug,
  anthropic_prompt_cache_policy, create_openai_client

AIAgent keeps thin forwarder methods for all 15 (preserving @staticmethod
where needed). Symbols tests patch on run_agent (OpenAI, AIAgent class
attrs) are routed through _ra() to honor the patch contract. The
_TRANSIENT_TRANSPORT_ERRORS frozenset moves with try_recover_primary_transport
and is referenced as a module-level constant in the extracted code.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 11391 -> 9887 lines (-1504).
The three big review-prompt strings (_MEMORY_REVIEW_PROMPT,
_SKILL_REVIEW_PROMPT, _COMBINED_REVIEW_PROMPT — 183 lines combined) move
out of the AIAgent class body and into agent/background_review.py where
they're consumed.

AIAgent re-exposes them as class attributes via 'from ... import' inside
the class body — Python binds those names into the class namespace so
existing AIAgent._MEMORY_REVIEW_PROMPT references keep working.
spawn_background_review_thread also falls back to the module-level
constants if an agent doesn't have the attribute (preserves the test
pattern of mocking these on the agent).

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 9986 -> 9800 lines (-186).
…oop.py

The 3,877-line run_conversation body — the agent loop itself — moves out
of run_agent.py into a dedicated module.  AIAgent.run_conversation is
now a thin forwarder that delegates to agent.conversation_loop.run_conversation
with the AIAgent instance as the first argument.

This is the largest single extraction in the run_agent.py refactor.
The body keeps all 163 self.X references intact (rewritten as agent.X),
all nested closures, all retry/backoff/compression machinery.  Symbols
that tests or callers patch on run_agent (_set_interrupt,
handle_function_call, AIAgent class attrs) are resolved through _ra()
inside the extracted module so the patch surface is preserved.

Five tests doing inspect.getsource(AIAgent.run_conversation) updated to
scan agent.conversation_loop.run_conversation. Two source-introspection
tests (TestMemoryNudgeCounterPersistence, TestMemoryProviderTurnStart)
updated to accept either self.X (legacy) or agent.X (extracted
form) in the matched assertions.

Live E2E verified on three model paths:
  * openai/gpt-5.4 (OpenAI chat completions via OpenRouter)
  * anthropic/claude-sonnet-4.6 (Anthropic Messages via OpenRouter)
  * moonshotai/kimi-k2-thinking (reasoning model, reasoning_content path)
Plus read_file tool execution, terminal tool, web_search.

tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure
(test_auxiliary_client::test_custom_endpoint... — same as on main).

run_agent.py: 9800 -> 5944 lines (-3856).
Total reduction since baseline: 16083 -> 5944 (-10139, 63%).
The largest method left on AIAgent (60+ parameters, the entire startup
sequence — credential resolution, provider auto-detection, context
engine bootstrap, memory store hydration, plugin lifecycle hooks)
moves into agent/agent_init.py.

AIAgent.__init__ is now a thin wrapper that calls
agent.agent_init.init_agent(self, ...) with the original full
parameter list preserved.

Module-level run_agent names referenced in the body (_openrouter_prewarm_done,
_qwen_portal_headers, _routermint_headers, _hermes_home, OpenAI,
get_tool_definitions, check_toolset_requirements) are resolved through
_ra() so test patches on those names keep working.  agent_init's logger
warnings are routed via _ra().logger so tests patching run_agent.logger
capture them (TestStringKSuffixContextLengthWarns,
TestCustomProvidersInvalidContextLengthWarns).

Live E2E reconfirmed on three model paths (openai/gpt-5.4,
anthropic/claude-sonnet-4.6, moonshotai/kimi-k2-thinking).

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 5944 -> 4564 lines (-1380).
Total reduction since baseline: 16083 -> 4564 (-11519, 72%).
…ypes

The Discord adapter silently dropped any attachment whose extension wasn't
in the SUPPORTED_DOCUMENT_TYPES allowlist (PDF, text family, zip, office).
Users uploading .wav / .bin / other unrecognized formats saw nothing in
their conversation — the file got logged as 'Unsupported document type'
and discarded before the agent ever saw it.

Add discord.allow_any_attachment (default false) to bypass the allowlist.
When on:
  - Any file is downloaded, cached under ~/.hermes/cache/documents/, and
    surfaced as a DOCUMENT-typed event with application/octet-stream MIME
  - gateway/run.py already emits a context note with the cached path,
    auto-translated via to_agent_visible_cache_path() for Docker/Modal
    sandboxed terminals
  - File body is NOT inlined — only the path — so binary uploads don't
    blow up the context window
  - Allowlisted text formats (.txt/.md/.log) keep their 100 KiB inline
    behavior unchanged

Also adds discord.max_attachment_bytes (default 32 MiB matches the
historical hardcoded cap; 0 = unlimited) since users opting into arbitrary
types may want to raise the cap. The whole attachment is held in memory
while being cached, so unlimited carries a real memory cost.

Env overrides: DISCORD_ALLOW_ANY_ATTACHMENT, DISCORD_MAX_ATTACHMENT_BYTES.

Discord-only by deliberate scope. Telegram has hard 20 MB API limits and
Slack has its own caps — extending the same flag there is a separate
follow-up if/when requested.
teknium1 and others added 23 commits May 17, 2026 11:50
…cceeds

xAI's OAuth implementation at ``auth.x.ai`` validates the PKCE
``code_challenge`` at the **token** endpoint, not just at the
authorize step.  When Hermes sends the standards-compliant token
POST with ``code_verifier`` alone — exactly what RFC 7636 §4.5
prescribes — xAI rejects the exchange with ``code_challenge is
required`` and the user is stuck with no working OAuth login.

The fix:

* Extract the token POST into ``_xai_oauth_exchange_code_for_tokens``
  so the wire format is unit-testable in isolation.
* Send the original ``code_challenge`` and ``code_challenge_method``
  in the form body alongside ``code_verifier``.  Strict RFC-compliant
  servers ignore the extras at the token endpoint, and xAI's
  permissive implementation accepts the exchange.  This is the
  standard "defensive echo" workaround used by every OAuth client
  that targets a server with this quirk.
* Refuse to fire the POST when ``code_verifier`` is empty — leaking
  the authorization code to a server that can't redeem it is worse
  than failing locally with an actionable error.  The new error
  code is ``xai_pkce_verifier_missing`` and the message points at
  this issue for context.
* Surface the HTTP status code prominently in the 4xx error message
  (``xAI token exchange failed (HTTP 400). Response: …``) so users
  and maintainers can tell a 400 (bad request / PKCE problem) from
  a 403 (tier denied, see NousResearch#26847) at a glance instead of parsing
  the JSON body by eye.

Closes NousResearch#26990
14 focused tests on the extracted helper
``_xai_oauth_exchange_code_for_tokens`` cover:

Core contract:
* ``code_verifier`` is on the wire (RFC 7636 §4.5).
* ``code_challenge`` + ``code_challenge_method=S256`` are echoed
  (the NousResearch#26990 defense-in-depth that makes xAI's token endpoint
  stop rejecting valid exchanges).
* ``grant_type=authorization_code``, ``code``, ``redirect_uri``,
  and ``client_id`` are all locked.
* Content-Type is ``application/x-www-form-urlencoded`` (xAI
  rejects ``application/json`` on this endpoint).
* The supplied ``token_endpoint`` URL is used verbatim — no
  hard-coded constant sneaks in via a future refactor.
* ``timeout_seconds`` is forwarded; floored at 20s.

Sanity guard:
* Empty ``code_verifier`` raises ``xai_pkce_verifier_missing``
  with a link to NousResearch#26990 — and NOTHING is sent.  Leaking the auth
  code to a server that can't redeem it is the wrong failure mode.
* Empty ``code_challenge`` omits only the defensive echo; the
  standards-compliant ``code_verifier`` request still goes out so
  RFC-compliant servers keep working.

Error surfacing:
* Non-200 responses include both ``HTTP <status>`` and the body
  verbatim — disambiguates 400 (PKCE / bad request) from 403
  (tier denied, see NousResearch#26847).
* Transport errors are wrapped as ``AuthError`` with the
  ``xai_token_exchange_failed`` code, so the surrounding
  ``format_auth_error`` UI mapping still fires.
* Non-dict JSON payloads raise ``xai_token_exchange_invalid``.
* 200 happy path returns the parsed payload dict verbatim.

End-to-end wire-format guard:
* A real ``httpx.Client`` with a stub transport captures the bytes
  on the wire and asserts every PKCE field round-trips through
  ``urlencode``.  Catches a future refactor that swaps
  ``data=`` for ``json=`` (which xAI would silently reject).
…ls for xAI compatibility

xAI's /responses endpoint rejects pattern and format JSON Schema keywords
in tool schemas with HTTP 400 'Invalid arguments passed to the model'.
The existing strip_pattern_and_format() only walked OpenAI-format tools
({'function': {'parameters': ...}}), missing Responses-format shapes
({'name': ..., 'parameters': ...}) used by codex_responses API mode.
This shows up most often with MCP-derived tools that carry validation
keywords (e.g. domain pattern regex in firecrawl, format: date-time)
through to the wire.

Extends the walk to handle both shapes. Auto-strip wiring is applied
separately in chat_completion_helpers (post-refactor location).

Closes NousResearch#27197
Port of the run_agent.py changes from NousResearch#27219 to current main: the
_build_api_kwargs body was extracted into agent/chat_completion_helpers.
build_api_kwargs, so wire the xAI tool-schema sanitization there
(provider in {'xai', 'xai-oauth'} or base_url=api.x.ai). Logs a warning
instead of silently swallowing exceptions, matching the contributor's
review-followup fix.

Co-authored-by: zccyman <zccyman@163.com>
…esearch#27572)

* feat(kanban): orchestrator-driven auto-decomposition on triage

Closes the core gap in the kanban system: dropping a one-liner into Triage
now decomposes it into a graph of child tasks routed to specialist
profiles by description, matching teknium's original vision ("main
orchestrator splits/creates actual tasks, doles them out to each agent").

The build
---------
- hermes_cli/profiles.py: new `description` + `description_auto` fields
  on ProfileInfo, persisted in <profile_dir>/profile.yaml. Helpers
  read_profile_meta / write_profile_meta. `create_profile` accepts
  optional description.
- hermes_cli/profile_describer.py: new module — auto-generate a 1-2
  sentence description from a profile's skills + model + name via the
  auxiliary LLM (`auxiliary.profile_describer`).
- hermes_cli/main.py: new `hermes profile create --description ...`
  flag; new `hermes profile describe [name] [--text ... | --auto |
  --all --auto]` subcommand.
- hermes_cli/kanban_db.py: new `decompose_triage_task` atomic helper —
  creates N child tasks, links the root as a child of every leaf
  (root waits for the whole graph), flips root `triage -> todo` with
  orchestrator assignee, records an audit comment + `decomposed` event
  in a single write_txn.
- hermes_cli/kanban_decompose.py: new module — calls the auxiliary LLM
  (`auxiliary.kanban_decomposer`) with the profile roster + descriptions
  to produce a JSON task graph, then invokes the DB helper. Rewrites
  unknown assignees to the configured `kanban.default_assignee` (or
  the active default profile) so a task NEVER lands with assignee=None.
  Falls back to specify-style single-task promotion when the LLM
  returns `fanout: false`.
- hermes_cli/kanban.py: new `hermes kanban decompose [task_id | --all]`
  CLI verb.
- hermes_cli/config.py: new DEFAULT_CONFIG keys —
  kanban.orchestrator_profile, kanban.default_assignee,
  kanban.auto_decompose (default True), kanban.auto_decompose_per_tick
  (default 3), auxiliary.kanban_decomposer, auxiliary.profile_describer.
- gateway/run.py: kanban dispatcher watcher now runs auto-decompose
  before each `_tick_once`, capped by `auto_decompose_per_tick` so a
  bulk-load of triage tasks doesn't burst-spend the aux LLM.
- plugins/kanban/dashboard/plugin_api.py: new endpoints —
  GET /profiles (list roster + descriptions),
  PATCH /profiles/<name> (set description, user-authored),
  POST /profiles/<name>/describe-auto (LLM-generate),
  POST /tasks/<id>/decompose (run decomposer),
  GET/PUT /orchestration (orchestrator/default-assignee/auto-decompose
  pickers, with resolved fallbacks echoed back).
- plugins/kanban/dashboard/dist/index.js: new OrchestrationPanel
  collapsible — dropdowns for orchestrator profile and default
  assignee, auto-decompose toggle, per-profile description editor with
  Save and Auto-generate buttons. New ⚗ Decompose button next to
  ✨ Specify on triage-column task drawers.

Behavior
--------
- A task in Triage gets fanned out into a small DAG of child tasks.
  Children with no internal parents flip to `ready` immediately
  (parallel dispatch). Children with sibling parents wait. The root
  stays alive as a parent of every child — when the whole graph
  finishes, it promotes to `ready` and the orchestrator profile wakes
  back up to judge completion (the "adds more tasks until done" part
  of the original vision).
- `kanban.orchestrator_profile` unset -> falls back to the default
  profile (whichever `hermes` launches with no -p flag).
- `kanban.default_assignee` unset -> same fallback. Tasks NEVER end
  up unassigned.
- `kanban.auto_decompose=true` (default) runs the decomposer
  automatically on dispatcher ticks; manual `hermes kanban decompose`
  is always available.

Tests
-----
- tests/hermes_cli/test_kanban_decompose_db.py — 7 tests for the
  atomic DB helper (status transitions, dep graph, audit trail,
  validation errors).
- tests/hermes_cli/test_kanban_decompose.py — 6 tests for the
  decomposer module (fanout, no-fanout fallback, unknown-assignee
  rewrite, malformed-JSON resilience, no-aux-client path).
- tests/hermes_cli/test_profile_describer.py — 10 tests for
  profile.yaml r/w + the LLM auto-describer (yaml corrupt tolerance,
  user-vs-auto description protection, --overwrite, fallback parsing).

E2E
---
- CLI end-to-end: created profiles with descriptions, dropped a triage
  task, mocked the aux LLM with a 3-task graph -> verified all three
  children were created with the right assignees, the dependency
  edges matched the LLM's graph, root flipped to todo gated by every
  child, audit comment + `decomposed` event recorded.
- Dashboard end-to-end: started the dashboard against an isolated
  HERMES_HOME, verified all four new endpoints via curl (profile
  listing, PATCH for description, PUT for orchestration settings,
  POST for decompose). Opened the UI in the browser, confirmed the
  OrchestrationPanel renders with all three pickers + the per-profile
  description editor, typed a description, clicked Save, verified
  ~/.hermes/profile.yaml was written. Clicked Decompose on the triage
  card and confirmed the inline error message surfaced as designed
  ("no auxiliary client configured").

* feat(kanban): surface decompose mode (Auto/Manual) as a one-click pill

The auto/manual toggle already existed as kanban.auto_decompose (default
true), but it was buried inside the collapsed Orchestration settings
panel — users couldn't tell at a glance which mode they were in. This
hoists it to a pill at the top of the kanban page so the state is always
visible and one click flips it.

UX
- New "⚗ Decompose: AUTO|MANUAL" pill in the kanban header. Emerald
  styling when Auto is on (the default), muted/gray when Manual.
- Pill is visible both in the collapsed AND expanded Orchestration
  settings views so context is preserved when the user opens the panel.
- Tooltip explains both states + what clicking does.
- Renamed the in-panel "Auto-decompose on triage / Enabled" checkbox
  to "Decompose mode / Auto (default) | Manual" for language parity
  with the pill.

Behavior preserved
- Default remains Auto (kanban.auto_decompose=true).
- Manual mode restores pre-PR behavior: triage tasks stay in triage
  until the user clicks ⚗ Decompose on each card (or runs
  `hermes kanban decompose <id>`).

Implementation
- plugins/kanban/dashboard/dist/index.js: load /orchestration on mount
  (not just on expand) so the collapsed pill reflects real state.
  Render mode pill in both collapsed and expanded headers. Reuses the
  existing PUT /api/plugins/kanban/orchestration endpoint — no new
  backend, no new tests required.

E2E verified
- Pill renders as "⚗ Decompose: AUTO" on page load (default).
- One click flips to "⚗ Decompose: MANUAL" with muted styling.
- config.yaml on disk shows auto_decompose: false after the flip.
- Second click round-trips back to Auto; config.yaml flips to true.

* feat(kanban): rename mode pill to "Orchestration: Auto/Manual"

Per Teknium feedback — "Decompose" was too implementation-specific.
"Orchestration" is the user-facing concept (the whole pitch is the
orchestrator profile routing work), and the pill is the front door to it.

- Pill text: "Orchestration: Auto" / "Orchestration: Manual" (title case,
  no ⚗ prefix, no SHOUTY-CAPS for the mode value)
- In-panel checkbox label: "Orchestration mode" (was "Decompose mode")
- Tooltips updated to match
- No behavior change

* docs(kanban): document decompose, profile descriptions, orchestration mode

Brings the docs site up to parity with the PR. English build verified
locally (npx docusaurus build --locale en) — clean, no new broken links
or anchors. Pre-existing broken-link warnings (rl-training, llms.txt,
step-by-step-checklist, fallback-model) untouched.

- website/docs/reference/cli-commands.md
    + `hermes kanban decompose` action row in the action table, with
      pointer to the Auto vs Manual orchestration section.

- website/docs/reference/profile-commands.md
    + `--description "<text>"` flag on `hermes profile create`.
    + Full `hermes profile describe` section: read, --text, --auto,
      --overwrite, --all flags with examples.

- website/docs/user-guide/features/kanban.md (the big one)
    + Triage column intro rewritten around the Auto-decompose default
      behavior, with pointer to the new Auto vs Manual section.
    + Status action row updated to mention both ⚗ Decompose and
      ✨ Specify on triage cards.
    + New "Auto vs Manual orchestration" section explaining the two
      modes, how to flip them (pill, config), how routing-by-description
      works, the no-None-assignee guarantee, plus a config knob table
      (auto_decompose, auto_decompose_per_tick, orchestrator_profile,
      default_assignee) and the two new auxiliary slots
      (kanban_decomposer, profile_describer).
    + REST surface table gains 6 new endpoint rows: /tasks/:id/decompose,
      /profiles (GET), /profiles/:name (PATCH), /profiles/:name/describe-auto,
      /orchestration (GET + PUT).

- website/docs/user-guide/features/kanban-tutorial.md
    + Triage column blurb updated for Auto by default + Manual via the
      pill, with cross-link to the Auto vs Manual orchestration section.

- website/docs/user-guide/profiles.md
    + Blank-profile flow now mentions --description and points to the
      kanban routing model for context.

- website/docs/user-guide/configuration.md
    + `kanban_decomposer` and `profile_describer` added to the
      `hermes model -> Configure auxiliary models` menu listing.
@coderabbitai

coderabbitai Bot commented May 17, 2026

Copy link
Copy Markdown

Important

Review skipped

Too many files!

This PR contains 295 files, which is 145 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24d758dc-ca8b-4a05-a13e-1793d7e78634

📥 Commits

Reviewing files that changed from the base of the PR and between fb05f5d and e254c4f.

⛔ Files ignored due to path filters (1)
  • plugins/kanban/dashboard/dist/index.js is excluded by !**/dist/**
📒 Files selected for processing (295)
  • Dockerfile
  • README.md
  • acp_adapter/tools.py
  • acp_registry/agent.json
  • agent/agent_init.py
  • agent/agent_runtime_helpers.py
  • agent/auxiliary_client.py
  • agent/background_review.py
  • agent/bedrock_adapter.py
  • agent/browser_provider.py
  • agent/browser_registry.py
  • agent/chat_completion_helpers.py
  • agent/codex_runtime.py
  • agent/context_compressor.py
  • agent/conversation_compression.py
  • agent/conversation_loop.py
  • agent/credential_pool.py
  • agent/iteration_budget.py
  • agent/lsp/client.py
  • agent/lsp/install.py
  • agent/lsp/manager.py
  • agent/lsp/reporter.py
  • agent/lsp/servers.py
  • agent/message_sanitization.py
  • agent/model_metadata.py
  • agent/process_bootstrap.py
  • agent/shell_hooks.py
  • agent/skill_commands.py
  • agent/stream_diag.py
  • agent/system_prompt.py
  • agent/tool_dispatch_helpers.py
  • agent/tool_executor.py
  • agent/transports/codex_app_server.py
  • agent/transports/codex_app_server_session.py
  • cli.py
  • cron/scheduler.py
  • gateway/platforms/api_server.py
  • gateway/platforms/base.py
  • gateway/platforms/discord.py
  • gateway/platforms/helpers.py
  • gateway/platforms/matrix.py
  • gateway/platforms/slack.py
  • gateway/platforms/sms.py
  • gateway/platforms/telegram.py
  • gateway/platforms/webhook.py
  • gateway/run.py
  • hermes_cli/auth.py
  • hermes_cli/codex_runtime_switch.py
  • hermes_cli/commands.py
  • hermes_cli/config.py
  • hermes_cli/dep_ensure.py
  • hermes_cli/doctor.py
  • hermes_cli/gateway.py
  • hermes_cli/goals.py
  • hermes_cli/kanban.py
  • hermes_cli/kanban_db.py
  • hermes_cli/kanban_decompose.py
  • hermes_cli/main.py
  • hermes_cli/model_switch.py
  • hermes_cli/plugins.py
  • hermes_cli/plugins_cmd.py
  • hermes_cli/profile_describer.py
  • hermes_cli/profiles.py
  • hermes_cli/proxy/cli.py
  • hermes_cli/proxy/server.py
  • hermes_cli/runtime_provider.py
  • hermes_cli/send_cmd.py
  • hermes_cli/session_recap.py
  • hermes_cli/status.py
  • hermes_cli/tools_config.py
  • hermes_cli/web_server.py
  • optional-skills/creative/meme-generation/scripts/generate_meme.py
  • optional-skills/devops/watchers/scripts/watch_rss.py
  • optional-skills/finance/stocks/scripts/stocks_client.py
  • optional-skills/health/fitness-nutrition/scripts/body_calc.py
  • optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py
  • optional-skills/productivity/telephony/scripts/telephony.py
  • optional-skills/research/darwinian-evolver/scripts/show_snapshot.py
  • optional-skills/research/domain-intel/scripts/domain_intel.py
  • optional-skills/research/osint-investigation/scripts/_http.py
  • optional-skills/research/osint-investigation/scripts/fetch_icij_offshore.py
  • plugins/browser/browser_use/__init__.py
  • plugins/browser/browser_use/plugin.yaml
  • plugins/browser/browser_use/provider.py
  • plugins/browser/browserbase/__init__.py
  • plugins/browser/browserbase/plugin.yaml
  • plugins/browser/browserbase/provider.py
  • plugins/browser/firecrawl/__init__.py
  • plugins/browser/firecrawl/plugin.yaml
  • plugins/browser/firecrawl/provider.py
  • plugins/disk-cleanup/__init__.py
  • plugins/google_meet/__init__.py
  • plugins/google_meet/cli.py
  • plugins/google_meet/meet_bot.py
  • plugins/google_meet/node/cli.py
  • plugins/google_meet/realtime/openai_client.py
  • plugins/google_meet/tools.py
  • plugins/kanban/dashboard/plugin_api.py
  • plugins/memory/byterover/__init__.py
  • plugins/memory/hindsight/__init__.py
  • plugins/memory/honcho/__init__.py
  • plugins/memory/honcho/cli.py
  • plugins/memory/honcho/client.py
  • plugins/memory/openviking/__init__.py
  • plugins/memory/supermemory/__init__.py
  • plugins/model-providers/deepseek/__init__.py
  • plugins/model-providers/kimi-coding/__init__.py
  • plugins/observability/langfuse/README.md
  • plugins/observability/langfuse/__init__.py
  • plugins/observability/langfuse/plugin.yaml
  • plugins/platforms/google_chat/adapter.py
  • plugins/platforms/irc/adapter.py
  • plugins/platforms/line/adapter.py
  • plugins/platforms/simplex/adapter.py
  • plugins/platforms/teams/adapter.py
  • plugins/teams_pipeline/cli.py
  • plugins/teams_pipeline/meetings.py
  • plugins/teams_pipeline/models.py
  • plugins/teams_pipeline/runtime.py
  • run_agent.py
  • scripts/check-windows-footguns.py
  • scripts/install.ps1
  • scripts/release.py
  • scripts/setup_open_webui.sh
  • scripts/tests/test-install-ps1-stage-protocol.ps1
  • skills/creative/comfyui/scripts/_common.py
  • skills/creative/comfyui/scripts/extract_schema.py
  • skills/creative/comfyui/scripts/fetch_logs.py
  • skills/creative/comfyui/scripts/hardware_check.py
  • skills/creative/comfyui/scripts/run_workflow.py
  • skills/creative/comfyui/scripts/ws_monitor.py
  • skills/creative/comfyui/tests/test_cloud_integration.py
  • skills/creative/comfyui/tests/test_extract_schema.py
  • skills/productivity/google-workspace/scripts/google_api.py
  • skills/productivity/maps/scripts/maps_client.py
  • skills/productivity/ocr-and-documents/scripts/extract_marker.py
  • skills/productivity/ocr-and-documents/scripts/extract_pymupdf.py
  • skills/research/arxiv/scripts/search_arxiv.py
  • skills/research/polymarket/scripts/polymarket.py
  • tests/acp/test_tools.py
  • tests/agent/lsp/_mock_lsp_server.py
  • tests/agent/lsp/test_install_and_lint_fixes.py
  • tests/agent/test_anthropic_adapter.py
  • tests/agent/test_auxiliary_main_first.py
  • tests/agent/test_compressor_historical_media.py
  • tests/agent/test_context_compressor.py
  • tests/agent/test_deepseek_anthropic_thinking.py
  • tests/agent/test_model_metadata.py
  • tests/agent/test_shell_hooks.py
  • tests/agent/test_skill_commands.py
  • tests/agent/transports/test_codex_app_server_runtime.py
  • tests/agent/transports/test_codex_app_server_session.py
  • tests/cli/test_cli_background_status_indicator.py
  • tests/cli/test_cli_init.py
  • tests/cli/test_reasoning_command.py
  • tests/conftest.py
  • tests/cron/test_cron_no_agent.py
  • tests/gateway/conftest.py
  • tests/gateway/test_allowlist_startup_check.py
  • tests/gateway/test_api_server.py
  • tests/gateway/test_api_server_runs.py
  • tests/gateway/test_background_command.py
  • tests/gateway/test_bluebubbles.py
  • tests/gateway/test_config_cwd_bridge.py
  • tests/gateway/test_discord_document_handling.py
  • tests/gateway/test_discord_system_messages.py
  • tests/gateway/test_google_chat.py
  • tests/gateway/test_matrix.py
  • tests/gateway/test_platform_connected_checkers.py
  • tests/gateway/test_qqbot.py
  • tests/gateway/test_restart_drain.py
  • tests/gateway/test_restart_resume_pending.py
  • tests/gateway/test_session_boundary_hooks.py
  • tests/gateway/test_session_model_override_routing.py
  • tests/gateway/test_teams.py
  • tests/gateway/test_telegram_thread_fallback.py
  • tests/gateway/test_transcript_offset.py
  • tests/gateway/test_update_streaming.py
  • tests/gateway/test_voice_command.py
  • tests/hermes_cli/test_auth_nous_provider.py
  • tests/hermes_cli/test_cmd_update.py
  • tests/hermes_cli/test_codex_runtime_switch.py
  • tests/hermes_cli/test_commands.py
  • tests/hermes_cli/test_doctor.py
  • tests/hermes_cli/test_gateway.py
  • tests/hermes_cli/test_install_cua_driver.py
  • tests/hermes_cli/test_kanban_core_functionality.py
  • tests/hermes_cli/test_kanban_decompose.py
  • tests/hermes_cli/test_kanban_decompose_db.py
  • tests/hermes_cli/test_memory_reset.py
  • tests/hermes_cli/test_models.py
  • tests/hermes_cli/test_opencode_go_in_model_list.py
  • tests/hermes_cli/test_plugins_cmd.py
  • tests/hermes_cli/test_profile_describer.py
  • tests/hermes_cli/test_send_cmd.py
  • tests/hermes_cli/test_session_recap.py
  • tests/hermes_cli/test_status.py
  • tests/hermes_cli/test_tools_config.py
  • tests/hermes_cli/test_tui_resume_flow.py
  • tests/hermes_cli/test_update_stale_dashboard.py
  • tests/hermes_cli/test_web_server.py
  • tests/hermes_cli/test_xai_oauth_pkce_token_exchange.py
  • tests/honcho_plugin/test_session.py
  • tests/plugins/browser/__init__.py
  • tests/plugins/browser/check_parity_vs_main.py
  • tests/plugins/browser/test_browser_provider_plugins.py
  • tests/plugins/model_providers/test_deepseek_profile.py
  • tests/plugins/test_achievements_plugin.py
  • tests/plugins/video_gen/test_xai_plugin.py
  • tests/run_agent/test_anthropic_truncation_continuation.py
  • tests/run_agent/test_background_review.py
  • tests/run_agent/test_codex_xai_oauth_recovery.py
  • tests/run_agent/test_jsondecodeerror_retryable.py
  • tests/run_agent/test_memory_nudge_counter_hydration.py
  • tests/run_agent/test_primary_runtime_restore.py
  • tests/run_agent/test_provider_parity.py
  • tests/run_agent/test_run_agent.py
  • tests/run_agent/test_streaming.py
  • tests/run_agent/test_tool_executor_contextvar_propagation.py
  • tests/skills/test_openclaw_migration.py
  • tests/stress/test_atypical_scenarios.py
  • tests/test_live_system_guard_self_test.py
  • tests/test_timezone.py
  • tests/test_tui_gateway_server.py
  • tests/tools/test_browser_homebrew_paths.py
  • tests/tools/test_code_execution_modes.py
  • tests/tools/test_delegate.py
  • tests/tools/test_discord_tool.py
  • tests/tools/test_dockerfile_pid1_reaping.py
  • tests/tools/test_hidden_dir_filter.py
  • tests/tools/test_managed_browserbase_and_modal.py
  • tests/tools/test_managed_modal_environment.py
  • tests/tools/test_mcp_cancelled_error_propagation.py
  • tests/tools/test_mcp_oauth.py
  • tests/tools/test_mcp_stability.py
  • tests/tools/test_mcp_tool.py
  • tests/tools/test_schema_sanitizer.py
  • tests/tools/test_send_message_tool.py
  • tests/tools/test_singularity_preflight.py
  • tests/tools/test_skill_manager_tool.py
  • tests/tools/test_skills_hub.py
  • tests/tools/test_transcription_dotenv_fallback.py
  • tests/tools/test_voice_cli_integration.py
  • tests/tui_gateway/test_entry_sys_path.py
  • tools/browser_providers/__init__.py
  • tools/browser_providers/base.py
  • tools/browser_tool.py
  • tools/code_execution_tool.py
  • tools/delegate_tool.py
  • tools/environments/local.py
  • tools/lazy_deps.py
  • tools/mcp_oauth.py
  • tools/mcp_tool.py
  • tools/process_registry.py
  • tools/schema_sanitizer.py
  • tools/send_message_tool.py
  • tools/tts_tool.py
  • tools/video_generation_tool.py
  • tools/x_search_tool.py
  • tools/xai_http.py
  • ui-tui/packages/hermes-ink/index.d.ts
  • ui-tui/packages/hermes-ink/src/entry-exports.ts
  • ui-tui/src/__tests__/cursorDriftRegression.test.ts
  • ui-tui/src/__tests__/forceTruecolor.test.ts
  • ui-tui/src/__tests__/text.test.ts
  • ui-tui/src/__tests__/textInputFastEcho.test.ts
  • ui-tui/src/__tests__/textInputWrap.test.ts
  • ui-tui/src/components/messageLine.tsx
  • ui-tui/src/components/textInput.tsx
  • ui-tui/src/lib/forceTruecolor.ts
  • ui-tui/src/lib/inputMetrics.ts
  • ui-tui/src/lib/text.ts
  • web/src/components/ChatSidebar.tsx
  • web/src/lib/gatewayClient.ts
  • web/src/pages/ChatPage.tsx
  • website/docs/developer-guide/gateway-internals.md
  • website/docs/guides/automate-with-cron.md
  • website/docs/guides/pipe-script-output.md
  • website/docs/reference/cli-commands.md
  • website/docs/reference/environment-variables.md
  • website/docs/reference/profile-commands.md
  • website/docs/user-guide/configuration.md
  • website/docs/user-guide/features/built-in-plugins.md
  • website/docs/user-guide/features/codex-app-server-runtime.md
  • website/docs/user-guide/features/cron.md
  • website/docs/user-guide/features/delegation.md
  • website/docs/user-guide/features/kanban-tutorial.md
  • website/docs/user-guide/features/kanban.md
  • website/docs/user-guide/features/plugins.md
  • website/docs/user-guide/features/spotify.md
  • website/docs/user-guide/features/web-dashboard.md
  • website/docs/user-guide/messaging/discord.md
  • website/docs/user-guide/messaging/matrix.md
  • website/docs/user-guide/profiles.md
  • website/docs/user-guide/security.md

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/runtime-env-hermeticity-clean

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

🔎 Lint report: test/runtime-env-hermeticity-clean vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8732 on HEAD, 8331 on base (🆕 +401)

🆕 New issues (365):

Rule Count
unresolved-attribute 258
invalid-argument-type 37
invalid-assignment 34
unresolved-import 14
invalid-parameter-default 11
invalid-type-form 3
unused-type-ignore-comment 3
unresolved-reference 2
unsupported-operator 2
call-non-callable 1
First entries
tests/hermes_cli/test_ollama_cloud_provider.py:370: [unresolved-attribute] unresolved-attribute: Object of type `AIAgent` has no attribute `provider`
agent/agent_init.py:1193: [unresolved-attribute] unresolved-attribute: Attribute `rstrip` is not defined on `dict[str, str] & ~AlwaysFalsy` in union `(str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy)`
tests/run_agent/test_steer.py:192: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_execution_thread_id` on type `AIAgent`
run_agent.py:2619: [unresolved-attribute] unresolved-attribute: Object of type `Self@_try_refresh_codex_client_credentials` has no attribute `_client_kwargs`
agent/agent_init.py:97: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `list[str]`
agent/agent_init.py:612: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["timeout"]` and value of type `int | float` on object of type `dict[str, str | dict[str, str]]`
hermes_cli/oneshot.py:332: [unresolved-attribute] unresolved-attribute: Unresolved attribute `suppress_status_output` on type `AIAgent`
tests/run_agent/test_413_compression.py:99: [unresolved-attribute] unresolved-attribute: Unresolved attribute `compression_enabled` on type `AIAgent`
tests/run_agent/test_413_compression.py:98: [unresolved-attribute] unresolved-attribute: Unresolved attribute `tool_delay` on type `AIAgent`
run_agent.py:1345: [unresolved-attribute] unresolved-attribute: Object of type `Self@_save_trajectory` has no attribute `model`
run_agent.py:1550: [unresolved-attribute] unresolved-attribute: Object of type `Self@_save_session_log` has no attribute `session_id`
tests/run_agent/test_run_agent.py:832: [unresolved-attribute] unresolved-attribute: Object of type `AIAgent` has no attribute `_cache_ttl`
cli.py:6378: [invalid-assignment] invalid-assignment: Object of type `Unknown` is not assignable to attribute `session_log_file` on type `AIAgent & <Protocol with members 'session_log_file'> & <Protocol with members 'logs_dir'> & ~AlwaysFalsy`
agent/conversation_loop.py:91: [invalid-type-form] invalid-type-form: Function `callable` is not valid in a parameter annotation: Did you mean `collections.abc.Callable`?
tests/run_agent/test_run_agent_codex_responses.py:285: [unresolved-attribute] unresolved-attribute: Object of type `AIAgent` has no attribute `api_mode`
agent/agent_init.py:772: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `bound method dict[str, str].__getitem__(key: str, /) -> str` cannot be called with key of type `slice[None, Literal[20], None]` on object of type `dict[str, str]`
run_agent.py:1341: [unresolved-attribute] unresolved-attribute: Object of type `Self@_save_trajectory` has no attribute `save_trajectories`
agent/tool_executor.py:638: [invalid-argument-type] invalid-argument-type: Argument to function `memory_tool` is incorrect: Expected `str`, found `Any | None`
run_agent.py:1942: [unresolved-attribute] unresolved-attribute: Object of type `~AlwaysFalsy` has no attribute `on_session_end`
tests/run_agent/test_1630_context_overflow_loop.py:42: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_use_prompt_caching` on type `AIAgent`
acp_adapter/session.py:627: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_print_fn` on type `AIAgent`
tests/run_agent/test_tool_call_guardrail_runtime.py:57: [unresolved-attribute] unresolved-attribute: Unresolved attribute `tool_delay` on type `AIAgent`
tests/run_agent/test_compressor_fallback_update.py:66: [unresolved-attribute] unresolved-attribute: Object of type `AIAgent` has no attribute `context_compressor`
agent/process_bootstrap.py:43: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
tests/run_agent/test_tool_call_guardrail_runtime.py:163: [unresolved-attribute] unresolved-attribute: Unresolved attribute `tool_start_callback` on type `AIAgent`
... and 340 more

✅ Fixed issues (119):

Rule Count
invalid-argument-type 66
invalid-assignment 21
unresolved-attribute 19
unsupported-operator 5
unresolved-reference 3
not-subscriptable 3
unresolved-import 1
no-matching-overload 1
First entries
run_agent.py:1677: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["args"]` and value of type `list[str | Unknown]` on object of type `dict[str, str]`
run_agent.py:14518: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:1832: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `bound method dict[str, str].__getitem__(key: str, /) -> str` cannot be called with key of type `slice[Literal[-4], None, None]` on object of type `dict[str, str]`
run_agent.py:13562: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.update_token_counts` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:1674: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["timeout"]` and value of type `int | float` on object of type `dict[str, str]`
run_agent.py:15959: [unresolved-attribute] unresolved-attribute: Object of type `None` has no attribute `to_metadata`
run_agent.py:1818: [invalid-assignment] invalid-assignment: Cannot assign to a subscript on an object of type `str`
run_agent.py:11169: [invalid-argument-type] invalid-argument-type: Argument to bound method `set.add` is incorrect: Expected `int`, found `int | None`
run_agent.py:7560: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["default_headers"]` and value of type `dict[str, str] & ~AlwaysFalsy` on object of type `dict[str, str]`
run_agent.py:2716: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `Any | (str & ~AlwaysFalsy) | None`
run_agent.py:15411: [invalid-argument-type] invalid-argument-type: Argument to bound method `AIAgent._strip_think_blocks` is incorrect: Expected `str`, found `Any | None`
run_agent.py:9967: [invalid-argument-type] invalid-argument-type: Argument to function `_get_anthropic_max_output` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
tests/run_agent/test_provider_attribution_headers.py:133: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["X-BILLING-INVOKE-ORIGIN"]` on object of type `str`
tests/run_agent/test_steer.py:194: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to attribute `_tool_worker_threads_lock` of type `lock`
run_agent.py:14034: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `bound method dict[str, str].__getitem__(key: str, /) -> str` cannot be called with key of type `slice[None, Literal[12], None]` on object of type `dict[str, str]`
run_agent.py:13525: [invalid-argument-type] invalid-argument-type: Argument to function `estimate_usage_cost` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:14031: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:12768: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control` is incorrect: Expected `bool`, found `int | str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | dict[Unknown, Unknown]`
run_agent.py:15870: [unresolved-attribute] unresolved-attribute: Attribute `rstrip` is not defined on `None` in union `None | Unknown | str`
run_agent.py:1819: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["default_headers"]` and value of type `(str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | dict[Unknown, Unknown]` on object of type `dict[str, str]`
run_agent.py:7543: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["default_headers"]` and value of type `dict[Unknown, Unknown]` on object of type `dict[str, str]`
run_agent.py:5009: [invalid-argument-type] invalid-argument-type: Argument to function `save_trajectory` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:11819: [invalid-argument-type] invalid-argument-type: Argument to function `_append_subdir_hint_to_multimodal` is incorrect: Expected `dict[str, Any]`, found `str | Unknown`
tools/browser_providers/browser_use.py:41: [unsupported-operator] unsupported-operator: Operator `>=` is not supported between objects of type `None | Unknown` and `Literal[500]`
run_agent.py:8691: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
... and 94 more

Unchanged: 4238 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@DIZ-admin

Copy link
Copy Markdown
Owner Author

Superseded by clean replacement PR #12.

@DIZ-admin DIZ-admin closed this May 17, 2026
@DIZ-admin DIZ-admin deleted the test/runtime-env-hermeticity-clean branch May 18, 2026 08:21
DIZ-admin pushed a commit that referenced this pull request May 26, 2026
Four findings from Copilot's review on PR NousResearch#22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `#11..NousResearch#40` even though the JSON
   `elements` array only held #1..#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.