Skip to content

fix(security): cherry-pick upstream security hardening#2

Merged
jecruz merged 227 commits into
mainfrom
fix/upstream-security-v5
Apr 5, 2026
Merged

fix(security): cherry-pick upstream security hardening#2
jecruz merged 227 commits into
mainfrom
fix/upstream-security-v5

Conversation

@jecruz

@jecruz jecruz commented Apr 1, 2026

Copy link
Copy Markdown
Owner

What does this PR do?

Cherry-picks upstream security hardening from NousResearch/hermes-agent. Adds credential pooling with multi-provider failover, comprehensive secret redaction for logs, memory provider plugins (7 backends), subscription feature management, and multi-platform gateway support (Discord, Matrix, Slack, Telegram, WhatsApp, custom API).

Major features:

  • Credential pooling: Multiple credentials per provider with round-robin/least-used/random strategies
  • Secret redaction: Snapshots env at import time; redacts API keys, tokens, passwords, private keys, DB connection strings, and phone numbers
  • Memory plugins: Honcho, Mem0, Holographic, Byterover, RetainDB, OpenViking, Hindsight (pluggable, one active at a time)
  • Gateway delivery: Event-driven message handling across 5+ platforms with schema injection
  • Nous subscription: Feature state tracking for web search, image generation, TTS, browser, Modal

Related Issue

Upstream security sync for v0.7.0 release

Type of Change

  • 🔒 Security fix
  • ✨ New feature (non-breaking change that adds functionality)
  • ♻️ Refactor (no behavior change)

Changes Made

Core Security Hardening

  • agent/redact.py: Regex-based secret redaction for logs; snapshots HERMES_REDACT_SECRETS at import time to prevent runtime bypass; patterns for 30+ API key types + base64-encoded tokens + DB connection strings
  • agent/credential_pool.py: Multi-credential failover with pluggable strategies (fill_first, round_robin, random, least_used); per-provider priority chaining; 429/402 cooldowns
  • hermes_cli/auth.py: Provider registry (Nous Portal, OpenRouter, GitHub Copilot, custom endpoints); OAuth device-code flow; JWT token refresh with skew; cross-process file locking on auth.json

Memory & Persistence

  • agent/memory_manager.py: Lifecycle orchestration for memory providers; prefetch before each turn; sync after each turn
  • agent/memory_provider.py: ABC for pluggable backends (honorable mention: already defined, plugins comply)
  • plugins/memory/: 7 memory backends (honcho, mem0, holographic, byterover, retaindb, openviking, hindsight) with standardized tool schemas

Gateway & Delivery

  • gateway/run.py: Event-driven message handling; platform abstraction; schema injection
  • gateway/platforms/: Discord, Matrix, Slack, Telegram, WhatsApp, generic API server

Nous Features & Setup

  • hermes_cli/nous_subscription.py: Subscription state tracking; feature availability flags
  • hermes_cli/memory_setup.py: Interactive memory provider selection (NEW)
  • hermes_cli/setup.py: Enhanced onboarding with memory/tool configuration

Test Coverage

  • tests/test_nous_subscription.py: NousFeatureState immutability, feature aggregation, model config parsing (NEW)

Infrastructure

  • Dockerfile: Updated Python base; env handling
  • .github/workflows/: Enhanced test/deploy/doc CI
  • requirements.txt: New dependencies (e.g., for memory backends)

How to Test

  1. Secret redaction: Log a message with an API key (sk-abc123...), Greptile token, phone number → verify redaction in logs

    python -c "from agent.redact import redact_sensitive_text; print(redact_sensitive_text('key=e93WIf0Vc78igjtgSINvawwBwkdPu5ctglEvo8uvA/dOaiP+'))"
  2. Credential pool: Add multiple credentials per provider; verify round-robin/least-used selection

    hermes config credentials add-pool --provider openrouter --label primary
    hermes config credentials add-pool --provider openrouter --label fallback
  3. Memory setup: Complete onboarding with memory provider selection

    python -m hermes_cli.main setup
  4. Tests: Run full test suite

    pytest tests/ -q
    pytest tests/test_nous_subscription.py -v

Checklist

Code

  • My commit messages follow Conventional Commits
  • My PR contains only changes related to this feature (no unrelated commits)
  • I've added tests for my changes
  • Tested on macOS 15.2

Documentation & Housekeeping

  • I've updated relevant documentation (RELEASE_v0.7.0.md, skill docs)
  • I've updated cli-config.yaml.example for new config keys
  • I've considered cross-platform impact (Windows, macOS, Linux)
  • I've updated tool descriptions/schemas

Screenshots / Logs

No UI changes (backend hardening only).

rewbs and others added 30 commits March 26, 2026 16:17
- add managed modal and gateway-backed tool integrations\n- improve CLI setup, auth, and configuration for subscriber flows\n- expand tests and docs for managed tool support
…l we're ready. Even if users enable it, it'll be blocked server-side for now, until we unlock for non-admin users on tool-gateway.
…gnore

- Combine apt-get update and install into single RUN with cache clearing
- Remove APT lists after installation
- Add --no-cache-dir to pip install
- Add --prefer-offline --no-audit to npm install
- Create .dockerignore to exclude unnecessary files from build context
- Update docker-publish.yml workflow to tag images with release names
- Ensure buildx caching is used (type=gha)
…edact flag

- Add gho_, ghu_, ghs_, ghr_ prefix patterns (OAuth, user-to-server,
  server-to-server, and refresh tokens) — all four types used by
  GitHub Apps and Copilot auth flows were absent from _PREFIX_PATTERNS
- Snapshot HERMES_REDACT_SECRETS at module import time instead of
  re-reading os.getenv() on every call, preventing runtime env mutations
  (e.g. LLM-generated export commands) from disabling redaction
The _REDACT_ENABLED constant is snapshotted at import time, so
monkeypatch.delenv() alone doesn't re-enable redaction during tests
when HERMES_REDACT_SECRETS=false is set in the host environment.
…hromium) (NousResearch#4292)

The SSRF protection added in NousResearch#3041 blocks all private/internal addresses
unconditionally in browser_navigate(). This prevents legitimate local use
cases (localhost apps, LAN devices) when using Camofox or the built-in
headless Chromium without a cloud provider.

The check is only meaningful for cloud backends (Browserbase, BrowserUse)
where the agent could reach internal resources on a remote machine. Local
backends give the user full terminal and network access already — the
SSRF check adds zero security value.

Add _is_local_backend() helper that returns True when Camofox is active
or no cloud provider is configured. Both the pre-navigation and
post-redirect SSRF checks now skip when running locally. The
browser.allow_private_urls config option remains available as an
explicit opt-out for cloud mode.
…docs

* docs: clarify WhatsApp allowlist behavior and document WHATSAPP_ALLOW_ALL_USERS

- Add WHATSAPP_ALLOW_ALL_USERS and WHATSAPP_DEBUG to env vars reference
- Warn that * is not a wildcard and silently blocks all messages
- Show WHATSAPP_ALLOWED_USERS as optional, not required
- Update troubleshooting with the * trap and debug mode tip
- Fix Security section to mention the allow-all alternative

Prompted by a user report in Discord where WHATSAPP_ALLOWED_USERS=*
caused all incoming messages to be silently dropped at the bridge level.

* feat: support * wildcard in platform allowlists

Follow the precedent set by SIGNAL_GROUP_ALLOWED_USERS which already
supports * as an allow-all wildcard.

Bridge (allowlist.js): matchesAllowedUser() now checks for * in the
allowedUsers set before iterating sender aliases.

Gateway (run.py): _is_authorized() checks for * in allowed_ids after
parsing the allowlist. This is generic — works for all platforms, not
just WhatsApp.

Updated docs to document * as a supported value instead of warning
against it. Added WHATSAPP_ALLOW_ALL_USERS and WHATSAPP_DEBUG to
the env vars reference.

Tests: JS allowlist test + 2 Python gateway tests (WhatsApp + Telegram
to verify cross-platform behavior).
The delivery target parser uses split(':', 1) which only splits on the
first colon. For the documented format platform:chat_id:thread_id
(e.g. 'telegram:-1001234567890:17585'), thread_id gets munged into
chat_id and is never extracted.

Fix: split(':', 2) to correctly extract all three parts. Also fix
to_string() to include thread_id for proper round-tripping.

The downstream plumbing in _deliver_to_platform() already handles
thread_id correctly (line 292-293) — it just never received a value.
`hermes config set KEY ""` and `hermes config set KEY 0` were rejected
because the guard used `not value` which is truthy for empty strings,
zero, and False. Changed to `value is None` so only truly missing
arguments are rejected.

Closes NousResearch#4277

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Some models (e.g. Kimi K2.5 on Alibaba OpenAI-compatible endpoint)
emit reasoning text followed by a closing </think> without a matching
opening <think> tag.  The existing paired-tag regexes in
_strip_think_blocks() cannot match these orphaned tags, so </think>
leaks into user-facing responses on all platforms.

Add a catch-all regex that strips any remaining opening or closing
think/thinking/reasoning/REASONING_SCRATCHPAD tags after the existing
paired-block removal pass.

Closes NousResearch#4285
… warning (NousResearch#4294)

* docs: update llama.cpp section with --jinja flag and tool calling guide

The llama.cpp docs were missing the --jinja flag which is required for
tool calling to work. Without it, models output tool calls as raw JSON
text instead of structured API responses, making Hermes unable to
execute them.

Changes:
- Add --jinja and -fa flags to the server startup example
- Replace deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with
  hermes model interactive setup
- Add caution block explaining the --jinja requirement and symptoms
- List models with native tool calling support
- Add /props endpoint verification tip

* docs+feat: comprehensive local LLM provider guides and context length warning

Docs (providers.md):
- Rewrote Ollama section with context length warning (defaults to 4k on
  <24GB VRAM), three methods to increase it, and verification steps
- Rewrote vLLM section with --max-model-len, tool calling flags
  (--enable-auto-tool-choice, --tool-call-parser), and context guidance
- Rewrote SGLang section with --context-length, --tool-call-parser,
  and warning about 128-token default max output
- Added LM Studio section (port 1234, context length defaults to 2048,
  tool calling since 0.3.6)
- Added llama.cpp context length flag (-c) and GPU offload (-ngl)
- Added Troubleshooting Local Models section covering:
  - Tool calls appearing as text (with per-server fix table)
  - Silent context truncation and diagnosis commands
  - Low detected context at startup
  - Truncated responses
- Replaced all deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with
  hermes model interactive setup and config.yaml examples
- Added deprecation warning for legacy env vars in General Setup

Code (cli.py):
- Added context length warning in show_banner() when detected context
  is <= 8192 tokens, with server-specific fix hints:
  - Ollama (port 11434): suggests OLLAMA_CONTEXT_LENGTH env var
  - LM Studio (port 1234): suggests model settings adjustment
  - Other servers: suggests config.yaml override

Tests:
- 9 new tests covering warning thresholds, server-specific hints,
  and no-warning cases
…g.yaml (NousResearch#4298)

The _has_any_provider_configured() guard only checked env vars, .env file,
and auth.json — missing config.yaml model.provider/base_url/api_key entirely.
Users who configured a provider through setup (saving to config.yaml) but had
empty API key placeholders in .env from the install template were permanently
blocked by the 'not configured' message.

Changes:
- _has_any_provider_configured() now checks config.yaml model section for
  explicit provider, base_url, or api_key — covers custom endpoints and
  providers that store credentials in config rather than env vars
- .env.example: comment out all empty API key placeholders so they don't
  pollute the environment when copied to .env by the installer
- .env.example: mark LLM_MODEL as deprecated (config.yaml is source of truth)
- 4 new tests for the config.yaml detection path

Reported by OkadoOP on Discord.
…iled refresh (NousResearch#4300)

When an OAuth token refresh fails on a 401 error, the pool recovery
would return 'not recovered' without trying the next credential in the
pool. This meant users who added a second valid credential via
'hermes auth add' would never see it used when the primary credential
was dead.

Now: try refresh first (handles expired tokens quickly), and if that
fails, rotate to the next available credential — same as 429/402
already did.

Adds three tests covering 401 refresh success, refresh-fail-then-rotate,
and refresh-fail-with-no-remaining-credentials.
Exposes the existing max_turns parameter (cli.py main()) as a CLI flag
so programmatic callers (Paperclip adapter, scripts) can control the
agent's tool-calling iteration limit without editing config.yaml.

Priority chain unchanged: CLI flag > config agent.max_turns > env
HERMES_MAX_ITERATIONS > default 90.
WSL detection was treated as a hard fail, blocking voice mode even when
audio worked via PulseAudio bridge. Now PULSE_SERVER env var presence
makes WSL a soft notice instead of a blocking warning. Device query
failures in WSL with PULSE_SERVER are also treated as non-blocking.
Fixes a zip-slip path traversal vulnerability in hermes profile import.
shutil.unpack_archive() on untrusted tar members allows entries like
../../escape.txt to write files outside ~/.hermes/profiles/.

- Add _normalize_profile_archive_parts() to reject absolute paths
  (POSIX and Windows), traversal (..), empty paths, backslash tricks
- Add _safe_extract_profile_archive() for manual per-member extraction
  that only allows regular files and directories (rejects symlinks)
- Replace shutil.unpack_archive() with the safe extraction path
- Add regression tests for traversal and absolute-path attacks

Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
acsezen and others added 21 commits April 3, 2026 22:40
…ile read guard

Two pre-existing issues causing test_file_read_guards timeouts on CI:

1. agent/redact.py: _ENV_ASSIGN_RE used unbounded [A-Z_]* with
   IGNORECASE, matching any letter/underscore to end-of-string at
   each position → O(n²) backtracking on 100K+ char inputs.
   Bounded to {0,50} since env var names are never that long.

2. tools/file_tools.py: redact_sensitive_text() ran BEFORE the
   character-count guard, so oversized content (that would be rejected
   anyway) went through the expensive regex first. Reordered to check
   size limit before redaction.
Add MiniMax as a fifth TTS provider alongside Edge TTS, ElevenLabs,
OpenAI, and NeuTTS. Supports speech-2.8-hd (recommended default) and
speech-2.8-turbo models via the MiniMax T2A HTTP API.

Changes:
- Add _generate_minimax_tts() with hex-encoded audio decoding
- Add MiniMax to provider dispatch, requirements check, and Telegram
  Opus compatibility handling
- Add MiniMax to interactive setup wizard with API key prompt
- Update TTS documentation and config example

Configuration:
  tts:
    provider: "minimax"
    minimax:
      model: "speech-2.8-hd"
      voice_id: "English_Graceful_Lady"

Requires MINIMAX_API_KEY environment variable.

API reference: https://platform.minimax.io/docs/api-reference/speech-t2a-http
* feat: add /branch (/fork) command for session branching

Inspired by Claude Code's /branch command. Creates a copy of the current
session's conversation history in a new session, allowing the user to
explore a different approach without losing the original.

Works like 'git checkout -b' for conversations:
- /branch            — auto-generates a title from the parent session
- /branch my-idea    — uses a custom title
- /fork              — alias for /branch

Implementation:
- CLI: _handle_branch_command() in cli.py
- Gateway: _handle_branch_command() in gateway/run.py
- CommandDef with 'fork' alias in commands.py
- Uses existing parent_session_id field in session DB
- Uses get_next_title_in_lineage() for auto-numbered branches
- 14 tests covering session creation, history copy, parent links,
  title generation, edge cases, and agent sync

* fix: clear ghost status-bar lines on terminal resize

When the terminal shrinks (e.g. un-maximize), the emulator reflows
previously full-width rows (status bar, input rules) into multiple
narrower rows. prompt_toolkit's _on_resize only cursor_up()s by the
stored layout height, missing the extra rows from reflow — leaving
ghost duplicates of the status bar visible.

Fix: monkey-patch Application._on_resize to detect width shrinks,
calculate the extra rows created by reflow, and inflate the renderer's
cursor_pos.y so the erase moves up far enough to clear ghosts.
OAuth refresh tokens are single-use. When multiple consumers share the
same Anthropic OAuth session (credential pool entries, Claude Code CLI,
multiple Hermes profiles), whichever refreshes first invalidates the
refresh token for all others. This causes a cascade:

1. Pool entry tries to refresh with a consumed refresh token → 400
2. Pool marks the credential as "exhausted" with a 24-hour cooldown
3. All subsequent heartbeats skip the credential entirely
4. The fallback to resolve_anthropic_token() only works while the
   access token in ~/.claude/.credentials.json hasn't expired
5. Once it expires, nothing can auto-recover without manual re-login

Fix:
- Add _sync_anthropic_entry_from_credentials_file() to detect when
  ~/.claude/.credentials.json has a newer refresh token and sync it
  into the pool entry, clearing exhaustion status
- After a successful pool refresh, write the new tokens back to
  ~/.claude/.credentials.json so other consumers stay in sync
- On refresh failure, check if the credentials file has a different
  (newer) refresh token and retry once before marking exhausted
- In _available_entries(), sync exhausted claude_code entries from
  the credentials file before applying the 24-hour cooldown, so a
  manual re-login or external refresh immediately unblocks agents

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address review feedback: replace bare `except: pass` with a debug
log when the post-retry write-back to ~/.claude/.credentials.json
fails. The write-back is best-effort (token is already resolved),
but logging helps troubleshooting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…les (NousResearch#4738)

Add docker_env option to terminal config — a dict of key-value pairs that
get set inside Docker containers via -e flags at both container creation
(docker run) and per-command execution (docker exec) time.

This complements docker_forward_env (which reads values dynamically from
the host process environment). docker_env is useful when Hermes runs as a
systemd service without access to the user's shell environment — e.g.
setting SSH_AUTH_SOCK or GNUPGHOME to known stable paths for SSH/GPG
agent socket forwarding.

Precedence: docker_env provides baseline values; docker_forward_env
overrides for the same key.

Config example:
  terminal:
    docker_env:
      SSH_AUTH_SOCK: /run/user/1000/ssh-agent.sock
      GNUPGHOME: /root/.gnupg
    docker_volumes:
      - /run/user/1000/ssh-agent.sock:/run/user/1000/ssh-agent.sock
      - /run/user/1000/gnupg/S.gpg-agent:/root/.gnupg/S.gpg-agent
Audit found 24+ discrepancies between docs and code. Fixed:

HIGH severity:
- Remove honcho toolset from tools-reference, toolsets-reference, and tools.md
  (converted to memory provider plugin, not a built-in toolset)
- Add note that Honcho is available via plugin

MEDIUM severity:
- Add hermes memory command family to cli-commands.md (setup/status/off)
- Add --clone-all, --clone-from to profile create in cli-commands.md
- Add --max-turns option to hermes chat in cli-commands.md
- Add /btw slash command to slash-commands.md
- Fix profile show example output (remove nonexistent disk usage,
  add .env and SOUL.md status lines)
- Add missing hermes-webhook toolset to toolsets-reference.md
- Add 5 missing providers to fallback-providers.md table
- Add 7 missing providers to providers.md fallback list
- Fix outdated model examples: glm-4-plus→glm-5, moonshot-v1-auto→kimi-for-coding
…, and test flakiness

Bug fixes:
- agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed
  re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual
  env var name chars. Without this, the pattern backtracks exponentially on large
  strings (e.g. 100K tool output), causing test_file_read_guards to time out.
- tools/file_operations.py: over-escaped newline in find -printf format string
  produced literal backslash-n instead of a real newline, breaking file search
  result parsing (total_count always 1, paths concatenated).

Test fixes:
- Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as
  'Hangs in non-interactive environments' but actually run fine:
  - test_413_compression.py (12 tests, 25s)
  - test_file_tools_live.py (71 tests, 24s)
  - test_code_execution.py (61 tests, 99s)
  - test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already)
- test_413_compression.py: fix threshold values in 2 preflight compression tests
  where context_length was too small for the compressed output to fit in one pass.
- test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK.
- test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when
  SDK is not installed so patch() targets exist.
- test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling
  of _gateway_queues — fixes race condition where resolve fires before threads
  register their approval entries, causing the test to hang indefinitely.

Net effect: +256 tests recovered from skip, 8 real failures fixed.
… diagnostics (NousResearch#5077)

Provider coverage:
- Add 6 missing providers to _PROVIDER_ENV_HINTS (Nous, DeepSeek,
  DashScope, HF, OpenCode Zen/Go)
- Add 5 missing providers to API connectivity checks (DeepSeek,
  Hugging Face, Alibaba/DashScope, OpenCode Zen, OpenCode Go)

New diagnostics:
- Config version check — detects outdated config, --fix runs
  non-interactive migration automatically
- Stale root-level config keys — detects provider/base_url at root
  level (known bug source, PR NousResearch#4329), --fix migrates them into
  the model section
- WAL file size check — warns on >50MB WAL files (indicates missed
  checkpoints from the duplicate close() bug), --fix runs PASSIVE
  checkpoint
- Mem0 memory plugin status — checks API key resolution including
  the env+json merge we just fixed
- Browse: POST /api/v1/browse → GET /api/v1/fs/{ls,tree,stat}
- Read: POST /api/v1/read[/abstract] → GET /api/v1/content/{read,abstract,overview}
- System prompt: result.get('children') → len(result) (API returns list)
- Content: result.get('content') → result is a plain string
- Browse: result['entries'] → result is the list; is_dir → isDir (camelCase)
- Browse: add rel_path and abstract fields to entry output

Based on PR NousResearch#4742 by catbusconductor. Auth header changes dropped
(already on main via NousResearch#4825).
…h#5082)

Add an optional 'script' parameter to cron jobs that references a Python
script. The script runs before each agent turn, and its stdout is injected
into the prompt as context. This enables stateful monitoring — the script
handles data collection and change detection, the LLM analyzes and reports.

- cron/jobs.py: add script field to create_job(), stored in job dict
- cron/scheduler.py: add _run_job_script() executor with timeout handling,
  inject script output/errors into _build_job_prompt()
- tools/cronjob_tools.py: add script to tool schema, create/update handlers,
  _format_job display
- hermes_cli/cron.py: add --script to create/edit, display in list/edit output
- hermes_cli/main.py: add --script argparse for cron create/edit subcommands
- tests/cron/test_cron_script.py: 20 tests covering job CRUD, script
  execution, path resolution, error handling, prompt injection, tool API

Script paths can be absolute or relative (resolved against ~/.hermes/scripts/).
Scripts run with a 120s timeout. Failures are injected as error context so
the LLM can report the problem. Empty string clears an attached script.
…oviders (NousResearch#5091)

Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
- Dedicated asyncio event loop for Hindsight async calls (fixes aiohttp session leaks)
- Client caching (reuse instead of creating per-call)
- Local mode daemon management with config change detection and auto-restart
- Memory mode support (hybrid/context/tools) and prefetch method (recall/reflect)
- Proper shutdown with event loop and client cleanup
- Disable HindsightEmbedded.__del__ to avoid GC loop errors
- Update API URLs (app -> ui.hindsight.vectorize.io, api_url -> base_url)
- Setup wizard: conditional fields (when clause), dynamic defaults (default_from)
- Switch dependency install from pip to uv (correct for uv-based venvs)
- Add hindsight-all to plugin.yaml and import mapping
- 12 new tests for dispatch routing and setup field filtering

Original PR NousResearch#5044 by cdbartholomew.
* feat: execute_code runs on remote terminal backends (Docker/SSH/Modal/Daytona/Singularity)

When TERMINAL_ENV is not 'local', execute_code now ships the script to
the remote environment and runs it there via the terminal backend --
the same container/sandbox/SSH session used by terminal() and file tools.

Architecture:
- Local backend: unchanged (UDS RPC, subprocess.Popen)
- Remote backends: file-based RPC via execute_oneshot() polling
  - Script writes request files, parent polls and dispatches tool calls
  - Responses written atomically (tmp + rename) via base64/stdin
  - execute_oneshot() bypasses persistent shell lock for concurrency

Changes:
- tools/environments/base.py: add execute_oneshot() (delegates to execute())
- tools/environments/persistent_shell.py: override execute_oneshot() to
  bypass _shell_lock via _execute_oneshot(), enabling concurrent polling
- tools/code_execution_tool.py: add file-based transport to
  generate_hermes_tools_module(), _execute_remote() with full env
  get-or-create, file shipping, RPC poll loop, output post-processing

* fix: use _get_env_config() instead of raw TERMINAL_ENV env var

Read terminal backend type through the canonical config resolution
path (terminal_tool._get_env_config) instead of os.getenv directly.

* fix: use echo piping instead of stdin_data for base64 writes

Modal doesn't reliably deliver stdin_data to chained commands
(base64 -d > file && mv), producing 0-byte files. Switch to
echo 'base64' | base64 -d which works on all backends.

Verified E2E on both Docker and Modal.
Bring Matrix feature parity with Discord by adding mention gating and
auto-threading. Both default to true, matching Discord behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move mention stripping outside the `if not is_dm` guard so mentions
are stripped in DMs too. Remove the bare-mention early return so a
message containing only a mention passes through as empty string,
matching Discord's behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…verage

- Add base64 token pattern for Greptile/custom endpoint API keys in redaction
- Simplify JSON field regex to rely on re.IGNORECASE
- Fix PooledCredential.extra typing with field(default_factory=dict)
- Add comprehensive test suite for NousFeatureState and model config
- Add CLAUDE.md with project documentation

Fixes: Generic base64 tokens (e.g. Greptile API keys) were not redacted in logs.
Tested: Redaction patterns, dataclass immutability, config parsing edge cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix PooledCredential.extra field typing with field(default_factory=dict)
- Add comprehensive test_nous_subscription.py with 14 test cases
- Add CLAUDE.md with tech stack, structure, conventions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolved 15 merge conflicts. Kept upstream security hardening:
- Gateway approval system with session tokens
- Memory provider shutdown during session expiry
- Discord message deduplication for RESUME replays
- Comprehensive credential pooling and redaction

Conflicts resolved by taking upstream version for gateway platforms,
CLI commands, tool implementations, and approval tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Apr 4, 2026

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: base64 encoding/decoding detected

Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.

Matches (first 20):

28550:+    encoded = base64.b64encode(content.encode("utf-8")).decode("ascii")
28676:+                encoded_result = base64.b64encode(

⚠️ WARNING: exec() or eval() usage

Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.

Matches (first 20):

7033:+Persistent memory via the `brv` CLI — hierarchical knowledge tree with tiered retrieval (fuzzy text → LLM-driven search).
7081:+a hierarchical context tree with tiered retrieval (fuzzy text → LLM-driven
19607:+shell injection, SQL injection, path traversal, eval()/exec() with user input,
32259:+Persistent memory via the `brv` CLI — hierarchical knowledge tree with tiered retrieval (fuzzy text → LLM-driven search). Local-first with optional cloud sync.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

12212:+        resp = self._httpx.post(
30728:+    response = requests.post(base_url, json=payload, headers=headers, timeout=60)

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/memory_setup.py
hermes_cli/setup.py
skills/productivity/google-workspace/scripts/setup.py
tests/skills/test_google_oauth_setup.py

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

19608:+pickle.loads(), obfuscated commands.

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

Set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 to address deprecation of
Node.js 20 on GitHub Actions runners (deprecated June 2, 2026).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Apr 4, 2026

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: base64 encoding/decoding detected

Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.

Matches (first 20):

28572:+    encoded = base64.b64encode(content.encode("utf-8")).decode("ascii")
28698:+                encoded_result = base64.b64encode(

⚠️ WARNING: exec() or eval() usage

Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.

Matches (first 20):

7055:+Persistent memory via the `brv` CLI — hierarchical knowledge tree with tiered retrieval (fuzzy text → LLM-driven search).
7103:+a hierarchical context tree with tiered retrieval (fuzzy text → LLM-driven
19629:+shell injection, SQL injection, path traversal, eval()/exec() with user input,
32281:+Persistent memory via the `brv` CLI — hierarchical knowledge tree with tiered retrieval (fuzzy text → LLM-driven search). Local-first with optional cloud sync.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

12234:+        resp = self._httpx.post(
30750:+    response = requests.post(base_url, json=payload, headers=headers, timeout=60)

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/memory_setup.py
hermes_cli/setup.py
skills/productivity/google-workspace/scripts/setup.py
tests/skills/test_google_oauth_setup.py

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

19630:+pickle.loads(), obfuscated commands.

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@github-actions

github-actions Bot commented Apr 4, 2026

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: base64 encoding/decoding detected

Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.

Matches (first 20):

28957:+    encoded = base64.b64encode(content.encode("utf-8")).decode("ascii")
29083:+                encoded_result = base64.b64encode(

⚠️ WARNING: exec() or eval() usage

Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.

Matches (first 20):

7143:+Persistent memory via the `brv` CLI — hierarchical knowledge tree with tiered retrieval (fuzzy text → LLM-driven search).
7191:+a hierarchical context tree with tiered retrieval (fuzzy text → LLM-driven
19717:+shell injection, SQL injection, path traversal, eval()/exec() with user input,
32666:+Persistent memory via the `brv` CLI — hierarchical knowledge tree with tiered retrieval (fuzzy text → LLM-driven search). Local-first with optional cloud sync.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

12322:+        resp = self._httpx.post(
31135:+    response = requests.post(base_url, json=payload, headers=headers, timeout=60)

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/memory_setup.py
hermes_cli/setup.py
skills/productivity/google-workspace/scripts/setup.py
tests/skills/test_google_oauth_setup.py

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

19718:+pickle.loads(), obfuscated commands.

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@jecruz jecruz merged commit d78ca06 into main Apr 5, 2026
6 of 7 checks passed
@jecruz jecruz deleted the fix/upstream-security-v5 branch April 5, 2026 00:48
jecruz pushed a commit that referenced this pull request Apr 22, 2026
…#13354)

Classic-CLI /steer typed during an active agent run was queued through
self._pending_input alongside ordinary user input.  process_loop, which
drains that queue, is blocked inside self.chat() for the entire run,
so the queued command was not pulled until AFTER _agent_running had
flipped back to False — at which point process_command() took the idle
fallback ("No agent running; queued as next turn") and delivered the
steer as an ordinary next-turn user message.

From Utku's bug report on PR NousResearch#13205: mid-run /steer arrived minutes
later at the end of the turn as a /queue-style message, completely
defeating its purpose.

Fix: add _should_handle_steer_command_inline() gating — when
_agent_running is True and the user typed /steer, dispatch
process_command(text) directly from the prompt_toolkit Enter handler
on the UI thread instead of queueing.  This mirrors the existing
_should_handle_model_command_inline() pattern for /model and is
safe because agent.steer() is thread-safe (uses _pending_steer_lock,
no prompt_toolkit state mutation, instant return).

No changes to the idle-path behavior: /steer typed with no active
agent still takes the normal queue-and-drain route so the fallback
"No agent running; queued as next turn" message is preserved.

Validation:
- 7 new unit tests in tests/cli/test_cli_steer_busy_path.py covering
  the detector, dispatch path, and idle-path control behavior.
- All 21 existing tests in tests/run_agent/test_steer.py still pass.
- Live PTY end-to-end test with real agent + real openrouter model:
    22:36:22 API call #1 (model requested execute_code)
    22:36:26 ENTER FIRED: agent_running=True, text='/steer ...'
    22:36:26 INLINE STEER DISPATCH fired
    22:36:43 agent.log: 'Delivered /steer to agent after tool batch'
    22:36:44 API call #2 included the steer; response contained marker
  Same test on the tip of main without this fix shows the steer
  landing as a new user turn ~20s after the run ended.
jecruz pushed a commit that referenced this pull request Apr 22, 2026
The MCP circuit breaker previously had no path back to the closed
state: once _server_error_counts[srv] reached _CIRCUIT_BREAKER_THRESHOLD
the gate short-circuited every subsequent call, so the only reset
path (on successful call) was unreachable. A single transient
3-failure blip (bad network, server restart, expired token) permanently
disabled every tool on that MCP server for the rest of the agent
session.

Introduce a classic closed/open/half-open state machine:

- Track a per-server breaker-open timestamp in _server_breaker_opened_at
  alongside the existing failure count.
- Add _CIRCUIT_BREAKER_COOLDOWN_SEC (60s). Once the count reaches
  threshold, calls short-circuit for the cooldown window.
- After the cooldown elapses, the *next* call falls through as a
  half-open probe that actually hits the session. Success resets the
  breaker via _reset_server_error; failure re-bumps the count via
  _bump_server_error, which re-stamps the open timestamp and re-arms
  the cooldown.

The error message now includes the live failure count and an
"Auto-retry available in ~Ns" hint so the model knows the breaker
will self-heal rather than giving up on the tool for the whole
session.

Covers tests 1 (half-opens after cooldown) and 2 (reopens on probe
failure); test 3 (cleared on reconnect) still fails pending fix #2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.