Skip to content

Add simple terminal#11

Merged
teknium1 merged 3 commits into
mainfrom
simplify-terminal
Nov 22, 2025
Merged

Add simple terminal#11
teknium1 merged 3 commits into
mainfrom
simplify-terminal

Conversation

@hjc-puro

@hjc-puro hjc-puro commented Nov 17, 2025

Copy link
Copy Markdown
Contributor
  • simplifies terminal to remove sessioning and interaction.
  • adds more lenient exponential backoff up to 1 minute of waiting instead of 10 secs

Testing:

  • tested background with
python run_agent.py "run some process in the background like a timer for 5 minutes and then tell me how much disk i have and then check the timer

@hjc-puro hjc-puro changed the title [DRAFT] Add simple terminal Add simple terminal Nov 17, 2025
@hjc-puro hjc-puro requested a review from teknium1 November 18, 2025 12:13
@teknium1 teknium1 merged commit 30ca282 into main Nov 22, 2025
sudo-yf pushed a commit to sudo-yf/hermes-agent that referenced this pull request Apr 5, 2026
…ojects

Sprint 15: Session Projects + Code Copy + Tool Card Toggle
sudo-yf pushed a commit to sudo-yf/hermes-agent that referenced this pull request Apr 5, 2026
Covers PRs NousResearch#11, NousResearch#13, NousResearch#14, NousResearch#15: Sprint 15 features, security hardening,
OpenRouter routing fix, project picker UX fixes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@a-canary

a-canary commented Apr 8, 2026

Copy link
Copy Markdown

Additional Investigation (2026-04-08)

Findings

Config is correct — no admin-level fix possible:

Config Value
settings.json defaultProvider chutes
settings.json defaultModel zai-org/GLM-5-TEE
CHUTES_API_KEY Set in .env
ANTHROPIC_API_KEY Not set (intentional — project uses chutes only)

Yet cron fails with: Error: No API key found for anthropic

Root Cause Hypothesis

pi-model-router@1.1.0 is installed as a package in project-trading. Its session_start hook auto-registers model groups (strategic, tactical, etc.) and may auto-select the "strategic" group (best available model by GDPval) before the session prompt runs.

The "strategic" group's best method likely routes to anthropic/claude-sonnet-4-6 (highest GDPval), but no anthropic key is configured — hence the error.

This is a pi-model-router or pi-coding-agent bug, not a config issue. Per A-0037, admin infrastructure does not develop pi packages.

What would help diagnose

Running this inside project-trading would reveal the model selection:

docker exec project-trading pi --print "test" 2>&1 | head -20

User Actions Still Required

  1. Claim 00 Anthropic credit: https://claude.ai/settings/usage
  2. If project-trading needs autonomous cron, either:
    a. Add ANTHROPIC_API_KEY to instances/trading/.env (if you want anthropic), OR
    b. Override the watchdog cron to use --model chutes:zai-org/GLM-5-TEE explicitly

malaiwah pushed a commit to malaiwah/hermes-agent that referenced this pull request Apr 11, 2026
…sionDB (adopt NousResearch#5989)' (NousResearch#11) from fix/status-tokens-from-sessiondb into main
@a-canary

Copy link
Copy Markdown

Tonight's Analysis (2026-04-14)

Finding: Director CLI IS working with Max subscription auth. Verified:

claude auth status
{"loggedIn": true, "authMethod": "oauth_token", "apiProvider": "firstParty"}

cd ~/pi-director && claude --print --model opus "Say 'Director OK'"
→ Director OK

Conclusion: Issue #11 has TWO separate problems:

  1. pi update cron failures — Affects admin + project containers. pi-model-router's strategic group auto-selects anthropic even when chutes is configured as primary. This is a pi-model-router/pi-coding-agent issue, not a pi-admin infrastructure issue. User needs to either claim 00 credit OR configure pi to use chutes-only for strategic model selection.

  2. Director/Developer dual-agent — Uses Claude Code CLI with Max subscription (oauth_token auth), NOT direct ANTHROPIC_API_KEY. These are WORKING. PLAN.md items blocked by 'Issue Add simple terminal #11' may actually be unblocked.

Remaining test items (should be unblocked):

  • Director responds to Discord messages
  • Developer spawns and completes TDD cycle
  • Parallel Developers (max 6)

Still blocked (user action needed):

  • pi update cron (billing credit OR chutes-only config)
  • Discord project channels (manual in Discord UI)
  • Archive orphaned .pi dirs (M-0006 confirmation)

@a-canary

Copy link
Copy Markdown

Closing as not planned. The billing error only affected deprecated container infrastructure that has been decommissioned:

  • All project containers (conjecture, dashboard, expert-horde, starlight-slm, trading) are stopped and archived (docker-compose files at ~/archive/docker-compose-20260410/)
  • The pi update cron entry is already commented out in instances/admin/crontab
  • CLI-based Director/Developer testing works correctly with Max subscription auth (confirmed 2026-04-15)
  • If containers are ever revived: claim the credit at claude.ai/settings/usage OR ensure ANTHROPIC_API_KEY is removed from instance .env files to use Chutes-only fallback

@a-canary

Copy link
Copy Markdown

Marking as not planned. The billing error only affected deprecated container infrastructure that has been decommissioned:

  • All project containers (conjecture, dashboard, expert-horde, starlight-slm, trading) are stopped and archived
  • The pi update cron entry is already commented out in instances/admin/crontab
  • CLI-based Director/Developer testing works correctly with Max subscription auth (confirmed 2026-04-15)
  • If containers are ever revived: claim the credit at claude.ai/settings/usage OR remove ANTHROPIC_API_KEY from instance .env files to use Chutes-only fallback

No code changes needed — container infrastructure is deprecated, CLI-based dual-agent is unaffected.

swswordholy-tech added a commit to swswordholy-tech/hermes-agent that referenced this pull request Apr 16, 2026
…EADME, AGENTS, website

Completes the remaining ADDING_A_PLATFORM.md integration points after
the MVP vertical slice (ff70b47..3babac1). Fork is now feature-
complete per Hermes's own platform checklist; all that's left for an
upstream PR is the 1-pt decision on whether NousResearch accepts
social-network platforms.

Integration points added:
  - cron/scheduler.py  — `"agentchat": Platform.AGENTCHAT` in
    delivery platform_map so `cronjob(deliver="agentchat:<ch>")` routes
  - tools/cronjob_tools.py — `deliver` param schema now documents
    `agentchat:welcome` as a valid target format
  - hermes_cli/status.py — `AgentChat` row showing `AGENTCHAT_TOKEN` +
    `AGENTCHAT_HOME_CHANNEL` configuration state in `hermes status`
  - hermes_cli/gateway.py — setup wizard `_PLATFORMS` entry with
    interactive prompts for TOKEN / AGENT_ID / ALLOWED_USERS /
    HOME_CHANNEL / REQUIRE_MENTION / WS_URL (`hermes setup gateway`
    → pick AgentChat)

Docs added / updated:
  - website/docs/user-guide/messaging/agentchat.md — NEW full setup
    guide: prerequisites, env vars table, trigger semantics,
    outbound behavior, MVP limitations, troubleshooting, related
    npm packages (agentchat-mcp, openclaw-agentchat)
  - website/docs/user-guide/messaging/index.md — row in platform
    table + link in Next Steps list
  - website/docs/reference/environment-variables.md — 8 new
    AGENTCHAT_* variable rows
  - README.md — AgentChat added to Messaging Gateway platform list
  - AGENTS.md — `gateway/platforms/` listing updated

channel_directory.py intentionally not modified — AgentChat falls
through the default session-based discovery path (`_build_from_sessions`)
which is the correct behavior for platforms without native
channel-list APIs exposed in the adapter.

redact.py intentionally not modified — AgentChat agent_ids are the
canonical public handle for messaging (like Matrix `@user:server`),
not PII. No regex-level redaction needed.

Verified:
  - 20 unit tests still pass (0.84s)
  - All 4 modified .py files pass `ast.parse`
  - Real-world roundtrip previously verified (sha:3babac1 commit msg)

Fork state after this commit: 9 of 16 ADDING_A_PLATFORM.md points
covered directly, 3 (NousResearch#11 channel_directory, NousResearch#14 redact, NousResearch#16 integration
tests) intentionally skipped with justification, 20 unit tests cover
the 5 core boundary invariants.
@kskotetsu

Copy link
Copy Markdown

Closed as PR #11 has been merged.

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
alt-glitch added a commit that referenced this pull request Apr 28, 2026
Address adversarial review findings:

1. Race condition (#1): Job-level concurrency with cancel-in-progress
   collapses back-to-back pushes; ref: main checkout always gets latest
   branch state; explicit push target (origin HEAD:main).

2. Loop prevention (#2): File-whitelist check before commit aborts if
   any file outside nix/{tui,web}.nix was modified, preventing
   accidental self-triggering.

3. Silent infra failures (#8): nix-lockfile-check now fails explicitly
   when fix-lockfiles exits without reporting stale status (catches nix
   setup failures, network errors, script bugs that bypass continue-on-error).

4. Commit traceability (#11): Auto-fix commits include source SHA and
   workflow run URL in the commit body.

5. Explicit push target (#12): git push origin HEAD:main instead of
   bare git push.
alt-glitch added a commit that referenced this pull request Apr 28, 2026
* ci(nix): auto-fix stale npm hashes on push to main

When a PR merges to main with updated package-lock.json or package.json
in ui-tui/ or web/, the new auto-fix-main job detects stale npmDepsHash
values and pushes a fix commit directly to main.

This eliminates the recurring manual hash-bump PRs (#15420, #15314,
#15272, #15244) by reusing the existing fix-lockfiles --apply pipeline.

The fix commit only touches nix/*.nix files, which are outside the push
path filter (package-lock.json / package.json), so it cannot re-trigger
itself.

Closes #15314

* fix(ci): use GitHub App token for auto-fix-main push

GITHUB_TOKEN commits are invisible to workflow triggers (GitHub's
infinite-loop prevention). The auto-fix-main job pushes directly to
main, so the fix commit never triggered downstream nix.yml verification.

Mint a short-lived token via the repo's GitHub App (daimon-nous, APP_ID
+ APP_PRIVATE_KEY secrets) so the push is treated as a real event and
nix.yml fires to verify the corrected hashes.

Tested via workflow_dispatch dry-run: app token minted successfully,
checkout with app token succeeded, fix job correctly gated.

Resolves review feedback from Bugbot (r3144569551).

* ci(nix): rename lockfile check job for required status check

Rename 'check' → 'nix-lockfile-check' so the status check name is
unambiguous when added as a required check on main.

* fix(ci): harden auto-fix-main against races, loops, and silent failures

Address adversarial review findings:

1. Race condition (#1): Job-level concurrency with cancel-in-progress
   collapses back-to-back pushes; ref: main checkout always gets latest
   branch state; explicit push target (origin HEAD:main).

2. Loop prevention (#2): File-whitelist check before commit aborts if
   any file outside nix/{tui,web}.nix was modified, preventing
   accidental self-triggering.

3. Silent infra failures (#8): nix-lockfile-check now fails explicitly
   when fix-lockfiles exits without reporting stale status (catches nix
   setup failures, network errors, script bugs that bypass continue-on-error).

4. Commit traceability (#11): Auto-fix commits include source SHA and
   workflow run URL in the commit body.

5. Explicit push target (#12): git push origin HEAD:main instead of
   bare git push.

---------

Co-authored-by: alt-glitch <alt-glitch@users.noreply.github.com>
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
* ci(nix): auto-fix stale npm hashes on push to main

When a PR merges to main with updated package-lock.json or package.json
in ui-tui/ or web/, the new auto-fix-main job detects stale npmDepsHash
values and pushes a fix commit directly to main.

This eliminates the recurring manual hash-bump PRs (NousResearch#15420, NousResearch#15314,
NousResearch#15272, NousResearch#15244) by reusing the existing fix-lockfiles --apply pipeline.

The fix commit only touches nix/*.nix files, which are outside the push
path filter (package-lock.json / package.json), so it cannot re-trigger
itself.

Closes NousResearch#15314

* fix(ci): use GitHub App token for auto-fix-main push

GITHUB_TOKEN commits are invisible to workflow triggers (GitHub's
infinite-loop prevention). The auto-fix-main job pushes directly to
main, so the fix commit never triggered downstream nix.yml verification.

Mint a short-lived token via the repo's GitHub App (daimon-nous, APP_ID
+ APP_PRIVATE_KEY secrets) so the push is treated as a real event and
nix.yml fires to verify the corrected hashes.

Tested via workflow_dispatch dry-run: app token minted successfully,
checkout with app token succeeded, fix job correctly gated.

Resolves review feedback from Bugbot (r3144569551).

* ci(nix): rename lockfile check job for required status check

Rename 'check' → 'nix-lockfile-check' so the status check name is
unambiguous when added as a required check on main.

* fix(ci): harden auto-fix-main against races, loops, and silent failures

Address adversarial review findings:

1. Race condition (NousResearch#1): Job-level concurrency with cancel-in-progress
   collapses back-to-back pushes; ref: main checkout always gets latest
   branch state; explicit push target (origin HEAD:main).

2. Loop prevention (NousResearch#2): File-whitelist check before commit aborts if
   any file outside nix/{tui,web}.nix was modified, preventing
   accidental self-triggering.

3. Silent infra failures (NousResearch#8): nix-lockfile-check now fails explicitly
   when fix-lockfiles exits without reporting stale status (catches nix
   setup failures, network errors, script bugs that bypass continue-on-error).

4. Commit traceability (NousResearch#11): Auto-fix commits include source SHA and
   workflow run URL in the commit body.

5. Explicit push target (NousResearch#12): git push origin HEAD:main instead of
   bare git push.

---------

Co-authored-by: alt-glitch <alt-glitch@users.noreply.github.com>
donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
* ci(nix): auto-fix stale npm hashes on push to main

When a PR merges to main with updated package-lock.json or package.json
in ui-tui/ or web/, the new auto-fix-main job detects stale npmDepsHash
values and pushes a fix commit directly to main.

This eliminates the recurring manual hash-bump PRs (NousResearch#15420, NousResearch#15314,
NousResearch#15272, NousResearch#15244) by reusing the existing fix-lockfiles --apply pipeline.

The fix commit only touches nix/*.nix files, which are outside the push
path filter (package-lock.json / package.json), so it cannot re-trigger
itself.

Closes NousResearch#15314

* fix(ci): use GitHub App token for auto-fix-main push

GITHUB_TOKEN commits are invisible to workflow triggers (GitHub's
infinite-loop prevention). The auto-fix-main job pushes directly to
main, so the fix commit never triggered downstream nix.yml verification.

Mint a short-lived token via the repo's GitHub App (daimon-nous, APP_ID
+ APP_PRIVATE_KEY secrets) so the push is treated as a real event and
nix.yml fires to verify the corrected hashes.

Tested via workflow_dispatch dry-run: app token minted successfully,
checkout with app token succeeded, fix job correctly gated.

Resolves review feedback from Bugbot (r3144569551).

* ci(nix): rename lockfile check job for required status check

Rename 'check' → 'nix-lockfile-check' so the status check name is
unambiguous when added as a required check on main.

* fix(ci): harden auto-fix-main against races, loops, and silent failures

Address adversarial review findings:

1. Race condition (NousResearch#1): Job-level concurrency with cancel-in-progress
   collapses back-to-back pushes; ref: main checkout always gets latest
   branch state; explicit push target (origin HEAD:main).

2. Loop prevention (NousResearch#2): File-whitelist check before commit aborts if
   any file outside nix/{tui,web}.nix was modified, preventing
   accidental self-triggering.

3. Silent infra failures (NousResearch#8): nix-lockfile-check now fails explicitly
   when fix-lockfiles exits without reporting stale status (catches nix
   setup failures, network errors, script bugs that bypass continue-on-error).

4. Commit traceability (NousResearch#11): Auto-fix commits include source SHA and
   workflow run URL in the commit body.

5. Explicit push target (NousResearch#12): git push origin HEAD:main instead of
   bare git push.

---------

Co-authored-by: alt-glitch <alt-glitch@users.noreply.github.com>
teknium1 added a commit that referenced this pull request May 5, 2026
#20226)

* docs(AGENTS.md): add curator/cron/delegation/toolsets, fix plugin tree, frontmatter, auto-discovery caveat

Closes #19101 and #19107 (@pty819).

Verified 16 claims from those two issues against current main. 12 were
real gaps; 2 were generated/hallucinated (#10 unverified --now flag is
actually real and already cited in AGENTS.md; #11 stale PR refs #5587
and #4950 do not appear in AGENTS.md at all); 2 were low-prio nits
(memory provider hierarchy, --now scope enumeration) deferred.

Changes:
- Project tree: add yuanbao to platforms comment; expand plugins/
  subtree with real directory names (kanban, hermes-achievements,
  observability, image_gen) instead of vague '<others>'.
- Test-count blurb: 15k/700 Apr → 17k/900 May (verified: 17,375 test
  defs, 915 files).
- Adding New Tools: clarify that auto-discovery wires up schemas but
  the tool only reaches an agent if its name is added to a toolset in
  toolsets.py. _HERMES_CORE_TOOLS is not dead code.
- Adding Configuration: enumerate top-level config.yaml sections
  including auxiliary and curator; note auxiliary is per-task
  overrides for side-LLM work.
- SKILL.md frontmatter: add author, license, related_skills. Note
  top-level tags/category are mirrored from metadata.hermes.*.
- New section 'Toolsets' — enumerates the 30 current TOOLSETS keys
  (including yuanbao, kanban, moa, spotify, safe, debugging).
- New section 'Delegation (delegate_task)' — sync semantics, batch
  mode, leaf vs orchestrator roles, config knobs, durability caveat.
- New section 'Curator (skill lifecycle)' — core files, 11 CLI verbs,
  telemetry sidecar, invariants (pin/delete split after PR #20220,
  bundled/hub off-limits), curator.* config section.
- New section 'Cron (scheduled jobs)' — 4 schedule formats, 7 CLI
  verbs, per-job fields, 3-min hard interrupt, catchup/grace windows,
  tick.lock, cron→session isolation.

Skipped (invalid claims):
- #19107 item 10: --now is real (hermes_cli/skills_hub.py:624/966/1013/1470)
- #19107 item 11: no '#5587' or '#4950' or 'async_delegation' in AGENTS.md

* docs(AGENTS.md): add Kanban section

Adds a Kanban entry alongside Curator / Cron / Delegation so the major
durable background systems are all represented. Covers the CLI verbs,
the HERMES_KANBAN_TASK-gated worker toolset, the in-gateway dispatcher,
plugin assets, and the board/tenant isolation model. Points at the full
742-line user docs for detail.
moltapollo-cvtx pushed a commit to moltapollo-cvtx/hermes-agent that referenced this pull request May 7, 2026
Companion to rename_entity. Same transactional shape; different job:
rename mutates one row in place, merge consolidates two rows into
one.

merge_entities(source_id, target_id) -> dict
  - INSERT OR IGNORE re-points every (fact_id, source) to (fact_id,
    target); existing dual links collapse to a single target link
  - DELETE remaining (fact_id, source) rows
  - union source.aliases + source.name into target.aliases (case-
    insensitive dedup, target.name itself excluded)
  - DELETE source entity row
  - re-encode every formerly-source-linked fact via
    _compute_hrr_vector(commit=False) so its bundle reflects the new
    entity set
  - rebuild affected category banks
  - all in one transaction; rollback on any failure

Tests (9 new, 175/175 in full memory suite):
- re-point + source row deletion + facts_re_encoded count
- alias union with case-insensitive dedup
- dual-link collapse (fact linked to both source and target ends up
  with one target link, no duplicate)
- empty-source merge (zero facts) still merges aliases and deletes
  the source row, no re-encode
- ValueError on self-merge, KeyError on missing ids
- encoding_version refresh on re-encoded facts
- post-merge probe by source.name resolves via merged alias
- rollback atomicity on simulated _compute_hrr_vector failure

Live DB merges (entity_id=11, 4, 10 → 114):
  Before: 7 Apollo* rows, recall split between NousResearch#11 (95 facts) and
          NousResearch#114 (96 facts) for the same conceptual company
  After:  4 Apollo* rows. NousResearch#114 now holds 100 facts (95 source links
          collapsed to 4 unique-to-source via INSERT OR IGNORE).
          Probe recall consolidated and stronger:
            fact_id=431  +0.5291 → +0.6405
            fact_id=34   +0.5051 → +0.6196
            fact_id=71   +0.4030 → +0.4481
          Probes by every alias (canonical, AER, AEG, bare "Apollo")
          return identical top-3.

Doctor post-merge: status=warn (only the pre-existing 4 dangling
fact_id refs in fact_entities; merge introduced zero new orphans —
re-point happens before delete). Smoke probe('Apollo Energy Group')
raw_sim=+0.6448 in 20ms over 507 facts.

Still open after this commit:
- 4 orphan fact_entities rows (pre-existing fact deletions that
  didn't clean up the join table)
- entity_id=125 "Apollo Energy Resources LLC Sales" (1 fact) and
  entity_id=96 "Apollo_Investigation_Vault" (1 fact) — judgment
  calls on whether to merge into NousResearch#114 or leave separate
- record_feedback dual-write to trust_score, retrieval_count product
  question, memory_banks dead-write table — all deferred from
  earlier commits

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gpapas pushed a commit to Gpapas/hermes-agent that referenced this pull request May 23, 2026
Four findings from Copilot's review on PR NousResearch#22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `NousResearch#11..NousResearch#40` even though the JSON
   `elements` array only held NousResearch#1..NousResearch#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mucky010 pushed a commit to Mucky010/hermes-agent that referenced this pull request May 24, 2026
Four findings from Copilot's review on PR NousResearch#22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `NousResearch#11..NousResearch#40` even though the JSON
   `elements` array only held NousResearch#1..NousResearch#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
exosyphon pushed a commit to exosyphon/hermes-agent that referenced this pull request May 24, 2026
Four findings from Copilot's review on PR NousResearch#22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `NousResearch#11..NousResearch#40` even though the JSON
   `elements` array only held NousResearch#1..NousResearch#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
slundell added a commit to slundell/hermes-agent that referenced this pull request May 26, 2026
The background-review fork is engineered to share the parent's prompt
prefix cache (issue NousResearch#25322 / PR NousResearch#17276), but it sets skip_memory=True,
which drops the holographic memory plugin's tools (fact_store,
fact_feedback). That makes the fork's tools[] diverge from the parent's
(37 vs 39), so the request prefix is no longer byte-identical and the
27B prefix cache thrashes — ~5.9M tokens reprocessed over a 20h window.

Until the upstream tools[]-parity bug is fixed, route the review fork
to a cheap aux model via a new HERMES_REVIEW_MODEL env var. Env-gated:
a no-op when the var is unset, so default behaviour is unchanged.

Interim, non-additive patch — revert when fixed upstream. Full analysis
in the deployment ISSUES.md (NousResearch#11).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
verkyyi added a commit to verkyyi/hermes-agent that referenced this pull request May 28, 2026
Upstream catch-up: 1,239 commits merged; only the 9 fork-patched files
conflicted. Resolutions (see docs/LOCAL_PATCHES.md for each patch):

- pyproject.toml: kept CVE pins (aiohttp 3.13.4, anthropic 0.87.0,
  cryptography 46.0.7); adopted upstream pytest-timeout, dropped xdist.
- run_agent.py: upstream split it into thin forwarders + new agent/* modules.
  Re-homed AgentFeeds (NousResearch#9) -- helpers stay in run_agent.py; manifest wiring
  into agent/system_prompt.build_system_prompt_parts (lazy import, monkeypatch
  -safe); config init in agent/agent_init.py. Re-homed origin request_id (NousResearch#1)
  into agent/agent_runtime_helpers + agent/tool_executor.
- hermes_cli/kanban_db.py: ported our crash diagnostics (provider_failure,
  failure_diagnostics) onto upstream's dict crash_details + layered upstream
  error-fingerprinting/systemic detection; unioned forced-skill machinery
  (NousResearch#5/NousResearch#11) with upstream has_spawnable_review/scratch-tip/accept-hooks/
  model_override.
- gateway/run.py: re-homed timestamp prefix (NousResearch#8) into _build_gateway_agent_
  history; combined notify-interval + heartbeat (kept public-progress cadence,
  added upstream edit-in-place heartbeat msg id).
- tests: took upstream conversation_loop refactor + new board= suite; unioned
  fixture env-isolation; added CRASH_GRACE_SECONDS=0 to worker_env fixture.

Merge-touched suites green (851 passed). Remaining full-suite failures are
pre-existing/environmental (macOS services, optional deps).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
lenardhuebner88-rgb pushed a commit to lenardhuebner88-rgb/hermes-agent that referenced this pull request May 28, 2026
…ring

Paket B wiring (from sync-20260526), Discord end-state only:
- runtime_health() reports status/latency/heartbeat-age/lag_class,
  reading op=11 freshness from discord.py 2.x KeepAliveHandler
  (_last_ack) — on_socket_response was removed in 2.0 and never fired.
- connect/disconnect go through _mark_connected/_mark_disconnected and
  a periodic _health_refresh_loop so heartbeat-age stays fresh.
- Zombie-WS detection via seconds-scale heartbeat-age thresholds
  (Review-Finding NousResearch#11/NousResearch#12). NOT the intermediate on_socket_response
  variant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
Four findings from Copilot's review on PR NousResearch#22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `NousResearch#11..NousResearch#40` even though the JSON
   `elements` array only held NousResearch#1..NousResearch#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#AI commit#
mosaiq-systems pushed a commit to mosaiq-systems/hermes-agent that referenced this pull request May 29, 2026
Four findings from Copilot's review on PR NousResearch#22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `NousResearch#11..NousResearch#40` even though the JSON
   `elements` array only held NousResearch#1..NousResearch#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MattKotsenas added a commit to MattKotsenas/hermes-agent that referenced this pull request May 30, 2026
Match the other sandbox backends' per-init filesystem isolation. Docker
stamps a fresh 'hermes-<uuid>' container name on every _init (docker.py:508),
so a destroyed-then-recreated env always sees a brand-new filesystem.

Gondolin's sandbox_dir is deterministic from task_id, and _setup_overlay_mounts
keeps the scratch dir (overlays/<safe>/{upper,work,merged}) on disk across env
lifecycles. The next env that mounts the same guest_path under the same
sandbox_dir inherits the prior session's writes via the persisted upper layer
— a real cross-session contamination bug, not just a disk leak.

Fix: _teardown_overlay_mounts now rmtrees the per-mount scratch dir
(merged.parent) after the lazy unmount returns. Lazy unmount + open-fd-keeps-
inode-alive means this is safe even if the daemon hasn't fully released
handles. Crash recovery still preserves upper/ because the import-time
sweep only unmounts and never rmtrees.

This also closes design-doc revisit item NousResearch#9 (failed-init cleanup).

Test:
  tests/integration/test_gondolin_terminal.py::test_overlay_writes_do_not_leak_
  between_env_lifecycles

A KVM-gated integration test that asserts the behavioural invariant via
the public GondolinEnvironment.execute() API: env1 writes a file into an
overlay extra_mount, env2 (same sandbox_dir, same mount config) must not
see it. Implementation-agnostic — no mention of upper/ or fuse-overlayfs —
so a future migration to a custom upstream VFSProvider (the @earendil-works/
gondolin package ships vfs/provider) satisfies the same contract trivially
and the test passes for free.

Doc updates (DO NOT MERGE revisit list):
  - NousResearch#9 marked resolved (this fix)
  - NousResearch#6 narrowed: lists the one test we now have and what's still missing
  - NousResearch#10 added: task_id='default' is shared across all top-level agents at
    the hermes/gateway layer; concurrent-tenancy isolation needs a
    per-session task_id and is out of scope for this branch
  - NousResearch#11 added: overlay=true + missing readonly is a silent UX trap
    (host-side scratch is created, daemon makes guest mount EROFS)

Regression: all 118 gondolin unit + integration tests pass.

DO NOT MERGE — see docs/design/gondolin-terminal-backend.md.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
* ci(nix): auto-fix stale npm hashes on push to main

When a PR merges to main with updated package-lock.json or package.json
in ui-tui/ or web/, the new auto-fix-main job detects stale npmDepsHash
values and pushes a fix commit directly to main.

This eliminates the recurring manual hash-bump PRs (NousResearch#15420, NousResearch#15314,
NousResearch#15272, NousResearch#15244) by reusing the existing fix-lockfiles --apply pipeline.

The fix commit only touches nix/*.nix files, which are outside the push
path filter (package-lock.json / package.json), so it cannot re-trigger
itself.

Closes NousResearch#15314

* fix(ci): use GitHub App token for auto-fix-main push

GITHUB_TOKEN commits are invisible to workflow triggers (GitHub's
infinite-loop prevention). The auto-fix-main job pushes directly to
main, so the fix commit never triggered downstream nix.yml verification.

Mint a short-lived token via the repo's GitHub App (daimon-nous, APP_ID
+ APP_PRIVATE_KEY secrets) so the push is treated as a real event and
nix.yml fires to verify the corrected hashes.

Tested via workflow_dispatch dry-run: app token minted successfully,
checkout with app token succeeded, fix job correctly gated.

Resolves review feedback from Bugbot (r3144569551).

* ci(nix): rename lockfile check job for required status check

Rename 'check' → 'nix-lockfile-check' so the status check name is
unambiguous when added as a required check on main.

* fix(ci): harden auto-fix-main against races, loops, and silent failures

Address adversarial review findings:

1. Race condition (NousResearch#1): Job-level concurrency with cancel-in-progress
   collapses back-to-back pushes; ref: main checkout always gets latest
   branch state; explicit push target (origin HEAD:main).

2. Loop prevention (NousResearch#2): File-whitelist check before commit aborts if
   any file outside nix/{tui,web}.nix was modified, preventing
   accidental self-triggering.

3. Silent infra failures (NousResearch#8): nix-lockfile-check now fails explicitly
   when fix-lockfiles exits without reporting stale status (catches nix
   setup failures, network errors, script bugs that bypass continue-on-error).

4. Commit traceability (NousResearch#11): Auto-fix commits include source SHA and
   workflow run URL in the commit body.

5. Explicit push target (NousResearch#12): git push origin HEAD:main instead of
   bare git push.

---------

Co-authored-by: alt-glitch <alt-glitch@users.noreply.github.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
NousResearch#20226)

* docs(AGENTS.md): add curator/cron/delegation/toolsets, fix plugin tree, frontmatter, auto-discovery caveat

Closes NousResearch#19101 and NousResearch#19107 (@pty819).

Verified 16 claims from those two issues against current main. 12 were
real gaps; 2 were generated/hallucinated (NousResearch#10 unverified --now flag is
actually real and already cited in AGENTS.md; NousResearch#11 stale PR refs NousResearch#5587
and NousResearch#4950 do not appear in AGENTS.md at all); 2 were low-prio nits
(memory provider hierarchy, --now scope enumeration) deferred.

Changes:
- Project tree: add yuanbao to platforms comment; expand plugins/
  subtree with real directory names (kanban, hermes-achievements,
  observability, image_gen) instead of vague '<others>'.
- Test-count blurb: 15k/700 Apr → 17k/900 May (verified: 17,375 test
  defs, 915 files).
- Adding New Tools: clarify that auto-discovery wires up schemas but
  the tool only reaches an agent if its name is added to a toolset in
  toolsets.py. _HERMES_CORE_TOOLS is not dead code.
- Adding Configuration: enumerate top-level config.yaml sections
  including auxiliary and curator; note auxiliary is per-task
  overrides for side-LLM work.
- SKILL.md frontmatter: add author, license, related_skills. Note
  top-level tags/category are mirrored from metadata.hermes.*.
- New section 'Toolsets' — enumerates the 30 current TOOLSETS keys
  (including yuanbao, kanban, moa, spotify, safe, debugging).
- New section 'Delegation (delegate_task)' — sync semantics, batch
  mode, leaf vs orchestrator roles, config knobs, durability caveat.
- New section 'Curator (skill lifecycle)' — core files, 11 CLI verbs,
  telemetry sidecar, invariants (pin/delete split after PR NousResearch#20220,
  bundled/hub off-limits), curator.* config section.
- New section 'Cron (scheduled jobs)' — 4 schedule formats, 7 CLI
  verbs, per-job fields, 3-min hard interrupt, catchup/grace windows,
  tick.lock, cron→session isolation.

Skipped (invalid claims):
- NousResearch#19107 item 10: --now is real (hermes_cli/skills_hub.py:624/966/1013/1470)
- NousResearch#19107 item 11: no 'NousResearch#5587' or 'NousResearch#4950' or 'async_delegation' in AGENTS.md

* docs(AGENTS.md): add Kanban section

Adds a Kanban entry alongside Curator / Cron / Delegation so the major
durable background systems are all represented. Covers the CLI verbs,
the HERMES_KANBAN_TASK-gated worker toolset, the in-gateway dispatcher,
plugin assets, and the board/tenant isolation model. Points at the full
742-line user docs for detail.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ex models (NousResearch#24182)

* feat(codex-runtime): scaffold optional codex app-server runtime

Foundational commit for an opt-in alternate runtime that hands OpenAI/Codex
turns to a 'codex app-server' subprocess instead of Hermes' tool dispatch.
Default behavior is unchanged.

Lands in three pieces:

1. agent/transports/codex_app_server.py — JSON-RPC 2.0 over stdio speaker
   for codex's app-server protocol (codex-rs/app-server). Spawn, init
   handshake, request/response, notification queue, server-initiated
   request queue (for approval round-trips), interrupt-friendly blocking
   reads. Tested against real codex 0.130.0 binary end-to-end during
   development.

2. hermes_cli/runtime_provider.py:
   - Adds 'codex_app_server' to _VALID_API_MODES.
   - Adds _maybe_apply_codex_app_server_runtime() helper, called at the
     end of _resolve_runtime_from_pool_entry(). Inert unless
     'model.openai_runtime: codex_app_server' is set in config.yaml AND
     provider in {openai, openai-codex}. Other providers cannot be
     rerouted (anthropic, openrouter, etc. preserved).

3. tests/agent/transports/test_codex_app_server_runtime.py — 24 tests
   covering api_mode registration, the rewriter helper (default-off,
   case-insensitive, opt-in, non-eligible providers preserved), version
   parser, missing-binary handling, error class. Does NOT require codex
   CLI installed.

This commit is wire-only: the api_mode is recognized but AIAgent does
not yet branch on it. Followup commits add the session adapter, event
projector, approval bridge, transcript projection (so memory/skill
review still works), plugin migration, and slash command.

Existing tests remain green:
- tests/cli/test_cli_provider_resolution.py (29 passed)
- tests/agent/test_credential_pool_routing.py (included above)

* feat(codex-runtime): add codex item projector for memory/skill review

The translator that lets Hermes' self-improvement loop keep working under the
Codex runtime: converts codex 'item/*' notifications into Hermes' standard
{role, content, tool_calls, tool_call_id} message shape that
agent/curator.py already knows how to read.

Item taxonomy (matches codex-rs/app-server-protocol/src/protocol/v2/item.rs):
  - userMessage          → {role: user, content}
  - agentMessage         → {role: assistant, content: text}
  - reasoning            → stashed in next assistant's 'reasoning' field
  - commandExecution     → assistant tool_call(name='exec_command') + tool result
  - fileChange           → assistant tool_call(name='apply_patch') + tool result
  - mcpToolCall          → assistant tool_call(name='mcp.<server>.<tool>') + tool result
  - dynamicToolCall      → assistant tool_call(name=<tool>) + tool result
  - plan/hookPrompt/etc  → opaque assistant note, no fabricated tool_calls

Invariants preserved:
  - Message role alternation never violated: each tool item produces at most
    one assistant + one tool message in that order, correlated by call_id.
  - Streaming deltas (item/<type>/outputDelta, item/agentMessage/delta)
    don't materialize messages — only item/completed does. Mirrors how
    Hermes already only writes the assistant message after streaming ends.
  - Tool call ids are deterministic (codex item id-based) so replays produce
    identical messages and prefix caches stay valid (AGENTS.md pitfall NousResearch#16).
  - JSON args use sorted_keys for the same reason.

Real wire formats verified against codex 0.130.0 by capturing live
notifications from thread/shellCommand and including one as a fixture
(COMMAND_EXEC_COMPLETED).

23 new tests, all green:
  - Streaming deltas don't materialize (3 paths)
  - Turn/thread frame events are silent
  - commandExecution: 5 tests including non-zero exit annotation +
    deterministic id stability across replays
  - agentMessage + reasoning attachment + reasoning consumption
  - fileChange: summary without inlined content
  - mcpToolCall: namespaced naming + error surfacing
  - userMessage: text fragments only (drops images/etc)
  - opaque items: no fabricated tool_calls
  - Helpers: deterministic id stability + sorted JSON args
  - Role alternation invariant across all four tool-shaped item types

This commit is a pure addition. AIAgent integration (the wire that uses the
projector) is the next commit.

* feat(codex-runtime): add session adapter + approval bridge

The third self-contained module: CodexAppServerSession owns one Codex
thread per Hermes session, drives turn/start, consumes streaming
notifications via CodexEventProjector, handles server-initiated approval
requests, and translates cancellation into turn/interrupt.

The adapter has a single public per-turn method:

    result = session.run_turn(user_input='...', turn_timeout=600)
    # result.final_text          → assistant text for the caller
    # result.projected_messages  → list ready to splice into AIAgent.messages
    # result.tool_iterations     → tick count for _iters_since_skill nudge
    # result.interrupted         → True on Ctrl+C / deadline / interrupt
    # result.error               → error string when the turn cannot complete
    # result.turn_id, thread_id  → for sessions DB / resume

Behavior:

  - ensure_started() spawns codex, does the initialize handshake, and
    issues thread/start with cwd + permissions profile. Idempotent.
  - run_turn() blocks until turn/completed, drains server-initiated
    requests (approvals) before reading notifications so codex never
    deadlocks waiting for us, projects every item/completed via the
    projector, and increments tool_iterations for the skill nudge gate.
  - request_interrupt() is thread-safe (threading.Event); the next loop
    iteration issues turn/interrupt and unwinds.
  - turn_timeout deadlock guard issues turn/interrupt and records an
    error if the turn never completes.
  - close() escalates terminate → kill via the underlying client.

Approval bridge:

  Codex emits server-initiated requests for execCommandApproval and
  applyPatchApproval. The adapter translates Hermes' approval choice
  vocabulary onto codex's decision vocabulary:

    Hermes 'once'                → codex 'approved'
    Hermes 'session' or 'always' → codex 'approvedForSession'
    Hermes 'deny' / anything else → codex 'denied'

  Routing precedence:
    1. _ServerRequestRouting.auto_approve_* flags (cron / non-interactive)
    2. approval_callback wired by the CLI (defers to
       tools.approval.prompt_dangerous_approval())
    3. Fail-closed denial when neither is wired

  Unknown server-request methods are answered with JSON-RPC error -32601
  so codex doesn't hang waiting for us.

Permission profile mapping mirrors AGENTS.md:
    Hermes 'auto'              → codex 'workspace-write'
    Hermes 'approval-required' → codex 'read-only-with-approval'
    Hermes 'unrestricted/yolo' → codex 'full-access'

20 new tests, all green. Combined with prior commits this PR now has
67 tests across three modules:
  - test_codex_app_server_runtime.py: 24 (api_mode + transport surface)
  - test_codex_event_projector.py: 23 (item taxonomy projections)
  - test_codex_app_server_session.py: 20 (turn loop + approvals + interrupts)

Full tests/agent/transports/ directory: 249/249 pass — no regressions
to existing transport tests.

Still no wire into AIAgent.run_conversation(); that integration commit
is small and goes next.

* feat(codex-runtime): wire codex_app_server runtime into AIAgent

The integration commit. AIAgent.run_conversation() now early-returns to a
new helper _run_codex_app_server_turn() when self.api_mode ==
'codex_app_server', bypassing the chat_completions tool loop entirely.

Three small surgical edits to run_agent.py (~105 LOC total):

1. Line ~1204 (constructor api_mode validation set):
   Add 'codex_app_server' so an explicit api_mode='codex_app_server'
   passed to AIAgent() isn't silently rewritten to 'chat_completions'.

2. Line ~12048 (run_conversation, just before the while loop):
   Early-return to _run_codex_app_server_turn() when self.api_mode is
   'codex_app_server'. Placed AFTER all standard pre-loop setup —
   logging context, session DB, surrogate sanitization, _user_turn_count
   and _turns_since_memory increments, _ext_prefetch_cache, memory
   manager on_turn_start — so behavior outside the model-call loop is
   identical between paths. Default Hermes flow is unchanged when the
   flag is off.

3. End-of-class (line ~15497):
   New method _run_codex_app_server_turn(). Lazy-instantiates one
   CodexAppServerSession per AIAgent (reused across turns), runs the
   turn, splices projected_messages into messages, increments
   _iters_since_skill by tool_iterations (since the chat_completions
   loop normally does that per iteration), fires
   _spawn_background_review on the same cadence as the default path.

Counter accounting:

  _turns_since_memory  ← already incremented at run_conversation:11817
                         (gated on memory store configured) — codex
                         helper does NOT touch it (would double-count).
  _user_turn_count     ← already incremented at run_conversation:11793
                         — codex helper does NOT touch it.
  _iters_since_skill   ← incremented in the chat_completions loop per
                         tool iteration. Codex helper increments by
                         turn.tool_iterations since the loop is bypassed.

User message:

  ALREADY appended to messages by run_conversation pre-loop (line 11823)
  before the early-return reaches us. Helper does NOT append again.
  Regression test test_user_message_not_duplicated guards this.

Approval callback wiring:

  Lazy-fetches tools.terminal_tool._get_approval_callback at session
  spawn time, passes to CodexAppServerSession. CLI threads with
  prompt_toolkit get interactive approvals; gateway/cron contexts get
  the codex-side fail-closed deny.

Error path:

  Codex session exceptions become a 'partial' result with completed=False
  and a final_response that explicitly tells the user how to switch back:
  'Codex app-server turn failed: ... Fall back to default runtime with
  /codex-runtime auto.' Same return-dict shape as the chat_completions
  path so all callers (gateway, CLI, batch_runner, ACP) work unchanged.

9 new integration tests in tests/run_agent/test_codex_app_server_integration.py:
  - api_mode='codex_app_server' is accepted on AIAgent construction
  - run_conversation returns the expected codex shape
    (final_response, codex_thread_id, codex_turn_id, completed, partial)
  - Projected messages are spliced into messages list
  - _iters_since_skill ticks per tool iteration
  - _user_turn_count delegated to standard flow (not double-counted)
  - User message appears exactly once (regression guard)
  - _spawn_background_review IS invoked (memory/skill review keeps working)
  - chat.completions.create is NEVER called (loop fully bypassed)
  - Session exception → partial result with /codex-runtime auto hint
  - Interrupted turn → partial result with error preserved

Adjacent test runs confirm no regressions:
  - tests/run_agent/test_memory_nudge_counter_hydration.py: green
  - tests/run_agent/test_background_review.py: green
  - tests/run_agent/test_fallback_model.py: green
  - tests/agent/transports/: 249/249 green

Still missing for full feature: /codex-runtime slash command, plugin
migration helper, docs page, live e2e test gated on codex binary. Those
are the remaining followup commits.

* feat(codex-runtime): add /codex-runtime slash command (CLI + gateway)

User-facing toggle for the optional codex app-server runtime. Follows the
'Adding a Slash Command (All Platforms)' pattern from AGENTS.md exactly:
single CommandDef in the central registry → CLI handler → gateway handler
→ running-agent guard → all surfaces (autocomplete, /help, Telegram menu,
Slack subcommands) update automatically.

Surface:
    /codex-runtime                    — show current state + codex CLI status
    /codex-runtime auto               — Hermes default runtime
    /codex-runtime codex_app_server   — codex subprocess runtime
    /codex-runtime on / off           — synonyms

Files changed:

  hermes_cli/codex_runtime_switch.py (new):
    Pure-Python state machine shared by CLI and gateway. Parse args,
    read/write model.openai_runtime in the config dict, gate enabling
    behind a codex --version check (don't let users opt in to a runtime
    they have no binary for; print npm install hint instead).
    Returns a CodexRuntimeStatus dataclass that callers render however
    suits their surface.

  hermes_cli/commands.py:
    Single CommandDef entry, no aliases (codex-runtime is its own thing).

  cli.py:
    Dispatch in process_command() + _handle_codex_runtime() handler that
    delegates to the shared module and renders results via _cprint.

  gateway/run.py:
    Dispatch in _handle_message() + _handle_codex_runtime_command() that
    returns a string (gateway sends as message). On a successful change
    that requires a new session, _evict_cached_agent() forces the next
    inbound message to construct a fresh AIAgent with the new api_mode —
    avoids prompt-cache invalidation mid-session.

  gateway/run.py running-agent guard:
    /codex-runtime joins /model in the early-intercept block so a runtime
    flip mid-turn can't split a turn across two transports.

Tests:
  tests/hermes_cli/test_codex_runtime_switch.py — 25 tests covering the
  state machine: arg parsing (10 cases incl. case-insensitive and
  synonyms), reading current runtime (5 cases incl. malformed configs),
  writing runtime (3 cases), apply() entry point covering read-only,
  no-op, codex-missing-blocked, codex-present-success, disable-no-binary-check,
  and persist-failure paths (8 cases). All green.

Adjacent test suites confirm no regressions:
  - tests/hermes_cli/test_commands.py + test_codex_runtime_switch.py:
    167/167 green
  - tests/agent/transports/: 283/283 green when combined with prior commits

Still missing: plugin migration helper, docs page, live e2e test gated on
codex binary. Followup commits.

* feat(codex-runtime): auto-migrate Hermes MCP servers to ~/.codex/config.toml

Translates the user's mcp_servers config from ~/.hermes/config.yaml into
the TOML format codex's MCP client expects. Wired into the
/codex-runtime codex_app_server enable path so users get their MCP tool
surface in the spawned subprocess automatically.

The migration runs on every enable. Failures are non-fatal — the runtime
change still proceeds and the user gets a warning so they can fix the
codex config manually.

What translates (mapping verified against codex-rs/core/src/config/edit.rs):
  Hermes mcp_servers.<n>.command/args/env  → codex stdio transport
  Hermes mcp_servers.<n>.url/headers       → codex streamable_http transport
  Hermes mcp_servers.<n>.timeout           → codex tool_timeout_sec
  Hermes mcp_servers.<n>.connect_timeout   → codex startup_timeout_sec
  Hermes mcp_servers.<n>.cwd               → codex stdio cwd
  Hermes mcp_servers.<n>.enabled: false    → codex enabled = false

What does NOT translate (warned + skipped per server):
  Hermes-specific keys (sampling, etc.) — codex's MCP client has no
  equivalent. Listed in the per-server skipped[] field of the report.

What's NOT migrated (intentional):
  AGENTS.md — codex respects this file natively in its cwd. Hermes' own
  AGENTS.md (project-level) is already in the worktree, so codex picks
  it up without translation. No code needed.

Idempotency design:
  All managed content lives between a 'managed by hermes-agent' marker
  and the next non-mcp_servers section header. _strip_existing_managed_block
  removes the prior managed region cleanly, preserving any user-added
  codex config (model, providers.openai, sandbox profiles, etc.) above
  or below.

Files added:
  hermes_cli/codex_runtime_plugin_migration.py — pure-Python migration
    helper. Public API: migrate(hermes_config, codex_home=None,
    dry_run=False) returns MigrationReport with .migrated/.errors/
    .skipped_keys_per_server. No external TOML dependency — minimal
    formatter handles strings/numbers/booleans/lists/inline-tables.

  tests/hermes_cli/test_codex_runtime_plugin_migration.py — 39 tests
  covering:
    - per-server translation (12): stdio/http/sse, cwd, timeouts,
      enabled flag, command+url precedence, sampling drop, unknown keys
    - TOML formatter (8): types, escaping, inline tables, error case
    - existing-block stripping (4): no marker, alone, with user content
      above, with user content below
    - end-to-end migrate() (8): empty, dry-run, round-trip, idempotent
      re-run, preserves user config, error reporting, invalid input,
      summary formatting

Files changed:
  hermes_cli/codex_runtime_switch.py — apply() now calls migrate() in
    the codex_app_server enable branch. Migration failure logs a warning
    in the result message but does NOT fail the runtime change. Disable
    path (auto) explicitly skips migration.

  tests/hermes_cli/test_codex_runtime_switch.py — 3 new tests:
    test_enable_triggers_mcp_migration, test_disable_does_not_trigger_migration,
    test_migration_failure_does_not_block_enable.

All 325 feature tests green:
  - tests/agent/transports/: 249 (incl. 67 new)
  - tests/run_agent/test_codex_app_server_integration.py: 9
  - tests/hermes_cli/test_codex_runtime_switch.py: 28 (3 new)
  - tests/hermes_cli/test_codex_runtime_plugin_migration.py: 39 (new)

* perf(codex-runtime): cache codex --version check within apply()

Single /codex-runtime invocation could spawn 'codex --version' up to 3
times (state report, enable gate, success message). Each spawn is ~50ms,
so the cumulative cost wasn't a crisis, but it was wasteful and turned a
trivial slash command into something noticeably laggy on slower systems.

Refactored to lazy-once via a closure over a nonlocal cache. First call
spawns; subsequent calls in the same apply() reuse the result.

Behavior unchanged — same return shape, same error handling, same install
hint when codex is missing. Just one subprocess per call instead of three.

Two regression-guard tests added:
  - test_binary_check_cached_within_apply: enable path → call_count == 1
  - test_binary_check_cached_on_read_only_call: state-report path → call_count == 1

Total tests for /codex-runtime now 30 (was 28); all 143 codex-runtime
tests still green.

* fix(codex-runtime): correct protocol field names found via live e2e test

Three real bugs caught only by running a turn end-to-end against codex
0.130.0 with a real ChatGPT subscription. Unit tests passed because they
asserted on our own (incorrect) wire shapes; the wire format from
codex-rs/app-server-protocol/src/protocol/v2/* is the source of truth and
my initial reading of the README was incomplete.

Bug 1: thread/start.permissions wire format

Was sending {"profileId": "workspace-write"}.
Real format per PermissionProfileSelectionParams enum (tagged union):
  {"type": "profile", "id": "workspace-write"}
AND requires the experimentalApi capability declared during initialize.
AND requires a matching [permissions] table in ~/.codex/config.toml or
codex fails the request with 'default_permissions requires a [permissions]
table'.

Fix: stop overriding permissions on thread/start. Codex picks its default
profile (read-only unless user configures otherwise), which matches what
codex CLI users expect — they configure their default permission profile
in ~/.codex/config.toml the standard way. Trying to be clever about
profile selection broke every turn we tested.

Live error before fix: 'Invalid request: missing field type' on every
turn/start, even though our turn/start payload was correct — the field
codex was complaining about was inside the permissions sub-object we
shouldn't have been sending.

Bug 2: server-request method names

Was matching 'execCommandApproval' and 'applyPatchApproval'.
Real names per common.rs ServerRequest enum:
  item/commandExecution/requestApproval
  item/fileChange/requestApproval
  item/permissions/requestApproval (new third method)

Fix: match the documented names. Added handler for
item/permissions/requestApproval that always declines — codex sometimes
asks to escalate permissions mid-turn and silent acceptance would surprise
users.

Live symptom before fix: agent.log showed
'Unknown codex server request: item/commandExecution/requestApproval'
and codex stalled because we replied with -32601 (unsupported method)
instead of an approval decision. The agent reported back 'The write
command was rejected' even though Hermes never showed the user an
approval prompt.

Bug 3: approval decision values

Was sending decision strings 'approved'/'approvedForSession'/'denied'.
Real values per CommandExecutionApprovalDecision enum (camelCase):
  accept, acceptForSession, decline, cancel
(also AcceptWithExecpolicyAmendment and ApplyNetworkPolicyAmendment
variants we don't currently use).

Fix: rename _approval_choice_to_codex_decision return values; update
auto_approve_* fallbacks; update fail-closed default from 'denied' to
'decline'. Test mapping table updated to match.

Live test verified after fixes:
  $ hermes (with model.openai_runtime: codex_app_server)
  > Run the shell command: echo hermes-codex-livetest > .../proof.txt
    then read it back

  Approval prompt fired with 'Codex requests exec in <cwd>'.
  User chose 'Allow once'. Codex executed the command, wrote the file,
  read it back. Final response: 'Read back from proof.txt:
  hermes-codex-livetest'. File contents on disk match.

agent.log confirms:
  codex app-server thread started: id=019e200e profile=workspace-write
                                    cwd=/tmp/hermes-codex-livetest/workspace

All 20 session tests still green after wire-format updates.

* fix(codex-runtime): correct apply_patch approval params + ship docs

Live e2e revealed FileChangeRequestApprovalParams doesn't carry the
changeset (just itemId, threadId, turnId, reason, grantRoot) — Codex's
'reason' field describes what the patch wants to do. Test config and
display logic updated to use it. The first 'apply_patch (0 change(s))'
display from the live test is now 'apply_patch: <reason>'.

Adds website/docs/user-guide/features/codex-app-server-runtime.md
covering enable/disable, prerequisites, approval UX, MCP migration
behavior, permission profile delegation to ~/.codex/config.toml, known
limitations, and the architecture diagram. Wired into the Automation
category in sidebars.ts.

Live e2e validation across the path matrix:
  ✓ thread/start handshake
  ✓ turn/start with text input
  ✓ commandExecution items + projection
  ✓ item/commandExecution/requestApproval → Hermes UI → response
  ✓ Approve once → command runs
  ✓ Deny → command rejected, codex falls back to read-only message
  ✓ Multi-turn (codex remembers prior turn's results)
  ✓ apply_patch via Codex's fileChange path
  ✓ item/fileChange/requestApproval → Hermes UI
  ✓ MCP server migration loads inside spawned codex (verified via
    'use the filesystem MCP tool' prompt)
  ✓ /codex-runtime auto → codex_app_server toggle cycle
  ✓ Disable doesn't trigger migration
  ✓ Enable with codex CLI present succeeds + migrates
  ✓ Hermes-side interrupt path (turn/interrupt request issued cleanly
    even if codex finishes before the interrupt lands)

Known live-validated limitations now documented in the docs page:
  - delegate_task subagents unavailable on this runtime
  - permission profile selection delegated to ~/.codex/config.toml
  - apply_patch approval prompt has no inline changeset (codex protocol
    doesn't expose it)

145/145 codex-runtime tests still green.

* feat(codex-runtime): native plugin migration + UX polish (quirks 2/4/5/10/11)

Major: migrate native Codex plugins (NousResearch#7 in OpenClaw's PR list)

Discovers installed curated plugins via codex's plugin/list RPC and
writes [plugins."<name>@<marketplace>"] entries to ~/.codex/config.toml
so they're enabled in the spawned Codex sessions. This is the
'YouTube-video-worthy' bit Pash highlighted: when a user has
google-calendar, github, etc. installed in their Codex CLI, those
plugins activate automatically when they enable Hermes' codex runtime.

Implementation:
  - hermes_cli/codex_runtime_plugin_migration.py: new _query_codex_plugins()
    helper spawns 'codex app-server' briefly and walks plugin/list. Returns
    (plugins, error) — failures are non-fatal so MCP migration still works.
  - render_codex_toml_section() now takes plugins + permissions args.
  - migrate() defaults: discover_plugins=True, default_permission_profile=
    'workspace-write'. Explicit None on either disables that side.
  - _strip_existing_managed_block() now also strips [plugins.*] and
    [permissions]/[permissions.*] sections inside the managed block, so
    re-runs replace plugins cleanly without touching codex's own config.

Quirk fixes:

NousResearch#2 Default permissions profile written on enable.
   Without this, Codex's read-only default kicks in and EVERY write
   triggers an approval prompt. Now writes [permissions] default =
   'workspace-write' so the runtime feels normal out of the box. Set
   default_permission_profile=None to opt out.

NousResearch#4 apply_patch approval prompt now shows what's changing.
   Codex's FileChangeRequestApprovalParams doesn't carry the changeset.
   Session adapter now caches the fileChange item from item/started
   notifications and looks it up by itemId when codex requests approval.
   Prompt shows '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of
   'apply_patch (0 change(s))'.

   Side benefit: also drains pending notifications BEFORE handling a
   server request, so the projector and per-turn caches are up to date
   when the approval decision fires. Bounded to 8 notifications per
   loop iter to avoid starving codex's response.

NousResearch#5/NousResearch#10 Exec approval prompt never shows empty cwd.
   When codex omits cwd in CommandExecutionRequestApprovalParams, fall
   back to the session's cwd. If somehow neither is available, show
   '<unknown>' explicitly instead of an empty string.

   Also surfaces 'reason' from the approval params when codex provides
   it — gives users more context on why codex wants to run something.

NousResearch#11 Banner indicates the codex_app_server runtime when active.
   New 'Runtime: codex app-server (terminal/file ops/MCP run inside
   codex)' line appears in the welcome banner only when the runtime is
   on. Default banner is unchanged.

Tests:
  - 7 new tests in test_codex_runtime_plugin_migration.py covering
    plugin discovery (mocked), failure handling, dry-run skip, opt-out
    flag, idempotent re-runs, and permissions writing.
  - 3 new tests in test_codex_app_server_session.py covering the
    enriched approval prompts: cwd fallback, change summary on
    apply_patch, fallback when no item/started cache exists.
  - All 26 session tests + 46 migration tests green; 153 total in PR.

* feat(codex-runtime): hermes-tools MCP callback + native plugin migration

The big architectural addition: when codex_app_server runtime is on,
Hermes registers its own tool surface as an MCP server in
~/.codex/config.toml so the codex subprocess can call back into Hermes
for tools codex doesn't ship with — web_search, browser_*, vision,
image_generate, skills, TTS.

Also: 'migrate native codex plugins' (Pash's YouTube-video-worthy bit) —
when the user has plugins like Linear, GitHub, Gmail, Calendar, Canva
installed via 'codex plugin', Hermes discovers them via plugin/list and
writes [plugins.<name>@openai-curated] entries so they activate
automatically.

New module: agent/transports/hermes_tools_mcp_server.py
  FastMCP stdio server exposing 17 Hermes tools. Each call dispatches
  through model_tools.handle_function_call() — same code path as the
  Hermes default runtime. Run with:
    python -m agent.transports.hermes_tools_mcp_server [--verbose]

  Exposed: web_search, web_extract, browser_navigate / _click / _type /
    _press / _snapshot / _scroll / _back / _get_images / _console /
    _vision, vision_analyze, image_generate, skill_view, skills_list,
    text_to_speech.

  NOT exposed (deliberately):
    - terminal/shell/read_file/write_file/patch — codex has built-ins
    - delegate_task/memory/session_search/todo — _AGENT_LOOP_TOOLS in
      model_tools.py:493, require running AIAgent context. Documented
      as a limitation and surfaced in the slash command output.

Migration changes (hermes_cli/codex_runtime_plugin_migration.py):
  - _query_codex_plugins() spawns 'codex app-server' briefly to walk
    plugin/list and pull installed openai-curated plugins. Failures are
    non-fatal — MCP migration still completes.
  - render_codex_toml_section() now takes plugins + permissions args
    AND wraps the managed block with a MIGRATION_END_MARKER comment so
    the stripper can reliably find both ends, even when the block
    contains top-level keys (default_permissions = ...).
  - migrate() defaults: discover_plugins=True, expose_hermes_tools=True,
    default_permission_profile=':workspace' (built-in codex profile name
    — must be prefixed with ':'). All three opt-out via explicit args.
  - _build_hermes_tools_mcp_entry() builds the codex stdio entry with
    HERMES_HOME and PYTHONPATH passthrough so a worktree-launched
    Hermes points the MCP subprocess at the same module layout.

Live-caught wire bugs fixed during this turn:
  1. Permission profile config key is top-level , NOT a [permissions] table. The [permissions] table is
     for *user-defined* profiles with structured fields. Built-in
     profile names start with ':' (':workspace', ':read-only',
     ':danger-no-sandbox'). Was emitting
     which codex rejected with 'invalid type: string "X", expected
     struct PermissionProfileToml'.
  2. Built-in profile is , NOT . Codex
     rejected  with 'unknown built-in profile'.
  3. Codex's MCP layer sends  for
     tool-call confirmation. We weren't handling it, so codex stalled
     and returned 'MCP tool call was rejected'. Now: auto-accept for
     our own hermes-tools server (user already opted in by enabling
     the runtime), decline for third-party servers.

Quirk fixes shipped (from the limitations list):
  NousResearch#2 default permissions: workspace profile written on enable. No more
     approval prompt on every write.
  NousResearch#4 apply_patch approval shows what's changing: cache fileChange
     items from item/started, look up by itemId when codex sends
     item/fileChange/requestApproval. Prompt: '1 add, 1 update:
     /tmp/new.py, /tmp/old.py' instead of '0 change(s)'.
  NousResearch#5/NousResearch#10 exec approval cwd never empty: fall back to session cwd, then
     '<unknown>'. Also surfaces 'reason' from codex when present.
  NousResearch#11 banner shows 'Runtime: codex app-server' line when active so
     users understand why tool counts may not match what's reachable.

Tests:
  - 5 new tests in test_codex_runtime_plugin_migration.py covering
    plugin discovery, expose_hermes_tools entry generation, idempotent
    re-runs, opt-out flag, permissions profile.
  - 3 new tests in test_codex_app_server_session.py covering enriched
    approval prompts (cwd fallback, fileChange summary).
  - 2 new tests for mcpServer/elicitation/request handling (accept
    hermes-tools, decline others).
  - New test file test_hermes_tools_mcp_server.py covering module
    surface, EXPOSED_TOOLS safety invariants (no shell/file_ops,
    no agent-loop tools), and main() error paths.
  - 166 codex-runtime tests total, all green.

Live e2e validated against codex 0.130.0 + ChatGPT subscription:
  ✓ /codex-runtime codex_app_server enables, migrates filesystem MCP,
    registers hermes-tools, writes default_permissions = ':workspace'
  ✓ Banner shows 'Runtime: codex app-server' line in subsequent sessions
  ✓ Shell command runs without approval prompt (workspace profile works)
  ✓ Multi-turn — codex remembers prior turn's results
  ✓ apply_patch path via fileChange request approval
  ✓ web_search via hermes-tools MCP callback returns real Firecrawl
    results: 'OpenAI Codex CLI – Getting Started' end-to-end in 13s
  ✓ Disable cycle clean

Docs updated: website/docs/user-guide/features/codex-app-server-runtime.md
  Full re-write covering native plugin migration, the hermes-tools
  callback architecture, the prerequisites change ('codex login is
  separate from hermes auth login codex'), the trade-off table now
  reflecting which Hermes tools work via callback, and the limitations
  list updated with what's actually unavailable on this runtime.

* feat(codex-runtime): pin user-config preservation invariant for quirk NousResearch#6

Quirk NousResearch#6 from the limitations list — user MCP servers / overrides /
codex-only sections in ~/.codex/config.toml that live OUTSIDE the
hermes-managed block must survive re-migration verbatim.

This already worked thanks to the MIGRATION_MARKER + MIGRATION_END_MARKER
pair I added when fixing the default_permissions wire format (so the
strip can find both ends of the managed region even with top-level
keys like default_permissions). But it was an emergent property
without a test pinning it.

Now explicitly tested:
  - User MCP server above the managed block survives migration
  - User MCP server below the managed block survives migration
  - Both above + below survive a second re-migration
  - User content (model, providers, sandbox, otel, etc.) outside our
    region is left untouched

Docs added a section "Editing ~/.codex/config.toml safely" explaining
the marker contract — so users know they can add their own MCP
servers, override permissions, configure codex-only options, etc.
without fear of Hermes overwriting their work.

167 codex-runtime tests, all green.

* docs(codex-runtime): clarify the actual tool surface — shell covers terminal/read/write/find

Previous docs and PR description undersold what codex's built-in
toolset actually provides. apply_patch alone made it sound like the
runtime could only edit files in patch format — implying you'd lose
terminal use, read_file, write_file, search/find. That was wrong.

Codex's 'shell' tool runs arbitrary shell commands inside the sandbox,
which covers everything you'd do in bash: cat/head/tail (read), echo>
or heredocs (write), find/rg/grep (search), ls/cd (navigate), build/
test/git/etc. apply_patch is for structured multi-file edits on top
of that. update_plan is its in-runtime todo. view_image loads images.
And codex has its own web_search built in (in addition to the
Firecrawl-backed one Hermes exposes via MCP callback).

Docs now have a 'What tools the model actually has' section right
after Why, breaking the surface into three clearly-labeled buckets:

  1. Codex's built-in toolset (always on) — shell, apply_patch,
     update_plan, view_image, web_search; covers everything terminal-
     adjacent.
  2. Native Codex plugins (auto-migrated from your codex plugin
     install) — Linear, GitHub, Gmail, Calendar, Outlook, Canva, etc.
  3. Hermes tool callback (MCP server in ~/.codex/config.toml) —
     web_search/web_extract via Firecrawl, browser_*, vision_analyze,
     image_generate, skill_view/skills_list, text_to_speech.

Plus a 'What's NOT available' callout listing the four agent-loop tools
(delegate_task, memory, session_search, todo) that need running
AIAgent context and can't reach the codex runtime.

Trade-offs table broken out: shell, apply_patch, update_plan,
view_image, sandbox each get their own row with a one-line description
so users can see at a glance what's available natively.

Architecture diagram updated to list the codex built-ins by name
instead of 'apply_patch + shell + sandbox'.

No code changes — purely docs clarification. 167 codex-runtime tests
still green.

* fix(codex-runtime): _spawn_background_review signature + review fork api_mode downgrade

Two real bugs in the self-improvement loop integration that the previous
test mocked away.

Bug 1: wrong call signature

The codex helper was calling self._spawn_background_review() with no
args after every turn. That function actually requires:
  messages_snapshot=list   (positional or keyword)
  review_memory=bool       (at least one trigger must be True)
  review_skills=bool

So the call would have raised TypeError at runtime — except the only
test that exercised this path mocked _spawn_background_review entirely
and just asserted spawn.called, so the wrong-arg shape never surfaced.

Bug 2: review fork inherits codex_app_server api_mode

The review fork is constructed with:
  api_mode = _parent_runtime.get('api_mode')

So when the parent is codex_app_server, the review fork ALSO runs as
codex_app_server. But the review fork's whole job is to call agent-loop
tools (memory, skill_manage) which require Hermes' own dispatch — they
short-circuit with 'must be handled by the agent loop' on the codex
runtime. So the review fork would have run, decided to save something,
called memory or skill_manage, and silently no-op'd.

Fixed in run_agent.py:_spawn_background_review() — when the parent
api_mode is 'codex_app_server', the review fork is downgraded to
'codex_responses' (same OAuth credentials, same openai-codex provider,
but talks to OpenAI's Responses API directly so Hermes owns the loop).

Also rewrote the codex helper's review wiring to match the
chat_completions path:
  - Computes _should_review_memory in the pre-loop block (was already
    being computed; now passed through to the helper as an arg).
  - Computes _should_review_skills AFTER the codex turn returns +
    counters tick (line ~15432 pattern in chat_completions).
  - Calls _spawn_background_review(messages_snapshot=, review_memory=,
    review_skills=) only when at least one trigger fires.
  - Adds the external memory provider sync (_sync_external_memory_for_turn)
    that the chat_completions path runs after every turn.

Tests:

  Replaced the broken test_background_review_invoked (which only
  asserted spawn.called) with three sharper tests:
    - test_background_review_NOT_invoked_below_threshold:
      single turn at default thresholds → no review fires (would have
      caught the original 'every turn calls spawn with no args' bug)
    - test_background_review_skill_trigger_fires_above_threshold:
      10 tool_iterations at threshold=10 → review fires with
      messages_snapshot=list, review_skills=True, counter resets
    - test_background_review_signature_never_breaks: regression guard
      asserting positional args are always empty and kwargs include
      messages_snapshot

  New TestReviewForkApiModeDowngrade class:
    - test_codex_app_server_parent_downgrades_review_fork: drives the
      real _spawn_background_review function (no mock at that level),
      asserts the review_agent gets api_mode='codex_responses' when
      the parent was codex_app_server.

Live-validated against real run_conversation:
  - Counter ticked from 0 to 5 after a 5-tool-iteration turn
  - _spawn_background_review fired exactly once with kwargs-only signature
  - review_skills=True, review_memory=False
  - messages_snapshot was 12 entries (5 assistant tool_calls + 5 tool
    results + 1 final assistant + initial system/user)
  - Counter reset to 0 after fire

170 codex-runtime tests, all green.

Docs: added a Self-improvement loop section to the codex runtime page
explaining both how the trigger logic stays equivalent and that the
review fork is auto-downgraded to codex_responses for the agent-loop
tools. Also clarified that apply_patch and update_plan ARE codex's
built-in tools (the previous version made it sound like they were
separate from 'codex's stuff' — they're not, all five tools listed
in 'What tools the model actually has' section 1 are codex built-ins).

* feat(codex-runtime): expose kanban tools through Hermes MCP callback

Kanban workers spawn as separate hermes chat -q subprocesses that read
the user's config.yaml. If model.openai_runtime: codex_app_server is set
globally (which is the whole point of opt-in), every dispatched worker
ALSO comes up on the codex runtime.

That mostly works — codex's built-in shell + apply_patch + update_plan
do the actual task work fine — but it had one critical break: the
worker handoff tools (kanban_complete, kanban_block, kanban_comment,
kanban_heartbeat) are Hermes-registered tools, not codex built-ins.
On the codex runtime, codex builds its own tool list and these never
reach the model, so the worker would do the work but not be able to
report back, hanging until the dispatcher's timeout escalates it as
zombie.

Fix: add all 9 kanban tools to the EXPOSED_TOOLS list in the Hermes
MCP callback. They dispatch statelessly through handle_function_call()
just like web_search and the others — they read HERMES_KANBAN_TASK
from env (set by the dispatcher), gate correctly (worker tools require
the env var, orchestrator tools require it unset), and write to
~/.hermes/kanban.db.

Why kanban tools work via stateless dispatch when delegate_task/memory/
session_search/todo don't: those four are listed in _AGENT_LOOP_TOOLS
(model_tools.py:493) and short-circuit in handle_function_call() with
'must be handled by the agent loop' — they need to mutate AIAgent's
mid-loop state. Kanban tools have no such requirement; they're pure
side-effect functions against the kanban.db plus state_meta.

Tools exposed:
  Worker handoff (require HERMES_KANBAN_TASK):
    kanban_complete, kanban_block, kanban_comment, kanban_heartbeat
  Read-only board queries:
    kanban_show, kanban_list
  Orchestrator (require HERMES_KANBAN_TASK unset):
    kanban_create, kanban_unblock, kanban_link

Tests:
  - test_kanban_worker_tools_exposed: complete/block/comment/heartbeat
    in EXPOSED_TOOLS (regression guard for the would-hang-worker bug)
  - test_kanban_orchestrator_tools_exposed: create/show/list/unblock/link

Docs:
  - New 'Workflow features' section in the docs page covering /goal,
    kanban, and cron behavior on this runtime
  - /goal: works fully via run_conversation feedback; only caveat is
    approval-prompt noise on long writes-heavy goals (mitigated by
    the default :workspace permission profile)
  - Kanban: enumerated which tools are reachable via the callback and
    why the env var propagates correctly through the codex subprocess
    to the MCP server subprocess
  - Cron: documented as 'not specifically tested' — same rules as the
    CLI apply since cron runs through AIAgent.run_conversation
  - Trade-offs table gained rows for /goal, kanban worker, kanban
    orchestrator

172/172 codex-runtime tests green (+2 from kanban tests).

* docs(codex-runtime): wire /codex-runtime into slash-commands ref + flag aux token cost

Three docs gaps caught during a final audit:

1. /codex-runtime was only in the feature docs page, not in the
   slash-commands reference. Added rows to both the CLI section and
   the Messaging section so users discover it where they'd look for
   slash command syntax.

2. CODEX_HOME and HERMES_KANBAN_TASK weren't in environment-variables.md.
   CODEX_HOME lets users redirect Codex CLI's config dir (the migration
   honors it). HERMES_KANBAN_TASK is set by the kanban dispatcher and
   propagates to the codex subprocess + the hermes-tools MCP subprocess
   so kanban worker tools gate correctly — documented as 'don't set
   manually' since it's an internal handoff.

3. Aux client behavior on this runtime. When openai_runtime=
   codex_app_server is on with the openai-codex provider, every aux
   task (title generation, context compression, vision auto-detect,
   session search summarization, the background self-improvement review
   fork) flows through the user's ChatGPT subscription by default.

   This is true for the existing codex_responses path too, but it's
   more visible / important here because users explicitly opted in for
   subscription billing. Added a 'Auxiliary tasks and ChatGPT
   subscription token cost' section to the docs page with a YAML
   example showing how to override specific aux tasks to a cheaper
   model (typically google/gemini-3-flash-preview via OpenRouter).

   Also documents how the self-improvement review fork gets
   auto-downgraded from codex_app_server to codex_responses by the
   fix earlier in this PR.

No code changes — pure docs. 172 codex-runtime tests still green.

* docs+test(codex-runtime): pin HOME passthrough, document multi-profile + CODEX_HOME

OpenClaw hit a real footgun in openclaw/openclaw#81562: when spawning
codex app-server they were synthesizing a per-agent HOME alongside
CODEX_HOME. That made every subprocess codex's shell tool launches
(gh, git, aws, npm, gcloud, ...) see a fake $HOME and miss the user's
real config files. They had to back it out in PR #81562 — keep
CODEX_HOME isolation, leave HOME alone.

Audit confirms Hermes' codex spawn doesn't have this problem. We do
os.environ.copy() and only overlay CODEX_HOME (when provided) and
RUST_LOG. HOME passes through unchanged. But it was an emergent
property without a test pinning it, so adding a regression guard:

  test_spawn_env_preserves_HOME — confirms parent HOME survives intact
                                  in the subprocess env
  test_spawn_env_sets_CODEX_HOME_when_provided — confirms codex_home
                                                  arg still isolates
                                                  codex state correctly

Docs additions:

  'HOME environment variable passthrough' section — calls out the
  contract explicitly: CODEX_HOME isolates codex's own state, HOME
  stays user-real so gh/git/aws/npm/etc. find their normal config.
  Cites openclaw#81562 as the cautionary tale.

  'Multi-profile / multi-tenant setups' section — addresses the
  related concern: profiles share ~/.codex/ by default. For users who
  want per-profile codex isolation (separate auth, separate plugins),
  documents the manual CODEX_HOME=<profile-scoped-dir> approach.

  Explains why we DON'T auto-scope CODEX_HOME per profile: doing so
  would silently invalidate existing codex login state for anyone
  upgrading to this PR with tokens already at ~/.codex/auth.json.
  Opt-in is safer than surprising users.

174 codex-runtime tests (+2 from HOME guards), all green.

* fix(codex-runtime): TOML control-char escapes + atomic config.toml write

Two footguns caught in a final audit pass before merge.

Bug 1: TOML control characters not escaped

The _format_toml_value() helper escaped backslashes and double quotes
but passed literal control characters (\n, \t, \r, \f, \b) through
unchanged. TOML basic strings don't allow literal control characters
— a path or env var containing a newline would produce invalid TOML
that codex refuses to load.

Realistic exposure: pathological cases like a HERMES_HOME with a
trailing newline (env var concatenation accident), or a PYTHONPATH
with a tab from a multi-line shell heredoc.

Fix: escape all five TOML basic-string control sequences (\b \t \n
\f \r) in addition to \\ and \" that we already did. Order
matters — backslash must come first or the other escapes get
re-escaped.

Bug 2: config.toml write wasn't atomic

If the python process crashed between target.mkdir() and the
write_text() finishing, a half-written config.toml could be left
behind. On NFS / Windows / some FUSE mounts this is a real concern;
on ext4/APFS small writes are usually atomic in practice but not
guaranteed.

Fix: write to a tempfile.mkstemp() temp file in the same directory,
then Path.replace() (atomic same-dir rename on POSIX, ReplaceFile on
Windows). On rename failure, clean up the temp file so repeated
failed migrations don't pile up .config.toml.* files.

Tests:
  - test_string_with_newline_escaped — \n in value → \n in output
  - test_string_with_tab_escaped — \t in value → \t in output
  - test_string_with_other_controls_escaped — \r, \f, \b
  - test_windows_path_escaped_correctly — backslash doubling
  - test_atomic_write_no_temp_leak_on_success — no .config.toml.*
    left over after a successful write
  - test_atomic_write_cleanup_on_rename_failure — temp file removed
    when Path.replace raises (simulated disk full)

180 codex-runtime tests, all green (+6 from this commit).

Footguns audited but NOT fixed (with rationale):

- Concurrent migrations race. Two Hermes processes hitting
  /codex-runtime codex_app_server within seconds of each other could
  cause one writer to lose entries. Low probability (you'd have to
  enable from two surfaces simultaneously) and low impact (just re-run
  migration). Adding fcntl/msvcrt locking is more code than it's
  worth here. The atomic rename above means each individual write is
  consistent — only the merge step is racy.

- Codex protocol version drift. We pin MIN_CODEX_VERSION=0.125 and
  check at runtime but don't reject too-new versions. Right call —
  the protocol has been stable through 0.125 → 0.130. If OpenAI
  breaks it later we'd see the error in test_codex_app_server_runtime
  on CI before users hit it.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
Four findings from Copilot's review on PR NousResearch#22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `NousResearch#11..NousResearch#40` even though the JSON
   `elements` array only held NousResearch#1..NousResearch#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
McClean-Sherlock pushed a commit to McClean-codes/hermes-agent that referenced this pull request Jun 7, 2026
…e removal, LOOP target warnings

NousResearch#8: Card ID parsing uses regex 'Created\s+card\s+(\S+)' with fallback
NousResearch#9: validate checks gate→revision pairs (referenced gates must have dependents)
NousResearch#10: _find_revision_node documented single-revision assumption
NousResearch#11: LOOP target mismatch warns before fallback
NousResearch#12: Removed unused _find_dependents
NousResearch#7: 32 unit tests — DAG, cycles, LOOP regex, failure propagation, state persistence, validation, card parsing
marozau pushed a commit to marozau/hermes-agent that referenced this pull request Jun 8, 2026
Story 8.1: Vendor YAKE keyword extractor as lib/_yake.py (stdlib-only)
  - extract_keywords() returns ≤8 candidates via 5-feature YAKE score
  - Intent classifier fallback with intent_source: yake-fallback telemetry
  - FTS query enrichment via enrich_query_with_yake()

Story 8.2: Recency power-law (1 + hours) ^ -0.3
  - Replaces exp(-Δdays/30) with configurable power-law
  - recency: { form: power_law, exponent: -0.3 } in config.yaml

Story 8.3: Per-type score multiplier + source boost
  - type_boosts: preference:1.2, procedure:1.1, superseded:0.2, etc.
  - user-correction source gets +0.3 boost
  - strength factor 1+0.1*log(1+access_count) ready for Epic 9.1

Story 8.4: Hybrid recall base-score (BM25 + embedding cosine)
  - llm_embed() in hermes_llm.py with _EMBEDDING_DISPATCH
  - embeddings() dispatch in hermes_providers_chat.py
  - base = 0.7*cosine_sim + 0.3*bm25_normalized
  - In-process LRU cache (1024 entries), fail-open to BM25
  - recall.use_embeddings config flag (default true)

Story 8.5: Hard cap heads-up at K=3
  - dedupe_and_cap enforces K=min(config,3)
  - validate_top_k() startup validation [1,3]
  - truncated_count telemetry field

Story 8.6: LLM reranker (preflight_rerank workload)
  - RERANK_PROMPT constant (grep-able)
  - RerankIndices Pydantic schema (Hard Invariant NousResearch#11)
  - Fail-open to score-based ranking
  - recall.use_reranker config flag (default false)

All 56 new tests pass. 116/122 total (6 pre-existing failures unchanged).
Hard invariants NousResearch#1,2,8,10,11 respected.
marozau pushed a commit to marozau/hermes-agent that referenced this pull request Jun 8, 2026
Story 9.1 — access_count + last_hit_at reinforcement:
- Add reinforce_entry() to hermes_memory.py as new canonical sibling
  (Hard Invariant NousResearch#1 extension). Atomically bumps access_count,
  sets last_hit_at=now, pairs with raw-layer reinforce event.
  Body bytes unchanged (content-hash stable).
- Idempotency via raw-layer scan: same (entry_id, source) pair
  reported twice doesn't double-bump. Scans today+yesterday JSONL.
- Wire verify hook: preflight_verify_helper.py --match hit triggers
  reinforce_entry() for each cited ID. Fail-open on failure.
- read_entries() now returns access_count + last_hit_at fields.
- Ranker strength factor (1.0 + 0.1*log(1+access_count)) already
  wired from Epic 8 Story 8.3 — cross-referenced here.

Story 9.2 — Manifest-based dedup in trajectory recorder:
- Add build_manifest() to hermes_memory.py: lists up to 50
  trajectory entries sorted by last_used_at desc.
- Add classify_trajectory_with_manifest(): sends manifest + failure
  pattern to LLM via hermes_llm.llm_call (Hard Invariant NousResearch#2).
  Pydantic-gated response (Hard Invariant NousResearch#11).
  Returns {action: reinforce, id} or {action: new, type, body}.
- Dedup prompt matches upstream PR NousResearch#4480 commit a443d1d shape.
- LLM calls reinforce_entry() for rematch (reuses 9.1's sibling).
- Telemetry: trajectory_outcome: reinforced-existing | new-entry.

Story 9.3 — Skill-dream consumes hit-rate signal:
- Add build_hit_rate_report(): joins preflight telemetry with
  verify_citation events, groups by category, computes hit_rate.
  Gated on ≥ min_fires (default 20) per category.
- Add propose_category_weight_nudges(): applies hard thresholds:
  - hit_rate < 0.15 → nudge_down
  - hit_rate > 0.5 → nudge_up (cites top-3 by access_count)
  - hit_rate < 0.05 AND unrelated > 0.6 → domain blind spot
  All are PROPOSALS only (ADR-2 / FR-14 / Hard Invariant NousResearch#4).

Tests: 36 new tests across 2 files (test_reinforce_entry.py: 11,
test_manifest_dedup.py: 25). All pass. Full suite: 152 pass,
6 pre-existing failures (unchanged), 0 regressions.
marozau pushed a commit to marozau/hermes-agent that referenced this pull request Jun 8, 2026
A4 (critical): N=1 path was dead code — effective_n > 1 skipped the
entire consolidation loop for the default config. Changed to >= 1 so
single-pass consolidation actually runs the LLM call.

A2 (Hard Invariant NousResearch#11): Define PatchProposalList(BaseModel) with
proposals: list[PatchProposal] (max_length=50). Set response_model=
PatchProposalList on LLMSpec in _run_consolidation_pass. Post-hoc
YAML parse is now fallback-only.

A1 (10.6 dead code): Wire _run_baseline_comparison() into
create_dream_artifact() — runs when effective_n > 1. BaselineComparison
written to manifest.baseline_comparison + REPORT.md Multi-Pass Verdict
section now includes top-1 hit-rate delta, cost delta, and VERDICT.

P1: sources.jsonl populated from multi-pass proposals.
P2: XML sentinels (<previous_pass index="N">) replace --- pass_N ---.
P3: Each pass sees only immediate prior pass (not accumulated).
P4: last_valid_proposals tracked separately; canonical patch uses last
validated, not last iteration.
P5: Outcome categorization: llm-failed / parse-failed / empty / ok.
P6: math.log guard: max(0, access_count or 0).
P7: Verdict thresholds: keep>=0.02, revert<0.02 AND cost>/bin/bash.50.
P8: pass_index 0-based per spec.
P9: cost nested inside each pass_audit entry.
P10: _parse_proposals_from_llm returns (proposals, parse_failures).
P11: Contradiction extraction uses stable index for new entries.
P12: Raw context populated from raw layer for contradiction verification.
P13: validate_consolidation_passes rejects bool before int check.
P14: Case-insensitive YAML fence extraction via regex.
P15: hit-rate failures logged with warning (not silent pass).
P16: _run_baseline_comparison cleanup in try/finally.

Tests: 35 Epic 10 tests. Full suite: 189 pass, 6 pre-existing.
marozau pushed a commit to marozau/hermes-agent that referenced this pull request Jun 8, 2026
Three CI scripts to prevent the dead-code and HI NousResearch#11 anti-patterns
that surfaced in Epic 9 and Epic 10 code reviews:

- check_dead_helpers.py (A3): Flags public functions in lib/*.py with
  zero production callers across lib/, scripts/, plugins/. Allowlists
  known public API (CLI entry points, provider dispatch, etc).
- check_hi11_violations.py (A4): Flags LLMSpec(...) constructions in
  lib/hermes_dream.py, lib/hermes_preflight.py, lib/hermes_memory.py
  that lack response_model= or have response_model=None.
- check_done_definition.py (A6): Combined gate — runs (b) dead helper
  check, (c) HI NousResearch#11 check, (d) new-function-has-test check. Story is
  NOT done until all three pass.

Also fixes the one real HI NousResearch#11 violation caught by the new check:
hermes_preflight.py reranker had response_model=None. Moved
RerankIndices to module level and set response_model=RerankIndices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants