Fix VM instance sharing across tasks by hjc-puro · Pull Request #6 · NousResearch/hermes-agent

hjc-puro · 2025-11-03T22:43:07Z

Isolates each VM to a task ID
Guarantees VMs will live for at most 20 minutes

hjc-puro · 2025-11-04T08:37:20Z

+
+        # Clean up VM for this task after conversation completes
+        try:
+            cleanup_vm(effective_task_id)


@teknium1 this part is a bit hacky - I can take it out if you're ok with instances running for ~5 mins after convo ends

- Update _PROVIDER_MODELS['minimax'] from stale ABAB 6.5 models to current MiniMax-M2.7/M2.5/M2.1 lineup (matching hermes-agent upstream) - Update _PROVIDER_MODELS['zai'] from GLM-4 to current GLM-5/4.7/4.5 lineup (matching hermes-agent upstream) - Extend resolve_model_provider() to also return base_url from config.yaml, so providers with custom endpoints (MiniMax, Z.AI) are routed correctly - Pass base_url to AIAgent in both streaming and sync chat paths Fixes NousResearch#6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase4.1 smart suggestions

#3 Cost Analytics Dashboard - New Analytics tab with summary cards (total tokens, cost, avg/mission, today, week) - CSS bar charts: cost by agent, cost by model, daily timeline (7d) - No external chart libraries — pure Tailwind #4 Export Mission as Markdown - Download .md file with full mission report (goal, team, transcript, artifacts) - Copy to clipboard button with visual feedback - Wired into Mission Detail Overlay #5 Word-by-word Streaming in Agent Chat - Replaced polling with SSE EventSource in AgentChatPanel - Real-time chunk streaming with fallback to polling on error - Streaming assistant message updates in-place #6 Remote Agents Panel - Fetches external sessions from gateway /api/sessions - Filters out local agent sessions — shows only remote/external - Auto-polls every 15s, card layout with status, model, tokens, cost - Open Chat links to ClawSuite chat tab #7 Real-time Collaboration (Presence) - BroadcastChannel-based cross-tab presence detection - Shows colored avatars of other users viewing Agent Hub - Heartbeat every 5s, stale cleanup at 30s - Shows which tab each peer is viewing

…ands

…lity + make configurable' (NousResearch#6) from fix/pids-limit-cgroup-probe into main

…t, stop/undo honesty, json_error crash, codex validation, deep-link race Bug #1: ChatPage loadSession reads res.items (not res.transcript) to match backend Bug NousResearch#2: Add GET /api/gui/session-search backed by SessionDB.search_messages (FTS5) Bug NousResearch#3: Stop button now checks res.supported before claiming run was stopped Bug NousResearch#4: Undo button now checks res.supported before removing messages locally Bug NousResearch#5: Fix _json_error positional calls in handle_chat_compress (was crashing 500) Bug NousResearch#6: Codex provider validation now also guards switching TO openai-codex Bug NousResearch#7: Deep-link hash check runs before health callback to prevent race condition

Aecroo · 2026-04-12T10:23:47Z

The fix has been merged into main and verified. host.get now correctly uses selectParentTemplates to retrieve templates for hosts.

- connection.py: cap header read at 8KB to prevent DoS from malicious handler - handler.py: use .find() instead of `in` + .index() to eliminate race in patch - handler.py: add truncated field to execute response when output exceeds 50KB - server.py: include error data field in formatted error messages - test: add timeout to test client recv, handle TimeoutExpired in close Fixes issues NousResearch#1, NousResearch#4, NousResearch#5, NousResearch#6, NousResearch#8, NousResearch#10 from Qwen 3.5 peer review on PR NousResearch#19. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…te + cell_size_check + synchronous=FULL) Production corruption #6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect(): - synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume. - secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data. - cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns. All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs. Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).

…ache_stats AWS Bedrock Converse returns `usage.cacheReadInputTokens` / `cacheWriteInputTokens` (camelCase) when cachePoint markers fire on the request, but `normalize_converse_response` was dropping both fields on the floor — reading only `inputTokens` and `outputTokens`. This made prompt caching on non-Claude Bedrock models (Nova, Llama, DeepSeek) appear to give zero discount in Hermes telemetry, even when AWS was actually charging the cache-read rate. Fix across three layers: 1. `agent/bedrock_adapter.py` (normalize_converse_response): surface `cacheReadInputTokens` and `cacheWriteInputTokens` on the returned SimpleNamespace. Expose both camelCase (Bedrock-native) and snake_case (Anthropic-convention) aliases so downstream normalizers can use whichever they already read. 2. `agent/transports/types.py` (Usage dataclass): add `cache_creation_tokens` alongside the existing `cached_tokens` field. Updates the docstring to make it clear both are populated when the upstream provider surfaces them. 3. `agent/transports/bedrock.py` (BedrockTransport.normalize_response and new extract_cache_stats): populate the new Usage fields when normalizing and add an extract_cache_stats method that mirrors AnthropicTransport's so telemetry consumers can be transport-agnostic. Semantics match Bedrock docs: `inputTokens` represents NEW/uncached input tokens billed at full rate; cache-read/write tokens are reported separately and are NOT double-counted inside `inputTokens`. Pricing reconciliation consumers can sum all three for true prompt size. 26 new tests in tests/agent/transports/test_bedrock_cache_telemetry.py covering normalization, transport propagation, extract_cache_stats parity with the Anthropic transport, zero-value handling, and both SimpleNamespace and raw dict input shapes. Closes gap NousResearch#6 identified in the Phase 2 re-verification (PraxVault/Hermes/Reference/Decisions/bedrock-phase2-audit/04-current-architecture).

…tor-routing plugin Lift the reject-only `create` routing invariant out of an inline edit in the hot upstream file `tools/kanban_tools.py::_handle_create` into a new opt-in standalone plugin that registers a `pre_tool_call` hook (LOCAL_PATCHES.md NousResearch#6). The hook returns `{"action":"block","message":...}` for a front-desk `kanban_create` aimed at a non-orchestrator lane; the executor wraps it as `{"error": <message>}` — the same shape the old inline `tool_error` produced. Scope/exemptions unchanged (reads the same HERMES_PROFILE / HERMES_KANBAN_TASK env vars). `_handle_create` is now routing-agnostic; the only remaining core touch is upstream's own `pre_tool_call` dispatch in `agent/tool_executor.py`. Tradeoff: standalone plugins are opt-in (`plugins.enabled`), so this safety invariant is now config-gated rather than always-on. Deploy config must enable `kanban-orchestrator-routing`; `KANBAN_ORCHESTRATOR_ROUTING_DISABLE=1` keeps it installed but inert. Pinned by an integration test. Tests retargeted to the hook layer + new coverage (guardrails, routing-agnostic handler, real plugin-manager wiring AND opt-in gating): 10 passed. Broader kanban/orchestrator suite green (179 passed / 1 skipped / 20 xfailed). ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…face A merge-surface budget tool: ranks tracked source files by the numstat deletions/modifications column (edits to lines upstream owns — what actually conflicts on a sync), reporting new files / pure additions separately as low-risk. Defaults to `verky/deploy` vs the local upstream mirror `main` (fallback `upstream/main`). `--json` for machines; `--check N` exits 1 if any source file exceeds a per-file modified-line budget (CI gate). Makes "minimize merge surface" a tracked number instead of a one-time cleanup, and shows the payoff of moving patches onto extension points (NousResearch#6, NousResearch#14). Current hot files: gateway/run.py (~622), hermes_cli/kanban_db.py (~108). Smoke-verified: default report, --json, --check both directions; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…etection, workflow report, parallel batch import Five items from the Texas Senate review. 1. Bug fix — /api-key now appears in the TUI catalog/autocomplete. The slash was wired locally in core.ts (and worked when typed) but was missing from the server-side hermes_cli/commands.COMMAND_REGISTRY that feeds commands.catalog. Added a CommandDef("api-key", …) entry (aliases apikey/api-keys/keys, subcommands list/show/set/unset) and listed "api-key" in FORECAST_DESK_SUBCOMMANDS so /forecast api-key also tab-completes. 2. Watched-source auto-attach (NousResearch#2). New auto_watch boolean on forecast_ledger.import_source_evidence: after a successful import, attach the (source_type, source) tuple as a watched source on the question, deduped against existing watches. The response now includes watched_source + auto_watch_note so the agent sees whether the watch was new or already there. Future reruns start from a known identifier instead of broad search. 3. Blocked-source detection (NousResearch#6). New detect_block_page() recognises Cloudflare challenges, DataDome / PerimeterX / Imperva walls, cookie / JS-required pages, and generic CAPTCHAs disguised as HTTP 200. _archive_url_evidence_ snapshot now runs the detector after fetching and writes blocked + reason + signal into both the snapshot metadata file AND the evidence row's top-level metadata. show_question / list_evidence and the workflow report (below) can surface "blocked" without opening the snapshot file. Patterns are conservative; the matched signal is exposed for audit. 4. workflow_report action (NousResearch#5). forecast_ledger action="workflow_report" aggregates one question's recent ledger activity into a compact report: evidence by source_type + by domain, blocked counts/reasons/items, snapshot probability deltas, watched-source / model-run / alert / postmortem breakdowns, a chronological timeline (capped via limit_timeline), and a heuristic `suggestions` list (e.g. "X blocked rows — prefer the structured adapter", "FRED in use but FRED_API_KEY not set", "unwatched repeated domain — pass auto_watch=true"). Designed for postmortems and to spot what slowed a session. 5. Parallel batch import (NousResearch#7). New action="import_source_evidence_batch" takes sources=[{source_type, source, …}, …] plus concurrency (default 4, capped 8) and fetches every entry concurrently in a ThreadPoolExecutor, then writes evidence rows sequentially in the main thread (keeps SQLite single-writer). Per-source failures are captured in results without aborting the batch. Verified: 4 FRED series — sum of individual fetches 4.27s, wall 1.23s ≈ 3.5× speedup. auto_watch propagates per-source with the same dedup. Tests: 8 new tool tests (auto_watch attach + dedup + off, workflow_report aggregation + blocked suggestions, batch parallelism + per-source error isolation + empty rejection) + the ledger blocked-URL test; full forecasting + hermes_cli + ui-tui suites green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…te + cell_size_check + synchronous=FULL) Production corruption NousResearch#6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect(): - synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume. - secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data. - cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns. All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs. Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).

…te + cell_size_check + synchronous=FULL) Production corruption NousResearch#6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect(): - synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume. - secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data. - cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns. All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs. Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas). #AI commit#

AC NousResearch#6 acceptance criteria: verify that a capped task preserves state and resumes/continues without Marco re-prompting. 20 new tests covering: - Artifact work handle extraction (BIF card + chat_id + user_id) - Verification state (unfinished, not done/success claim) - Last completed step and next action fields - Auto-continue decision logic (depth gate, completed/interrupted guards, platform scoping, emergency mode still triggers, non-standard exit reasons) - User-facing handoff text ("continuing automatically", no success claim, with/without auto-continue variants) - Synthetic continuation text (system instruction, task card ref, fresh budget, empty summary handling) - Gateway integration: artifact write + disk persistence + synthetic internal MessageEvent queuing - Restart survivability (reload artifact from persisted JSON) - Regression: turn_exit_reason alias miss, interrupted/failed caps

…te + cell_size_check + synchronous=FULL) Production corruption NousResearch#6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect(): - synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume. - secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data. - cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns. All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs. Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).

Match the other sandbox backends' per-init filesystem isolation. Docker stamps a fresh 'hermes-<uuid>' container name on every _init (docker.py:508), so a destroyed-then-recreated env always sees a brand-new filesystem. Gondolin's sandbox_dir is deterministic from task_id, and _setup_overlay_mounts keeps the scratch dir (overlays/<safe>/{upper,work,merged}) on disk across env lifecycles. The next env that mounts the same guest_path under the same sandbox_dir inherits the prior session's writes via the persisted upper layer — a real cross-session contamination bug, not just a disk leak. Fix: _teardown_overlay_mounts now rmtrees the per-mount scratch dir (merged.parent) after the lazy unmount returns. Lazy unmount + open-fd-keeps- inode-alive means this is safe even if the daemon hasn't fully released handles. Crash recovery still preserves upper/ because the import-time sweep only unmounts and never rmtrees. This also closes design-doc revisit item NousResearch#9 (failed-init cleanup). Test: tests/integration/test_gondolin_terminal.py::test_overlay_writes_do_not_leak_ between_env_lifecycles A KVM-gated integration test that asserts the behavioural invariant via the public GondolinEnvironment.execute() API: env1 writes a file into an overlay extra_mount, env2 (same sandbox_dir, same mount config) must not see it. Implementation-agnostic — no mention of upper/ or fuse-overlayfs — so a future migration to a custom upstream VFSProvider (the @earendil-works/ gondolin package ships vfs/provider) satisfies the same contract trivially and the test passes for free. Doc updates (DO NOT MERGE revisit list): - NousResearch#9 marked resolved (this fix) - NousResearch#6 narrowed: lists the one test we now have and what's still missing - NousResearch#10 added: task_id='default' is shared across all top-level agents at the hermes/gateway layer; concurrent-tenancy isolation needs a per-session task_id and is out of scope for this branch - NousResearch#11 added: overlay=true + missing readonly is a silent UX trap (host-side scratch is created, daemon makes guest mount EROFS) Regression: all 118 gondolin unit + integration tests pass. DO NOT MERGE — see docs/design/gondolin-terminal-backend.md.

…ache_stats AWS Bedrock Converse returns `usage.cacheReadInputTokens` / `cacheWriteInputTokens` (camelCase) when cachePoint markers fire on the request, but `normalize_converse_response` was dropping both fields on the floor — reading only `inputTokens` and `outputTokens`. This made prompt caching on non-Claude Bedrock models (Nova, Llama, DeepSeek) appear to give zero discount in Hermes telemetry, even when AWS was actually charging the cache-read rate. Fix across three layers: 1. `agent/bedrock_adapter.py` (normalize_converse_response): surface `cacheReadInputTokens` and `cacheWriteInputTokens` on the returned SimpleNamespace. Expose both camelCase (Bedrock-native) and snake_case (Anthropic-convention) aliases so downstream normalizers can use whichever they already read. 2. `agent/transports/types.py` (Usage dataclass): add `cache_creation_tokens` alongside the existing `cached_tokens` field. Updates the docstring to make it clear both are populated when the upstream provider surfaces them. 3. `agent/transports/bedrock.py` (BedrockTransport.normalize_response and new extract_cache_stats): populate the new Usage fields when normalizing and add an extract_cache_stats method that mirrors AnthropicTransport's so telemetry consumers can be transport-agnostic. Semantics match Bedrock docs: `inputTokens` represents NEW/uncached input tokens billed at full rate; cache-read/write tokens are reported separately and are NOT double-counted inside `inputTokens`. Pricing reconciliation consumers can sum all three for true prompt size. 26 new tests in tests/agent/transports/test_bedrock_cache_telemetry.py covering normalization, transport propagation, extract_cache_stats parity with the Anthropic transport, zero-value handling, and both SimpleNamespace and raw dict input shapes. Closes gap NousResearch#6 identified in the Phase 2 re-verification (PraxVault/Hermes/Reference/Decisions/bedrock-phase2-audit/04-current-architecture).

hjc-puro added 2 commits November 3, 2025 17:42

fix leakage

a4db3fd

prevent leakage of morph instances between tasks

fbd3a2f

hjc-puro commented Nov 4, 2025

View reviewed changes

hjc-puro requested a review from teknium1 November 4, 2025 08:37

teknium1 merged commit 9573b2a into main Nov 4, 2025

This was referenced Mar 5, 2026

Feature: Enhanced Extension System with Tool Interception & Lifecycle Events (inspired by Pi) #359

Closed

Feature: Multi-Agent Architecture — Orchestration, Cooperation, Specialized Roles & Resilient Workflows #344

Closed

SHL0MS mentioned this pull request Mar 30, 2026

[UX] Context-exceeded error lacks actionable guidance #4061

Closed

MorAlekss mentioned this pull request Apr 1, 2026

feat(skills): add verify-code-changes skill #4459

Closed

teknium1 mentioned this pull request Apr 2, 2026

feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration #4623

Merged

Copilot AI mentioned this pull request Apr 6, 2026

docs: feature Burgess Principle integration across README and repo docs ljbudgie/hermes-agent#2

Merged

ahmedaltewaj mentioned this pull request Apr 7, 2026

Gateway interrupt loop: get_pending_message never clears, causing infinite recursion #21

Closed

Helmi mentioned this pull request Apr 8, 2026

Inactivity timeout fires repeatedly with MiniMax 2.7 highspeed #6260

Closed

aaronlab mentioned this pull request Apr 9, 2026

fix: token estimation accuracy, context length logging, batch integrity #6629

Open

6 tasks

h4x3rotab referenced this pull request in Clawdi-AI/hermes-agent Apr 10, 2026

Merge pull request #6 from outsourc-e/phase4.1-smart-suggestions

30fea9f

Phase4.1 smart suggestions

Vex-Dravex added a commit to Vex-Dravex/hermes-agent that referenced this pull request Apr 10, 2026

ralph: story NousResearch#6 — Add /checkpoint and /restore slash comm…

acc94f0

…ands

Vex-Dravex added a commit to Vex-Dravex/hermes-agent that referenced this pull request Apr 11, 2026

ralph: story NousResearch#6 — Add /checkpoint and /restore slash comm…

a104206

…ands

malaiwah pushed a commit to malaiwah/hermes-agent that referenced this pull request Apr 11, 2026

Merge pull request 'fix(docker): gate --pids-limit on cgroup availabi…

66a5de5

…lity + make configurable' (NousResearch#6) from fix/pids-limit-cgroup-probe into main

kshitijk4poor mentioned this pull request Apr 15, 2026

feat: add hermes-blender skill for 3D modeling and rendering #10191

Closed

falses00 mentioned this pull request Apr 15, 2026

Code Review Refactoring Implementations (P0-P2) #10445

Open

kshitijk4poor mentioned this pull request Apr 16, 2026

feat: add TouchDesigner integration skill (twozero MCP) #10081

Closed

teknium1 mentioned this pull request Apr 16, 2026

fix(approval): heartbeat activity during gateway approval wait #11245

Merged

Bartok9 mentioned this pull request May 29, 2026

fix(goals): stop /goal over-continuation on exploratory + scope-narrow goals (#34196, #34197) #34343

Open

This was referenced Jun 1, 2026

Proactive model validation + per-use-case model guidance at gateway start #36278

Open

[Setup]: #22812

Closed

ricardocamiloconsir mentioned this pull request Jun 2, 2026

feat(gateway): session model pool — concurrency-aware auto-assignment with auxiliary slot tracking #37519

Open

liuhao1024 mentioned this pull request Jun 5, 2026

fix(gateway): strip _HERMES_GATEWAY from Windows detached restart helper env #40059

Open

friendshipisover mentioned this pull request Jun 7, 2026

fix(state): anchor lineage title numbering to the resolved base #41223

Open

13 tasks

jarvis-stark-ops mentioned this pull request Jun 7, 2026

feat(gateway): dispatcher heartbeat — detect silent stalls from outside #41588

Closed

3 tasks

cristianmgm7 mentioned this pull request Jun 10, 2026

feat(platforms): add Carbon Voice as a native messaging platform #43226

Open

AutomalyRo mentioned this pull request Jun 10, 2026

[Bug]: OpenAI Codex usage completely Broken/Being Treated as Custom API #43461

Closed

1 task

ether-btc mentioned this pull request Jun 10, 2026

feat(skills): add model-task-router — automatic task-to-model routing backed by DeepSWE data #43534

Open

annguyenNous mentioned this pull request Jun 11, 2026

fix: add bounds checking in _parse_status git status parsing #44052

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix VM instance sharing across tasks#6

Fix VM instance sharing across tasks#6
teknium1 merged 2 commits into
mainfrom
fix-leakage

hjc-puro commented Nov 3, 2025 •

edited

Loading

Uh oh!

hjc-puro Nov 4, 2025

Uh oh!

Aecroo commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hjc-puro commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjc-puro Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Aecroo commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hjc-puro commented Nov 3, 2025 •

edited

Loading