Skip to content

Fix VM instance sharing across tasks#6

Merged
teknium1 merged 2 commits into
mainfrom
fix-leakage
Nov 4, 2025
Merged

Fix VM instance sharing across tasks#6
teknium1 merged 2 commits into
mainfrom
fix-leakage

Conversation

@hjc-puro

@hjc-puro hjc-puro commented Nov 3, 2025

Copy link
Copy Markdown
Contributor
  • Isolates each VM to a task ID
  • Guarantees VMs will live for at most 20 minutes

Comment thread run_agent.py

# Clean up VM for this task after conversation completes
try:
cleanup_vm(effective_task_id)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@teknium1 this part is a bit hacky - I can take it out if you're ok with instances running for ~5 mins after convo ends

@hjc-puro hjc-puro requested a review from teknium1 November 4, 2025 08:37
@teknium1 teknium1 merged commit 9573b2a into main Nov 4, 2025
sudo-yf pushed a commit to sudo-yf/hermes-agent that referenced this pull request Apr 5, 2026
- Update _PROVIDER_MODELS['minimax'] from stale ABAB 6.5 models to
  current MiniMax-M2.7/M2.5/M2.1 lineup (matching hermes-agent upstream)
- Update _PROVIDER_MODELS['zai'] from GLM-4 to current GLM-5/4.7/4.5
  lineup (matching hermes-agent upstream)
- Extend resolve_model_provider() to also return base_url from config.yaml,
  so providers with custom endpoints (MiniMax, Z.AI) are routed correctly
- Pass base_url to AIAgent in both streaming and sync chat paths

Fixes NousResearch#6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
h4x3rotab referenced this pull request in Clawdi-AI/hermes-agent Apr 10, 2026
h4x3rotab referenced this pull request in Clawdi-AI/hermes-agent Apr 10, 2026
#3 Cost Analytics Dashboard
- New Analytics tab with summary cards (total tokens, cost, avg/mission, today, week)
- CSS bar charts: cost by agent, cost by model, daily timeline (7d)
- No external chart libraries — pure Tailwind

#4 Export Mission as Markdown
- Download .md file with full mission report (goal, team, transcript, artifacts)
- Copy to clipboard button with visual feedback
- Wired into Mission Detail Overlay

#5 Word-by-word Streaming in Agent Chat
- Replaced polling with SSE EventSource in AgentChatPanel
- Real-time chunk streaming with fallback to polling on error
- Streaming assistant message updates in-place

#6 Remote Agents Panel
- Fetches external sessions from gateway /api/sessions
- Filters out local agent sessions — shows only remote/external
- Auto-polls every 15s, card layout with status, model, tokens, cost
- Open Chat links to ClawSuite chat tab

#7 Real-time Collaboration (Presence)
- BroadcastChannel-based cross-tab presence detection
- Shows colored avatars of other users viewing Agent Hub
- Heartbeat every 5s, stale cleanup at 30s
- Shows which tab each peer is viewing
Vex-Dravex added a commit to Vex-Dravex/hermes-agent that referenced this pull request Apr 10, 2026
Vex-Dravex added a commit to Vex-Dravex/hermes-agent that referenced this pull request Apr 11, 2026
malaiwah pushed a commit to malaiwah/hermes-agent that referenced this pull request Apr 11, 2026
…lity + make configurable' (NousResearch#6) from fix/pids-limit-cgroup-probe into main
gary-the-ai pushed a commit to gary-the-ai/hermes-web-console-gui that referenced this pull request Apr 11, 2026
…t, stop/undo honesty, json_error crash, codex validation, deep-link race

Bug #1: ChatPage loadSession reads res.items (not res.transcript) to match backend
Bug NousResearch#2: Add GET /api/gui/session-search backed by SessionDB.search_messages (FTS5)
Bug NousResearch#3: Stop button now checks res.supported before claiming run was stopped
Bug NousResearch#4: Undo button now checks res.supported before removing messages locally
Bug NousResearch#5: Fix _json_error positional calls in handle_chat_compress (was crashing 500)
Bug NousResearch#6: Codex provider validation now also guards switching TO openai-codex
Bug NousResearch#7: Deep-link hash check runs before health callback to prevent race condition
@Aecroo

Aecroo commented Apr 12, 2026

Copy link
Copy Markdown

The fix has been merged into main and verified. host.get now correctly uses selectParentTemplates to retrieve templates for hosts.

malaiwah pushed a commit to malaiwah/hermes-agent that referenced this pull request Apr 13, 2026
- connection.py: cap header read at 8KB to prevent DoS from malicious handler
- handler.py: use .find() instead of `in` + .index() to eliminate race in patch
- handler.py: add truncated field to execute response when output exceeds 50KB
- server.py: include error data field in formatted error messages
- test: add timeout to test client recv, handle TimeoutExpired in close

Fixes issues NousResearch#1, NousResearch#4, NousResearch#5, NousResearch#6, NousResearch#8, NousResearch#10 from Qwen 3.5 peer review on PR NousResearch#19.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kshitijk4poor pushed a commit that referenced this pull request May 27, 2026
…te + cell_size_check + synchronous=FULL)

Production corruption #6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect():

- synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume.

- secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data.

- cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns.

All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs.

Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).
praxstack added a commit to praxstack/NousResearch-hermes-agent that referenced this pull request May 27, 2026
…ache_stats

AWS Bedrock Converse returns `usage.cacheReadInputTokens` /
`cacheWriteInputTokens` (camelCase) when cachePoint markers fire on the
request, but `normalize_converse_response` was dropping both fields on
the floor — reading only `inputTokens` and `outputTokens`. This made
prompt caching on non-Claude Bedrock models (Nova, Llama, DeepSeek)
appear to give zero discount in Hermes telemetry, even when AWS was
actually charging the cache-read rate.

Fix across three layers:

1. `agent/bedrock_adapter.py` (normalize_converse_response):
   surface `cacheReadInputTokens` and `cacheWriteInputTokens` on the
   returned SimpleNamespace. Expose both camelCase (Bedrock-native)
   and snake_case (Anthropic-convention) aliases so downstream
   normalizers can use whichever they already read.

2. `agent/transports/types.py` (Usage dataclass):
   add `cache_creation_tokens` alongside the existing `cached_tokens`
   field. Updates the docstring to make it clear both are populated
   when the upstream provider surfaces them.

3. `agent/transports/bedrock.py` (BedrockTransport.normalize_response
   and new extract_cache_stats):
   populate the new Usage fields when normalizing and add an
   extract_cache_stats method that mirrors AnthropicTransport's so
   telemetry consumers can be transport-agnostic.

Semantics match Bedrock docs: `inputTokens` represents NEW/uncached
input tokens billed at full rate; cache-read/write tokens are reported
separately and are NOT double-counted inside `inputTokens`. Pricing
reconciliation consumers can sum all three for true prompt size.

26 new tests in tests/agent/transports/test_bedrock_cache_telemetry.py
covering normalization, transport propagation, extract_cache_stats
parity with the Anthropic transport, zero-value handling, and both
SimpleNamespace and raw dict input shapes.

Closes gap NousResearch#6 identified in the Phase 2 re-verification
(PraxVault/Hermes/Reference/Decisions/bedrock-phase2-audit/04-current-architecture).
verkyyi added a commit to verkyyi/hermes-agent that referenced this pull request May 28, 2026
…tor-routing plugin

Lift the reject-only `create` routing invariant out of an inline edit in the
hot upstream file `tools/kanban_tools.py::_handle_create` into a new opt-in
standalone plugin that registers a `pre_tool_call` hook (LOCAL_PATCHES.md NousResearch#6).

The hook returns `{"action":"block","message":...}` for a front-desk
`kanban_create` aimed at a non-orchestrator lane; the executor wraps it as
`{"error": <message>}` — the same shape the old inline `tool_error` produced.
Scope/exemptions unchanged (reads the same HERMES_PROFILE / HERMES_KANBAN_TASK
env vars). `_handle_create` is now routing-agnostic; the only remaining core
touch is upstream's own `pre_tool_call` dispatch in `agent/tool_executor.py`.

Tradeoff: standalone plugins are opt-in (`plugins.enabled`), so this safety
invariant is now config-gated rather than always-on. Deploy config must enable
`kanban-orchestrator-routing`; `KANBAN_ORCHESTRATOR_ROUTING_DISABLE=1` keeps it
installed but inert. Pinned by an integration test.

Tests retargeted to the hook layer + new coverage (guardrails, routing-agnostic
handler, real plugin-manager wiring AND opt-in gating): 10 passed. Broader
kanban/orchestrator suite green (179 passed / 1 skipped / 20 xfailed). ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
verkyyi added a commit to verkyyi/hermes-agent that referenced this pull request May 28, 2026
…face

A merge-surface budget tool: ranks tracked source files by the numstat
deletions/modifications column (edits to lines upstream owns — what actually
conflicts on a sync), reporting new files / pure additions separately as
low-risk. Defaults to `verky/deploy` vs the local upstream mirror `main`
(fallback `upstream/main`). `--json` for machines; `--check N` exits 1 if any
source file exceeds a per-file modified-line budget (CI gate).

Makes "minimize merge surface" a tracked number instead of a one-time cleanup,
and shows the payoff of moving patches onto extension points (NousResearch#6, NousResearch#14). Current
hot files: gateway/run.py (~622), hermes_cli/kanban_db.py (~108).

Smoke-verified: default report, --json, --check both directions; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
teddyjfpender added a commit to teddyjfpender/superforecasting-agent that referenced this pull request May 28, 2026
…etection, workflow report, parallel batch import

Five items from the Texas Senate review.

1. Bug fix — /api-key now appears in the TUI catalog/autocomplete. The slash was
   wired locally in core.ts (and worked when typed) but was missing from the
   server-side hermes_cli/commands.COMMAND_REGISTRY that feeds commands.catalog.
   Added a CommandDef("api-key", …) entry (aliases apikey/api-keys/keys,
   subcommands list/show/set/unset) and listed "api-key" in
   FORECAST_DESK_SUBCOMMANDS so /forecast api-key also tab-completes.

2. Watched-source auto-attach (NousResearch#2). New auto_watch boolean on
   forecast_ledger.import_source_evidence: after a successful import, attach the
   (source_type, source) tuple as a watched source on the question, deduped
   against existing watches. The response now includes watched_source +
   auto_watch_note so the agent sees whether the watch was new or already there.
   Future reruns start from a known identifier instead of broad search.

3. Blocked-source detection (NousResearch#6). New detect_block_page() recognises Cloudflare
   challenges, DataDome / PerimeterX / Imperva walls, cookie / JS-required
   pages, and generic CAPTCHAs disguised as HTTP 200. _archive_url_evidence_
   snapshot now runs the detector after fetching and writes blocked + reason +
   signal into both the snapshot metadata file AND the evidence row's top-level
   metadata. show_question / list_evidence and the workflow report (below) can
   surface "blocked" without opening the snapshot file. Patterns are
   conservative; the matched signal is exposed for audit.

4. workflow_report action (NousResearch#5). forecast_ledger action="workflow_report"
   aggregates one question's recent ledger activity into a compact report:
   evidence by source_type + by domain, blocked counts/reasons/items, snapshot
   probability deltas, watched-source / model-run / alert / postmortem
   breakdowns, a chronological timeline (capped via limit_timeline), and a
   heuristic `suggestions` list (e.g. "X blocked rows — prefer the structured
   adapter", "FRED in use but FRED_API_KEY not set", "unwatched repeated
   domain — pass auto_watch=true"). Designed for postmortems and to spot what
   slowed a session.

5. Parallel batch import (NousResearch#7). New action="import_source_evidence_batch" takes
   sources=[{source_type, source, …}, …] plus concurrency (default 4, capped 8)
   and fetches every entry concurrently in a ThreadPoolExecutor, then writes
   evidence rows sequentially in the main thread (keeps SQLite single-writer).
   Per-source failures are captured in results without aborting the batch.
   Verified: 4 FRED series — sum of individual fetches 4.27s, wall 1.23s ≈
   3.5× speedup. auto_watch propagates per-source with the same dedup.

Tests: 8 new tool tests (auto_watch attach + dedup + off, workflow_report
aggregation + blocked suggestions, batch parallelism + per-source error
isolation + empty rejection) + the ledger blocked-URL test; full forecasting +
hermes_cli + ui-tui suites green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mathias3 pushed a commit to mathias3/hermes-agent that referenced this pull request May 28, 2026
…te + cell_size_check + synchronous=FULL)

Production corruption NousResearch#6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect():

- synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume.

- secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data.

- cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns.

All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs.

Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
…te + cell_size_check + synchronous=FULL)

Production corruption NousResearch#6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect():

- synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume.

- secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data.

- cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns.

All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs.

Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).

#AI commit#
marcolivierlavoie added a commit to marcolivierlavoie/hermes-agent that referenced this pull request May 29, 2026
AC NousResearch#6 acceptance criteria: verify that a capped task preserves state and
resumes/continues without Marco re-prompting.

20 new tests covering:
- Artifact work handle extraction (BIF card + chat_id + user_id)
- Verification state (unfinished, not done/success claim)
- Last completed step and next action fields
- Auto-continue decision logic (depth gate, completed/interrupted guards,
  platform scoping, emergency mode still triggers, non-standard exit reasons)
- User-facing handoff text ("continuing automatically", no success claim,
  with/without auto-continue variants)
- Synthetic continuation text (system instruction, task card ref, fresh
  budget, empty summary handling)
- Gateway integration: artifact write + disk persistence + synthetic
  internal MessageEvent queuing
- Restart survivability (reload artifact from persisted JSON)
- Regression: turn_exit_reason alias miss, interrupted/failed caps
mosaiq-systems pushed a commit to mosaiq-systems/hermes-agent that referenced this pull request May 29, 2026
…te + cell_size_check + synchronous=FULL)

Production corruption NousResearch#6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect():

- synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume.

- secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data.

- cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns.

All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs.

Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).
MattKotsenas added a commit to MattKotsenas/hermes-agent that referenced this pull request May 30, 2026
Match the other sandbox backends' per-init filesystem isolation. Docker
stamps a fresh 'hermes-<uuid>' container name on every _init (docker.py:508),
so a destroyed-then-recreated env always sees a brand-new filesystem.

Gondolin's sandbox_dir is deterministic from task_id, and _setup_overlay_mounts
keeps the scratch dir (overlays/<safe>/{upper,work,merged}) on disk across env
lifecycles. The next env that mounts the same guest_path under the same
sandbox_dir inherits the prior session's writes via the persisted upper layer
— a real cross-session contamination bug, not just a disk leak.

Fix: _teardown_overlay_mounts now rmtrees the per-mount scratch dir
(merged.parent) after the lazy unmount returns. Lazy unmount + open-fd-keeps-
inode-alive means this is safe even if the daemon hasn't fully released
handles. Crash recovery still preserves upper/ because the import-time
sweep only unmounts and never rmtrees.

This also closes design-doc revisit item NousResearch#9 (failed-init cleanup).

Test:
  tests/integration/test_gondolin_terminal.py::test_overlay_writes_do_not_leak_
  between_env_lifecycles

A KVM-gated integration test that asserts the behavioural invariant via
the public GondolinEnvironment.execute() API: env1 writes a file into an
overlay extra_mount, env2 (same sandbox_dir, same mount config) must not
see it. Implementation-agnostic — no mention of upper/ or fuse-overlayfs —
so a future migration to a custom upstream VFSProvider (the @earendil-works/
gondolin package ships vfs/provider) satisfies the same contract trivially
and the test passes for free.

Doc updates (DO NOT MERGE revisit list):
  - NousResearch#9 marked resolved (this fix)
  - NousResearch#6 narrowed: lists the one test we now have and what's still missing
  - NousResearch#10 added: task_id='default' is shared across all top-level agents at
    the hermes/gateway layer; concurrent-tenancy isolation needs a
    per-session task_id and is out of scope for this branch
  - NousResearch#11 added: overlay=true + missing readonly is a silent UX trap
    (host-side scratch is created, daemon makes guest mount EROFS)

Regression: all 118 gondolin unit + integration tests pass.

DO NOT MERGE — see docs/design/gondolin-terminal-backend.md.
praxstack added a commit to praxstack/NousResearch-hermes-agent that referenced this pull request May 30, 2026
…ache_stats

AWS Bedrock Converse returns `usage.cacheReadInputTokens` /
`cacheWriteInputTokens` (camelCase) when cachePoint markers fire on the
request, but `normalize_converse_response` was dropping both fields on
the floor — reading only `inputTokens` and `outputTokens`. This made
prompt caching on non-Claude Bedrock models (Nova, Llama, DeepSeek)
appear to give zero discount in Hermes telemetry, even when AWS was
actually charging the cache-read rate.

Fix across three layers:

1. `agent/bedrock_adapter.py` (normalize_converse_response):
   surface `cacheReadInputTokens` and `cacheWriteInputTokens` on the
   returned SimpleNamespace. Expose both camelCase (Bedrock-native)
   and snake_case (Anthropic-convention) aliases so downstream
   normalizers can use whichever they already read.

2. `agent/transports/types.py` (Usage dataclass):
   add `cache_creation_tokens` alongside the existing `cached_tokens`
   field. Updates the docstring to make it clear both are populated
   when the upstream provider surfaces them.

3. `agent/transports/bedrock.py` (BedrockTransport.normalize_response
   and new extract_cache_stats):
   populate the new Usage fields when normalizing and add an
   extract_cache_stats method that mirrors AnthropicTransport's so
   telemetry consumers can be transport-agnostic.

Semantics match Bedrock docs: `inputTokens` represents NEW/uncached
input tokens billed at full rate; cache-read/write tokens are reported
separately and are NOT double-counted inside `inputTokens`. Pricing
reconciliation consumers can sum all three for true prompt size.

26 new tests in tests/agent/transports/test_bedrock_cache_telemetry.py
covering normalization, transport propagation, extract_cache_stats
parity with the Anthropic transport, zero-value handling, and both
SimpleNamespace and raw dict input shapes.

Closes gap NousResearch#6 identified in the Phase 2 re-verification
(PraxVault/Hermes/Reference/Decisions/bedrock-phase2-audit/04-current-architecture).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants