Fix VM instance sharing across tasks#6
Merged
Merged
Conversation
hjc-puro
commented
Nov 4, 2025
|
|
||
| # Clean up VM for this task after conversation completes | ||
| try: | ||
| cleanup_vm(effective_task_id) |
Contributor
Author
There was a problem hiding this comment.
@teknium1 this part is a bit hacky - I can take it out if you're ok with instances running for ~5 mins after convo ends
sudo-yf
pushed a commit
to sudo-yf/hermes-agent
that referenced
this pull request
Apr 5, 2026
- Update _PROVIDER_MODELS['minimax'] from stale ABAB 6.5 models to current MiniMax-M2.7/M2.5/M2.1 lineup (matching hermes-agent upstream) - Update _PROVIDER_MODELS['zai'] from GLM-4 to current GLM-5/4.7/4.5 lineup (matching hermes-agent upstream) - Extend resolve_model_provider() to also return base_url from config.yaml, so providers with custom endpoints (MiniMax, Z.AI) are routed correctly - Pass base_url to AIAgent in both streaming and sync chat paths Fixes NousResearch#6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 tasks
h4x3rotab
referenced
this pull request
in Clawdi-AI/hermes-agent
Apr 10, 2026
Phase4.1 smart suggestions
h4x3rotab
referenced
this pull request
in Clawdi-AI/hermes-agent
Apr 10, 2026
#3 Cost Analytics Dashboard - New Analytics tab with summary cards (total tokens, cost, avg/mission, today, week) - CSS bar charts: cost by agent, cost by model, daily timeline (7d) - No external chart libraries — pure Tailwind #4 Export Mission as Markdown - Download .md file with full mission report (goal, team, transcript, artifacts) - Copy to clipboard button with visual feedback - Wired into Mission Detail Overlay #5 Word-by-word Streaming in Agent Chat - Replaced polling with SSE EventSource in AgentChatPanel - Real-time chunk streaming with fallback to polling on error - Streaming assistant message updates in-place #6 Remote Agents Panel - Fetches external sessions from gateway /api/sessions - Filters out local agent sessions — shows only remote/external - Auto-polls every 15s, card layout with status, model, tokens, cost - Open Chat links to ClawSuite chat tab #7 Real-time Collaboration (Presence) - BroadcastChannel-based cross-tab presence detection - Shows colored avatars of other users viewing Agent Hub - Heartbeat every 5s, stale cleanup at 30s - Shows which tab each peer is viewing
Vex-Dravex
added a commit
to Vex-Dravex/hermes-agent
that referenced
this pull request
Apr 10, 2026
Vex-Dravex
added a commit
to Vex-Dravex/hermes-agent
that referenced
this pull request
Apr 11, 2026
malaiwah
pushed a commit
to malaiwah/hermes-agent
that referenced
this pull request
Apr 11, 2026
…lity + make configurable' (NousResearch#6) from fix/pids-limit-cgroup-probe into main
gary-the-ai
pushed a commit
to gary-the-ai/hermes-web-console-gui
that referenced
this pull request
Apr 11, 2026
…t, stop/undo honesty, json_error crash, codex validation, deep-link race Bug #1: ChatPage loadSession reads res.items (not res.transcript) to match backend Bug NousResearch#2: Add GET /api/gui/session-search backed by SessionDB.search_messages (FTS5) Bug NousResearch#3: Stop button now checks res.supported before claiming run was stopped Bug NousResearch#4: Undo button now checks res.supported before removing messages locally Bug NousResearch#5: Fix _json_error positional calls in handle_chat_compress (was crashing 500) Bug NousResearch#6: Codex provider validation now also guards switching TO openai-codex Bug NousResearch#7: Deep-link hash check runs before health callback to prevent race condition
|
The fix has been merged into main and verified. host.get now correctly uses selectParentTemplates to retrieve templates for hosts. |
malaiwah
pushed a commit
to malaiwah/hermes-agent
that referenced
this pull request
Apr 13, 2026
- connection.py: cap header read at 8KB to prevent DoS from malicious handler - handler.py: use .find() instead of `in` + .index() to eliminate race in patch - handler.py: add truncated field to execute response when output exceeds 50KB - server.py: include error data field in formatted error messages - test: add timeout to test client recv, handle TimeoutExpired in close Fixes issues NousResearch#1, NousResearch#4, NousResearch#5, NousResearch#6, NousResearch#8, NousResearch#10 from Qwen 3.5 peer review on PR NousResearch#19. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 17, 2026
kshitijk4poor
pushed a commit
that referenced
this pull request
May 27, 2026
…te + cell_size_check + synchronous=FULL) Production corruption #6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect(): - synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume. - secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data. - cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns. All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs. Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).
praxstack
added a commit
to praxstack/NousResearch-hermes-agent
that referenced
this pull request
May 27, 2026
…ache_stats AWS Bedrock Converse returns `usage.cacheReadInputTokens` / `cacheWriteInputTokens` (camelCase) when cachePoint markers fire on the request, but `normalize_converse_response` was dropping both fields on the floor — reading only `inputTokens` and `outputTokens`. This made prompt caching on non-Claude Bedrock models (Nova, Llama, DeepSeek) appear to give zero discount in Hermes telemetry, even when AWS was actually charging the cache-read rate. Fix across three layers: 1. `agent/bedrock_adapter.py` (normalize_converse_response): surface `cacheReadInputTokens` and `cacheWriteInputTokens` on the returned SimpleNamespace. Expose both camelCase (Bedrock-native) and snake_case (Anthropic-convention) aliases so downstream normalizers can use whichever they already read. 2. `agent/transports/types.py` (Usage dataclass): add `cache_creation_tokens` alongside the existing `cached_tokens` field. Updates the docstring to make it clear both are populated when the upstream provider surfaces them. 3. `agent/transports/bedrock.py` (BedrockTransport.normalize_response and new extract_cache_stats): populate the new Usage fields when normalizing and add an extract_cache_stats method that mirrors AnthropicTransport's so telemetry consumers can be transport-agnostic. Semantics match Bedrock docs: `inputTokens` represents NEW/uncached input tokens billed at full rate; cache-read/write tokens are reported separately and are NOT double-counted inside `inputTokens`. Pricing reconciliation consumers can sum all three for true prompt size. 26 new tests in tests/agent/transports/test_bedrock_cache_telemetry.py covering normalization, transport propagation, extract_cache_stats parity with the Anthropic transport, zero-value handling, and both SimpleNamespace and raw dict input shapes. Closes gap NousResearch#6 identified in the Phase 2 re-verification (PraxVault/Hermes/Reference/Decisions/bedrock-phase2-audit/04-current-architecture).
verkyyi
added a commit
to verkyyi/hermes-agent
that referenced
this pull request
May 28, 2026
…tor-routing plugin Lift the reject-only `create` routing invariant out of an inline edit in the hot upstream file `tools/kanban_tools.py::_handle_create` into a new opt-in standalone plugin that registers a `pre_tool_call` hook (LOCAL_PATCHES.md NousResearch#6). The hook returns `{"action":"block","message":...}` for a front-desk `kanban_create` aimed at a non-orchestrator lane; the executor wraps it as `{"error": <message>}` — the same shape the old inline `tool_error` produced. Scope/exemptions unchanged (reads the same HERMES_PROFILE / HERMES_KANBAN_TASK env vars). `_handle_create` is now routing-agnostic; the only remaining core touch is upstream's own `pre_tool_call` dispatch in `agent/tool_executor.py`. Tradeoff: standalone plugins are opt-in (`plugins.enabled`), so this safety invariant is now config-gated rather than always-on. Deploy config must enable `kanban-orchestrator-routing`; `KANBAN_ORCHESTRATOR_ROUTING_DISABLE=1` keeps it installed but inert. Pinned by an integration test. Tests retargeted to the hook layer + new coverage (guardrails, routing-agnostic handler, real plugin-manager wiring AND opt-in gating): 10 passed. Broader kanban/orchestrator suite green (179 passed / 1 skipped / 20 xfailed). ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
verkyyi
added a commit
to verkyyi/hermes-agent
that referenced
this pull request
May 28, 2026
…face A merge-surface budget tool: ranks tracked source files by the numstat deletions/modifications column (edits to lines upstream owns — what actually conflicts on a sync), reporting new files / pure additions separately as low-risk. Defaults to `verky/deploy` vs the local upstream mirror `main` (fallback `upstream/main`). `--json` for machines; `--check N` exits 1 if any source file exceeds a per-file modified-line budget (CI gate). Makes "minimize merge surface" a tracked number instead of a one-time cleanup, and shows the payoff of moving patches onto extension points (NousResearch#6, NousResearch#14). Current hot files: gateway/run.py (~622), hermes_cli/kanban_db.py (~108). Smoke-verified: default report, --json, --check both directions; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
teddyjfpender
added a commit
to teddyjfpender/superforecasting-agent
that referenced
this pull request
May 28, 2026
…etection, workflow report, parallel batch import
Five items from the Texas Senate review.
1. Bug fix — /api-key now appears in the TUI catalog/autocomplete. The slash was
wired locally in core.ts (and worked when typed) but was missing from the
server-side hermes_cli/commands.COMMAND_REGISTRY that feeds commands.catalog.
Added a CommandDef("api-key", …) entry (aliases apikey/api-keys/keys,
subcommands list/show/set/unset) and listed "api-key" in
FORECAST_DESK_SUBCOMMANDS so /forecast api-key also tab-completes.
2. Watched-source auto-attach (NousResearch#2). New auto_watch boolean on
forecast_ledger.import_source_evidence: after a successful import, attach the
(source_type, source) tuple as a watched source on the question, deduped
against existing watches. The response now includes watched_source +
auto_watch_note so the agent sees whether the watch was new or already there.
Future reruns start from a known identifier instead of broad search.
3. Blocked-source detection (NousResearch#6). New detect_block_page() recognises Cloudflare
challenges, DataDome / PerimeterX / Imperva walls, cookie / JS-required
pages, and generic CAPTCHAs disguised as HTTP 200. _archive_url_evidence_
snapshot now runs the detector after fetching and writes blocked + reason +
signal into both the snapshot metadata file AND the evidence row's top-level
metadata. show_question / list_evidence and the workflow report (below) can
surface "blocked" without opening the snapshot file. Patterns are
conservative; the matched signal is exposed for audit.
4. workflow_report action (NousResearch#5). forecast_ledger action="workflow_report"
aggregates one question's recent ledger activity into a compact report:
evidence by source_type + by domain, blocked counts/reasons/items, snapshot
probability deltas, watched-source / model-run / alert / postmortem
breakdowns, a chronological timeline (capped via limit_timeline), and a
heuristic `suggestions` list (e.g. "X blocked rows — prefer the structured
adapter", "FRED in use but FRED_API_KEY not set", "unwatched repeated
domain — pass auto_watch=true"). Designed for postmortems and to spot what
slowed a session.
5. Parallel batch import (NousResearch#7). New action="import_source_evidence_batch" takes
sources=[{source_type, source, …}, …] plus concurrency (default 4, capped 8)
and fetches every entry concurrently in a ThreadPoolExecutor, then writes
evidence rows sequentially in the main thread (keeps SQLite single-writer).
Per-source failures are captured in results without aborting the batch.
Verified: 4 FRED series — sum of individual fetches 4.27s, wall 1.23s ≈
3.5× speedup. auto_watch propagates per-source with the same dedup.
Tests: 8 new tool tests (auto_watch attach + dedup + off, workflow_report
aggregation + blocked suggestions, batch parallelism + per-source error
isolation + empty rejection) + the ledger blocked-URL test; full forecasting +
hermes_cli + ui-tui suites green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mathias3
pushed a commit
to mathias3/hermes-agent
that referenced
this pull request
May 28, 2026
…te + cell_size_check + synchronous=FULL) Production corruption NousResearch#6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect(): - synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume. - secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data. - cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns. All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs. Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).
Bryce-huang
pushed a commit
to wbkunlun/hermes-agent
that referenced
this pull request
May 29, 2026
…te + cell_size_check + synchronous=FULL) Production corruption NousResearch#6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect(): - synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume. - secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data. - cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns. All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs. Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas). #AI commit#
marcolivierlavoie
added a commit
to marcolivierlavoie/hermes-agent
that referenced
this pull request
May 29, 2026
AC NousResearch#6 acceptance criteria: verify that a capped task preserves state and resumes/continues without Marco re-prompting. 20 new tests covering: - Artifact work handle extraction (BIF card + chat_id + user_id) - Verification state (unfinished, not done/success claim) - Last completed step and next action fields - Auto-continue decision logic (depth gate, completed/interrupted guards, platform scoping, emergency mode still triggers, non-standard exit reasons) - User-facing handoff text ("continuing automatically", no success claim, with/without auto-continue variants) - Synthetic continuation text (system instruction, task card ref, fresh budget, empty summary handling) - Gateway integration: artifact write + disk persistence + synthetic internal MessageEvent queuing - Restart survivability (reload artifact from persisted JSON) - Regression: turn_exit_reason alias miss, interrupted/failed caps
mosaiq-systems
pushed a commit
to mosaiq-systems/hermes-agent
that referenced
this pull request
May 29, 2026
…te + cell_size_check + synchronous=FULL) Production corruption NousResearch#6 left b-tree pages with zeroed headers but intact old cell content — the Bug E pattern. This fix applies three pragma calls on every connect(): - synchronous=FULL (was NORMAL): closes the WAL-checkpoint reordering window where a crash between WAL commit and main-DB write leaves a partially-written b-tree page header. Cost is <1ms per commit on local SSD; negligible at kanban write volume. - secure_delete=ON: forces SQLite to zero freed page bytes on disk. If a torn write or hardware fault later corrupts a page, the underlying cell content is zero, so corruption is detectable and no stale rows can resurface as live data. - cell_size_check=ON: adds a read-side guard so corrupt cells surface as errors at read time rather than as silent wrong-data returns. All three are connection-scoped and re-applied on every connect(). secure_delete also writes a persistent flag into the DB header on the first call against a fresh DB, making the protection durable across processes for new DBs. Tests added for all four required cases: each pragma active on a fresh connection, and all three re-applied after close+reopen. Also adds the required negative test (migration path does not reset pragmas).
MattKotsenas
added a commit
to MattKotsenas/hermes-agent
that referenced
this pull request
May 30, 2026
Match the other sandbox backends' per-init filesystem isolation. Docker
stamps a fresh 'hermes-<uuid>' container name on every _init (docker.py:508),
so a destroyed-then-recreated env always sees a brand-new filesystem.
Gondolin's sandbox_dir is deterministic from task_id, and _setup_overlay_mounts
keeps the scratch dir (overlays/<safe>/{upper,work,merged}) on disk across env
lifecycles. The next env that mounts the same guest_path under the same
sandbox_dir inherits the prior session's writes via the persisted upper layer
— a real cross-session contamination bug, not just a disk leak.
Fix: _teardown_overlay_mounts now rmtrees the per-mount scratch dir
(merged.parent) after the lazy unmount returns. Lazy unmount + open-fd-keeps-
inode-alive means this is safe even if the daemon hasn't fully released
handles. Crash recovery still preserves upper/ because the import-time
sweep only unmounts and never rmtrees.
This also closes design-doc revisit item NousResearch#9 (failed-init cleanup).
Test:
tests/integration/test_gondolin_terminal.py::test_overlay_writes_do_not_leak_
between_env_lifecycles
A KVM-gated integration test that asserts the behavioural invariant via
the public GondolinEnvironment.execute() API: env1 writes a file into an
overlay extra_mount, env2 (same sandbox_dir, same mount config) must not
see it. Implementation-agnostic — no mention of upper/ or fuse-overlayfs —
so a future migration to a custom upstream VFSProvider (the @earendil-works/
gondolin package ships vfs/provider) satisfies the same contract trivially
and the test passes for free.
Doc updates (DO NOT MERGE revisit list):
- NousResearch#9 marked resolved (this fix)
- NousResearch#6 narrowed: lists the one test we now have and what's still missing
- NousResearch#10 added: task_id='default' is shared across all top-level agents at
the hermes/gateway layer; concurrent-tenancy isolation needs a
per-session task_id and is out of scope for this branch
- NousResearch#11 added: overlay=true + missing readonly is a silent UX trap
(host-side scratch is created, daemon makes guest mount EROFS)
Regression: all 118 gondolin unit + integration tests pass.
DO NOT MERGE — see docs/design/gondolin-terminal-backend.md.
praxstack
added a commit
to praxstack/NousResearch-hermes-agent
that referenced
this pull request
May 30, 2026
…ache_stats AWS Bedrock Converse returns `usage.cacheReadInputTokens` / `cacheWriteInputTokens` (camelCase) when cachePoint markers fire on the request, but `normalize_converse_response` was dropping both fields on the floor — reading only `inputTokens` and `outputTokens`. This made prompt caching on non-Claude Bedrock models (Nova, Llama, DeepSeek) appear to give zero discount in Hermes telemetry, even when AWS was actually charging the cache-read rate. Fix across three layers: 1. `agent/bedrock_adapter.py` (normalize_converse_response): surface `cacheReadInputTokens` and `cacheWriteInputTokens` on the returned SimpleNamespace. Expose both camelCase (Bedrock-native) and snake_case (Anthropic-convention) aliases so downstream normalizers can use whichever they already read. 2. `agent/transports/types.py` (Usage dataclass): add `cache_creation_tokens` alongside the existing `cached_tokens` field. Updates the docstring to make it clear both are populated when the upstream provider surfaces them. 3. `agent/transports/bedrock.py` (BedrockTransport.normalize_response and new extract_cache_stats): populate the new Usage fields when normalizing and add an extract_cache_stats method that mirrors AnthropicTransport's so telemetry consumers can be transport-agnostic. Semantics match Bedrock docs: `inputTokens` represents NEW/uncached input tokens billed at full rate; cache-read/write tokens are reported separately and are NOT double-counted inside `inputTokens`. Pricing reconciliation consumers can sum all three for true prompt size. 26 new tests in tests/agent/transports/test_bedrock_cache_telemetry.py covering normalization, transport propagation, extract_cache_stats parity with the Anthropic transport, zero-value handling, and both SimpleNamespace and raw dict input shapes. Closes gap NousResearch#6 identified in the Phase 2 re-verification (PraxVault/Hermes/Reference/Decisions/bedrock-phase2-audit/04-current-architecture).
This was referenced Jun 1, 2026
Closed
This was referenced Jun 2, 2026
13 tasks
3 tasks
1 task
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.