Skip to content

feat: add execution receipts and direct terminal work-order lane#8402

Closed
MestreY0d4-Uninter wants to merge 1 commit into
NousResearch:mainfrom
MestreY0d4-Uninter:h007-receipts-direct-lane
Closed

feat: add execution receipts and direct terminal work-order lane#8402
MestreY0d4-Uninter wants to merge 1 commit into
NousResearch:mainfrom
MestreY0d4-Uninter:h007-receipts-direct-lane

Conversation

@MestreY0d4-Uninter

@MestreY0d4-Uninter MestreY0d4-Uninter commented Apr 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add persisted execution receipts plus a SQLite receipt ledger
  • expose receipt reconcile/prune/maintenance surfaces in tools, CLI, and slash commands
  • formalize a narrow direct terminal work-order lane on leased warm workers and surface runtime reuse metadata in receipts

H007 line overview

This PR is part 1 of the H007 line: a narrow, evidence-backed durable execution fabric for direct structured terminal work orders.

The line is intentionally split for reviewability:

  • Part 1 (this PR): auditable execution substrate + receipt surfaces + direct lane/runtime metadata
  • Part 2 (feat: add durable execution work orders #8403): durable execution work orders with queue/run_due, retry, stale-run reclaim, and runner controls

The key goal is to keep the claim narrow and testable rather than implying that general orchestration is solved.

Motivation

Hermes can already delegate work, but delegated execution is hard to audit and hard to reason about operationally.

This PR makes delegated execution inspectable:

  • receipts are persisted under HERMES_HOME
  • receipts are indexed in SQLite for query/reconcile/prune
  • operator surfaces exist in both CLI and slash commands
  • the narrow direct terminal work-order lane now surfaces worker/runtime reuse metadata explicitly

What this PR changes

  • add tools/execution_receipts.py
    • durable JSON receipt artifacts
    • SQLite receipt ledger
    • reconcile/prune/query helpers
  • add tools/execution_receipts_tool.py
    • list, query, prune, reconcile, maintenance_status, install_maintenance, remove_maintenance
  • add operator-facing receipt surfaces
    • hermes receipts ...
    • /receipts ...
  • forward execution_envelope and context_package through the delegate path
  • persist richer execution metadata
    • execution path
    • envelope digest
    • worker mode / lease key / task ID
    • runtime kind / runtime ID / runtime reuse
  • harden the direct terminal work-order substrate and persistent Docker cwd tracking
  • add focused test coverage for receipts, delegate integration, receipt maintenance, CLI entrypoints, runtime metadata, and direct work-order execution

Validation

Focused local validation on this branch:

source /home/ubuntu/hermes-agent-dev/extraordinary-prototypes-2026-04-11/.venv/bin/activate
python -m py_compile \
  run_agent.py \
  tools/tool_result_storage.py \
  tools/environments/docker.py \
  tools/delegate_tool.py \
  tools/execution_receipts.py \
  tools/execution_receipts_tool.py \
  hermes_cli/receipts.py \
  model_tools.py \
  toolsets.py \
  hermes_cli/tools_config.py \
  hermes_cli/commands.py \
  hermes_cli/main.py \
  tests/run_agent/test_run_agent.py \
  tests/tools/test_delegate.py \
  tests/tools/test_docker_environment.py \
  tests/tools/test_tool_result_storage.py \
  tests/tools/test_execution_receipts.py \
  tests/tools/test_execution_receipts_tool.py \
  tests/hermes_cli/test_receipts.py

pytest \
  tests/run_agent/test_run_agent.py \
  tests/tools/test_delegate.py \
  tests/tools/test_docker_environment.py \
  tests/tools/test_tool_result_storage.py \
  tests/tools/test_execution_receipts.py \
  tests/tools/test_execution_receipts_tool.py \
  tests/hermes_cli/test_receipts.py \
  -q -o addopts=''

Result:

  • 421 passed, 1 warning

External real-run evidence for the supported direct lane that this PR surfaces:

  • classic warm mean: 4.35s
  • direct warm mean: 0.20s
  • about 95.4% faster on the validated supported slice
  • 40/40 valid runs
  • 40/40 exact-match runs
  • one reused concrete Docker runtime ID across valid warm runs

Notes for reviewers

This PR intentionally does not claim:

  • general orchestration is solved
  • all delegation is dramatically faster
  • host-bound git worktree or host-venv pytest workloads already belong in this lane

The claim is intentionally narrow:

  • direct structured terminal work orders on leased warm workers are now auditable and operator-visible
  • the supported slice has already been benchmarked and validated externally

If the overall H007 direction looks good, the follow-up queue/control-plane PR is already prepared as #8403.

@MestreY0d4-Uninter

MestreY0d4-Uninter commented Apr 12, 2026

Copy link
Copy Markdown
Contributor Author

Quick CI note after opening this PR:

  • I checked the red checks before assuming regression.
  • build-and-push is failing in the Dockerfile/npm layer with spawn git ENOENT while building the image; this same workflow is already red on main in recent upstream runs.
  • The repo-level test workflow is also already red on current main (I reproduced the baseline locally from a clean origin/main worktree and hit a large unrelated failure set there as well).
  • The focused H007 validation for this branch is green locally: 421 passed, 1 warning on the receipt/direct-lane slice listed in the PR body.

So the current red CI does not appear to come from this H007 change set itself. If maintainers want, I can still help split out or separately investigate the existing Docker/test baseline issues.

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Superseded by #9209 — clean reimplementation on current main (v0.9.0).

The original branch was too far behind main to refresh cleanly. #9209 reimplements the same concepts (receipt ledger, SQLite index, operator surfaces) from scratch with:

  • 20 new tests (all passing)
  • Integration with current toolsets/CLI architecture
  • Full CRUD + query + prune + reconcile + maintenance_status
  • CLI: /receipts list|get|query|prune|reconcile|status

See #9209.

MestreY0d4-Uninter added a commit to MestreY0d4-Uninter/hermes-agent that referenced this pull request Apr 14, 2026
Durable execution receipt system for Hermes — makes delegated task
execution inspectable, auditable, and manageable.

## Receipt Ledger (tools/execution_receipts.py)
- ExecutionReceipt: task_id, status, duration, execution_path, tool_calls
- JSON persistence under HERMES_HOME/execution-receipts/
- SQLite ledger indexed by task_id, timestamp, status
- CRUD: create, finalize, get, list, query, prune, reconcile
- Lazy auto-prune (~2% chance per receipt, keeps 1 week + min 50)

## Auto-Instrumentation (tools/delegate_tool.py)
- Receipt automatically created when _run_single_child starts
- Receipt automatically finalized on success/error
- Non-fatal: receipt failures never break delegation

## Tool Surface (tools/execution_receipts_tool.py)
- 'execution_receipts' tool in 'execution' toolset
- Actions: list, query, get, prune, reconcile, maintenance_status

## CLI (hermes_cli/receipts.py)
- /receipts list|get|query|prune|reconcile|status

## Tests: 22 passed (13 new + 2 integration + 7 tool surface)

Supersedes NousResearch#8402.
MestreY0d4-Uninter added a commit to MestreY0d4-Uninter/hermes-agent that referenced this pull request Apr 19, 2026
Durable execution receipt system for Hermes — makes delegated task
execution inspectable, auditable, and manageable.

- ExecutionReceipt: task_id, status, duration, execution_path, tool_calls
- JSON persistence under HERMES_HOME/execution-receipts/
- SQLite ledger indexed by task_id, timestamp, status
- CRUD: create, finalize, get, list, query, prune, reconcile
- Lazy auto-prune (~2% chance per receipt, keeps 1 week + min 50)

- Receipt automatically created when _run_single_child starts
- Receipt automatically finalized on success/error
- Non-fatal: receipt failures never break delegation

- 'execution_receipts' tool in 'execution' toolset
- Actions: list, query, get, prune, reconcile, maintenance_status

- /receipts list|get|query|prune|reconcile|status

Supersedes NousResearch#8402.
@MestreY0d4-Uninter MestreY0d4-Uninter deleted the h007-receipts-direct-lane branch April 27, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant