Skip to content

feat: add execution receipts — auditable records of delegated task execution#9209

Closed
MestreY0d4-Uninter wants to merge 1 commit into
NousResearch:mainfrom
MestreY0d4-Uninter:refresh/execution-receipts
Closed

feat: add execution receipts — auditable records of delegated task execution#9209
MestreY0d4-Uninter wants to merge 1 commit into
NousResearch:mainfrom
MestreY0d4-Uninter:refresh/execution-receipts

Conversation

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor

Summary

Adds a durable execution receipt system that makes delegated task execution inspectable, auditable, and manageable from both the agent and CLI.

Receipt Ledger (tools/execution_receipts.py)

  • ExecutionReceipt dataclass: task_id, status, duration, execution_path, worker_mode, runtime info, tool calls, files modified
  • JSON artifact persistence under HERMES_HOME/execution-receipts/
  • SQLite ledger indexed by task_id, timestamp, status
  • CRUD + maintenance: create, finalize, get, list, query, prune, reconcile

Tool Surface (tools/execution_receipts_tool.py)

  • Registered as execution_receipts in the execution toolset
  • Actions: list, query, get, prune, reconcile, maintenance_status
  • Filterable by task_id, status, execution_path, time range

CLI Surface (hermes_cli/receipts.py)

  • /receipts list [--task-id ID] [--limit N] — recent receipts table
  • /receipts get <id> — full receipt JSON
  • /receipts query [--status STATUS] [--since HOURS] — filtered query
  • /receipts prune [--older-than HOURS] — clean old receipts
  • /receipts reconcile — fix JSON/DB inconsistencies
  • /receipts status — system health check

Integration

  • Added to _HERMES_CORE_TOOLS in toolsets.py
  • Added CommandDef in hermes_cli/commands.py
  • Handler wired in cli.py

Test Plan

20 passed (new receipt tests)
115 passed (existing delegate/storage tests)
  • TestExecutionReceipt — dataclass, serialization, file persistence
  • TestReceiptLedger — CRUD, query filters, prune, reconcile, orphan recovery, maintenance
  • TestReceiptsToolSurface — all 7 tool actions

Why reimplementation

The original PR #8402 was far behind main (pre-v0.9.0) and could not be cleanly refreshed. This is a clean reimplementation from scratch on current main, preserving the architectural ideas (receipt ledger, SQLite indexing, operator surfaces) while adapting to the current codebase.

Supersedes #8402.

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Refresh — April 14 2026

Branch rebased onto current origin/main. All CI-relevant tests pass locally.

Test results

Suite Result
tests/test_execution_receipts_impl.py 22 passed
tests/tools/test_delegate.py 67 passed
Total 89 passed, 0 failed

CI notes

  • check-attribution — fixed: added MestreY0d4-Uninter to AUTHOR_MAP in scripts/release.py and .mailmap (housekeeping commit on top of feature commit).
  • test / e2e — results pending new CI run. The previously failing test job was caused by tests/acp/test_entry.py collection error (ModuleNotFoundError: No module named 'acp') — same issue present on origin/main itself (not introduced by this PR).

Commit log

72370002 chore: add MestreY0d4-Uninter to AUTHOR_MAP and .mailmap
7605bd1f feat: execution receipts with auto-instrumentation

Mergeability

No conflicts with current origin/main — GitHub reports MERGEABLE.

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Stacking note

PR #8403 (durable work orders) builds on top of this PR. Recommended merge order:

  1. Merge this PR (feat: add execution receipts — auditable records of delegated task execution #9209) first
  2. feat: add durable execution work orders #8403 will then be rebased to drop the inline receipts commits and left with only the work-orders feature commit

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Refresh rebuilt from origin/main and pushed to refresh/9209. Validation: py_compile on changed .py files passed; focused pytest on changed test files ran and hit one existing failure in tests/run_agent/test_run_agent.py::TestStreamingApiCall::test_tool_call_accumulation (expected web_search, got search).

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Additional validation note:

The one extra focused failure I hit while checking this refreshed branch is also present on a clean origin/main checkout:

  • /home/ubuntu/hermes-all-venv/bin/python -m pytest -o addopts= -q tests/run_agent/test_run_agent.py::TestStreamingApiCall::test_tool_call_accumulation\n- current origin/main fails the same way ("search" vs expected "web_search")\n\nSo the remaining failure I observed during refresh is baseline main noise, not specific evidence against the execution-receipts delta.

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Audit completed (2026-04-19)

  • Branch refreshed from current origin/main (SHA: 6af04474)
  • Validation: Focused tests passing (execution receipts)
  • Impact: Medium (adds auditable records of delegated tasks)
  • Risk: Medium (996 lines, 11 files, feature addition)
  • Age: Very old PR (5/5)

Baseline Noise Note: Test failures reproduce on clean origin/main — not a regression from this PR.

Recommendation: DECISION NEEDED — Feature adds execution receipt ledger for delegation auditing.

Question for maintainers: Is this execution receipt feature still desired?

  • If YES → merge after review
  • If NO → close as deprecated

Feature value: Provides durable, auditable records of all delegated task executions.


Part of batch audit: 23 PRs audited, 2 closed (absorbed), 21 refreshed
Batch 3 — Feature decision required

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

🔔 Ready for maintainer review

Esta PR foi validada como parte da auditoria completa de 2026-04-19.

Status:

Ação necessária: Review e merge (ou decisão de feature para #9209, #8942).


Audit batch 3: 4 PRs com validação mínima concluída

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Closing in favor of focused split PRs:

Each addresses a single logical change for easier review.

@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/tools Tool registry, model_tools, toolsets comp/cli CLI entry point, hermes_cli/, setup wizard tool/delegate Subagent delegation labels Apr 24, 2026
@MestreY0d4-Uninter MestreY0d4-Uninter deleted the refresh/execution-receipts branch April 27, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have tool/delegate Subagent delegation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants