Skip to content

feat(tools): add execution receipts core engine + tool wrapper#14836

Closed
MestreY0d4-Uninter wants to merge 5 commits into
NousResearch:mainfrom
MestreY0d4-Uninter:feat/execution-receipts-core
Closed

feat(tools): add execution receipts core engine + tool wrapper#14836
MestreY0d4-Uninter wants to merge 5 commits into
NousResearch:mainfrom
MestreY0d4-Uninter:feat/execution-receipts-core

Conversation

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor

Adds a durable, auditable record system for delegated task execution.

Includes:

  • tools/execution_receipts.py — SQLite-backed receipt ledger with JSON artifact persistence
  • tools/execution_receipts_tool.py — agent-facing tool surface (list/query/get/prune/reconcile/status)
  • toolsets.py — register execution_receipts in HERMES_CORE_TOOLS
  • tests/test_execution_receipts_impl.py — 20 unit tests covering dataclass, ledger, and tool surface

Stacked PRs:

Split from #9209.
Closes #9209 (partial)

- Add tools/execution_receipts.py: durable JSON receipt artifacts
  indexed in SQLite for query, reconcile, and prune operations.
- Add tools/execution_receipts_tool.py: operator-facing tool
  registration for list/query/get/prune/reconcile/maintenance.
- Register execution_receipts in HERMES_CORE_TOOLS.
- Add tests for receipt dataclass, ledger, and tool surface.

Split from NousResearch#9209.
Closes NousResearch#9209 (partial)
@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Audit/update 2026-04-25:

  • No branch mutation in this audit.
  • Focused local validation: 20 passed in 0.94s (tests/test_execution_receipts_impl.py).
  • Recommendation: Execution receipts core; base of the stack and should be reviewed first.

This was part of the open-PR cleanup pass against current upstream/main.

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Refresh/validation update:

  • Refreshed feat/execution-receipts-core against current upstream/main and pushed normally (no force-push).
  • Kept the remediation surgical to the execution-receipts core/tool surface:
    • receipts persist under HERMES_HOME/artifacts/execution-receipts/;
    • JSON writes are atomic and create/finalize roll back the file if SQLite indexing fails;
    • prune now rejects non-positive ages, keeps failed receipts by default, and reports deleted IDs/missing files;
    • reconcile/maintenance status now surface invalid receipt JSON via parse_errors instead of silently treating it as consistent;
    • execution_receipts is included in the relevant Hermes CLI/cron toolsets and builtin tool discovery expectations.
  • Local validation passed:
    • python3 -m py_compile tools/execution_receipts.py tools/execution_receipts_tool.py tests/test_execution_receipts_impl.py tests/tools/test_registry.py toolsets.py
    • scripts/run_tests.sh tests/test_execution_receipts_impl.py tests/tools/test_registry.py tests/tools/test_clipboard.py tests/hermes_cli/test_tools_config.py tests/cron/test_scheduler.py -q → 301 passed
    • real persist/query/reconcile smoke under isolated HERMES_HOME passed
    • registry/toolset smoke passed
    • exact latest CI PR-only failures were also run locally and passed:
      • tests/hermes_cli/test_plugin_scanner_recursion.py::TestKindField::test_unknown_kind_falls_back_to_standalone
      • tests/tools/test_file_state_registry.py::FileToolsIntegrationTests::test_net_new_file_no_warning
      • tests/tools/test_file_state_registry.py::FileToolsIntegrationTests::test_sibling_agent_write_surfaces_warning_through_handler
  • Independent pre-push review found no blocking security/path-safety/SQLite/scope issues.
  • Remote status after push: PR is MERGEABLE; attribution, supply-chain scan, check, e2e, and Nix checks are green. The broad Tests / test job is still red, matching the repository's current main-branch instability pattern; I compared it with recent failing main runs and documented the comparison in the logs below.

Evidence/logs:

  • /home/ubuntu/github-sanitization/20260427T011658Z/hermes-14836-final-focused-tests.log
  • /home/ubuntu/github-sanitization/20260427T011658Z/hermes-14836-post-ci-fix-focused-tests.log
  • /home/ubuntu/github-sanitization/20260427T011658Z/hermes-14836-real-smoke.log
  • /home/ubuntu/github-sanitization/20260427T011658Z/hermes-14836-registry-smoke-rerun.log
  • /home/ubuntu/github-sanitization/20260427T011658Z/hermes-14836-tests-failure-summary-post-ci-fix.txt
  • /home/ubuntu/github-sanitization/20260427T011658Z/hermes-14836-baseline-compare-refresh.txt

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Refresh/validation update 2026-04-27:

  • Refreshed feat/execution-receipts-core onto current upstream/main and pushed normally (no force-push).
  • Cleanup during refresh: fixed trailing whitespace / EOF whitespace reported by git diff --check; no behavior changes beyond the existing execution-receipts core/tool surface.
  • Local validation:
    • git diff --check upstream/main — passed
    • python3 -m py_compile tools/execution_receipts.py tools/execution_receipts_tool.py toolsets.py tests/test_execution_receipts_impl.py tests/tools/test_registry.py — passed
    • python3 -m pytest tests/test_execution_receipts_impl.py tests/tools/test_registry.py -q — 55 passed
  • Independent review: no blockers; diff remains limited to the receipt core, tool wrapper, toolset registration, and focused tests.
  • Remote checks: all non-Tests checks are green. The Tests / test job is still red with the same known gateway Discord baseline failure (AttributeError: 'types.SimpleNamespace' object has no attribute 'guild'). Compared with current main run 25001653980: PR has 60 collected failure names vs 62 on main, both with 88 occurrences of the same SimpleNamespace.guild error, so this is unrelated to the execution receipts diff.

Evidence saved locally under /home/ubuntu/github-sanitization/20260427T011658Z/:

  • hermes-14836-local-validation-summary.txt
  • hermes-14836-focused-pytest.log
  • hermes-14836-ci-final-summary.txt
  • hermes-14836-vs-main-failure-summary.txt
  • hermes-14836-test-failed-log.txt

@MestreY0d4-Uninter

Copy link
Copy Markdown
Contributor Author

Closing the base execution-receipts PR for now. The feature is sizable and stale enough that a fresh design/smaller PR on current main would be easier to review if the idea is still wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants