Skip to content

[RFC][v2] Host-verified complete_step evidence receipts #2537

@GTC2080

Description

@GTC2080

Problem

complete_step currently verifies that the model supplied evidence fields, but the host runtime does not confirm that the cited command, file read, or edit actually happened in the current turn.

Proposal

Add a small runtime-only evidence receipt ledger. The agent records tool-call receipts during a turn, and complete_step checks verification, diff, and files evidence against those receipts before accepting a step completion.

Non-goals

  • No UI changes.
  • No performance claims or optimization.
  • No auto-plan or multi-agent behavior.
  • No prompt, tool schema, or tool list changes.
  • No persistence of receipt data.

Conflict check

This is intentionally scoped away from the active auto-plan, worktree agents, goal state, cache diagnostics, and MCP startup/import PRs. The first implementation should touch only the core agent loop, complete_step behavior, and focused tests.

Review evidence plan

PRs will link this RFC. Because the first slice has no UI changes, screenshots are not applicable. Because it makes no performance claim and avoids prompt/tool-schema changes, cache/token metrics are not expected; if scope changes, the PR will include the required data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions