Skip to content

bug: finalReadinessCheck enforces strict ordering (complete_step must follow writer), causing false positive on correctly ordered tool calls #3469

@CVEngineer66

Description

@CVEngineer66

Summary

finalReadinessCheck() in internal/agent/agent.go enforces that complete_step must appear after the last writer tool call in the receipt sequence (HasSuccessfulCompleteStepAfter(writer)). This ordering constraint conflicts with the fact that todo_write already has its own guard (verifyTodoCompletionTransitions) that rejects marking items completed without a prior matching complete_step.

The result: even when the model correctly calls complete_step → todo_write (in that order), if a writer call happens to appear between them in the receipt stream, the final answer is blocked with a confusing error.


Root Cause

internal/agent/agent.go:573:

if hasTodoReceipt && !a.evidence.HasSuccessfulCompleteStepAfter(writer) {
    missing = append(missing, "call complete_step after the latest write")
}

This uses HasSuccessfulCompleteStepAfter(writer) — the complete_step must be at an index greater than the last writer receipt. But in a multi-step turn, the receipt ordering is:

[0] write_file (write config)
[1] todo_write (mark step 1 completed)
[2] edit_file (write plan doc)
[3] complete_step (sign off step 2)   ← index 3 < writer at index 2? NO, 3 > 2, should pass

Wait — let me re-check the actual scenario. The actual problem was:

The model called todo_write first (marking items completed), then complete_step. The todo_write succeeded because its own guard only checks for a matching complete_step anywhere in the turn, not the ordering. But finalReadinessCheck then rejected the final answer because complete_step wasn't after the writer.

So the core tension is:

  1. todo_write's internal guard allows either order (it searches all receipts)
  2. finalReadinessCheck requires strict order (complete_step after writer)

Steps to Reproduce

  1. In a session where files have been written, call complete_step then todo_write (or vice versa), with writer calls interspersed
  2. Attempt to produce a final answer
  3. See: "Host final-answer readiness check failed. Before giving a final answer, address the missing host-observable receipts: call complete_step after the latest write"

Suggested Fix

Two options:

Option A (minimal): Remove the HasSuccessfulCompleteStepAfter(writer) check from finalReadinessCheck. The todo_write tool already enforces that items can't be marked completed without a matching complete_step receipt — this is a sufficient guard. The finalReadinessCheck should only check for incompleteTodos and projectChecks, both of which are unique concerns not covered by any tool-level guard.

Option B (relaxed): Change HasSuccessfulCompleteStepAfter(writer) to a new HasSuccessfulCompleteStep() that finds a matching complete_step receipt anywhere in the turn, not just after the writer.


Related Code

  • internal/agent/agent.go: finalReadinessCheck() (line ~537), finalReadinessRetryMessage() (line ~607)
  • internal/evidence/evidence.go: HasSuccessfulCompleteStepAfter() (line ~126), UnverifiedCompletedTodos() (line ~259)
  • internal/tool/builtin/todo.go: verifyTodoCompletionTransitions() (line ~95)
  • internal/tool/builtin/completestep.go: Execute() (line ~79)

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentCore agent loop (internal/agent, internal/control)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions