Skip to content

/goal judge produces false positive when agent claims file created but write silently fails #18421

@pdonizete

Description

@pdonizete

Bug Description

The /goal judge marks a goal as DONE based solely on the agent's textual response claiming success, without requiring verifiable evidence (e.g., read_file output or filesystem confirmation).

Reproduction Steps

  1. Set a goal: /goal Criar um resumo prático em português dos principais conceitos de machine learning para revisão, com exemplos de código Python, e salvar em /home/ubuntu/ml-resumo.md
  2. Agent responds claiming the file was created successfully
  3. Judge evaluates the response and returns {"done": true, "reason": "..."}
  4. Goal shows: ✓ Goal done (1/20 turns)
  5. File does not exist on disk — verified with find and read_file

Expected Behavior

The judge should require verifiable evidence of completion for filesystem-related goals — such as read_file output, ls confirmation, or terminal exit code — rather than trusting the agent's self-declaration.

Actual Behavior

The judge trusts the agent's textual claim of success and marks the goal as DONE, even though the file was never written to disk. This is a false positive — the documented failure mode where the judge says done when work remains.

Root Cause

The judge only sees ~4KB of the agent's last response text. It has no access to tool outputs, filesystem state, or terminal results. If the agent says "file created successfully" but write_file failed silently (or the tool call was never actually executed), the judge has no way to detect the discrepancy.

Suggested Fix

  • For goals involving file creation/modification, the judge system prompt should require the agent to include verification evidence (e.g., read_file output or ls -la result) in its final response before marking done.
  • Alternatively, the judge could be given access to tool output metadata (not just response text) to cross-reference claims against actual tool execution results.
  • The CONTINUATION_PROMPT_TEMPLATE could instruct the agent: "Before declaring a goal complete, always verify filesystem changes with a read/list operation and include the output in your response."

Environment

  • Hermes Agent v0.12.0 (2026.4.30)
  • Provider: Dialagram (unlimited)
  • OS: Ubuntu (VPS)

Workaround

Include explicit verification instructions in the goal text:

/goal Create file X, then read it back with read_file to confirm it exists and has content. Only consider the goal complete after verification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions