/goal judge produces false positive when agent claims file created but write silently fails

## Bug Description

The `/goal` judge marks a goal as DONE based solely on the agent's textual response claiming success, without requiring verifiable evidence (e.g., `read_file` output or filesystem confirmation).

## Reproduction Steps

1. Set a goal: `/goal Criar um resumo prático em português dos principais conceitos de machine learning para revisão, com exemplos de código Python, e salvar em /home/ubuntu/ml-resumo.md`
2. Agent responds claiming the file was created successfully
3. Judge evaluates the response and returns `{"done": true, "reason": "..."}`
4. Goal shows: `✓ Goal done (1/20 turns)`
5. File does not exist on disk — verified with `find` and `read_file`

## Expected Behavior

The judge should require **verifiable evidence** of completion for filesystem-related goals — such as `read_file` output, `ls` confirmation, or terminal exit code — rather than trusting the agent's self-declaration.

## Actual Behavior

The judge trusts the agent's textual claim of success and marks the goal as DONE, even though the file was never written to disk. This is a **false positive** — the documented failure mode where the judge says done when work remains.

## Root Cause

The judge only sees ~4KB of the agent's last response text. It has no access to tool outputs, filesystem state, or terminal results. If the agent says "file created successfully" but `write_file` failed silently (or the tool call was never actually executed), the judge has no way to detect the discrepancy.

## Suggested Fix

- For goals involving file creation/modification, the judge system prompt should require the agent to include **verification evidence** (e.g., `read_file` output or `ls -la` result) in its final response before marking done.
- Alternatively, the judge could be given access to tool output metadata (not just response text) to cross-reference claims against actual tool execution results.
- The `CONTINUATION_PROMPT_TEMPLATE` could instruct the agent: "Before declaring a goal complete, always verify filesystem changes with a read/list operation and include the output in your response."

## Environment

- Hermes Agent v0.12.0 (2026.4.30)
- Provider: Dialagram (unlimited)
- OS: Ubuntu (VPS)

## Workaround

Include explicit verification instructions in the goal text:
```
/goal Create file X, then read it back with read_file to confirm it exists and has content. Only consider the goal complete after verification.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/goal judge produces false positive when agent claims file created but write silently fails #18421

Bug Description

Reproduction Steps

Expected Behavior

Actual Behavior

Root Cause

Suggested Fix

Environment

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

/goal judge produces false positive when agent claims file created but write silently fails #18421

Description

Bug Description

Reproduction Steps

Expected Behavior

Actual Behavior

Root Cause

Suggested Fix

Environment

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions