feature: capability grounding — LLM must not claim actions it didn't verifiably execute

## Problem

The LLM repeatedly told the user it was scheduling reminders when it wasn't actually doing so. This is a severe trust violation — the bot confidently claimed to perform an action that never happened.

### Root Cause Chain

1. \`netclaw-manual\` skill didn't auto-load → no guidance about scheduling workflow
2. The compressed skill index shows skill **names** but not **how to use them**
3. The LLM sees \`set_reminder\` in the tool list but doesn't call it correctly (or at all)
4. The LLM claims success anyway because nothing in the system prompt tells it to verify tool execution

### The Deeper Problem

Even when skills load correctly, there's no **capability grounding** in the system prompt that says:

> \"You MUST NOT claim to have performed an action unless you called the corresponding tool AND the tool returned a success result. If you are unsure whether an action completed, say so.\"

This is a system prompt alignment issue that should be enforced at the identity/AGENTS.md level.

### Proposed Fixes

1. **Add capability grounding rule to AGENTS.md or system prompt template**: Explicit instruction that the bot must verify tool execution before claiming success.

2. **Post-turn verification**: After a turn where the LLM claims to have scheduled/created/modified something, check whether the relevant tool was actually called successfully in that turn's tool call log.

3. **Eval cases**: Add eval scenarios where the model is prompted to schedule something — verify it actually calls \`set_reminder\` and doesn't hallucinate the result.

4. **Tool call summary in compaction**: When compacting, explicitly note which tools were called and their results, so post-compaction turns don't lose awareness of what was actually done.

### Impact

This is arguably the most severe user-facing issue — a bot that confidently lies about what it did is worse than one that says \"I don't know how to do that.\"

### Incident

Session \`D0AC6CKBK5K/1774021203.467619\` — repeated hallucinated scheduling despite multiple user corrections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: capability grounding — LLM must not claim actions it didn't verifiably execute #324

Problem

Root Cause Chain

The Deeper Problem

Proposed Fixes

Impact

Incident

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feature: capability grounding — LLM must not claim actions it didn't verifiably execute #324

Description

Problem

Root Cause Chain

The Deeper Problem

Proposed Fixes

Impact

Incident

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions