Skip to content

feature: capability grounding — LLM must not claim actions it didn't verifiably execute #324

@Aaronontheweb

Description

@Aaronontheweb

Problem

The LLM repeatedly told the user it was scheduling reminders when it wasn't actually doing so. This is a severe trust violation — the bot confidently claimed to perform an action that never happened.

Root Cause Chain

  1. `netclaw-manual` skill didn't auto-load → no guidance about scheduling workflow
  2. The compressed skill index shows skill names but not how to use them
  3. The LLM sees `set_reminder` in the tool list but doesn't call it correctly (or at all)
  4. The LLM claims success anyway because nothing in the system prompt tells it to verify tool execution

The Deeper Problem

Even when skills load correctly, there's no capability grounding in the system prompt that says:

"You MUST NOT claim to have performed an action unless you called the corresponding tool AND the tool returned a success result. If you are unsure whether an action completed, say so."

This is a system prompt alignment issue that should be enforced at the identity/AGENTS.md level.

Proposed Fixes

  1. Add capability grounding rule to AGENTS.md or system prompt template: Explicit instruction that the bot must verify tool execution before claiming success.

  2. Post-turn verification: After a turn where the LLM claims to have scheduled/created/modified something, check whether the relevant tool was actually called successfully in that turn's tool call log.

  3. Eval cases: Add eval scenarios where the model is prompted to schedule something — verify it actually calls `set_reminder` and doesn't hallucinate the result.

  4. Tool call summary in compaction: When compacting, explicitly note which tools were called and their results, so post-compaction turns don't lose awareness of what was actually done.

Impact

This is arguably the most severe user-facing issue — a bot that confidently lies about what it did is worse than one that says "I don't know how to do that."

Incident

Session `D0AC6CKBK5K/1774021203.467619` — repeated hallucinated scheduling despite multiple user corrections.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions