Skip to content

chore(agents): default bug-investigator to Opus + add grounding contract#3666

Merged
Yeraze merged 1 commit into
mainfrom
chore/bug-investigator-grounding-opus
Jun 23, 2026
Merged

chore(agents): default bug-investigator to Opus + add grounding contract#3666
Yeraze merged 1 commit into
mainfrom
chore/bug-investigator-grounding-opus

Conversation

@Yeraze

@Yeraze Yeraze commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Summary

Hardens the meshmonitor-bug-investigator agent, which had been producing confidently-wrong file paths / line numbers / symbol names and false "X doesn't exist" claims that had to be caught and corrected downstream.

Two root causes, both addressed:

  1. It ran on Sonnet (meshtastic-expert is already Opus) → bumped to model: opus.
  2. No requirement to verify what it cited → added a mandatory Grounding rules section.

Grounding rules added

  • Cite only what you opened — every path/line/symbol must come from a tool result in that session.
  • Verify before reporting — re-run ls / grep -n on each citation and keep the output; drop unconfirmable claims.
  • Prove absence — "X doesn't exist anywhere" requires the actual grep -rn command + its empty output (the most common failure mode).
  • Grep-first, no bulk-reads — locate with grep, read only ±30 lines; an errored/empty Read means it doesn't exist (don't synthesize contents).
  • Label confidence — every finding tagged CONFIRMED (with output) or HYPOTHESIS; trust the code over the user's framing when they conflict.

The report's Evidence items must now be tool-confirmed.

Config-only change (.claude/agents/); no application code touched.

🤖 Generated with Claude Code

The meshmonitor-bug-investigator agent ran on Sonnet and had no requirement to
verify the file paths / line numbers / symbols it cited, which led to confident
hallucinations (nonexistent files, wrong field names, "X doesn't exist" claims
that were false) that had to be caught and corrected downstream.

- Bump model sonnet -> opus (meshtastic-expert is already opus).
- Add a mandatory "Grounding rules" section: cite only what you opened; re-run
  ls/grep to confirm each citation before reporting; prove absence with a real
  grep + its empty output; grep-first instead of bulk-reading large files; and
  label every finding CONFIRMED (with tool output) or HYPOTHESIS.
- Require the report's Evidence items to be tool-confirmed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: chore(agents): default bug-investigator to Opus + add grounding contract

Tasks

  • Read changed file
  • Review changes and provide feedback

Review

This is a clean, focused improvement to the agent configuration. The changes are well-motivated and well-executed.

Model upgrade (line 5)

Bumping to model: opus is appropriate given the agent's job — it needs strong reasoning to avoid hallucinating file paths and line numbers. The PR description notes meshtastic-expert is already on Opus, so this brings meshmonitor-bug-investigator in line with that precedent.

Grounding rules (lines 14–45)

The 5 rules are specific, actionable, and directly address the failure modes described in the PR summary:

  1. Cite only what you opened — closes the "memory/guessing" loophole cleanly.
  2. Verify before you report — the ls + grep -n verification step is concrete and easy for the model to follow.
  3. Prove absence — requiring the actual empty grep -rn output to substantiate "X doesn't exist" is the right fix for the most common failure mode.
  4. Grep first, never bulk-read — good constraint; the ±30-line guidance is specific enough to be actionable.
  5. Label confidence — CONFIRMED/HYPOTHESIS tagging makes uncertainty visible to downstream engineers.

Evidence section in output format (lines 275–278)

The inline comment in the output template reinforces the grounding rules at the exact point where the agent would produce citations. This is a nice touch — it makes the rules hard to overlook even if the agent "skims" the system prompt.

Minor observations (no blocking issues)

  • The existing HYPOTHESIS / EVIDENCE / ROOT CAUSE block format at lines 128–135 predates the grounding rules and uses the HYPOTHESIS label. The new grounding rule 5 introduces CONFIRMED / HYPOTHESIS tagging on individual Evidence items rather than top-level sections. These aren't in conflict, but a future cleanup could unify the two patterns for clarity.
  • Rule 4 says "read only the targeted ±30 lines" but the tools frontmatter (line 4) gives the agent the read tool with no explicit constraint. The rule is advisory text; whether the model reliably respects the 30-line bound depends on model behavior. Low-risk for now, worth monitoring.
  • No test coverage for agent config files is expected or needed here; this is purely a prompt/config change.

Overall: Straightforward, well-scoped hardening. No blocking concerns. ✓


@Yeraze Yeraze merged commit be88753 into main Jun 23, 2026
19 checks passed
@Yeraze Yeraze deleted the chore/bug-investigator-grounding-opus branch June 23, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant