Skip to content

feat: add structured tool failure reasons#446

Merged
Astro-Han merged 3 commits into
devfrom
codex/i439-tool-failure-reasons
May 5, 2026
Merged

feat: add structured tool failure reasons#446
Astro-Han merged 3 commits into
devfrom
codex/i439-tool-failure-reasons

Conversation

@Astro-Han

Copy link
Copy Markdown
Owner

Summary

Add structured tool failure diagnostics for ordinary errored tool calls.

This PR introduces a small failure classifier, persists metadata.diagnostics.failure on tool-error state, appends a concise model-facing recovery hint during message replay, and preserves only safe failure fields in raw/sanitized exports.

Why

Fixes #439.

Today ordinary tool failures are mostly flattened to an error string. That makes it hard for the agent to decide whether to fix arguments, stop after a user abort, ask for permission, check local setup, or report a provider problem. This keeps the original error text but adds a stable local reason and generic recovery hint.

Related Issue

Fixes #439.

Human Review Status

Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.

Review Focus

  • Whether the initial failure classifier categories are conservative enough.
  • Whether metadata.diagnostics.failure is merged without disturbing existing loop diagnostics.
  • Whether export sanitization preserves only safe generic fields and continues redacting raw input/error/metadata.

Risk Notes

  • Model behavior may change for errored tool results because output-error.errorText now includes a short recovery hint.
  • Misclassification could nudge the model toward the wrong recovery path. The fallback remains unknown.
  • No migration is needed because the new metadata field is optional.
  • No UI copy, telemetry, provider retry policy, or Bash timeout behavior is changed.

How To Verify

Targeted tests: 135 passed, 0 failed
Command: bun --cwd packages/opencode test test/session/tool-failure.test.ts test/session/message-v2.test.ts test/session/diagnostics.test.ts test/session/export.test.ts test/session/prompt-effect.test.ts

Typecheck: passed
Command: bun --cwd packages/opencode typecheck

Diff check: no whitespace errors
Command: git diff --check

Screenshots or Recordings

Not applicable. No visible UI change.

Checklist

  • Human review status is stated above as pending, approved, or not required
  • I linked the related issue, or stated why there is no issue
  • This PR has type, scope, and priority labels, or I requested maintainer labeling
  • I described the review focus and any meaningful risks
  • I listed the relevant verification steps and the key result for each
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope
  • I manually checked visible UI or copy changes when needed, with screenshots or recordings
  • I considered macOS and Windows impact for desktop, packaging, updater, signing, paths, shell, or permissions changes
  • I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant
  • I reviewed the final diff for unrelated changes and suspicious dependency changes
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English

@Astro-Han Astro-Han added enhancement New feature or request P1 High priority harness Model harness, prompts, tool descriptions, and session mechanics labels May 5, 2026
@coderabbitai

coderabbitai Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@Astro-Han has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 30 minutes and 13 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9be19f10-2f63-49ed-9819-3fc65cf437b6

📥 Commits

Reviewing files that changed from the base of the PR and between 12889f1 and d609e43.

📒 Files selected for processing (11)
  • packages/opencode/src/session/diagnostics.ts
  • packages/opencode/src/session/export.ts
  • packages/opencode/src/session/message-v2.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/tool-failure.ts
  • packages/opencode/src/tool/sensitive.ts
  • packages/opencode/test/session/diagnostics.test.ts
  • packages/opencode/test/session/export.test.ts
  • packages/opencode/test/session/message-v2.test.ts
  • packages/opencode/test/session/prompt-effect.test.ts
  • packages/opencode/test/session/tool-failure.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/i439-tool-failure-reasons

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a structured tool failure classification system to provide better recovery hints to models when tool executions fail. It adds a new tool-failure.ts module for classifying errors into categories like permission_denied, invalid_arguments, and environment, and integrates this metadata into session diagnostics, exports, and model message formatting. Feedback was provided to improve type safety by replacing any with unknown in the object helper and adding defensive type checks when extracting errorKind from untrusted metadata.

Comment thread packages/opencode/src/session/tool-failure.ts Outdated
Comment thread packages/opencode/src/session/tool-failure.ts
@Astro-Han Astro-Han force-pushed the codex/i439-tool-failure-reasons branch from 2664b6e to d609e43 Compare May 5, 2026 09:04
@Astro-Han Astro-Han merged commit 16cebab into dev May 5, 2026
20 checks passed
@Astro-Han Astro-Han deleted the codex/i439-tool-failure-reasons branch May 5, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request harness Model harness, prompts, tool descriptions, and session mechanics P1 High priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add structured tool failure reasons for agent recovery

1 participant