feat: add structured tool failure reasons by Astro-Han · Pull Request #446 · Astro-Han/pawwork

Astro-Han · 2026-05-05T08:49:48Z

Summary

Add structured tool failure diagnostics for ordinary errored tool calls.

This PR introduces a small failure classifier, persists metadata.diagnostics.failure on tool-error state, appends a concise model-facing recovery hint during message replay, and preserves only safe failure fields in raw/sanitized exports.

Why

Fixes #439.

Today ordinary tool failures are mostly flattened to an error string. That makes it hard for the agent to decide whether to fix arguments, stop after a user abort, ask for permission, check local setup, or report a provider problem. This keeps the original error text but adds a stable local reason and generic recovery hint.

Related Issue

Fixes #439.

Human Review Status

Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.

Review Focus

Whether the initial failure classifier categories are conservative enough.
Whether metadata.diagnostics.failure is merged without disturbing existing loop diagnostics.
Whether export sanitization preserves only safe generic fields and continues redacting raw input/error/metadata.

Risk Notes

Model behavior may change for errored tool results because output-error.errorText now includes a short recovery hint.
Misclassification could nudge the model toward the wrong recovery path. The fallback remains unknown.
No migration is needed because the new metadata field is optional.
No UI copy, telemetry, provider retry policy, or Bash timeout behavior is changed.

How To Verify

Targeted tests: 135 passed, 0 failed
Command: bun --cwd packages/opencode test test/session/tool-failure.test.ts test/session/message-v2.test.ts test/session/diagnostics.test.ts test/session/export.test.ts test/session/prompt-effect.test.ts

Typecheck: passed
Command: bun --cwd packages/opencode typecheck

Diff check: no whitespace errors
Command: git diff --check

Screenshots or Recordings

Not applicable. No visible UI change.

Checklist

coderabbitai · 2026-05-05T08:49:54Z

Warning

Rate limit exceeded

@Astro-Han has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 30 minutes and 13 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9be19f10-2f63-49ed-9819-3fc65cf437b6

📥 Commits

Reviewing files that changed from the base of the PR and between 12889f1 and d609e43.

📒 Files selected for processing (11)

packages/opencode/src/session/diagnostics.ts
packages/opencode/src/session/export.ts
packages/opencode/src/session/message-v2.ts
packages/opencode/src/session/processor.ts
packages/opencode/src/session/tool-failure.ts
packages/opencode/src/tool/sensitive.ts
packages/opencode/test/session/diagnostics.test.ts
packages/opencode/test/session/export.test.ts
packages/opencode/test/session/message-v2.test.ts
packages/opencode/test/session/prompt-effect.test.ts
packages/opencode/test/session/tool-failure.test.ts

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/i439-tool-failure-reasons

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a structured tool failure classification system to provide better recovery hints to models when tool executions fail. It adds a new tool-failure.ts module for classifying errors into categories like permission_denied, invalid_arguments, and environment, and integrates this metadata into session diagnostics, exports, and model message formatting. Feedback was provided to improve type safety by replacing any with unknown in the object helper and adding defensive type checks when extracting errorKind from untrusted metadata.

Astro-Han added enhancement New feature or request P1 High priority harness Model harness, prompts, tool descriptions, and session mechanics labels May 5, 2026

gemini-code-assist Bot reviewed May 5, 2026

View reviewed changes

Comment thread packages/opencode/src/session/tool-failure.ts Outdated

Comment thread packages/opencode/src/session/tool-failure.ts

Astro-Han added 3 commits May 5, 2026 17:02

feat: classify tool failures

5491e59

feat: persist tool failure hints

5a4c4b8

feat: preserve tool failure export metadata

d609e43

Astro-Han force-pushed the codex/i439-tool-failure-reasons branch from 2664b6e to d609e43 Compare May 5, 2026 09:04

Astro-Han merged commit 16cebab into dev May 5, 2026
20 checks passed

Astro-Han deleted the codex/i439-tool-failure-reasons branch May 5, 2026 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add structured tool failure reasons#446

feat: add structured tool failure reasons#446
Astro-Han merged 3 commits into
devfrom
codex/i439-tool-failure-reasons

Astro-Han commented May 5, 2026

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Astro-Han commented May 5, 2026

Summary

Why

Related Issue

Human Review Status

Review Focus

Risk Notes

How To Verify

Screenshots or Recordings

Checklist

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 5, 2026 •

edited

Loading