Skip to content

feat(opencode): declare-and-stat external artifact protocol in bash tool#561

Merged
Astro-Han merged 4 commits into
devfrom
codex/bash-expected-outputs
May 12, 2026
Merged

feat(opencode): declare-and-stat external artifact protocol in bash tool#561
Astro-Han merged 4 commits into
devfrom
codex/bash-expected-outputs

Conversation

@Astro-Han

@Astro-Han Astro-Han commented May 12, 2026

Copy link
Copy Markdown
Owner

Summary

Add a declared external artifact protocol to the bash tool so assistant-declared output files are stat-verified after command execution and registered in turn-change.

Why

PawWork's "本轮生成文件" only showed files written through write/edit/apply_patch. Files created by third-party CLIs through bash, such as officecli-generated .docx files, bypassed turnChange.recordWrite() and never appeared in the UI.

Related Issue

No GitHub issue. This PR implements Slock task #33 from #PawWork:2b44653e.

Human Review Status

Pending. A human should make the final merge decision after reviewing the final diff and verification evidence.

Review Focus

  • packages/opencode/src/tool/bash.ts: declared expected_outputs flow, pre/post state capture, and recordWrite() only on real file changes
  • packages/opencode/test/tool/bash.test.ts: failed-but-wrote behavior, unchanged-path suppression, and no-declaration compatibility
  • skills/document-processing/SKILL.md: the one-line guidance for third-party CLI output declaration

Risk Notes

Low to medium. This changes bash tool schema and records declared file side effects, including binary files, into turn-change. Main things to verify are path normalization, external-directory permission behavior, and that undeclared bash commands remain unchanged.

How To Verify

Focused expected_outputs tests: 4 passed in packages/opencode/test/tool/bash.test.ts
Bash tool suite smoke: 31 passed in packages/opencode/test/tool/bash.test.ts
Typecheck: ok (`bun --cwd packages/opencode typecheck`)
Diff check: no whitespace errors (`git diff --check`)
Package lint/smoke scripts: none defined under packages/opencode; full bash.test used as tool-level smoke

Screenshots or Recordings

Not applicable. No visible UI change was introduced directly; this is tool/runtime plumbing verified through turn-change tests.

Checklist

  • Human review status is stated above as pending, approved, or not required
  • I linked the related issue, or stated why there is no issue
  • This PR has type, primary area, and priority labels, or I requested maintainer labeling
  • I described the review focus and any meaningful risks
  • I listed the relevant verification steps and the key result for each
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope
  • I manually checked visible UI or copy changes when needed, with screenshots or recordings
  • I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes
  • I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant
  • I reviewed the final diff for unrelated changes and suspicious dependency changes
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English

Summary by CodeRabbit

  • New Features

    • File-tracking for bash commands: declare expected output paths to detect and record file creation, binary/text changes, and produce per-run artifact metadata.
  • Tests

    • Added tests validating artifact recording for added/unchanged files, behavior on non-zero exits, binary outputs, and indeterminate pre-state handling.
  • Documentation

    • Clarified that absolute output paths should be listed in expected_outputs for third‑party CLI outputs.

Review Change Stack

@Astro-Han Astro-Han added enhancement New feature or request P2 Medium priority harness Model harness, prompts, tool descriptions, and session mechanics labels May 12, 2026
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The bash tool now supports declaring expected output files and tracking their changes. The Parameters schema gains an optional expected_outputs array, and file state is captured before and after execution using content hashing. When files differ, changes are recorded into the turn-change system. Tests verify artifact recording, change detection, and backward compatibility.

Changes

Bash Tool Expected Output Tracking

Layer / File(s) Summary
Parameters schema and hashing utilities
packages/opencode/src/tool/bash.ts
Parameters extended with optional expected_outputs array. New helpers (textHash, binaryHash, sameState) detect file changes across existence, content (with BOM awareness), size, and binary flags.
Imports and service setup
packages/opencode/src/tool/bash.ts
Node fs/promises and crypto imported. Service acquisition includes turnChange from TurnChange.Service. Directory permission checks updated to use afs service.
File state tracking and pre-execution setup
packages/opencode/src/tool/bash.ts
readTrackedState method reads and classifies files (directories, large, binary, UTF-8 with BOM). Pre-execution resolves, normalizes, constrains expected_outputs to external directory, and reads initial state before command runs.
Post-execution comparison and artifact recording
packages/opencode/src/tool/bash.ts
After execution, compares before/after file states, records write events to turnChange for changed files, and augments result with artifacts array describing existence, change status, and file flags.
Test setup and expected_outputs coverage
packages/opencode/test/tool/bash.test.ts
TurnChange.defaultLayer wired into runtime. createTurn() helper creates sessions and persists messages. Test block verifies artifact recording, change detection on non-zero exit, unchanged files, and legacy behavior without expected_outputs.
Usage documentation
skills/document-processing/SKILL.md
Documentation note instructs recording absolute bash tool output paths in expected_outputs for PawWork change tracking.

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Possibly related issues:
    • #562: Adds and tests expected_outputs handling; may relate to additional test cases for text/BOM hashing and workdir-relative path resolution.

"A rabbit hops through the bash tool door,
'Track thy outputs!' it did roar.
Hashes hum for files before and after,
Turn-change notes the new artifact laughter." 🐇🔍

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: adding a declared external artifact protocol to the bash tool for stat-verifying and registering assistant-declared output files.
Description check ✅ Passed The description follows the template with all required sections completed: summary, why, related issue, human review status, review focus, risk notes, verification steps, and a comprehensive checklist.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/bash-expected-outputs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an expected_outputs parameter to the bash tool, allowing the runtime to track and register file changes made by external CLI commands. It includes logic for hashing file states, detecting binary content, and recording these changes as artifacts in the session's turn history. A review comment identified significant code duplication between the new readTrackedState function and existing logic in turn-change.ts, recommending consolidation into a shared utility. Additionally, the reviewer noted inconsistent indentation in the newly added block that violates the project's 2-space rule.

Comment thread packages/opencode/src/tool/bash.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/tool/bash.ts`:
- Line 4: Refactor readTrackedState to stop using Effect.promise with direct
nodefs calls and instead use Effect.gen to access the AppFileSystem service;
remove the import of node:fs/promises, call yield* AppFileSystem.Service inside
Effect.gen, replace nodefs.stat() with the service's stat() method and
nodefs.readFile() with the service.readBytes() (which returns a Buffer for
binary detection), and ensure the function yields/handles the Effect results
appropriately using Effect.gen and the AppFileSystem.Service API.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 09d52749-e454-4174-b178-b6baebc5c5ca

📥 Commits

Reviewing files that changed from the base of the PR and between 26416f1 and aa1b54d.

📒 Files selected for processing (3)
  • packages/opencode/src/tool/bash.ts
  • packages/opencode/test/tool/bash.test.ts
  • skills/document-processing/SKILL.md

Comment thread packages/opencode/src/tool/bash.ts
@Astro-Han

Copy link
Copy Markdown
Owner Author

Batching the internal review loop here in English for the public PR record.

GLM first-pass review

Verdict: P3 minor, non-blocking.

Finding:

  • skills/document-processing/SKILL.md said "absolute file path" while the implementation also supports paths relative to workdir through resolveExecutionPath().

Everything else in the first-pass review matched the brief:

  • happy path covered
  • failed-but-wrote covered
  • unchanged path suppression covered
  • no-declaration backward compatibility covered
  • no stdout parsing
  • no cwd auto-diff
  • declared paths only

Opus second-pass review

Verdict before fixup: conditional pass with one P2 + two P3.

Main points:

  • Architecture matched the agreed external-artifact protocol.
  • The important correctness hole was the readTrackedState() fallback for unknown errnos, which could create a false-positive corridor if one side read as error:EACCES and the other as ENOENT.
  • Directory artifacts were also being labeled as binary in the metadata block, which was misleading.
  • Two additional test ideas were recommended: text-file coverage and relative-path coverage.

Follow-up from this branch

Fixup e467089d4 addresses the blocking and immediate non-blocking review items:

  • fixes the indeterminate-state false-positive risk by marking unknown-error states as non-comparable and skipping recordWrite() when either side is indeterminate
  • adds a regression test proving an indeterminate before state does not create a false-positive turn-change entry
  • emits directory: true instead of mislabeling directory artifacts as binary
  • updates the skill wording to say expected_outputs may be absolute or relative to workdir

The smaller coverage additions for a normal text-file case and a relative-path case are tracked separately in follow-up issue #562 to keep this fixup tight.

@Astro-Han Astro-Han left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Engineering final review from @GPT-X.

No new implementation blockers found in the current PR head (e467089d4). The fixup addresses the previously identified correctness hole: indeterminate file states are now non-comparable, so before=EACCES / after=ENOENT no longer creates a false-positive recordWrite(). The regression test covers that path. Directory artifact metadata now reports directory: true instead of binary: true, and the SKILL wording matches the implementation's absolute-or-workdir-relative path support.

I re-verified locally:

  • bun --cwd packages/opencode test test/tool/bash.test.ts --timeout 30000 -> 32 pass
  • bun --cwd packages/opencode typecheck -> pass
  • git diff --check dev...HEAD -> pass

Remaining merge gates are procedural, not new code findings:

  • unit-opencode is still pending on GitHub Actions at the time of this review.
  • CodeRabbit's latest review is still pending.
  • Existing bot review threads should be replied to and resolved before merge. I do not consider the requested AppFileSystem refactor or shared-helper consolidation necessary for this v1; the current implementation follows the already-used turn-change.ts direct filesystem pattern, and broader deduplication can stay out of this PR.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/opencode/test/tool/bash.test.ts (1)

417-466: 💤 Low value

Spy mock with type bypassing is fragile but necessary.

Lines 432 and 438 use as any to bypass TypeScript's type checking when mocking nodefs.readFile. While this is a common pattern for mocking native modules in tests, it reduces type safety:

  • The mock implementation relies on runtime type inspection (typeof args[0] === "string") instead of compile-time checks
  • Changes to the real readFile signature won't be caught at compile time
  • mockImplementationOnce assumes a specific call order (first read fails, subsequent reads succeed)

The test correctly restores the spy in the finally block and validates the expected behavior. Consider documenting why only the first call is mocked (to simulate "before" state read failure while allowing "after" state checks to succeed).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/tool/bash.test.ts` around lines 417 - 466, The test
uses a fragile type-bypassing spy on nodefs.readFile (see readSpy created with
spyOn(nodefs as any, "readFile") and mockImplementationOnce) to simulate an
EACCES on the initial read only; add an inline comment immediately above that
spy explaining: why we cast to any, that mockImplementationOnce is intentionally
used to fail only the "before" read while allowing subsequent reads to call the
real originalReadFile, and that this preserves the post-action verification
flow; mention that the spy is restored in the finally block
(readSpy.mockRestore()) to avoid leaking the mock into other tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/test/tool/bash.test.ts`:
- Around line 55-73: The object passed into SessionNs.updateMessage in
createTurn currently uses a double-cast "as unknown as MessageV2.Info"; replace
that cast with a proper typed value (e.g., declare const message:
MessageV2.Assistant = { ... } or build the literal and append "satisfies
MessageV2.Assistant") and then pass message to SessionNs.updateMessage; ensure
you remove the "as unknown as MessageV2.Info" and use the MessageV2.Assistant
type/satisfies check so the shape matches compile-time expectations in
createTurn and the updateMessage call.

---

Nitpick comments:
In `@packages/opencode/test/tool/bash.test.ts`:
- Around line 417-466: The test uses a fragile type-bypassing spy on
nodefs.readFile (see readSpy created with spyOn(nodefs as any, "readFile") and
mockImplementationOnce) to simulate an EACCES on the initial read only; add an
inline comment immediately above that spy explaining: why we cast to any, that
mockImplementationOnce is intentionally used to fail only the "before" read
while allowing subsequent reads to call the real originalReadFile, and that this
preserves the post-action verification flow; mention that the spy is restored in
the finally block (readSpy.mockRestore()) to avoid leaking the mock into other
tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f0432a37-1836-4240-8da5-c1f7d34fe42a

📥 Commits

Reviewing files that changed from the base of the PR and between aa1b54d and e467089.

📒 Files selected for processing (3)
  • packages/opencode/src/tool/bash.ts
  • packages/opencode/test/tool/bash.test.ts
  • skills/document-processing/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • skills/document-processing/SKILL.md
  • packages/opencode/src/tool/bash.ts

Comment thread packages/opencode/test/tool/bash.test.ts
@Astro-Han Astro-Han merged commit d369bf8 into dev May 12, 2026
22 checks passed
@Astro-Han Astro-Han deleted the codex/bash-expected-outputs branch May 12, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant