You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want PawWork's base tool surface to be smaller, more honest, and harder for models to misuse. This issue covers three concurrent strands:
Description cleanup (including over-push framing in todowrite.txt): remove fictional examples, delegation hype, lying behavioral claims, content that teaches wrong tool boundaries, framing that pushes tool selection as performance rather than organization, and filling teaching gaps that bias weaker models on common tool boundaries (e.g. line-deletion semantics in edit.txt).
Redundant tool removal: delete tools whose function fully overlaps another tool or whose default-path exposure is zero.
Deletion guardrail hardening: stop the bash description from teaching rm as a normal example, and strengthen the trash-vs-rm boundary.
The goal is moving PawWork toward Claude Code's tool count (~15) and away from Gemini CLI's (~23), without going as bare as Codex CLI's (~8) which is GPT-only and would break PawWork's open-to-all-models stance.
What do you do today?
task.txt embeds two fictional example agents (code-reviewer, greeting-responder) that do not exist in the PawWork subagent list. It explicitly tells models to "Launch multiple agents concurrently whenever possible" and that "The agent's outputs should generally be trusted" — both push delegation harder than PawWork's product direction warrants.
glob.txt tells models to "speculatively perform multiple searches as a batch".
skill.txt says skills are "listed in the system prompt" in two places (L1 and L5); the actual mechanism is registry.ts:283 (describeSkill) appending the list at the end of the skill tool description itself — both pointers are literally wrong.
multiedit.txt claims atomicity ("All edits must be valid for the operation to succeed - if any edit fails, none will be applied"), but multiedit.ts:38-50 is a plain for loop calling edit.execute() with each successful write landing immediately and no rollback. The description lies about runtime behavior; users get half-edited files when later edits fail.
A separate finding from the spec audit: multiedit is not currently registered in registry.ts builtin tool list. The tool is referenced as string literals or test/spec mentions in 9 paths: packages/opencode/src/permission/index.ts:297 (EDIT_TOOLS = ["edit", "write", "apply_patch", "multiedit"]), packages/opencode/src/config/agent.ts:111 (legacy migration union check), packages/opencode/src/config/config.ts:733 (legacy migration union check), packages/opencode/specs/effect-migration.md:269 (one-line spec mention), packages/opencode/test/agent/agent.test.ts:493 (test name), packages/opencode/test/config/config.test.ts:2006-2030 (entire migrates legacy multiedit tool to edit permission test block — multiedit-specific migration test), and packages/opencode/test/permission/next.test.ts:390-401 (test name + array entry + assertion). So removing multiedit does NOT change any registered model's tool surface; it removes dead code, the lying description, three string-literal references in src, one spec mention, and three test references. Per user decision: PawWork has no legacy users / no legacy config in the wild, so the entire multiedit migration code path is removed (not preserved as a no-op).
codesearch.txt is gated to providerID === ProviderID.opencode || Flag.OPENCODE_ENABLE_EXA (registry.ts:319-321). The same gate also applies to WebSearchTool.id in the same OR condition — the gate covers both tools. PawWork users on the default Zen provider never see codesearch, but websearch goes through this same gate and must be preserved when codesearch is removed. The codesearch description still pushes "Use this tool for ANY question or task related to programming"; the .txt + .ts + dispatcher branch sit in the codebase for no PawWork value.
bash.txt L16 uses rm "path with spaces/file.txt" as the quoting example, then never warns against rm anywhere. trash.txt (4 lines) tries to redirect deletion through trash, but bash.txt's reverse-push is louder and models keep reaching for rm then failing.
Important context for the "then failing" half: the patterns rm */rmdir */unlink */find * -delete*/sudo */dd */mkfs*/chmod */kill * are all set to deny in PawWork's build-agent default permission ruleset at packages/opencode/src/agent/agent.ts:99-110, validated by packages/opencode/test/permission/pawwork-defaults.test.ts. So rm calls already hard-fail at the permission layer — this is an existing PawWork carveout over upstream, not something this issue introduces. Note: the deny ruleset covers POSIX deletion paths only; Windows deletion commands (del, erase, rd, Remove-Item) are NOT currently denied — tracked as a separate signal for follow-up after PR1 lands. The bash.txt + trash.txt edits in PR1 reduce the wasted first-attempt that ends in a deny error and stop bash.txt from actively suggesting rm as the canonical example; they are NOT the layer doing the actual blocking.
bash.txt does not have a single "DO NOT" list section — current text is prose plus a bullet list at L31 (Avoid using Bash with the find, grep, cat, head, tail, sed, awk, or echo commands...). The rm guardrail will be added by extending that L31 bullet list, not by inserting a new "DO NOT" section.
bash.txt L86 (NEVER use the TodoWrite or Task tools) and L113 (DO NOT use the TodoWrite or Task tools) instruct models against TodoWrite/Task tool use during git/PR workflows — this contradicts PawWork's project-level encouragement of TodoWrite for multi-step work. Both lines sit inside the deferred 66-line git/PR workflow block; the surgical deletion of these two lines is an explicit exception to the otherwise-deferred status of that block.
apply_patch.txt L12 and L26 contain *** Delete File: <path> syntax which lets the model delete files directly through the patch envelope, bypassing the trash guardrail. The first audit pass missed this; apply_patch.txt is not clean and needs a guardrail line added (prose-level only — runtime dispatcher enforcement is out of scope, same upstream-foundation reasoning as keeping apply_patch itself).
todowrite.txt L1 frames the tool as helping the user "track progress, organize complex tasks, and demonstrate thoroughness to the user". L166 reads When in doubt, use this tool. Being proactive with task management demonstrates attentiveness and ensures you complete all requirements successfully. Both push using the tool as performance, not organization.
edit.txt is currently 10 lines of exact-replacement semantics but does not teach the line-deletion boundary case. Weaker models routinely delete a single line by passing the line content as oldString and an empty newString but forget the trailing newline character — leaving a stray blank line at the deleted position. PawWork's default audience includes weaker models (GLM-5.1, Kimi K2.6, Qwen Coder); for them this teaching gap matters more than for stronger models that have absorbed the convention from training data. This is not a "teach wrong" failure but a "fail to teach" failure on a known stumbling block.
Deferred (still not in this issue): bash.txt ~64 lines of remaining git/PR workflow (after L86/L113 surgical removals); todowrite.txt's 8-example demonstration block. Both are "compress for context budget" calls that need validation infrastructure to land safely.
What would a good result look like?
multiedit and codesearch are removed (description + implementation + string-literal references + dispatcher gate). Existing descriptions stop teaching wrong behavior, lying about runtime semantics, or contradicting other PawWork project rules. The trash-vs-rm boundary becomes the obvious default rather than competing with the bash example. apply_patch's deletion syntax has a prose-level guardrail line. Effect on visible tool count for the default profile (Zen provider + non-GPT model + no LSP + no plan flag): zero net change at runtime — codesearch is already gated off for default Zen users, and multiedit was never registered. The cleanup matters by reducing dead code, lying descriptions, and incorrect pointers, not by altering what models see at runtime under default settings. Nothing requires new fixture infrastructure to land.
Which audience does this matter to most?
Both
Extra context — why this is a PawWork carveout, not an upstream feature request
opencode is shaped around dev-first defaults — its users routinely operate commit/PR/gh CLI as everyday vocabulary, and apply_patch + heavy delegation hype + Exa code search are net-positive there. PawWork's audience distribution is wider (covers technical and non-technical users; default tasks span documents, spreadsheets, code, research, life admin) and the default-task distribution is different. The same description text produces inconsistent quality on the two surfaces — not "right vs wrong", but a difference in product-positioning defaults.
Permanent carveout for packages/opencode/src/tool/*.txt: during upstream sync, always accept HEAD on these files. The underlying runtime (tool.ts dispatcher logic) keeps following upstream via graft + squash.
Combined because multiedit is dead code (not registered) and its removal carries the same risk profile as the .txt edits. Touches 8 .txt files + 5 source-code locations.
Description edits (8 files):
task.txt: delete fictional example agents block (code-reviewer, greeting-responder, <example_agent_descriptions> through end of file) plus surrounding Example usage (NOTE: ...) framing. This explicitly includes the two <example> blocks at task.txt L30-49 and L51-57 which reference the fictional code-reviewer/greeting-responder agents — both must go. Delete Usage notes item 1 (Launch multiple agents concurrently whenever possible) and item 4 (outputs should generally be trusted); renumber remaining consecutively. Rewrite L12 Other tasks that are not related to the agent descriptions above → Other tasks that are not related to the available agents. Rewrite the Usage note that contains the phrase agent description (currently L21, If the agent description mentions ...) by content match (not by post-renumber index) to point at "the available agents". (Both rewrites use schema-independent phrasing — "available agents" — to avoid layout-dependent terms like "above" or "in this tool description".) No new schema anchor on L3.
glob.txt L6: delete only the second sentence (It is always better to speculatively perform multiple searches as a batch that are potentially useful.). Keep the first sentence.
skill.txt L1 AND L5: rewrite listed in the system prompt / listed in your system prompt → listed below in both lines. Literal-pointer fix; the list is appended at the end of the skill tool description by describeSkill, not in system prompt.
bash.txt:
L16: replace the rm quoting example with a non-destructive command. Pick from bash.txt's own existing examples (mkdir, python); do NOT use cat because L31's avoid-list explicitly forbids cat via Bash, which would be self-contradictory.
L31 already contains a structured nested sub-bullet list under "Avoid using Bash with..." (sub-bullets like File search: Use Glob (NOT find or ls), Content search: Use Grep (NOT grep or rg), Read files: Use Read (NOT cat/head/tail), Edit files: Use Edit (NOT sed/awk), Write files: Use Write (NOT echo >/cat <<EOF)). Add a NEW sub-bullet at the same nesting level paralleling those: File deletion: Use the trash tool, NOT \rm`/`del` (rm permanently deletes; trash is reversible)`.
Delete L86 (NEVER use the TodoWrite or Task tools) and L113 (DO NOT use the TodoWrite or Task tools). Both lines surgical — rest of L51-L117 git/PR workflow block stays untouched (deferred).
trash.txt: expand from 4 to ~7 lines. Diff must contain phrases mentioning (a) reversibility (Trash is recoverable), (b) rm/del contrast (those are permanent and unrecoverable), and (c) when to use trash (any file/dir deletion in user-facing flows). PR diff verification is mechanical: grep the diff for these three categories.
todowrite.txt L1 AND L166: delete the demonstrate thoroughness to the user framing on L1, and delete the entire L166 line (When in doubt, use this tool. Being proactive with task management demonstrates attentiveness and ensures you complete all requirements successfully.).
apply_patch.txt: NOT audit clean. Add an explicit guardrail line near the top: Use \*** Delete File:` only when the user has explicitly asked to delete a specific file by path. For any heuristic or cleanup deletion (e.g. removing temp files, cleaning up after a refactor), use the trash tool, not this syntax.Phrased as user-intent ("explicitly asked" vs "heuristic") rather than git-state ("tracked" vs "ad-hoc") because apply_patch operates on filesystem paths regardless of git tracking — models cannot reliably distinguish tracked from untracked. This is best-effort prose-level guardrail; runtime dispatcher-level enforcement (rejecting*** Delete File:headers) is out of scope — same upstream-foundation reasoning as keepingapply_patch` itself.
edit.txt: append a line-deletion hint as a new bullet at the end of the existing usage-notes list: To delete a line cleanly, the \oldString` must include the trailing newline character. Otherwise only the line content is removed and a stray blank line remains at the deleted position.` One-line teaching addition (not a rewrite); fills the boundary-case gap that biases weaker models toward stray-blank-line bugs.
lsp.txt: audit clean. Note in PR body.
Multiedit removal (9 paths total — 5 src + 1 spec + 3 test files):
Delete packages/opencode/src/tool/multiedit.ts.
Delete packages/opencode/src/tool/multiedit.txt.
Edit packages/opencode/src/permission/index.ts:297: remove "multiedit" from EDIT_TOOLS array.
Edit packages/opencode/src/config/agent.ts:111: remove || tool === "multiedit" from the union check.
Edit packages/opencode/src/config/config.ts:733: remove || tool === "multiedit" from the union check.
Edit packages/opencode/specs/effect-migration.md:269: remove the multiedit list entry (one-line removal).
Edit packages/opencode/test/agent/agent.test.ts:493: rename test from "legacy tools config maps write/edit/patch/multiedit to edit permission" to "legacy tools config maps write/edit/patch to edit permission" (test logic unchanged).
Edit packages/opencode/test/config/config.test.ts:2006-2030: delete the entire test("migrates legacy multiedit tool to edit permission", ...) block — multiedit-specific migration test, no longer applies after removal.
Edit packages/opencode/test/permission/next.test.ts:390-401: remove multiedit from input array, remove expect(result.has("multiedit")).toBe(true) assertion, rename test name from "disabled - disables edit/write/apply_patch/multiedit when edit denied" to "disabled - disables edit/write/apply_patch when edit denied".
PR2 — Codesearch removal
Delete packages/opencode/src/tool/codesearch.txt.
Delete packages/opencode/src/tool/codesearch.ts.
In registry.ts:319-321, edit the OR condition tool.id === CodeSearchTool.id || tool.id === WebSearchTool.id to remove only the CodeSearchTool.id arm. Post-edit shape: if (tool.id === WebSearchTool.id) { return input.providerID === ProviderID.opencode || Flag.OPENCODE_ENABLE_EXA }. Websearch's providerID === ProviderID.opencode || Flag.OPENCODE_ENABLE_EXA gating is preserved.
Remove the CodeSearchTool import.
Edit packages/opencode/src/agent/agent.ts:185: remove the codesearch: "allow" permission entry from the default permission ruleset (otherwise the entry references a deleted tool).
Remove Flag.OPENCODE_ENABLE_EXA only after grepping every reference: packages/**, tests/**, docs/**, env-handling code, README/CHANGELOG, and any .env* template. If any non-codesearch reference exists, keep the flag.
Remove tests that exercise codesearch.
Out of scope
apply_patch removal: not in scope. Real OpenAI product optimization for GPT-5+ string-replace hallucination — patch format with anchor lines is significantly more accurate. Removal would require dispatcher rewiring (registry.ts:323-327 toggles apply_patch ↔ edit/write mutually exclusive when Env.get("OPENCODE_E2E_LLM_URL") is set OR modelID.includes("gpt-") and not oss/gpt-4; see registry.ts:323-327 for the full condition), conflicting with the "stay close to upstream until 10K users" foundation commitment. Description cleanup of apply_patch.txt (the *** Delete File prose guardrail) IS in scope as part of PR1.
bash.txt ~64 lines git/PR workflow (after L86/L113 removal): still deferred. Not teaching wrong behavior; pure context-budget compression that needs validation infrastructure.
todowrite.txt 8-example demonstration block: still deferred for the same reason.
Smoke transcript upgrade: reviewer suggestion declined as not a priority; revisit when adopting an autoresearch-style harness improvement framework.
Preconditions
PR1 multiedit-removal preconditions (run before opening PR):
rg -nw 'multiedit' packages/ tests/ docs/ --glob '!docs/superpowers/**' ; rg -n '"multiedit"' packages/ tests/ docs/ --glob '!docs/superpowers/**' ; rg -n 'MultiEditTool' packages/ tests/ docs/ --glob '!docs/superpowers/**' (three runs — word-boundary, quoted-form, AND CamelCase class name; ; not && because rg exits non-zero on no matches and && would skip subsequent runs when one finds nothing; --glob '!docs/superpowers/**' excludes the local plan file which itself contains multiedit references) — list every reference; the 9 known paths (tool/multiedit.ts, tool/multiedit.txt, permission/index.ts:297, config/agent.ts:111, config/config.ts:733, specs/effect-migration.md:269, test/agent/agent.test.ts:493, test/config/config.test.ts:2006-2030, test/permission/next.test.ts:390-401) must all be removed/updated in this PR.
PR2 codesearch-removal preconditions (run before opening PR):
rg -n 'codesearch|CodeSearchTool' packages/ tests/ docs/ — list every reference; all must be removable.
rg -n 'OPENCODE_ENABLE_EXA' packages/ tests/ docs/ '**/.env*' — list every reference; if any non-codesearch reference exists, keep the flag.
Confirm WebSearchTool.id retains its gating behavior post-edit (manual diff inspection of the OR condition).
Owner-local checklist (NOT in PR diffs; AGENTS.md is git-excluded)
These items are owner-only attestation, recorded here for completeness, and are NOT verifiable in PR diffs because AGENTS.md is excluded from git via .git/info/exclude:
AGENTS.md Engineering conventions section opens with the new principle: Improvements over upstream are allowed when there's a clear product reason. Strategic carveouts (UI rewrite, i18n zh+en only, tool descriptions) are permanent. One-off improvements (trash replacing rm, removing tools that don't fit PawWork) sit between fork and vendor — log in memory and review on each upstream sync, not as long-term divergence.
AGENTS.md long-term divergence list: existing tool/*.txt permanent carveout kept; add Tool registry: codesearch removed (for PR2) and Tool registry: multiedit removed (for PR1) entries when those PRs land.
New memory follow-up signal: trash-vs-rm prose-level guardrail is best-effort; if real sessions show models still routinely reaching for rm despite PR1, escalate to bash-dispatcher-level rm interception (separate issue, separate scope).
Acceptance criteria
PR1 (verifiable in PR diff)
task.txt: fictional agents block, Example usage framing, Usage 1 (Launch multiple agents concurrently), Usage 4 (outputs should generally be trusted) all removed. Remaining Usage notes renumbered with no orphan numbering.
task.txt L12 dangling above reference rewritten as specified.
task.txt Usage note containing agent description (currently L21) rewritten by content match to point at "the available agents" (schema-independent phrasing matching the PR1 scope description).
task.txt L3 carries no new schema anchor.
glob.txt L6: only second sentence removed; first sentence retained.
skill.txt L1 AND L5: both wrong pointers rewritten to listed below.
bash.txt L16: rm removed from quoting example; replaced with mkdir or python (NOT cat).
bash.txt L31 bullet list: new File deletion: Use the trash tool, NOT \rm`/`del`` bullet added.
bash.txt L86 (NEVER use the TodoWrite or Task tools) AND L113 (DO NOT use the TodoWrite or Task tools) deleted; rest of git/PR workflow block untouched.
trash.txt expanded; diff contains phrases mentioning (a) reversibility, (b) rm/del contrast, (c) when to use.
todowrite.txt L1 (demonstrate thoroughness) AND L166 (When in doubt, use this tool ... demonstrates attentiveness) edits applied.
apply_patch.txt adds the *** Delete File: prose guardrail line near the top.
edit.txt appends the line-deletion hint as the last bullet, mentioning the trailing newline requirement.
PR body notes audit result for lsp.txt (clean), apply_patch.txt (not clean → guardrail added), and edit.txt (teaching gap → hint added).
packages/opencode/src/tool/multiedit.ts deleted.
packages/opencode/src/tool/multiedit.txt deleted.
packages/opencode/src/permission/index.ts:297: "multiedit" removed from EDIT_TOOLS array.
packages/opencode/specs/effect-migration.md:269 multiedit list entry removed.
packages/opencode/test/agent/agent.test.ts:493 test renamed to "legacy tools config maps write/edit/patch to edit permission" (multiedit removed from name only; test body unchanged).
packages/opencode/test/permission/next.test.ts test renamed (multiedit removed from name), multiedit removed from input array, expect(result.has("multiedit")).toBe(true) assertion removed.
Post-deletion three-grep set (rg -nw 'multiedit', rg -n '"multiedit"', rg -n 'MultiEditTool') all return 0 hits across packages/ tests/ docs/ --glob '!docs/superpowers/**'.
bun --cwd packages/opencode test PASSES (the three modified test files all stay green).
registry.ts:319-321 edited to remove only the CodeSearchTool.id arm; post-edit shape is if (tool.id === WebSearchTool.id) { return ... } (single condition, not OR).
Flag.OPENCODE_ENABLE_EXA removed only if no non-codesearch reference exists; otherwise flag preserved.
All codesearch tests removed.
Validation (lightweight, no fixture)
For each PR, one manual smoke run per model in a real PawWork session.
Models: Opus, Sonnet, GLM-5.1, Kimi K2.6, Qwen Coder. Verify each model ID against the current PawWork model registry before running smoke (typo guard).
Provider routing: at minimum one run on Zen-routed default provider AND one run on ProviderID.opencode (PR2 codesearch gate change must not regress the opencode-provider websearch path).
PR1 smoke: a task that uses task dispatch; a deletion (verify trash route, not rm, and verify the model does NOT use apply_patch's *** Delete File: syntax for ad-hoc deletion either); a multi-edit-style change confirms repeated edit calls work; no model attempts to call multiedit; an edit-tool single-line deletion (verify the diff has no stray blank line at the deleted position — validates the new edit.txt hint changes weaker-model behavior).
PR2 smoke: a programming question on Zen (no codesearch dependence) AND on ProviderID.opencode (websearch still gated correctly).
Any model fabricating subagent_type = "code-reviewer" or attempting to call multiedit/codesearch is a regression to fix before merge.
CI keyword gate: removed
The original CI keyword gate proposal is dropped. Heavy regex-grep on text files is the wrong shape for this kind of guardrail. Anti-regression is enforced by PR diff review plus the manual smoke list.
Risks
R1 (model regression) — substantially reduced from the original plan. Description cleanup removes content that was either lying (multiedit atomicity), pushing wrong tool selection (code-reviewer fictional agent, speculatively batch, demonstrate thoroughness), or contradicting project rules (NEVER TodoWrite during git). Models losing these hints should not regress.
R8 (multiedit deletion breaks callers) — split path: (a) success path — calling edit multiple times is functionally equivalent (slightly more verbose); (b) failure path — multiedit was never atomic in code despite the description claim, so removing it does not lose any safety property that actually existed at runtime. Note: GPT-5+ models on usePatch=true (registry.ts:323-327) already do not see edit/write (gated off when patch mode is active), so multiedit removal does not change their effective edit surface — they continue to use apply_patch as the single edit tool. Non-GPT models retain edit/write and lose only the multiedit sugar.
R8b (codesearch deletion breaks callers) — PawWork users on default Zen never reached codesearch; opencode-provider users on the upstream surface are unaffected because the deletion does not propagate upstream (carveout).
R9 (one-off dispatcher divergence vs "10K users foundation" rule) — Mitigation: AGENTS.md "Improvements over upstream are allowed when there's a clear product reason" principle codifies that one-off product-driven removal is acceptable; long-term divergence list logs codesearch and multiedit; on upstream sync we accept HEAD then re-remove rather than treat them as permanent carveouts.
R10 (websearch gate accidentally broken in PR2) — codesearch and websearch share the same OR condition gate at registry.ts:319-321. Mitigation: PR2 acceptance criterion explicitly requires WebSearchTool.id arm remain intact, verifiable in diff. PR2 smoke includes a websearch run on ProviderID.opencode to catch any regression.
R11 (apply_patch *** Delete File: is the real bypass path; bash rm is already hard-blocked) — bash rm/rmdir/unlink/etc. are denied at the permission layer (see "What you do today" note on agent.ts:99-110), so they cannot bypass trash. The actual gap is apply_patch: *** Delete File: syntax goes through the apply_patch permission, not the bash permission, so the rm *: deny rule does not apply. Note: PawWork's permission grammar is per-tool (e.g. bash: { "rm *": "deny" }), not per-payload-pattern; adding something like apply_patch: { "*** Delete File:*": "deny" } is NOT a simple permission-rule change — it would require either new grammar support or dispatcher-level interception of patch envelope contents. The PR1 prose guardrail in apply_patch.txt is best-effort instruction to the model, not runtime enforcement. Accepted trade-off because dispatcher-level rejection of *** Delete File: carries the same upstream-foundation cost as removing apply_patch outright. Escalate only if real sessions show models routinely using *** Delete File: to delete user files.
What task are you trying to do?
We want PawWork's base tool surface to be smaller, more honest, and harder for models to misuse. This issue covers three concurrent strands:
todowrite.txt): remove fictional examples, delegation hype, lying behavioral claims, content that teaches wrong tool boundaries, framing that pushes tool selection as performance rather than organization, and filling teaching gaps that bias weaker models on common tool boundaries (e.g. line-deletion semantics inedit.txt).rmas a normal example, and strengthen the trash-vs-rm boundary.The goal is moving PawWork toward Claude Code's tool count (~15) and away from Gemini CLI's (~23), without going as bare as Codex CLI's (~8) which is GPT-only and would break PawWork's open-to-all-models stance.
What do you do today?
task.txtembeds two fictional example agents (code-reviewer,greeting-responder) that do not exist in the PawWork subagent list. It explicitly tells models to "Launch multiple agents concurrently whenever possible" and that "The agent's outputs should generally be trusted" — both push delegation harder than PawWork's product direction warrants.glob.txttells models to "speculatively perform multiple searches as a batch".skill.txtsays skills are "listed in the system prompt" in two places (L1 and L5); the actual mechanism isregistry.ts:283(describeSkill) appending the list at the end of the skill tool description itself — both pointers are literally wrong.multiedit.txtclaims atomicity ("All edits must be valid for the operation to succeed - if any edit fails, none will be applied"), butmultiedit.ts:38-50is a plainforloop callingedit.execute()with each successful write landing immediately and no rollback. The description lies about runtime behavior; users get half-edited files when later edits fail.A separate finding from the spec audit:
multieditis not currently registered inregistry.tsbuiltin tool list. The tool is referenced as string literals or test/spec mentions in 9 paths:packages/opencode/src/permission/index.ts:297(EDIT_TOOLS = ["edit", "write", "apply_patch", "multiedit"]),packages/opencode/src/config/agent.ts:111(legacy migration union check),packages/opencode/src/config/config.ts:733(legacy migration union check),packages/opencode/specs/effect-migration.md:269(one-line spec mention),packages/opencode/test/agent/agent.test.ts:493(test name),packages/opencode/test/config/config.test.ts:2006-2030(entiremigrates legacy multiedit tool to edit permissiontest block — multiedit-specific migration test), andpackages/opencode/test/permission/next.test.ts:390-401(test name + array entry + assertion). So removing multiedit does NOT change any registered model's tool surface; it removes dead code, the lying description, three string-literal references in src, one spec mention, and three test references. Per user decision: PawWork has no legacy users / no legacy config in the wild, so the entire multiedit migration code path is removed (not preserved as a no-op).codesearch.txtis gated toproviderID === ProviderID.opencode || Flag.OPENCODE_ENABLE_EXA(registry.ts:319-321). The same gate also applies toWebSearchTool.idin the same OR condition — the gate covers both tools. PawWork users on the default Zen provider never see codesearch, but websearch goes through this same gate and must be preserved when codesearch is removed. The codesearch description still pushes "Use this tool for ANY question or task related to programming"; the.txt+.ts+ dispatcher branch sit in the codebase for no PawWork value.bash.txtL16 usesrm "path with spaces/file.txt"as the quoting example, then never warns againstrmanywhere.trash.txt(4 lines) tries to redirect deletion through trash, but bash.txt's reverse-push is louder and models keep reaching forrmthen failing.Important context for the "then failing" half: the patterns
rm */rmdir */unlink */find * -delete*/sudo */dd */mkfs*/chmod */kill *are all set todenyin PawWork's build-agent default permission ruleset atpackages/opencode/src/agent/agent.ts:99-110, validated bypackages/opencode/test/permission/pawwork-defaults.test.ts. Sormcalls already hard-fail at the permission layer — this is an existing PawWork carveout over upstream, not something this issue introduces. Note: the deny ruleset covers POSIX deletion paths only; Windows deletion commands (del,erase,rd,Remove-Item) are NOT currently denied — tracked as a separate signal for follow-up after PR1 lands. The bash.txt + trash.txt edits in PR1 reduce the wasted first-attempt that ends in a deny error and stop bash.txt from actively suggestingrmas the canonical example; they are NOT the layer doing the actual blocking.bash.txtdoes not have a single "DO NOT" list section — current text is prose plus a bullet list at L31 (Avoid using Bash with the find, grep, cat, head, tail, sed, awk, or echo commands...). The rm guardrail will be added by extending that L31 bullet list, not by inserting a new "DO NOT" section.bash.txtL86 (NEVER use the TodoWrite or Task tools) and L113 (DO NOT use the TodoWrite or Task tools) instruct models against TodoWrite/Task tool use during git/PR workflows — this contradicts PawWork's project-level encouragement of TodoWrite for multi-step work. Both lines sit inside the deferred 66-line git/PR workflow block; the surgical deletion of these two lines is an explicit exception to the otherwise-deferred status of that block.apply_patch.txtL12 and L26 contain*** Delete File: <path>syntax which lets the model delete files directly through the patch envelope, bypassing the trash guardrail. The first audit pass missed this; apply_patch.txt is not clean and needs a guardrail line added (prose-level only — runtime dispatcher enforcement is out of scope, same upstream-foundation reasoning as keepingapply_patchitself).todowrite.txtL1 frames the tool as helping the user "track progress, organize complex tasks, and demonstrate thoroughness to the user". L166 readsWhen in doubt, use this tool. Being proactive with task management demonstrates attentiveness and ensures you complete all requirements successfully.Both push using the tool as performance, not organization.edit.txtis currently 10 lines of exact-replacement semantics but does not teach the line-deletion boundary case. Weaker models routinely delete a single line by passing the line content asoldStringand an emptynewStringbut forget the trailing newline character — leaving a stray blank line at the deleted position. PawWork's default audience includes weaker models (GLM-5.1, Kimi K2.6, Qwen Coder); for them this teaching gap matters more than for stronger models that have absorbed the convention from training data. This is not a "teach wrong" failure but a "fail to teach" failure on a known stumbling block.Deferred (still not in this issue):
bash.txt~64 lines of remaining git/PR workflow (after L86/L113 surgical removals);todowrite.txt's 8-example demonstration block. Both are "compress for context budget" calls that need validation infrastructure to land safely.What would a good result look like?
multieditandcodesearchare removed (description + implementation + string-literal references + dispatcher gate). Existing descriptions stop teaching wrong behavior, lying about runtime semantics, or contradicting other PawWork project rules. The trash-vs-rm boundary becomes the obvious default rather than competing with the bash example. apply_patch's deletion syntax has a prose-level guardrail line. Effect on visible tool count for the default profile (Zen provider + non-GPT model + no LSP + no plan flag): zero net change at runtime —codesearchis already gated off for default Zen users, andmultieditwas never registered. The cleanup matters by reducing dead code, lying descriptions, and incorrect pointers, not by altering what models see at runtime under default settings. Nothing requires new fixture infrastructure to land.Which audience does this matter to most?
Both
Extra context — why this is a PawWork carveout, not an upstream feature request
opencode is shaped around dev-first defaults — its users routinely operate
commit/PR/gh CLIas everyday vocabulary, and apply_patch + heavy delegation hype + Exa code search are net-positive there. PawWork's audience distribution is wider (covers technical and non-technical users; default tasks span documents, spreadsheets, code, research, life admin) and the default-task distribution is different. The same description text produces inconsistent quality on the two surfaces — not "right vs wrong", but a difference in product-positioning defaults.Permanent carveout for
packages/opencode/src/tool/*.txt: during upstream sync, always accept HEAD on these files. The underlying runtime (tool.tsdispatcher logic) keeps following upstream via graft + squash.Scope
Two PRs, each will land independently.
PR1 — Description cleanup + multiedit removal (combined)
Combined because multiedit is dead code (not registered) and its removal carries the same risk profile as the .txt edits. Touches 8 .txt files + 5 source-code locations.
Description edits (8 files):
task.txt: delete fictional example agents block (code-reviewer,greeting-responder,<example_agent_descriptions>through end of file) plus surroundingExample usage (NOTE: ...)framing. This explicitly includes the two<example>blocks at task.txt L30-49 and L51-57 which reference the fictionalcode-reviewer/greeting-responderagents — both must go. Delete Usage notes item 1 (Launch multiple agents concurrently whenever possible) and item 4 (outputs should generally be trusted); renumber remaining consecutively. Rewrite L12Other tasks that are not related to the agent descriptions above→Other tasks that are not related to the available agents. Rewrite the Usage note that contains the phraseagent description(currently L21,If the agent description mentions ...) by content match (not by post-renumber index) to point at "the available agents". (Both rewrites use schema-independent phrasing — "available agents" — to avoid layout-dependent terms like "above" or "in this tool description".) No new schema anchor on L3.glob.txtL6: delete only the second sentence (It is always better to speculatively perform multiple searches as a batch that are potentially useful.). Keep the first sentence.skill.txtL1 AND L5: rewritelisted in the system prompt/listed in your system prompt→listed belowin both lines. Literal-pointer fix; the list is appended at the end of the skill tool description bydescribeSkill, not in system prompt.bash.txt:rmquoting example with a non-destructive command. Pick frombash.txt's own existing examples (mkdir,python); do NOT usecatbecause L31's avoid-list explicitly forbidscatvia Bash, which would be self-contradictory.File search: Use Glob (NOT find or ls),Content search: Use Grep (NOT grep or rg),Read files: Use Read (NOT cat/head/tail),Edit files: Use Edit (NOT sed/awk),Write files: Use Write (NOT echo >/cat <<EOF)). Add a NEW sub-bullet at the same nesting level paralleling those:File deletion: Use the trash tool, NOT \rm`/`del` (rm permanently deletes; trash is reversible)`.NEVER use the TodoWrite or Task tools) and L113 (DO NOT use the TodoWrite or Task tools). Both lines surgical — rest of L51-L117 git/PR workflow block stays untouched (deferred).trash.txt: expand from 4 to ~7 lines. Diff must contain phrases mentioning (a) reversibility (Trash is recoverable), (b)rm/delcontrast (those are permanent and unrecoverable), and (c) when to use trash (any file/dir deletion in user-facing flows). PR diff verification is mechanical: grep the diff for these three categories.todowrite.txtL1 AND L166: delete thedemonstrate thoroughness to the userframing on L1, and delete the entire L166 line (When in doubt, use this tool. Being proactive with task management demonstrates attentiveness and ensures you complete all requirements successfully.).apply_patch.txt: NOT audit clean. Add an explicit guardrail line near the top:Use \*** Delete File:` only when the user has explicitly asked to delete a specific file by path. For any heuristic or cleanup deletion (e.g. removing temp files, cleaning up after a refactor), use the trash tool, not this syntax.Phrased as user-intent ("explicitly asked" vs "heuristic") rather than git-state ("tracked" vs "ad-hoc") because apply_patch operates on filesystem paths regardless of git tracking — models cannot reliably distinguish tracked from untracked. This is best-effort prose-level guardrail; runtime dispatcher-level enforcement (rejecting*** Delete File:headers) is out of scope — same upstream-foundation reasoning as keepingapply_patch` itself.edit.txt: append a line-deletion hint as a new bullet at the end of the existing usage-notes list:To delete a line cleanly, the \oldString` must include the trailing newline character. Otherwise only the line content is removed and a stray blank line remains at the deleted position.` One-line teaching addition (not a rewrite); fills the boundary-case gap that biases weaker models toward stray-blank-line bugs.lsp.txt: audit clean. Note in PR body.Multiedit removal (9 paths total — 5 src + 1 spec + 3 test files):
packages/opencode/src/tool/multiedit.ts.packages/opencode/src/tool/multiedit.txt.packages/opencode/src/permission/index.ts:297: remove"multiedit"fromEDIT_TOOLSarray.packages/opencode/src/config/agent.ts:111: remove|| tool === "multiedit"from the union check.packages/opencode/src/config/config.ts:733: remove|| tool === "multiedit"from the union check.packages/opencode/specs/effect-migration.md:269: remove the multiedit list entry (one-line removal).packages/opencode/test/agent/agent.test.ts:493: rename test from"legacy tools config maps write/edit/patch/multiedit to edit permission"to"legacy tools config maps write/edit/patch to edit permission"(test logic unchanged).packages/opencode/test/config/config.test.ts:2006-2030: delete the entiretest("migrates legacy multiedit tool to edit permission", ...)block — multiedit-specific migration test, no longer applies after removal.packages/opencode/test/permission/next.test.ts:390-401: removemultieditfrom input array, removeexpect(result.has("multiedit")).toBe(true)assertion, rename test name from"disabled - disables edit/write/apply_patch/multiedit when edit denied"to"disabled - disables edit/write/apply_patch when edit denied".PR2 — Codesearch removal
packages/opencode/src/tool/codesearch.txt.packages/opencode/src/tool/codesearch.ts.registry.ts:319-321, edit the OR conditiontool.id === CodeSearchTool.id || tool.id === WebSearchTool.idto remove only theCodeSearchTool.idarm. Post-edit shape:if (tool.id === WebSearchTool.id) { return input.providerID === ProviderID.opencode || Flag.OPENCODE_ENABLE_EXA }. Websearch'sproviderID === ProviderID.opencode || Flag.OPENCODE_ENABLE_EXAgating is preserved.CodeSearchToolimport.packages/opencode/src/agent/agent.ts:185: remove thecodesearch: "allow"permission entry from the default permission ruleset (otherwise the entry references a deleted tool).Flag.OPENCODE_ENABLE_EXAonly after grepping every reference:packages/**,tests/**,docs/**, env-handling code, README/CHANGELOG, and any.env*template. If any non-codesearch reference exists, keep the flag.Out of scope
apply_patchremoval: not in scope. Real OpenAI product optimization for GPT-5+ string-replace hallucination — patch format with anchor lines is significantly more accurate. Removal would require dispatcher rewiring (registry.ts:323-327togglesapply_patch↔edit/writemutually exclusive whenEnv.get("OPENCODE_E2E_LLM_URL")is set ORmodelID.includes("gpt-")and notoss/gpt-4; seeregistry.ts:323-327for the full condition), conflicting with the "stay close to upstream until 10K users" foundation commitment. Description cleanup ofapply_patch.txt(the*** Delete Fileprose guardrail) IS in scope as part of PR1.bash.txt~64 lines git/PR workflow (after L86/L113 removal): still deferred. Not teaching wrong behavior; pure context-budget compression that needs validation infrastructure.todowrite.txt8-example demonstration block: still deferred for the same reason.Preconditions
PR1 multiedit-removal preconditions (run before opening PR):
rg -nw 'multiedit' packages/ tests/ docs/ --glob '!docs/superpowers/**' ; rg -n '"multiedit"' packages/ tests/ docs/ --glob '!docs/superpowers/**' ; rg -n 'MultiEditTool' packages/ tests/ docs/ --glob '!docs/superpowers/**'(three runs — word-boundary, quoted-form, AND CamelCase class name;;not&&because rg exits non-zero on no matches and&&would skip subsequent runs when one finds nothing;--glob '!docs/superpowers/**'excludes the local plan file which itself containsmultieditreferences) — list every reference; the 9 known paths (tool/multiedit.ts,tool/multiedit.txt,permission/index.ts:297,config/agent.ts:111,config/config.ts:733,specs/effect-migration.md:269,test/agent/agent.test.ts:493,test/config/config.test.ts:2006-2030,test/permission/next.test.ts:390-401) must all be removed/updated in this PR.PR2 codesearch-removal preconditions (run before opening PR):
rg -n 'codesearch|CodeSearchTool' packages/ tests/ docs/— list every reference; all must be removable.rg -n 'OPENCODE_ENABLE_EXA' packages/ tests/ docs/ '**/.env*'— list every reference; if any non-codesearch reference exists, keep the flag.WebSearchTool.idretains its gating behavior post-edit (manual diff inspection of the OR condition).Owner-local checklist (NOT in PR diffs; AGENTS.md is git-excluded)
These items are owner-only attestation, recorded here for completeness, and are NOT verifiable in PR diffs because AGENTS.md is excluded from git via
.git/info/exclude:Engineering conventionssection opens with the new principle:Improvements over upstream are allowed when there's a clear product reason. Strategic carveouts (UI rewrite, i18n zh+en only, tool descriptions) are permanent. One-off improvements (trash replacing rm, removing tools that don't fit PawWork) sit between fork and vendor — log in memory and review on each upstream sync, not as long-term divergence.tool/*.txtpermanent carveout kept; addTool registry: codesearch removed(for PR2) andTool registry: multiedit removed(for PR1) entries when those PRs land.project_tool_set_v1.mdrecords the decisions (multiedit deleted / codesearch deleted / apply_patch unchanged / LSP via [Bug] TypeScript LSP can consume excessive CPU and memory on large workspaces #232) plus rationale.rmdespite PR1, escalate to bash-dispatcher-level rm interception (separate issue, separate scope).Acceptance criteria
PR1 (verifiable in PR diff)
task.txt: fictional agents block,Example usageframing, Usage 1 (Launch multiple agents concurrently), Usage 4 (outputs should generally be trusted) all removed. Remaining Usage notes renumbered with no orphan numbering.task.txtL12 danglingabovereference rewritten as specified.task.txtUsage note containingagent description(currently L21) rewritten by content match to point at "the available agents" (schema-independent phrasing matching the PR1 scope description).task.txtL3 carries no new schema anchor.glob.txtL6: only second sentence removed; first sentence retained.skill.txtL1 AND L5: both wrong pointers rewritten tolisted below.bash.txtL16:rmremoved from quoting example; replaced withmkdirorpython(NOTcat).bash.txtL31 bullet list: newFile deletion: Use the trash tool, NOT \rm`/`del`` bullet added.bash.txtL86 (NEVER use the TodoWrite or Task tools) AND L113 (DO NOT use the TodoWrite or Task tools) deleted; rest of git/PR workflow block untouched.trash.txtexpanded; diff contains phrases mentioning (a) reversibility, (b)rm/delcontrast, (c) when to use.todowrite.txtL1 (demonstrate thoroughness) AND L166 (When in doubt, use this tool ... demonstrates attentiveness) edits applied.apply_patch.txtadds the*** Delete File:prose guardrail line near the top.edit.txtappends the line-deletion hint as the last bullet, mentioning the trailing newline requirement.lsp.txt(clean),apply_patch.txt(not clean → guardrail added), andedit.txt(teaching gap → hint added).packages/opencode/src/tool/multiedit.tsdeleted.packages/opencode/src/tool/multiedit.txtdeleted.packages/opencode/src/permission/index.ts:297:"multiedit"removed fromEDIT_TOOLSarray.packages/opencode/src/config/agent.ts:111:|| tool === "multiedit"removed.packages/opencode/src/config/config.ts:733:|| tool === "multiedit"removed.packages/opencode/specs/effect-migration.md:269multiedit list entry removed.packages/opencode/test/agent/agent.test.ts:493test renamed to"legacy tools config maps write/edit/patch to edit permission"(multiedit removed from name only; test body unchanged).packages/opencode/test/config/config.test.tstest("migrates legacy multiedit tool to edit permission", ...)block deleted entirely.packages/opencode/test/permission/next.test.tstest renamed (multiedit removed from name),multieditremoved from input array,expect(result.has("multiedit")).toBe(true)assertion removed.rg -nw 'multiedit',rg -n '"multiedit"',rg -n 'MultiEditTool') all return 0 hits acrosspackages/ tests/ docs/ --glob '!docs/superpowers/**'.bun --cwd packages/opencode testPASSES (the three modified test files all stay green).PR2 (verifiable in PR diff)
packages/opencode/src/tool/codesearch.tsdeleted.packages/opencode/src/tool/codesearch.txtdeleted.registry.ts:319-321edited to remove only theCodeSearchTool.idarm; post-edit shape isif (tool.id === WebSearchTool.id) { return ... }(single condition, not OR).CodeSearchToolimport removed.packages/opencode/src/agent/agent.ts:185codesearch: "allow"permission entry removed.Flag.OPENCODE_ENABLE_EXAremoved only if no non-codesearch reference exists; otherwise flag preserved.Validation (lightweight, no fixture)
For each PR, one manual smoke run per model in a real PawWork session.
ProviderID.opencode(PR2 codesearch gate change must not regress the opencode-provider websearch path).taskdispatch; a deletion (verify trash route, not rm, and verify the model does NOT use apply_patch's*** Delete File:syntax for ad-hoc deletion either); a multi-edit-style change confirms repeatededitcalls work; no model attempts to callmultiedit; an edit-tool single-line deletion (verify the diff has no stray blank line at the deleted position — validates the new edit.txt hint changes weaker-model behavior).ProviderID.opencode(websearch still gated correctly).subagent_type = "code-reviewer"or attempting to callmultiedit/codesearchis a regression to fix before merge.CI keyword gate: removed
The original CI keyword gate proposal is dropped. Heavy regex-grep on text files is the wrong shape for this kind of guardrail. Anti-regression is enforced by PR diff review plus the manual smoke list.
Risks
code-reviewerfictional agent,speculatively batch,demonstrate thoroughness), or contradicting project rules (NEVER TodoWrite during git). Models losing these hints should not regress.editmultiple times is functionally equivalent (slightly more verbose); (b) failure path — multiedit was never atomic in code despite the description claim, so removing it does not lose any safety property that actually existed at runtime. Note: GPT-5+ models onusePatch=true(registry.ts:323-327) already do not seeedit/write(gated off when patch mode is active), so multiedit removal does not change their effective edit surface — they continue to useapply_patchas the single edit tool. Non-GPT models retainedit/writeand lose only the multiedit sugar.registry.ts:319-321. Mitigation: PR2 acceptance criterion explicitly requiresWebSearchTool.idarm remain intact, verifiable in diff. PR2 smoke includes a websearch run onProviderID.opencodeto catch any regression.*** Delete File:is the real bypass path; bashrmis already hard-blocked) — bashrm/rmdir/unlink/etc. are denied at the permission layer (see "What you do today" note onagent.ts:99-110), so they cannot bypass trash. The actual gap isapply_patch:*** Delete File:syntax goes through theapply_patchpermission, not thebashpermission, so therm *: denyrule does not apply. Note: PawWork's permission grammar is per-tool (e.g.bash: { "rm *": "deny" }), not per-payload-pattern; adding something likeapply_patch: { "*** Delete File:*": "deny" }is NOT a simple permission-rule change — it would require either new grammar support or dispatcher-level interception of patch envelope contents. The PR1 prose guardrail in apply_patch.txt is best-effort instruction to the model, not runtime enforcement. Accepted trade-off because dispatcher-level rejection of*** Delete File:carries the same upstream-foundation cost as removing apply_patch outright. Escalate only if real sessions show models routinely using*** Delete File:to delete user files.Precedent
trashtool replacingrmis the prior one-off product improvement over upstream cited in the new "Improvements over upstream" principle.