[Feature] Unified file attach that routes to vision, workspace, or reject based on model capability

### What task are you trying to do?

A non-technical user wants to hand a `.docx` (or `.xlsx`, `.pptx`, PDF, image, plain text) to the PawWork agent so the agent can read it, summarize it, or edit it. The user's natural motion is "click the attach button and pick the file" — the same mental model they have from ChatGPT, Codex App, email, and nearly every chat interface they use daily. They do not distinguish between "uploading" a file and "referencing" a file from a project folder; to them there is one action: give this file to the assistant.

### What do you do today?

On 2026-04-21 a real non-technical user tried exactly this in PawWork. They clicked the attach button and picked a `.docx`. The file was rejected with a toast that says "Only images, PDFs, or text files can be attached here." The user read this as "PawWork does not support Word documents" and gave up. We had to step in and verbally explain an undocumented workaround: put the file inside the project directory, then type `@filename` to reference it, at which point the agent can handle it through the `document-processing` skill (backed by the bundled `officecli` at `packages/desktop-electron/resources/tools/officecli`). The product has the capability; the UI hides it behind a second mechanism the user never learned.

A second latent gap is visible in the same code path. For non-multimodal models — text-only models in the catalog, identifiable via `modalities.input === ["text"]` in the model metadata — the attach handler does not check the active model's capability (see `packages/app/src/components/prompt-input/attachments.ts:48-71`). Images and PDFs sail through as base64 `ImageAttachmentPart`s, get sent to a model that cannot read them (`packages/app/src/components/prompt-input/build-request-parts.ts:185`), and the model either errors or silently ignores the attachment. The user sees a non-sequitur reply and has no idea the image never reached the model. The model tooltip already surfaces `capabilities.input` per model (`packages/app/src/components/model-tooltip.tsx:17`), so the data needed to gate this at the UI already exists; it is simply not consulted at attach time.

### What would a good result look like?

One attach action that just works, the way Codex App works. The user drops a file or clicks attach; PawWork routes the file internally based on its type and the active model's input modalities. The user does not have to learn that "upload" and "@" are two different mechanisms. The `@` affordance stays for power users, for referencing files already in the workspace, and for path completion; the attach button becomes the friendly high-level entry point that reuses the same underlying mechanisms.

Proposed routing matrix:

| File type | Active model supports image input | Behavior |
| --- | --- | --- |
| Image (png/jpg/webp) | Yes | Embed as vision attachment (current behavior) |
| Image | No | **Hard reject with a clear, actionable message** naming a concrete alternative: "This model cannot read images. Switch to a multimodal model (for example Claude Sonnet 4.6 or GPT-4o) to attach images." Include a one-click shortcut to the model picker. |
| PDF | Yes | Embed as vision attachment (current behavior) |
| PDF | No | Save the file to the workspace, auto-insert an `@` mention into the prompt, let the agent extract text via the `document-processing` skill |
| `.docx` / `.xlsx` / `.pptx` | Any | Save to the workspace, auto-insert `@`, the skill plus `officecli` handles it |
| Text (md/txt/csv/json) | Any | Save to the workspace, auto-insert `@` |
| Anything else | Any | Save to the workspace, auto-insert `@`, let the agent decide whether it can read it |

### Which audience does this matter to most?

Both.

### Extra context

**Why the current "upload" button mislabels itself**: PawWork's underlying agent is a CLI-native fork of opencode, and CLI agents (Claude Code, Codex CLI) intentionally do not have an "upload" button at all — they only have `@` path references and image paste. PawWork's Electron shell inherited an "attach file" button from the ChatGPT-style chat UX, but kept the CLI-native semantics underneath, so the button does something much narrower than its label suggests. The mismatch is what trips non-technical users. We keep the button but make it do what users already expect it to do.

**Reference products**:
- Claude Code and Codex CLI have no upload button; `@` plus image paste is the only input paradigm. Works well for engineers, invisible to non-technical users.
- Codex App and ChatGPT have one attach entry with a backend extraction pipeline: images go through vision; `.docx` / `.xlsx` / `.pptx` go through document extraction or Advanced Data Analysis. This is the model PawWork should match, because PawWork targets the same audience (non-technical knowledge workers), not CLI engineers.

**Current code pointers**:
- Attach button: `packages/app/src/components/prompt-input.tsx:1439`
- Attach handler and MIME gate: `packages/app/src/components/prompt-input/attachments.ts:48-85`
- MIME whitelist: `packages/app/src/components/prompt-input/files.ts:5-66`
- `@` mention picker: `packages/app/src/components/prompt-input.tsx:589-618`
- Embed-vs-reference fork: `packages/app/src/components/prompt-input/build-request-parts.ts:100` (file reference) vs `:185` (image embed)
- Model capability source already wired: `packages/app/src/components/model-tooltip.tsx:17` (`capabilities.input` with `text | image | audio | video | pdf`)
- Bundled document tool: `packages/desktop-electron/resources/tools/officecli` (downloaded by `packages/desktop-electron/scripts/download-tools.ts:33`)
- Document skill routing: `skills/document-processing/SKILL.md` (routes docx/xlsx/pptx to officecli)

**Out of scope for this issue**:
- Adding `.docx` / `.xlsx` / `.pptx` parsing capability. Already works via bundled `officecli` + the `document-processing` skill — this is a UX discoverability issue, not a capability gap.
- Removing the `@` mechanism. It stays as the explicit low-level interface.
- Redesigning the file picker UI beyond the routing change described above.

**Implementation considerations left open for the implementing PR**:
1. Where to place files in the workspace — `cwd` root versus a dedicated folder such as `.pawwork-attachments/`. The latter is cleaner but introduces a path the agent has to know about.
2. Filename collision handling when `report.docx` is uploaded twice — overwrite, suffix with timestamp, or prompt the user.
3. Timing of the auto-`@` insertion — at drop time so the user sees it immediately, or only at send time.
4. Confirmation that `capabilities.input` is populated consistently across every provider and every model we ship, otherwise the non-multimodal gate will misfire.
5. UI language for the file chip that unifies "will be sent to the model as vision" and "added to your workspace and referenced" in a single visual pattern without forcing the user to understand the distinction.
6. Exact copy and UX of the non-multimodal-plus-image reject, including whether the model switcher shortcut is a button or a link.

**Follow-up ideas for separate issues**:
- Inline preview of attached `.docx` / `.xlsx` / `.pptx` content in the prompt chip.
- Proactive suggestion to switch model when the user drops a file the active model cannot consume, instead of hard-rejecting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Unified file attach that routes to vision, workspace, or reject based on model capability #100

What task are you trying to do?

What do you do today?

What would a good result look like?

Which audience does this matter to most?

Extra context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

File type	Active model supports image input	Behavior
Image (png/jpg/webp)	Yes	Embed as vision attachment (current behavior)
Image	No	Hard reject with a clear, actionable message naming a concrete alternative: "This model cannot read images. Switch to a multimodal model (for example Claude Sonnet 4.6 or GPT-4o) to attach images." Include a one-click shortcut to the model picker.
PDF	Yes	Embed as vision attachment (current behavior)
PDF	No	Save the file to the workspace, auto-insert an `@` mention into the prompt, let the agent extract text via the `document-processing` skill
`.docx` / `.xlsx` / `.pptx`	Any	Save to the workspace, auto-insert `@`, the skill plus `officecli` handles it
Text (md/txt/csv/json)	Any	Save to the workspace, auto-insert `@`
Anything else	Any	Save to the workspace, auto-insert `@`, let the agent decide whether it can read it

[Feature] Unified file attach that routes to vision, workspace, or reject based on model capability #100

Description

What task are you trying to do?

What do you do today?

What would a good result look like?

Which audience does this matter to most?

Extra context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions