What task are you trying to do?
A non-technical user wants to hand a .docx (or .xlsx, .pptx, PDF, image, plain text) to the PawWork agent so the agent can read it, summarize it, or edit it. The user's natural motion is "click the attach button and pick the file" — the same mental model they have from ChatGPT, Codex App, email, and nearly every chat interface they use daily. They do not distinguish between "uploading" a file and "referencing" a file from a project folder; to them there is one action: give this file to the assistant.
What do you do today?
On 2026-04-21 a real non-technical user tried exactly this in PawWork. They clicked the attach button and picked a .docx. The file was rejected with a toast that says "Only images, PDFs, or text files can be attached here." The user read this as "PawWork does not support Word documents" and gave up. We had to step in and verbally explain an undocumented workaround: put the file inside the project directory, then type @filename to reference it, at which point the agent can handle it through the document-processing skill (backed by the bundled officecli at packages/desktop-electron/resources/tools/officecli). The product has the capability; the UI hides it behind a second mechanism the user never learned.
A second latent gap is visible in the same code path. For non-multimodal models — text-only models in the catalog, identifiable via modalities.input === ["text"] in the model metadata — the attach handler does not check the active model's capability (see packages/app/src/components/prompt-input/attachments.ts:48-71). Images and PDFs sail through as base64 ImageAttachmentParts, get sent to a model that cannot read them (packages/app/src/components/prompt-input/build-request-parts.ts:185), and the model either errors or silently ignores the attachment. The user sees a non-sequitur reply and has no idea the image never reached the model. The model tooltip already surfaces capabilities.input per model (packages/app/src/components/model-tooltip.tsx:17), so the data needed to gate this at the UI already exists; it is simply not consulted at attach time.
What would a good result look like?
One attach action that just works, the way Codex App works. The user drops a file or clicks attach; PawWork routes the file internally based on its type and the active model's input modalities. The user does not have to learn that "upload" and "@" are two different mechanisms. The @ affordance stays for power users, for referencing files already in the workspace, and for path completion; the attach button becomes the friendly high-level entry point that reuses the same underlying mechanisms.
Proposed routing matrix:
| File type |
Active model supports image input |
Behavior |
| Image (png/jpg/webp) |
Yes |
Embed as vision attachment (current behavior) |
| Image |
No |
Hard reject with a clear, actionable message naming a concrete alternative: "This model cannot read images. Switch to a multimodal model (for example Claude Sonnet 4.6 or GPT-4o) to attach images." Include a one-click shortcut to the model picker. |
| PDF |
Yes |
Embed as vision attachment (current behavior) |
| PDF |
No |
Save the file to the workspace, auto-insert an @ mention into the prompt, let the agent extract text via the document-processing skill |
.docx / .xlsx / .pptx |
Any |
Save to the workspace, auto-insert @, the skill plus officecli handles it |
| Text (md/txt/csv/json) |
Any |
Save to the workspace, auto-insert @ |
| Anything else |
Any |
Save to the workspace, auto-insert @, let the agent decide whether it can read it |
Which audience does this matter to most?
Both.
Extra context
Why the current "upload" button mislabels itself: PawWork's underlying agent is a CLI-native fork of opencode, and CLI agents (Claude Code, Codex CLI) intentionally do not have an "upload" button at all — they only have @ path references and image paste. PawWork's Electron shell inherited an "attach file" button from the ChatGPT-style chat UX, but kept the CLI-native semantics underneath, so the button does something much narrower than its label suggests. The mismatch is what trips non-technical users. We keep the button but make it do what users already expect it to do.
Reference products:
- Claude Code and Codex CLI have no upload button;
@ plus image paste is the only input paradigm. Works well for engineers, invisible to non-technical users.
- Codex App and ChatGPT have one attach entry with a backend extraction pipeline: images go through vision;
.docx / .xlsx / .pptx go through document extraction or Advanced Data Analysis. This is the model PawWork should match, because PawWork targets the same audience (non-technical knowledge workers), not CLI engineers.
Current code pointers:
- Attach button:
packages/app/src/components/prompt-input.tsx:1439
- Attach handler and MIME gate:
packages/app/src/components/prompt-input/attachments.ts:48-85
- MIME whitelist:
packages/app/src/components/prompt-input/files.ts:5-66
@ mention picker: packages/app/src/components/prompt-input.tsx:589-618
- Embed-vs-reference fork:
packages/app/src/components/prompt-input/build-request-parts.ts:100 (file reference) vs :185 (image embed)
- Model capability source already wired:
packages/app/src/components/model-tooltip.tsx:17 (capabilities.input with text | image | audio | video | pdf)
- Bundled document tool:
packages/desktop-electron/resources/tools/officecli (downloaded by packages/desktop-electron/scripts/download-tools.ts:33)
- Document skill routing:
skills/document-processing/SKILL.md (routes docx/xlsx/pptx to officecli)
Out of scope for this issue:
- Adding
.docx / .xlsx / .pptx parsing capability. Already works via bundled officecli + the document-processing skill — this is a UX discoverability issue, not a capability gap.
- Removing the
@ mechanism. It stays as the explicit low-level interface.
- Redesigning the file picker UI beyond the routing change described above.
Implementation considerations left open for the implementing PR:
- Where to place files in the workspace —
cwd root versus a dedicated folder such as .pawwork-attachments/. The latter is cleaner but introduces a path the agent has to know about.
- Filename collision handling when
report.docx is uploaded twice — overwrite, suffix with timestamp, or prompt the user.
- Timing of the auto-
@ insertion — at drop time so the user sees it immediately, or only at send time.
- Confirmation that
capabilities.input is populated consistently across every provider and every model we ship, otherwise the non-multimodal gate will misfire.
- UI language for the file chip that unifies "will be sent to the model as vision" and "added to your workspace and referenced" in a single visual pattern without forcing the user to understand the distinction.
- Exact copy and UX of the non-multimodal-plus-image reject, including whether the model switcher shortcut is a button or a link.
Follow-up ideas for separate issues:
- Inline preview of attached
.docx / .xlsx / .pptx content in the prompt chip.
- Proactive suggestion to switch model when the user drops a file the active model cannot consume, instead of hard-rejecting.
What task are you trying to do?
A non-technical user wants to hand a
.docx(or.xlsx,.pptx, PDF, image, plain text) to the PawWork agent so the agent can read it, summarize it, or edit it. The user's natural motion is "click the attach button and pick the file" — the same mental model they have from ChatGPT, Codex App, email, and nearly every chat interface they use daily. They do not distinguish between "uploading" a file and "referencing" a file from a project folder; to them there is one action: give this file to the assistant.What do you do today?
On 2026-04-21 a real non-technical user tried exactly this in PawWork. They clicked the attach button and picked a
.docx. The file was rejected with a toast that says "Only images, PDFs, or text files can be attached here." The user read this as "PawWork does not support Word documents" and gave up. We had to step in and verbally explain an undocumented workaround: put the file inside the project directory, then type@filenameto reference it, at which point the agent can handle it through thedocument-processingskill (backed by the bundledofficecliatpackages/desktop-electron/resources/tools/officecli). The product has the capability; the UI hides it behind a second mechanism the user never learned.A second latent gap is visible in the same code path. For non-multimodal models — text-only models in the catalog, identifiable via
modalities.input === ["text"]in the model metadata — the attach handler does not check the active model's capability (seepackages/app/src/components/prompt-input/attachments.ts:48-71). Images and PDFs sail through as base64ImageAttachmentParts, get sent to a model that cannot read them (packages/app/src/components/prompt-input/build-request-parts.ts:185), and the model either errors or silently ignores the attachment. The user sees a non-sequitur reply and has no idea the image never reached the model. The model tooltip already surfacescapabilities.inputper model (packages/app/src/components/model-tooltip.tsx:17), so the data needed to gate this at the UI already exists; it is simply not consulted at attach time.What would a good result look like?
One attach action that just works, the way Codex App works. The user drops a file or clicks attach; PawWork routes the file internally based on its type and the active model's input modalities. The user does not have to learn that "upload" and "@" are two different mechanisms. The
@affordance stays for power users, for referencing files already in the workspace, and for path completion; the attach button becomes the friendly high-level entry point that reuses the same underlying mechanisms.Proposed routing matrix:
@mention into the prompt, let the agent extract text via thedocument-processingskill.docx/.xlsx/.pptx@, the skill plusofficeclihandles it@@, let the agent decide whether it can read itWhich audience does this matter to most?
Both.
Extra context
Why the current "upload" button mislabels itself: PawWork's underlying agent is a CLI-native fork of opencode, and CLI agents (Claude Code, Codex CLI) intentionally do not have an "upload" button at all — they only have
@path references and image paste. PawWork's Electron shell inherited an "attach file" button from the ChatGPT-style chat UX, but kept the CLI-native semantics underneath, so the button does something much narrower than its label suggests. The mismatch is what trips non-technical users. We keep the button but make it do what users already expect it to do.Reference products:
@plus image paste is the only input paradigm. Works well for engineers, invisible to non-technical users..docx/.xlsx/.pptxgo through document extraction or Advanced Data Analysis. This is the model PawWork should match, because PawWork targets the same audience (non-technical knowledge workers), not CLI engineers.Current code pointers:
packages/app/src/components/prompt-input.tsx:1439packages/app/src/components/prompt-input/attachments.ts:48-85packages/app/src/components/prompt-input/files.ts:5-66@mention picker:packages/app/src/components/prompt-input.tsx:589-618packages/app/src/components/prompt-input/build-request-parts.ts:100(file reference) vs:185(image embed)packages/app/src/components/model-tooltip.tsx:17(capabilities.inputwithtext | image | audio | video | pdf)packages/desktop-electron/resources/tools/officecli(downloaded bypackages/desktop-electron/scripts/download-tools.ts:33)skills/document-processing/SKILL.md(routes docx/xlsx/pptx to officecli)Out of scope for this issue:
.docx/.xlsx/.pptxparsing capability. Already works via bundledofficecli+ thedocument-processingskill — this is a UX discoverability issue, not a capability gap.@mechanism. It stays as the explicit low-level interface.Implementation considerations left open for the implementing PR:
cwdroot versus a dedicated folder such as.pawwork-attachments/. The latter is cleaner but introduces a path the agent has to know about.report.docxis uploaded twice — overwrite, suffix with timestamp, or prompt the user.@insertion — at drop time so the user sees it immediately, or only at send time.capabilities.inputis populated consistently across every provider and every model we ship, otherwise the non-multimodal gate will misfire.Follow-up ideas for separate issues:
.docx/.xlsx/.pptxcontent in the prompt chip.