fix: detect legacy Office binary formats as non-text attachments#54234
fix: detect legacy Office binary formats as non-text attachments#54234kagura-agent wants to merge 2 commits into
Conversation
Greptile SummaryThis PR fixes a bug where
Confidence Score: 5/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/media-understanding/apply.ts
Line: 306-312
Comment:
**No test added for the new MIME types**
The existing suite has a dedicated test at line 1173 verifying that `application/vnd.*` Office attachments are skipped as binary. It would be consistent to add a parallel test for `application/msword` (and optionally `application/x-cfb`) so regressions on these cases are caught automatically, especially given that this is the exact scenario that caused the bug in #54176.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: detect legacy Office binary formats..." | Re-trigger Greptile |
| // Legacy Microsoft Office binary formats (OLE/CFB container) | ||
| if ( | ||
| mime === "application/msword" || | ||
| mime === "application/x-cfb" | ||
| ) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
No test added for the new MIME types
The existing suite has a dedicated test at line 1173 verifying that application/vnd.* Office attachments are skipped as binary. It would be consistent to add a parallel test for application/msword (and optionally application/x-cfb) so regressions on these cases are caught automatically, especially given that this is the exact scenario that caused the bug in #54176.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/apply.ts
Line: 306-312
Comment:
**No test added for the new MIME types**
The existing suite has a dedicated test at line 1173 verifying that `application/vnd.*` Office attachments are skipped as binary. It would be consistent to add a parallel test for `application/msword` (and optionally `application/x-cfb`) so regressions on these cases are caught automatically, especially given that this is the exact scenario that caused the bug in #54176.
How can I resolve this? If you propose a fix, please make it concise.|
Friendly ping — any feedback on this PR? Happy to address concerns or adjust the approach. Will close in 7 days if no response. 🙏 |
|
Friendly ping — this has been open for a week. Let me know if any changes are needed! |
Add application/msword and application/x-cfb to isBinaryMediaMime() so that .doc files (Microsoft Word 97-2003 format) are not auto-embedded as text content in prompts. The application/x-cfb MIME type is what file-type libraries detect for OLE Compound Binary Format files (legacy Office), and application/msword is the standard MIME for .doc files. Neither was previously caught by the binary detection logic, causing 70KB+ of binary garbage to be embedded in conversation history and triggering LLM 'high risk' rejections. Fixes openclaw#54176
…ypes Address CI format check failure and Greptile review feedback: - Run oxfmt on apply.ts to fix formatting - Add test for application/msword and application/x-cfb detection
55cbf1d to
254444b
Compare
|
Rebased on latest main — no conflicts. Friendly ping: would love any feedback or review when you get a chance 🙏 |
|
Closing — this has been open for ~3 weeks without review. The fix is still valid; happy to reopen or resubmit if there's interest. Thanks! |
Summary
Fixes #54176 — .doc files (application/x-cfb, application/msword) are auto-embedded as text, causing 70KB of binary garbage in prompts and LLM 'high risk' rejections.
Root Cause
isBinaryMediaMime()insrc/media-understanding/apply.tscatchesapplication/vnd.*(which covers modern Office formats like .docx/.xlsx) and common archives, but misses two legacy Microsoft Office MIME types:application/msword— the standard MIME for .doc filesapplication/x-cfb— the MIME that file-type sniffers detect for OLE Compound Binary Format (legacy Office container)Fix
Add both MIME types to the binary detection check, grouped with a comment explaining they are legacy OLE/CFB formats.
Testing
application/vnd.*)