fix(media): treat legacy .doc containers as binary#54380
Conversation
Greptile SummaryThis PR correctly classifies
Confidence Score: 5/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/media-understanding/apply.test.ts
Line: 1207-1223
Comment:
**Test doesn't exercise the new `isBinaryMediaMime` code path**
The test uses `applyWithDisabledMedia`, which internally calls `createMediaDisabledConfig()`. That config leaves `allowedMimesConfigured` as `false`, so `extractFileBlocks` populates `allowedMimes` from `EXTRA_TEXT_MIMES` only — and neither `application/msword` nor `application/x-cfb` is in that set. The file would therefore be skipped at the later `!allowedMimes.has(mimeType)` check (line ~400) even **without** the new entries in `isBinaryMediaMime`.
To have the test actually target the fix, use a config that explicitly allows those MIME types:
```ts
it("skips legacy Office binary formats carried as msword or x-cfb", async () => {
const oleMagic = Buffer.from("d0cf11e0a1b11ae1", "hex");
for (const mime of ["application/msword", "application/x-cfb"]) {
const filePath = await createTempMediaFile({
fileName: mime === "application/msword" ? "legacy.doc" : "legacy.cfb",
content: oleMagic,
});
// Use a config that permits the MIME type so that the only thing
// preventing embedding is the isBinaryMediaMime check.
const cfg = createMediaDisabledConfigWithAllowedMimes([mime]);
const { ctx, result } = await applyWithDisabledMedia({
body: "<media:file>",
mediaPath: filePath,
mediaType: mime,
cfg,
});
expectFileNotApplied({ ctx, result, body: "<media:file>" });
}
});
```
As written, reverting the `isBinaryMediaMime` change would leave the test green, so it doesn't guard against regressions in the new binary-classification logic.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(media): treat legacy .doc containers..." | Re-trigger Greptile |
| it("skips legacy Office binary formats carried as msword or x-cfb", async () => { | ||
| const oleMagic = Buffer.from("d0cf11e0a1b11ae1", "hex"); | ||
| for (const mime of ["application/msword", "application/x-cfb"]) { | ||
| const filePath = await createTempMediaFile({ | ||
| fileName: mime === "application/msword" ? "legacy.doc" : "legacy.cfb", | ||
| content: oleMagic, | ||
| }); | ||
|
|
||
| const { ctx, result } = await applyWithDisabledMedia({ | ||
| body: "<media:file>", | ||
| mediaPath: filePath, | ||
| mediaType: mime, | ||
| }); | ||
|
|
||
| expectFileNotApplied({ ctx, result, body: "<media:file>" }); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Test doesn't exercise the new
isBinaryMediaMime code path
The test uses applyWithDisabledMedia, which internally calls createMediaDisabledConfig(). That config leaves allowedMimesConfigured as false, so extractFileBlocks populates allowedMimes from EXTRA_TEXT_MIMES only — and neither application/msword nor application/x-cfb is in that set. The file would therefore be skipped at the later !allowedMimes.has(mimeType) check (line ~400) even without the new entries in isBinaryMediaMime.
To have the test actually target the fix, use a config that explicitly allows those MIME types:
it("skips legacy Office binary formats carried as msword or x-cfb", async () => {
const oleMagic = Buffer.from("d0cf11e0a1b11ae1", "hex");
for (const mime of ["application/msword", "application/x-cfb"]) {
const filePath = await createTempMediaFile({
fileName: mime === "application/msword" ? "legacy.doc" : "legacy.cfb",
content: oleMagic,
});
// Use a config that permits the MIME type so that the only thing
// preventing embedding is the isBinaryMediaMime check.
const cfg = createMediaDisabledConfigWithAllowedMimes([mime]);
const { ctx, result } = await applyWithDisabledMedia({
body: "<media:file>",
mediaPath: filePath,
mediaType: mime,
cfg,
});
expectFileNotApplied({ ctx, result, body: "<media:file>" });
}
});As written, reverting the isBinaryMediaMime change would leave the test green, so it doesn't guard against regressions in the new binary-classification logic.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/apply.test.ts
Line: 1207-1223
Comment:
**Test doesn't exercise the new `isBinaryMediaMime` code path**
The test uses `applyWithDisabledMedia`, which internally calls `createMediaDisabledConfig()`. That config leaves `allowedMimesConfigured` as `false`, so `extractFileBlocks` populates `allowedMimes` from `EXTRA_TEXT_MIMES` only — and neither `application/msword` nor `application/x-cfb` is in that set. The file would therefore be skipped at the later `!allowedMimes.has(mimeType)` check (line ~400) even **without** the new entries in `isBinaryMediaMime`.
To have the test actually target the fix, use a config that explicitly allows those MIME types:
```ts
it("skips legacy Office binary formats carried as msword or x-cfb", async () => {
const oleMagic = Buffer.from("d0cf11e0a1b11ae1", "hex");
for (const mime of ["application/msword", "application/x-cfb"]) {
const filePath = await createTempMediaFile({
fileName: mime === "application/msword" ? "legacy.doc" : "legacy.cfb",
content: oleMagic,
});
// Use a config that permits the MIME type so that the only thing
// preventing embedding is the isBinaryMediaMime check.
const cfg = createMediaDisabledConfigWithAllowedMimes([mime]);
const { ctx, result } = await applyWithDisabledMedia({
body: "<media:file>",
mediaPath: filePath,
mediaType: mime,
cfg,
});
expectFileNotApplied({ ctx, result, body: "<media:file>" });
}
});
```
As written, reverting the `isBinaryMediaMime` change would leave the test green, so it doesn't guard against regressions in the new binary-classification logic.
How can I resolve this? If you propose a fix, please make it concise.|
Fresh branch from current main, narrow diff (2 files), and all checks are green. The remaining Greptile note is non-blocking test-strength feedback; production change is unchanged and safe to merge if this still looks good. |
Summary
Fixes #54176
Supersedes #54190 and #54234 with a fresh branch from current main.