fix(reply): parse markdown image replies as media#66471
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2dfcd77e71
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Greptile SummaryThis PR extends One gap: Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/media/parse.ts
Line: 133-180
Comment:
**Inline caption text silently dropped from `text`**
When a markdown image appears inline with leading caption text on the same line (e.g. `"See this: "`), the text before the image is pushed to `lineSegments` but `pieces` is then cleared, so `cleanedLine` only captures text *after* the last image. The caller only adds `cleanedLine` to `keptLines`, so any preceding caption is lost from `cleanedText`/`text` — it exists only in `segments`, which the reply path (`parseReplyDirectives`) doesn't consume.
For `"Caption "` the result is `{ text: "", mediaUrls: ["https://…"] }` — the caption disappears silently. The new tests cover images on their own lines (the common LLM pattern) but leave the inline case untested.
A straightforward fix is to collect all non-empty text segments into `keptLines` as well:
```ts
for (const segment of markdownImageResult.lineSegments) {
if (segment.type === "text") {
keptLines.push(segment.text); // keep inline caption in cleanedText
pushTextSegment(segment.text);
continue;
}
segments.push(segment);
}
```
and skip the separate `if (markdownImageResult.cleanedLine)` push (it's already handled above).
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(reply): parse markdown image replies..." | Re-trigger Greptile |
|
Good fix. Extending |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: acab679138
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75a56f7915
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| segmentPieces.push(before); | ||
| visiblePieces.push(before); | ||
|
|
||
| const target = cleanCandidate(unwrapQuoted(match.destination) ?? match.destination); |
There was a problem hiding this comment.
Preserve trailing ')' in markdown image destinations
The markdown-image branch still normalizes parsed destinations with cleanCandidate(...), which strips trailing punctuation including ) and ]. For valid links that end in a balanced parenthesis (for example )), the extracted mediaUrl is truncated to https://en.wikipedia.org/wiki/Test_(film, so downstream media delivery uses a broken URL even though parseMarkdownImageDestination already parsed the destination correctly.
Useful? React with 👍 / 👎.
| if (!trimmedStart.startsWith("MEDIA:")) { | ||
| keptLines.push(line); | ||
| pushTextSegment(line); | ||
| const markdownImageResult = collectMarkdownImageSegments({ line, media }); |
There was a problem hiding this comment.
Ignore markdown image syntax inside inline code spans
Non-MEDIA: lines are now always passed through markdown-image extraction, but this path only excludes fenced code blocks and does not respect inline backtick spans. As a result, literal examples like Use `` syntax are converted into outbound media, which removes intended literal text and can send unintended images whenever replies discuss markdown syntax.
Useful? React with 👍 / 👎.
75a56f7 to
86bdf95
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86bdf95641
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| matches.length < MAX_MARKDOWN_IMAGE_MATCHES_PER_LINE && | ||
| attempts < MAX_MARKDOWN_IMAGE_ATTEMPTS_PER_LINE | ||
| ) { | ||
| const index = line.indexOf(" are parsed as outbound media instead of literal text. In Markdown, a backslash-escaped image marker should render verbatim; with this logic we can unintentionally send media when a reply is trying to show syntax examples, and the visible text is also corrupted by leaving a stray backslash.
Useful? React with 👍 / 👎.
ccc1413 to
69c654a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 69c654aa61
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return { text: "" }; | ||
| } | ||
| const mayContainMediaToken = /media:/i.test(trimmedRaw); | ||
| const mayContainMarkdownImage = /!\[[^\]]*]\(/.test(trimmedRaw); |
There was a problem hiding this comment.
Broaden markdown-image precheck for nested alt brackets
The fast-path guard mayContainMarkdownImage only matches !\[[^\]]*]\(, so valid Markdown like ![a [b]](https://example.com/x.png) is treated as having no image and exits early with plain text. findMarkdownImageMatches can parse nested [] correctly, but this guard prevents it from running whenever there is no MEDIA: token or audio tag on the message. That leaves standards-compliant image replies unlifted to mediaUrl/mediaUrls in the common markdown-only case.
Useful? React with 👍 / 👎.
🔒 Aisle Security AnalysisWe found 2 potential security issue(s) in this PR:
1. 🟠 Potential SSRF via Markdown image URL extraction in splitMediaFromOutput
Description
This introduces a new path for untrusted text to cause downstream components to treat arbitrary attacker-controlled URLs as media:
If downstream logic fetches, proxies, or otherwise dereferences Vulnerable code: const target = normalizeMediaSource(
cleanCandidate(unwrapQuoted(match.destination) ?? match.destination),
);
if (isRemoteMarkdownImageMedia(target)) {
params.media.push(target);
lineSegments.push({ type: "media", url: target });
}
function isRemoteMarkdownImageMedia(candidate: string): boolean {
return /^https?:\/\//i.test(candidate) && isValidMedia(candidate);
}
function isValidMedia(candidate: string) {
...
if (/^https?:\/\//i.test(candidate)) {
return true;
}
...
}RecommendationTreat remote media URLs as untrusted and apply SSRF mitigations before accepting them as media. Suggested hardening:
Example: import dns from "node:dns/promises";
import net from "node:net";
function isPublicIp(ip: string): boolean {
// block loopback, link-local, RFC1918, etc. (implement fully)
return !(
ip.startsWith("127.") ||
ip.startsWith("10.") ||
ip.startsWith("192.168.") ||
ip.startsWith("169.254.")
);
}
async function isSafeRemoteMedia(urlStr: string): Promise<boolean> {
let url: URL;
try { url = new URL(urlStr); } catch { return false; }
if (url.protocol !== "http:" && url.protocol !== "https:") return false;
if (url.username || url.password) return false;
// Resolve and verify IPs (beware DNS rebinding; do checks at connect-time too)
const addrs = await dns.lookup(url.hostname, { all: true });
if (addrs.some(a => net.isIP(a.address) && !isPublicIp(a.address))) return false;
return true;
}Apply 2. 🟡 Algorithmic complexity DoS in markdown image parsing (splitMediaFromOutput)
Description
The per-line parser can be forced into high CPU usage with adversarially crafted text:
This is a classic resource-exhaustion vector (algorithmic complexity) and may be reachable if Vulnerable code (hot loop + full-line scans): while (matches.length < MAX_MARKDOWN_IMAGE_MATCHES_PER_LINE && attempts < MAX_MARKDOWN_IMAGE_ATTEMPTS_PER_LINE) {
const index = line.indexOf("![", searchIndex);
attempts += 1;
const altEnd = findMatchingBracket(line, index + 2, "[", "]");
...
searchIndex = index + 2;
}RecommendationMitigate worst-case CPU and memory usage by adding global and more linear-time limits:
Example defensive cap: const MAX_OUTPUT_CHARS = 200_000;
const MAX_LINES = 2_000;
const safeRaw = trimmedRaw.slice(0, MAX_OUTPUT_CHARS);
const lines = safeRaw.split("\n").slice(0, MAX_LINES);Analyzed PR: #66471 at commit Last updated on: 2026-04-24T02:42:18Z |
69c654a to
8cbea62
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8cbea62427
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -185,8 +445,23 @@ export function splitMediaFromOutput(raw: string): { | |||
|
|
|||
| const trimmedStart = line.trimStart(); | |||
| if (!trimmedStart.toUpperCase().startsWith("MEDIA:")) { | |||
| keptLines.push(line); | |||
| pushTextSegment(line); | |||
| const markdownImageResult = collectMarkdownImageSegments({ line, media }); | |||
There was a problem hiding this comment.
Skip markdown image parsing inside indented code blocks
splitMediaFromOutput only suppresses extraction for fenced code spans, then unconditionally runs collectMarkdownImageSegments on every other non-MEDIA: line. That means valid Markdown indented code blocks (e.g. lines starting with four spaces containing  as a literal example) are treated as outbound media and the literal text is removed, which can send unintended images when replies include markdown syntax examples.
Useful? React with 👍 / 👎.
8cbea62 to
9674af7
Compare
9674af7 to
2b7bf89
Compare
Summary
MEDIA:tokenstargets intomediaUrl/mediaUrlsFixes #66191
Root cause
OpenClaw's final reply normalization path only extracted outbound media from explicit
MEDIA:tokens. Replies that used markdown image syntax likestayed text-only, so channels expecting a normalized media payload fell back to sending the raw URL as plain text instead of a media message.That showed up most obviously in Telegram group chats because those turns lean on the normalized final reply path more heavily once tool-summary text is suppressed.
Design
src/media/parse.tsto recognize markdown image targets and lift them into ordered media segmentssrc/auto-reply/reply/agent-runner-payloads.tsso no Telegram-specific special case is neededValidation
pnpm test:serial src/media/parse.test.ts src/auto-reply/reply/agent-runner-payloads.test.tspnpm build