Skip to content

fix(reply): parse markdown image replies as media#66471

Merged
vincentkoc merged 3 commits intomainfrom
kitsune/telegram-group-media
Apr 24, 2026
Merged

fix(reply): parse markdown image replies as media#66471
vincentkoc merged 3 commits intomainfrom
kitsune/telegram-group-media

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • parse markdown image syntax in outbound reply text the same way we already parse MEDIA: tokens
  • keep surrounding caption text while lifting ![...](...) targets into mediaUrl / mediaUrls
  • add regression coverage for the shared parser and the final reply-payload builder path used before outbound delivery

Fixes #66191

Root cause

OpenClaw's final reply normalization path only extracted outbound media from explicit MEDIA: tokens. Replies that used markdown image syntax like ![chart](https://example.com/chart.png) stayed text-only, so channels expecting a normalized media payload fell back to sending the raw URL as plain text instead of a media message.

That showed up most obviously in Telegram group chats because those turns lean on the normalized final reply path more heavily once tool-summary text is suppressed.

Design

  • extend src/media/parse.ts to recognize markdown image targets and lift them into ordered media segments
  • reuse the existing final reply-payload build path in src/auto-reply/reply/agent-runner-payloads.ts so no Telegram-specific special case is needed
  • leave invalid markdown image targets as plain text

Validation

  • pnpm test:serial src/media/parse.test.ts src/auto-reply/reply/agent-runner-payloads.test.ts
  • pnpm build

@vincentkoc vincentkoc marked this pull request as ready for review April 14, 2026 09:50
@vincentkoc vincentkoc self-assigned this Apr 14, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: S maintainer Maintainer-authored PR labels Apr 14, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dfcd77e71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/media/parse.ts
Comment thread src/media/parse.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

This PR extends src/media/parse.ts to recognize markdown image syntax (![alt](url)) and lift matched URLs into the mediaUrls/mediaUrl fields, mirroring the existing MEDIA: token path. The fix targets the final reply normalization path (buildReplyPayloads with parseMode: "always"), which is the path Telegram group chats lean on most.

One gap: collectMarkdownImageSegments only adds cleanedLine (text after the last image) to keptLines. When an image appears inline with leading text on the same line — e.g. "Caption ![img](url)" — the preceding caption is added to lineSegments but not to keptLines, so it is silently dropped from the returned text field. The new tests cover the common LLM pattern (images on their own lines) and pass, but the inline-caption case is untested and would lose caption text.

Confidence Score: 4/5

  • Safe to merge for the documented Telegram group chat case; the inline-caption text loss is a pre-existing gap in the new code worth addressing before wider rollout.
  • All new tests pass for the common LLM output pattern. One P2 issue: caption text on the same line as an inline image is silently dropped from the text field (only lands in segments). LLMs rarely produce inline images, so practical impact is low, but it is incorrect behavior in the changed code path.
  • src/media/parse.ts — specifically the keptLines handling inside the collectMarkdownImageSegments caller block (lines ~247-259).
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/media/parse.ts
Line: 133-180

Comment:
**Inline caption text silently dropped from `text`**

When a markdown image appears inline with leading caption text on the same line (e.g. `"See this: ![chart](url)"`), the text before the image is pushed to `lineSegments` but `pieces` is then cleared, so `cleanedLine` only captures text *after* the last image. The caller only adds `cleanedLine` to `keptLines`, so any preceding caption is lost from `cleanedText`/`text` — it exists only in `segments`, which the reply path (`parseReplyDirectives`) doesn't consume.

For `"Caption ![img](https://…)"` the result is `{ text: "", mediaUrls: ["https://…"] }` — the caption disappears silently. The new tests cover images on their own lines (the common LLM pattern) but leave the inline case untested.

A straightforward fix is to collect all non-empty text segments into `keptLines` as well:

```ts
for (const segment of markdownImageResult.lineSegments) {
  if (segment.type === "text") {
    keptLines.push(segment.text);   // keep inline caption in cleanedText
    pushTextSegment(segment.text);
    continue;
  }
  segments.push(segment);
}
```

and skip the separate `if (markdownImageResult.cleanedLine)` push (it's already handled above).

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(reply): parse markdown image replies..." | Re-trigger Greptile

Comment thread src/media/parse.ts
@EronFan
Copy link
Copy Markdown
Contributor

EronFan commented Apr 14, 2026

Good fix. Extending src/media/parse.ts to handle ![...](...) markdown image syntax is the right approach — consistent with existing MEDIA: token handling, no channel-specific special cases needed. Regression coverage added. ✅

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: acab679138

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/media/parse.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75a56f7915

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/media/parse.ts Outdated
segmentPieces.push(before);
visiblePieces.push(before);

const target = cleanCandidate(unwrapQuoted(match.destination) ?? match.destination);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve trailing ')' in markdown image destinations

The markdown-image branch still normalizes parsed destinations with cleanCandidate(...), which strips trailing punctuation including ) and ]. For valid links that end in a balanced parenthesis (for example ![x](https://en.wikipedia.org/wiki/Test_(film))), the extracted mediaUrl is truncated to https://en.wikipedia.org/wiki/Test_(film, so downstream media delivery uses a broken URL even though parseMarkdownImageDestination already parsed the destination correctly.

Useful? React with 👍 / 👎.

Comment thread src/media/parse.ts
if (!trimmedStart.startsWith("MEDIA:")) {
keptLines.push(line);
pushTextSegment(line);
const markdownImageResult = collectMarkdownImageSegments({ line, media });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore markdown image syntax inside inline code spans

Non-MEDIA: lines are now always passed through markdown-image extraction, but this path only excludes fenced code blocks and does not respect inline backtick spans. As a result, literal examples like Use `![chart](https://example.com/chart.png)` syntax are converted into outbound media, which removes intended literal text and can send unintended images whenever replies discuss markdown syntax.

Useful? React with 👍 / 👎.

@vincentkoc vincentkoc force-pushed the kitsune/telegram-group-media branch from 75a56f7 to 86bdf95 Compare April 23, 2026 23:07
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86bdf95641

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/media/parse.ts
matches.length < MAX_MARKDOWN_IMAGE_MATCHES_PER_LINE &&
attempts < MAX_MARKDOWN_IMAGE_ATTEMPTS_PER_LINE
) {
const index = line.indexOf("![", searchIndex);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore escaped markdown image syntax

The scanner currently treats any ![ as an image opener without checking whether it was escaped, so inputs like \![chart](https://example.com/chart.png) are parsed as outbound media instead of literal text. In Markdown, a backslash-escaped image marker should render verbatim; with this logic we can unintentionally send media when a reply is trying to show syntax examples, and the visible text is also corrupted by leaving a stray backslash.

Useful? React with 👍 / 👎.

@vincentkoc vincentkoc force-pushed the kitsune/telegram-group-media branch 2 times, most recently from ccc1413 to 69c654a Compare April 24, 2026 01:32
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69c654aa61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/media/parse.ts
return { text: "" };
}
const mayContainMediaToken = /media:/i.test(trimmedRaw);
const mayContainMarkdownImage = /!\[[^\]]*]\(/.test(trimmedRaw);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Broaden markdown-image precheck for nested alt brackets

The fast-path guard mayContainMarkdownImage only matches !\[[^\]]*]\(, so valid Markdown like ![a [b]](https://example.com/x.png) is treated as having no image and exits early with plain text. findMarkdownImageMatches can parse nested [] correctly, but this guard prevents it from running whenever there is no MEDIA: token or audio tag on the message. That leaves standards-compliant image replies unlifted to mediaUrl/mediaUrls in the common markdown-only case.

Useful? React with 👍 / 👎.

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 24, 2026

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Potential SSRF via Markdown image URL extraction in splitMediaFromOutput
2 🟡 Medium Algorithmic complexity DoS in markdown image parsing (splitMediaFromOutput)
1. 🟠 Potential SSRF via Markdown image URL extraction in splitMediaFromOutput
Property Value
Severity High
CWE CWE-918
Location src/media/parse.ts:170-172

Description

splitMediaFromOutput now extracts media URLs not only from explicit MEDIA: directives, but also from Markdown image syntax (![alt](destination)). The extracted destination is treated as media if it matches ^https?:// and passes isValidMedia(), which unconditionally returns true for any http(s) URL.

This introduces a new path for untrusted text to cause downstream components to treat arbitrary attacker-controlled URLs as media:

  • Input: untrusted agent/tool output text containing ![x](http://...) / ![x](https://...)
  • Normalization/cleaning: unwrapQuoted / cleanCandidate / normalizeMediaSource
  • Approval: isRemoteMarkdownImageMedia() which only checks scheme + isValidMedia()
  • Output: URL is appended to mediaUrls and emitted as a {type: "media"} segment

If downstream logic fetches, proxies, or otherwise dereferences mediaUrls (common for rendering, previewing, uploading, or safety scanning), an attacker can trigger SSRF to internal services (e.g., http://127.0.0.1, http://169.254.169.254, RFC1918 ranges) simply by embedding a Markdown image.

Vulnerable code:

const target = normalizeMediaSource(
  cleanCandidate(unwrapQuoted(match.destination) ?? match.destination),
);
if (isRemoteMarkdownImageMedia(target)) {
  params.media.push(target);
  lineSegments.push({ type: "media", url: target });
}

function isRemoteMarkdownImageMedia(candidate: string): boolean {
  return /^https?:\/\//i.test(candidate) && isValidMedia(candidate);
}

function isValidMedia(candidate: string) {
  ...
  if (/^https?:\/\//i.test(candidate)) {
    return true;
  }
  ...
}

Recommendation

Treat remote media URLs as untrusted and apply SSRF mitigations before accepting them as media.

Suggested hardening:

  1. Parse and validate URLs using new URL() (reject invalid URLs, reject credentials).
  2. Block private/loopback/link-local IP ranges and common internal hostnames.
  3. Optionally enforce an allowlist of permitted domains for remote media.
  4. Consider requiring an explicit MEDIA: directive for remote fetches, and treat Markdown images as display-only unless explicitly enabled.

Example:

import dns from "node:dns/promises";
import net from "node:net";

function isPublicIp(ip: string): boolean {// block loopback, link-local, RFC1918, etc. (implement fully)
  return !(
    ip.startsWith("127.") ||
    ip.startsWith("10.") ||
    ip.startsWith("192.168.") ||
    ip.startsWith("169.254.")
  );
}

async function isSafeRemoteMedia(urlStr: string): Promise<boolean> {
  let url: URL;
  try { url = new URL(urlStr); } catch { return false; }
  if (url.protocol !== "http:" && url.protocol !== "https:") return false;
  if (url.username || url.password) return false;// Resolve and verify IPs (beware DNS rebinding; do checks at connect-time too)
  const addrs = await dns.lookup(url.hostname, { all: true });
  if (addrs.some(a => net.isIP(a.address) && !isPublicIp(a.address))) return false;

  return true;
}

Apply isSafeRemoteMedia() (or equivalent) inside isRemoteMarkdownImageMedia before pushing the URL to mediaUrls.

2. 🟡 Algorithmic complexity DoS in markdown image parsing (splitMediaFromOutput)
Property Value
Severity Medium
CWE CWE-400
Location src/media/parse.ts:293-326

Description

splitMediaFromOutput() now attempts to extract remote media URLs from Markdown image syntax for every non-MEDIA: line when the overall output matches /!\[[^\]]*]\(/.

The per-line parser can be forced into high CPU usage with adversarially crafted text:

  • findMarkdownImageMatches() loops up to MAX_MARKDOWN_IMAGE_ATTEMPTS_PER_LINE (80) and for each candidate calls findMatchingBracket(), which may scan to the end of the line when brackets are unbalanced or deeply nested.
  • On failures, searchIndex only advances by index + 2, allowing many overlapping ![ candidates and repeated near-full-line scans.
  • There is no cap on total number of lines or total message size processed by splitMediaFromOutput(). An attacker can supply many lines near MAX_MARKDOWN_IMAGE_LINE_LENGTH (20k) containing repeated ![ sequences that trigger the expensive path, producing large cumulative CPU time.

This is a classic resource-exhaustion vector (algorithmic complexity) and may be reachable if raw contains untrusted user/model output in request/response handling paths.

Vulnerable code (hot loop + full-line scans):

while (matches.length < MAX_MARKDOWN_IMAGE_MATCHES_PER_LINE && attempts < MAX_MARKDOWN_IMAGE_ATTEMPTS_PER_LINE) {
  const index = line.indexOf("![", searchIndex);
  attempts += 1;
  const altEnd = findMatchingBracket(line, index + 2, "[", "]");
  ...
  searchIndex = index + 2;
}

Recommendation

Mitigate worst-case CPU and memory usage by adding global and more linear-time limits:

  1. Add a cap on total input size / number of lines processed by splitMediaFromOutput() (or short-circuit Markdown image extraction when the input is very large).
  2. Avoid repeated rescans by making the matcher single-pass (advance searchIndex more aggressively on failure, e.g. to altEnd + 1 when altEnd exists, or to the next ![ after the closing ] when possible).
  3. Consider lowering the per-line attempt limit or stopping after a maximum number of total markdown-image parse operations per message.

Example defensive cap:

const MAX_OUTPUT_CHARS = 200_000;
const MAX_LINES = 2_000;
const safeRaw = trimmedRaw.slice(0, MAX_OUTPUT_CHARS);
const lines = safeRaw.split("\n").slice(0, MAX_LINES);

Analyzed PR: #66471 at commit 2b7bf89

Last updated on: 2026-04-24T02:42:18Z

@vincentkoc vincentkoc force-pushed the kitsune/telegram-group-media branch from 69c654a to 8cbea62 Compare April 24, 2026 02:26
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8cbea62427

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/media/parse.ts
@@ -185,8 +445,23 @@ export function splitMediaFromOutput(raw: string): {

const trimmedStart = line.trimStart();
if (!trimmedStart.toUpperCase().startsWith("MEDIA:")) {
keptLines.push(line);
pushTextSegment(line);
const markdownImageResult = collectMarkdownImageSegments({ line, media });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip markdown image parsing inside indented code blocks

splitMediaFromOutput only suppresses extraction for fenced code spans, then unconditionally runs collectMarkdownImageSegments on every other non-MEDIA: line. That means valid Markdown indented code blocks (e.g. lines starting with four spaces containing ![...](...) as a literal example) are treated as outbound media and the literal text is removed, which can send unintended images when replies include markdown syntax examples.

Useful? React with 👍 / 👎.

@vincentkoc vincentkoc force-pushed the kitsune/telegram-group-media branch from 8cbea62 to 9674af7 Compare April 24, 2026 02:32
@vincentkoc vincentkoc force-pushed the kitsune/telegram-group-media branch from 9674af7 to 2b7bf89 Compare April 24, 2026 02:34
@vincentkoc vincentkoc merged commit 60d892d into main Apr 24, 2026
12 checks passed
@vincentkoc vincentkoc deleted the kitsune/telegram-group-media branch April 24, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram group chats strip media — images only work in DMs

2 participants