feat(msteams): extract structured quote/reply context from HTML attachments by robinhanksliu · Pull Request #44739 · openclaw/openclaw

robinhanksliu · 2026-03-13T06:24:26Z

Problem

When a Teams user quotes/replies to a message, the quoted sender name and body are merged into activity.text as a flat string. The agent receives something like:

Jianmei YuRobin's Claw 是你偷偷改格式了吗？

...with no way to distinguish the quoted content from the actual message, or identify who originally wrote the quoted part.

Solution

inbound.ts: Add extractMSTeamsQuoteInfo() to parse Teams <blockquote> HTML from text/html attachments. Supports both the schema.skype.com/Reply format and simpler blockquote variants.
message-handler.ts:
- Call quote extraction on inbound messages
- Populate ReplyToSender and ReplyToBody in the inbound context (consistent with Telegram and WhatsApp implementations)
- Format the agent body with a [Replying to ...] annotation block

After this change, the agent sees:

actual message text

[Replying to Jianmei Yu]
是你偷偷改格式了吗？
[/Replying]

Changes

File	Change
`extensions/msteams/src/inbound.ts`	+`extractMSTeamsQuoteInfo()`, `MSTeamsQuoteInfo` type, HTML parsing helpers
`extensions/msteams/src/inbound.test.ts`	+6 test cases covering various blockquote formats
`extensions/msteams/src/monitor-handler/message-handler.ts`	Wire up quote extraction, populate `ReplyToSender`/`ReplyToBody`, format `[Replying to ...]` block

Testing

All 233 existing msteams tests pass ✅
6 new tests added for quote extraction ✅
1 pre-existing failure in probe.test.ts (unrelated to this change)

…hments When a Teams user quotes/replies to a message, the quoted sender name and body are mixed into activity.text as a flat string, making it impossible for the agent to distinguish the quote from the actual message. This change: - Adds extractMSTeamsQuoteInfo() to parse Teams blockquote HTML from text/html attachments (supports both schema.skype.com/Reply format and simpler blockquote variants) - Populates ReplyToSender and ReplyToBody in the inbound context (consistent with Telegram and WhatsApp implementations) - Formats the agent body with a [Replying to ...] annotation block so the LLM receives structured context about quoted messages - Adds 6 test cases covering various blockquote formats The agent now sees: actual message text [Replying to Jianmei Yu] quoted message content [/Replying] instead of the previous flat string where all content was merged.

greptile-apps · 2026-03-13T06:28:41Z

Greptile Summary

This PR adds structured quote/reply extraction for Microsoft Teams inbound messages. When a Teams user quotes a message, the quoted sender name and body are now parsed from the text/html attachment (using the schema.skype.com/Reply blockquote format or a simpler fallback), and the agent sees a clean [Replying to …] annotation block instead of the flat merged string. ReplyToSender and ReplyToBody are also populated on the inbound context payload, bringing Teams in line with the Telegram and WhatsApp implementations.

Key observations:

The HTML parsing relies on regex rather than a DOM parser, which is sufficient for Teams' well-structured output but has known limitations around nested blockquotes.
htmlToPlainText only decodes six named HTML entities ( , &, <, >, ", '); numeric entities (e.g.  , ’) would be left as literal strings in extracted sender/body text.
When the text/html attachment contains only the blockquote and no trailing content, cleanBody falls back to the full merged activity.text. In that path, agentBody emits the quote context twice — once embedded in cleanBody and again in the [Replying to …] block — without any structured separation of the actual message.
6 new unit tests cover the main code paths well, including the fallback case.

Confidence Score: 4/5

Safe to merge; the feature is additive and well-tested, with no changes to existing message routing or auth logic.
The implementation is logically sound and consistent with the Telegram/WhatsApp pattern. The two concerns (incomplete entity decoding and the duplicate-content fallback path) are edge cases that don't affect the common case and don't cause data loss or incorrect routing. The test suite is comprehensive.
extensions/msteams/src/inbound.ts (entity decoding) and extensions/msteams/src/monitor-handler/message-handler.ts (agentBody fallback path)

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/msteams/src/inbound.ts
Line: 67-81

Comment:
**Incomplete HTML entity decoding**

`htmlToPlainText` only decodes six named entities (`&nbsp;`, `&amp;`, `&lt;`, `&gt;`, `&quot;`, `&#39;`). Any other numeric or named HTML entity — e.g. `&#160;` (non-breaking space), `&#x2019;` (right single quotation mark), `&mdash;`, `&eacute;`, etc. — will appear as raw literal strings in the extracted `quotedSender` / `quotedBody` / `cleanBody`.

Teams can produce numeric entities in generated HTML (e.g. for curly quotes or special Unicode characters). Consider adding a generic numeric-entity fallback:

```ts
.replace(/&#x([0-9a-f]+);/gi, (_, hex) => String.fromCodePoint(parseInt(hex, 16)))
.replace(/&#([0-9]+);/g, (_, dec) => String.fromCodePoint(Number(dec)))
```

or use a lightweight library like `he` to handle the full entity spectrum.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/msteams/src/monitor-handler/message-handler.ts
Line: 514-517

Comment:
**Quote content duplicated in agent body when HTML lacks post-blockquote text**

When the `text/html` attachment contains only the `<blockquote>` and no trailing content (e.g. some Teams clients put the new message only in `activity.text`), `htmlToPlainText(afterBlockquote)` returns `""` and `cleanBody` falls back to the full `fallbackText` — the merged flat string that already contains the quoted sender name, the quoted body, and the actual message all concatenated.

In that case `agentBody` becomes:

```
Bobsome quotemy actual message

[Replying to Bob]
some quote
[/Replying]
```

The `quotedBody` (`some quote`) appears twice — once buried inside `cleanBody` and again inside the `[Replying to …]` block. The agent cannot distinguish the user's actual message from the quoted portion inside `cleanBody`, so the annotation provides little value in this fallback path.

One approach: when `cleanBody === fallbackText`, skip emitting the `[Replying to …]` block (since the structured separation can't be guaranteed anyway), or at least avoid duplicating the annotation:

```ts
const agentBody = quoteInfo
  ? quoteInfo.cleanBody !== fallbackText
    ? quoteInfo.cleanBody +
      `\n\n[Replying to ${quoteInfo.quotedSender ?? "unknown"}]\n${quoteInfo.quotedBody ?? "(no text)"}\n[/Replying]`
    : rawBody  // structured extraction failed; surface as-is
  : rawBody;
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/msteams/src/inbound.ts
Line: 147-158

Comment:
**Regex can mis-match on nested or malformed blockquotes**

The non-greedy `([\s\S]*?)<\/blockquote>` pattern stops at the **first** `</blockquote>` it encounters. If Teams ever nests a quoted reply inside another reply (e.g. a thread reply to a reply), or if the blockquote's inner content itself contains a literal `</blockquote>` string (unlikely but possible in forwarded HTML content), the captured group `bqMatch[1]` will be truncated and both `quotedSender` and `quotedBody` may be wrong or empty.

For the current Teams use-case this is probably fine, but it's worth a brief code comment acknowledging the limitation so future maintainers don't be surprised:

```ts
// Note: non-greedy match stops at the first </blockquote>, so nested
// blockquotes are not supported. Teams does not currently produce them
// for quote/reply scenarios.
const blockquoteRe = ...
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: bf744ac}

greptile-apps · 2026-03-13T06:28:44Z

+function htmlToPlainText(html: string): string {
+  return html
+    .replace(/<br\s*\/?>/gi, "\n")
+    .replace(/<\/p>/gi, "\n")
+    .replace(/<[^>]*>/g, "")
+    .replace(/&nbsp;/gi, " ")
+    .replace(/&amp;/gi, "&")
+    .replace(/&lt;/gi, "<")
+    .replace(/&gt;/gi, ">")
+    .replace(/&quot;/gi, '"')
+    .replace(/&#39;/gi, "'")
+    .replace(/[ \t]+/g, " ")
+    .replace(/\n{3,}/g, "\n\n")
+    .trim();
+}


Incomplete HTML entity decoding

htmlToPlainText only decodes six named entities ( , &, <, >, ", '). Any other numeric or named HTML entity — e.g.   (non-breaking space), ’ (right single quotation mark), —, é, etc. — will appear as raw literal strings in the extracted quotedSender / quotedBody / cleanBody.

Teams can produce numeric entities in generated HTML (e.g. for curly quotes or special Unicode characters). Consider adding a generic numeric-entity fallback:

.replace(/&#x([0-9a-f]+);/gi, (_, hex) => String.fromCodePoint(parseInt(hex, 16))) .replace(/&#([0-9]+);/g, (_, dec) => String.fromCodePoint(Number(dec)))

or use a lightweight library like he to handle the full entity spectrum.

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/msteams/src/inbound.ts Line: 67-81 Comment: **Incomplete HTML entity decoding** `htmlToPlainText` only decodes six named entities (` `, `&`, `<`, `>`, `"`, `'`). Any other numeric or named HTML entity — e.g. ` ` (non-breaking space), `’` (right single quotation mark), `—`, `é`, etc. — will appear as raw literal strings in the extracted `quotedSender` / `quotedBody` / `cleanBody`. Teams can produce numeric entities in generated HTML (e.g. for curly quotes or special Unicode characters). Consider adding a generic numeric-entity fallback: ```ts .replace(/&#x([0-9a-f]+);/gi, (_, hex) => String.fromCodePoint(parseInt(hex, 16))) .replace(/&#([0-9]+);/g, (_, dec) => String.fromCodePoint(Number(dec))) ``` or use a lightweight library like `he` to handle the full entity spectrum. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-13T06:28:45Z

+    const agentBody = quoteInfo
+      ? quoteInfo.cleanBody +
+        `\n\n[Replying to ${quoteInfo.quotedSender ?? "unknown"}]\n${quoteInfo.quotedBody ?? "(no text)"}\n[/Replying]`
+      : rawBody;


Quote content duplicated in agent body when HTML lacks post-blockquote text

When the text/html attachment contains only the <blockquote> and no trailing content (e.g. some Teams clients put the new message only in activity.text), htmlToPlainText(afterBlockquote) returns "" and cleanBody falls back to the full fallbackText — the merged flat string that already contains the quoted sender name, the quoted body, and the actual message all concatenated.

In that case agentBody becomes:

Bobsome quotemy actual message [Replying to Bob] some quote [/Replying]

The quotedBody (some quote) appears twice — once buried inside cleanBody and again inside the [Replying to …] block. The agent cannot distinguish the user's actual message from the quoted portion inside cleanBody, so the annotation provides little value in this fallback path.

One approach: when cleanBody === fallbackText, skip emitting the [Replying to …] block (since the structured separation can't be guaranteed anyway), or at least avoid duplicating the annotation:

const agentBody = quoteInfo ? quoteInfo.cleanBody !== fallbackText ? quoteInfo.cleanBody + `\n\n[Replying to ${quoteInfo.quotedSender ?? "unknown"}]\n${quoteInfo.quotedBody ?? "(no text)"}\n[/Replying]` : rawBody // structured extraction failed; surface as-is : rawBody;

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/msteams/src/monitor-handler/message-handler.ts Line: 514-517 Comment: **Quote content duplicated in agent body when HTML lacks post-blockquote text** When the `text/html` attachment contains only the `<blockquote>` and no trailing content (e.g. some Teams clients put the new message only in `activity.text`), `htmlToPlainText(afterBlockquote)` returns `""` and `cleanBody` falls back to the full `fallbackText` — the merged flat string that already contains the quoted sender name, the quoted body, and the actual message all concatenated. In that case `agentBody` becomes: ``` Bobsome quotemy actual message [Replying to Bob] some quote [/Replying] ``` The `quotedBody` (`some quote`) appears twice — once buried inside `cleanBody` and again inside the `[Replying to …]` block. The agent cannot distinguish the user's actual message from the quoted portion inside `cleanBody`, so the annotation provides little value in this fallback path. One approach: when `cleanBody === fallbackText`, skip emitting the `[Replying to …]` block (since the structured separation can't be guaranteed anyway), or at least avoid duplicating the annotation: ```ts const agentBody = quoteInfo ? quoteInfo.cleanBody !== fallbackText ? quoteInfo.cleanBody + `\n\n[Replying to ${quoteInfo.quotedSender ?? "unknown"}]\n${quoteInfo.quotedBody ?? "(no text)"}\n[/Replying]` : rawBody // structured extraction failed; surface as-is : rawBody; ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-13T06:28:46Z

+    const blockquoteRe =
+      /<blockquote[^>]*itemtype=["']http:\/\/schema\.skype\.com\/Reply["'][^>]*>([\s\S]*?)<\/blockquote>/i;
+    const bqMatch = blockquoteRe.exec(html);
+    if (!bqMatch) {
+      // Try a more lenient blockquote match (some Teams clients omit schema attrs).
+      const simpleBqRe = /<blockquote[^>]*>([\s\S]*?)<\/blockquote>/i;
+      const simpleBqMatch = simpleBqRe.exec(html);
+      if (!simpleBqMatch) {
+        continue;
+      }
+      return parseBlockquoteContent(simpleBqMatch, html, fallbackText);
+    }


Regex can mis-match on nested or malformed blockquotes

The non-greedy ([\s\S]*?)<\/blockquote> pattern stops at the first </blockquote> it encounters. If Teams ever nests a quoted reply inside another reply (e.g. a thread reply to a reply), or if the blockquote's inner content itself contains a literal </blockquote> string (unlikely but possible in forwarded HTML content), the captured group bqMatch[1] will be truncated and both quotedSender and quotedBody may be wrong or empty.

For the current Teams use-case this is probably fine, but it's worth a brief code comment acknowledging the limitation so future maintainers don't be surprised:

// Note: non-greedy match stops at the first </blockquote>, so nested // blockquotes are not supported. Teams does not currently produce them // for quote/reply scenarios. const blockquoteRe = ...

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/msteams/src/inbound.ts Line: 147-158 Comment: **Regex can mis-match on nested or malformed blockquotes** The non-greedy `([\s\S]*?)<\/blockquote>` pattern stops at the **first** `</blockquote>` it encounters. If Teams ever nests a quoted reply inside another reply (e.g. a thread reply to a reply), or if the blockquote's inner content itself contains a literal `</blockquote>` string (unlikely but possible in forwarded HTML content), the captured group `bqMatch[1]` will be truncated and both `quotedSender` and `quotedBody` may be wrong or empty. For the current Teams use-case this is probably fine, but it's worth a brief code comment acknowledging the limitation so future maintainers don't be surprised: ```ts // Note: non-greedy match stops at the first </blockquote>, so nested // blockquotes are not supported. Teams does not currently produce them // for quote/reply scenarios. const blockquoteRe = ... ``` How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf744ac425

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-13T06:31:07Z

+      const simpleBqRe = /<blockquote[^>]*>([\s\S]*?)<\/blockquote>/i;
+      const simpleBqMatch = simpleBqRe.exec(html);


Avoid treating generic blockquotes as reply metadata

The fallback path matches any <blockquote> when the Skype reply schema is missing, so a normal user-authored quote block in a text/html attachment is reinterpreted as a reply and then rewritten downstream into [Replying to ...] context. This changes message meaning in conversations that use blockquote formatting without using Teams reply UI, because extractMSTeamsQuoteInfo will still return quote data from plain formatting HTML.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-13T06:31:07Z

+    const agentBody = quoteInfo
+      ? quoteInfo.cleanBody +
+        `\n\n[Replying to ${quoteInfo.quotedSender ?? "unknown"}]\n${quoteInfo.quotedBody ?? "(no text)"}\n[/Replying]`
+      : rawBody;


Preserve attachment placeholder in quoted attachment-only messages

When quote info is present, BodyForAgent is always rebuilt from quoteInfo.cleanBody plus the reply annotation and never falls back to rawBody. For quote messages where text is empty and rawBody came from buildMSTeamsAttachmentPlaceholder (for example, attachment-only replies), this drops the placeholder and hides that media/doc content was sent, reducing the agent's available context.

Useful? React with 👍 / 👎.

- Add numeric entity decoding (&#x...; and &#...;) in htmlToPlainText - Remove lenient blockquote fallback to avoid misinterpreting generic blockquotes as reply metadata (only match schema.skype.com/Reply) - Add comment about nested blockquote limitation - Guard against duplication when cleanBody falls back to raw text - Preserve attachment placeholder for attachment-only replies - Add test for numeric entity decoding - Update simple blockquote test to verify it is now ignored

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75ddf47325

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-13T06:45:07Z

+      .replace(/&#x([0-9a-f]+);/gi, (_, hex) => String.fromCodePoint(parseInt(hex, 16)))
+      .replace(/&#([0-9]+);/g, (_, dec) => String.fromCodePoint(Number(dec)))


Guard numeric entity decoding against invalid code points

htmlToPlainText decodes &#x...; / &#...; with String.fromCodePoint(...) directly, which throws RangeError for values above 0x10FFFF (for example &#x110000;). Because extractMSTeamsQuoteInfo is called on every inbound message before context finalization, a malformed or untrusted HTML attachment can make message handling fail for that turn instead of gracefully treating the entity as plain text.

Useful? React with 👍 / 👎.

steipete · 2026-04-25T07:29:22Z

Closing this as implemented after Codex review.

Current main already extracts structured Microsoft Teams reply/quote data from HTML attachments, stores it as inbound reply context, and exposes that context to the agent through the shared inbound metadata prompt. The requested capability is present and was already shipped in v2026.4.22.

What I checked:

Quote extraction exists on main: extensions/msteams/src/inbound.ts:36 defines extractMSTeamsQuoteInfo(...) and parses Teams http://schema.skype.com/Reply HTML attachments into structured sender/body data. (extensions/msteams/src/inbound.ts:36, 38caa6832d4e)
Teams inbound handler wires extracted reply context into the payload: extensions/msteams/src/monitor-handler/message-handler.ts:198 calls extractMSTeamsQuoteInfo(attachments), and extensions/msteams/src/monitor-handler/message-handler.ts:786-:788 populate ReplyToBody, ReplyToSender, and ReplyToIsQuote on the finalized inbound context. (extensions/msteams/src/monitor-handler/message-handler.ts:198, 38caa6832d4e)
Agent prompt already consumes reply context generically: src/auto-reply/reply/inbound-meta.ts:269-:277 adds a structured Replied message (untrusted, for context) block using ReplyToSender, ReplyToBody, and ReplyToIsQuote, so the agent receives separable quote metadata even without this PR's exact BodyForAgent formatting. (src/auto-reply/reply/inbound-meta.ts:269, 38caa6832d4e)
Tests cover the Teams quote extraction path: extensions/msteams/src/inbound.test.ts:103-:219 exercises extractMSTeamsQuoteInfo across reply attachments, missing sender/body cases, entity decoding, object payloads, and attachment scanning. (extensions/msteams/src/inbound.test.ts:103, 38caa6832d4e)
Feature was already in the latest release: The release commit for tag v2026.4.22 (00bd2cf7a376f1fba26291c6c4766f1f15cbdfa5) already contains the same Teams quote extraction and message-handler wiring in extensions/msteams/src/inbound.ts and extensions/msteams/src/monitor-handler/message-handler.ts. (00bd2cf7a376)

So I’m closing this as already implemented rather than keeping a duplicate issue open.

Review notes: reviewed against 38caa6832d4e; fix evidence: release v2026.4.22, commit 00bd2cf7a376.

openclaw-barnacle Bot added channel: msteams Channel integration: msteams size: M labels Mar 13, 2026

greptile-apps Bot reviewed Mar 13, 2026

View reviewed changes

fix: accept null in contentType to match MSTeamsAttachmentLike type

af875c9

chatgpt-codex-connector Bot reviewed Mar 13, 2026

View reviewed changes

BradGroux mentioned this pull request Mar 17, 2026

WORKING: All Microsoft Issues and PRs #49126

Closed

sudie-codes mentioned this pull request Mar 21, 2026

msteams: extract structured quote/reply context #51647

Merged

5 tasks

steipete closed this Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(msteams): extract structured quote/reply context from HTML attachments#44739

feat(msteams): extract structured quote/reply context from HTML attachments#44739
robinhanksliu wants to merge 3 commits intoopenclaw:mainfrom
robinhanksliu:feat/msteams-quote-context

robinhanksliu commented Mar 13, 2026

Uh oh!

greptile-apps Bot commented Mar 13, 2026

Uh oh!

greptile-apps Bot Mar 13, 2026

Uh oh!

greptile-apps Bot Mar 13, 2026

Uh oh!

greptile-apps Bot Mar 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 13, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 13, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const simpleBqRe = /<blockquote[^>]>([\s\S]?)<\/blockquote>/i;
		const simpleBqMatch = simpleBqRe.exec(html);

		.replace(/&#x([0-9a-f]+);/gi, (_, hex) => String.fromCodePoint(parseInt(hex, 16)))
		.replace(/&#([0-9]+);/g, (_, dec) => String.fromCodePoint(Number(dec)))

Uh oh!

Conversation

robinhanksliu commented Mar 13, 2026

Problem

Solution

After this change, the agent sees:

Changes

Testing

Uh oh!

greptile-apps Bot commented Mar 13, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants