fix: improve email encoding and HTML rendering by riderx · Pull Request #1435 · Cap-go/capgo

riderx · 2026-01-14T20:57:43Z

Summary (AI generated)

Fixed email encoding issues in the Cloudflare Worker email-to-Discord system:

Base64 decoding now properly handles UTF-8 characters (was using Latin-1)
Quoted-printable decoding correctly handles multi-byte UTF-8 sequences
HTML-to-text conversion preserves structure with proper block element handling and entity decoding

Test plan (AI generated)

Send emails with non-ASCII characters (accents, emoji, special symbols) and verify they display correctly in Discord
Test HTML emails to ensure proper formatting with preserved paragraphs, lists, and tables
Verify HTML entities ( , —, smart quotes) are decoded properly

Checklist (AI generated)

Code follows project style
Documentation update needed
E2E test coverage added
Manually tested encoding with sample emails

Generated with AI

Summary by CodeRabbit

Improvements
- Better HTML-to-plain-text conversion that preserves structure (lists, headings, code blocks) and normalizes whitespace for clearer message text.
- Robust fallback for HTML conversion to ensure safe plain-text output when conversion errors occur.
- Charset-aware decoding for base64 and quoted-printable content to reliably display international characters and symbols.
- Safer normalization and handling of embedded HTML for more consistent plain-text and embed rendering.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-14T20:57:57Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds Turndown-based HTML→Markdown conversion with a safe fallback in discord.ts; enhances charset-aware decoding for base64 and quoted-printable content using TextDecoder and byte-level decoding in email-parser.ts; adds turndown dependency. No public API changes.

Changes

Cohort / File(s)	Summary
HTML-to-Markdown / Safe Stripping `cloudflare_workers/email/discord.ts`	Replaced simple HTML stripping with TurndownService-based HTML→Markdown conversion; added module-scoped Turndown initialization, improved newline/whitespace normalization, and a character-by-character `stripTagsSafely(html)` fallback used if Turndown fails. Attachment and Discord API behavior unchanged.
Charset-aware Decoding `cloudflare_workers/email/email-parser.ts`	Added Content-Type charset extraction; introduced `decodeBase64Utf8(base64, charset)`, `decodeQuotedPrintable(content, charset)`, and `decodeBytes(bytes, charset)` using `TextDecoder` with fallbacks; updated `decodeContent` to use charset-aware decoders.
Dependencies `package.json`	Added runtime dependency `turndown` and dev dependency `@types/turndown`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰
I hop through tags and turn them kind,
Turndown trims what once waslined,
Bytes convert from base64 dens,
Newlines hum with softer pens,
A happy rabbit bounds—refined!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description covers main changes and testing approach, but is incomplete: lacks detailed implementation details, does not check all required checklist items, and missing critical E2E test coverage documentation despite non-trivial changes.	Complete the checklist by verifying linting, add E2E test coverage details for encoding/rendering tests, and update documentation if the new Turndown dependency affects public APIs or deployment.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: improving email encoding (Base64/Quoted-printable UTF-8 handling) and HTML rendering (Turndown integration), which are the core modifications across all files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare_workers/email/discord.ts

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c3c8b7550d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-14T21:00:27Z

cloudflare_workers/email/discord.ts

+  // Replace numeric entities (&#123; or &#x1F600;)
+  result = result
+    .replace(/&#(\d+);/g, (_, code) => String.fromCodePoint(Number.parseInt(code, 10)))
+    .replace(/&#x([0-9a-fA-F]+);/g, (_, code) => String.fromCodePoint(Number.parseInt(code, 16)))


Guard numeric entity decoding against invalid code points

The new numeric-entity decoding calls String.fromCodePoint for every &#...;/&#x...; without validating the range. If an email contains an out-of-range value (e.g., &#x110000; or a surrogate), String.fromCodePoint throws a RangeError. Because stripHtml doesn’t catch this, the exception will bubble and abort message handling, dropping the Discord post. Before this change the entity would have remained as text, so this is a regression for malformed or hostile HTML. Consider range-checking and leaving the original entity on failure.

Useful? React with 👍 / 👎.

cloudflare_workers/email/discord.ts

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@cloudflare_workers/email/discord.ts`:
- Around line 282-286: The numeric-entity replacement callbacks can throw from
String.fromCodePoint on invalid code points; update both replace callbacks (the
ones calling String.fromCodePoint in the result.replace chain) to parse the
code, validate it is an integer within 0..0x10FFFF (and non-negative), and only
call String.fromCodePoint for valid values, otherwise return a safe fallback
(e.g., the original match or the Unicode replacement character U+FFFD) to avoid
throwing; apply the same check for decimal and hex branches.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abc6ffd and c3c8b75.

📒 Files selected for processing (2)

cloudflare_workers/email/discord.ts
cloudflare_workers/email/email-parser.ts

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{vue,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint to lint Vue, TypeScript, and JavaScript files; use bun lint:fix to auto-fix issues

Files:

cloudflare_workers/email/email-parser.ts
cloudflare_workers/email/discord.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use single quotes and no semicolons per @antfu/eslint-config rules

Files:

cloudflare_workers/email/email-parser.ts
cloudflare_workers/email/discord.ts

**/*.{ts,tsx,js,jsx,vue}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint or lint/format command before validating any backend or frontend task to ensure consistent formatting

Files:

cloudflare_workers/email/email-parser.ts
cloudflare_workers/email/discord.ts

🪛 GitHub Check: CodeQL

cloudflare_workers/email/discord.ts

[failure] 198-221: Incomplete multi-character sanitization
This string may still contain <script, which may cause an HTML element injection vulnerability.

[failure] 198-203: Incomplete multi-character sanitization
This string may still contain <!--, which may cause an HTML element injection vulnerability.

[failure] 198-201: Incomplete multi-character sanitization
This string may still contain <script, which may cause an HTML element injection vulnerability.

[failure] 198-200: Incomplete multi-character sanitization
This string may still contain <style, which may cause an HTML element injection vulnerability.

[failure] 201-201: Bad HTML filtering regexp
This regular expression does not match script end tags like </script >.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (6)

cloudflare_workers/email/email-parser.ts (4)

143-160: LGTM! Good charset-aware content decoding.

The charset extraction from Content-Type header with UTF-8 fallback is correct. The regex handles both quoted and unquoted charset values properly.

166-185: LGTM! Correct UTF-8 base64 decoding.

The approach of converting the Latin-1 string from atob() to a byte array and then using TextDecoder with the specified charset is the correct way to handle multi-byte UTF-8 sequences in base64-encoded content.

192-227: LGTM! Proper quoted-printable decoding with UTF-8 support.

The byte accumulation approach correctly handles multi-byte UTF-8 sequences that appear as consecutive =XX patterns. The flush-before-plain-text logic ensures bytes are decoded as a unit before mixing with ASCII text.

232-241: LGTM!

Simple and effective byte-to-string conversion with appropriate fallback.

cloudflare_workers/email/discord.ts (2)

241-280: LGTM! Comprehensive entity decoding.

The entity map covers common named entities including typographic characters, currency symbols, and mathematical operators. Using Unicode escape sequences for special quotes avoids source file encoding issues.

197-221: The catch-all pattern on line 221 (/<[^>]+>/g) already removes any remaining HTML tags, including edge cases like </script > with trailing spaces. Since the stripHtml() output is sent to Discord's API as plain text (not rendered HTML), there is no XSS vulnerability here. The suggested regex improvements are unnecessary.

If desired, add a JSDoc comment clarifying that this function converts HTML to plain text for display in Discord messages (where HTML is not rendered), rather than for security-critical sanitization.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

cloudflare_workers/email/discord.ts

- Fix base64 decoding to properly handle UTF-8 characters instead of Latin-1 - Fix quoted-printable decoding to correctly handle multi-byte UTF-8 sequences - Improve HTML-to-text conversion to preserve structure and decode entities - Handle block elements (p, div, h1-h6) with newlines - Convert list items to bullet points - Format tables readably - Decode HTML entities including smart quotes, dashes, and special symbols - Support numeric entities ({ and 😀) This ensures emails display correctly in Discord without encoding artifacts or broken HTML formatting. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

- Add multi-pass loop for script/style/comment removal to handle nested tags - Allow whitespace in closing tags (e.g., </script >) - Handle malformed comments with extra dashes - Run sanitization again after entity decoding to catch decoded tags - Escape remaining angle brackets in final output Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

- Use [^>]* for closing tags to match variants like </script\t\n bar> - Remove orphaned closing tags and unclosed comments - Remove < and > entirely instead of replacing with identical unicode escapes - Validate code points in numeric entity decoding to prevent RangeError Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

cloudflare_workers/email/discord.ts

…oach Use a non-regex approach to strip HTML tags to avoid security scanner warnings about incomplete multi-character sanitization. The new stripAllTags function processes HTML character by character, which is inherently safe against nested/malformed tag attacks. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace custom HTML sanitization with turndown library for converting HTML emails to Markdown. This provides: - Better Markdown output that Discord can render - Proper handling of complex HTML structures - Well-tested library vs custom regex patterns - Eliminates security scanner warnings about HTML sanitization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

cloudflare_workers/email/discord.ts

Replace regex-based fallback with safe character-by-character approach to eliminate security scanner warning about incomplete sanitization. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

socket-security · 2026-01-14T23:14:13Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	npm/@types/turndown@5.0.6

View full report

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@package.json`:
- Line 200: The project depends on turndown@^7.2.2 while `@types/turndown`@^5.0.6
targets Turndown v5 causing a type mismatch; to fix, verify whether turndown v7
ships its own TypeScript declarations (inspect node_modules/turndown for
index.d.ts) and if it does remove the outdated `@types/turndown` entry from
package.json, otherwise keep runtime dependency but remove or replace the `@types`
package and add either a small local declaration file (e.g., declare module
'turndown' with the needed APIs) or use targeted type assertions/@ts-ignore
where you call newer v7 APIs; optionally set "skipLibCheck": true in tsconfig to
suppress mismatched lib errors and document this decision in package.json or the
repo README.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between afd4114 and ef68dd3.

⛔ Files ignored due to path filters (1)

bun.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

cloudflare_workers/email/discord.ts
cloudflare_workers/email/email-parser.ts
package.json

🚧 Files skipped from review as they are similar to previous changes (2)

cloudflare_workers/email/discord.ts
cloudflare_workers/email/email-parser.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-14T23:16:14Z

package.json

    "semver": "^7.7.3",
    "stripe": "^19.3.1",
    "supabase": "^2.58.5",
+    "turndown": "^7.2.2",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check latest versions and current info from npm registry echo "=== Checking turndown and `@types/turndown` on npm ===" npm view turndown@latest version 2>/dev/null || echo "turndown info unavailable" npm view `@types/turndown`@latest version 2>/dev/null || echo "@types/turndown info unavailable" # Get the full package data to understand versions better echo -e "\n=== turndown package info ===" npm view turndown@7.2.2 time 2>/dev/null | head -20 || echo "Unable to fetch turndown details" echo -e "\n=== `@types/turndown` package info ===" npm view `@types/turndown`@5.0.6 time 2>/dev/null | head -20 || echo "Unable to fetch `@types/turndown` details"

Repository: Cap-go/capgo

Length of output: 1303

🌐 Web query:

turndown npm latest version 7.2.2 security

💡 Result:

Short answer: turndown v7.2.2 fixes a Regular‑Expression DoS (ReDoS) vulnerability present in versions <= 7.2.1 — upgrade to >= 7.2.2. (security.snyk.io)

Citations:

1: https://security.snyk.io/package/npm/turndown?utm_source=openai

🌐 Web query:

@types/turndown latest version compatibility with turndown 7

💡 Result:

Short answer: No — the published @types/turndown typings target turndown 5.x (index.d.ts header) and the latest @types release is 5.x, so they are not officially updated for turndown 7. See the @types package (declares "Type definitions for turndown 5.0"). [1][2]

Details / links:

@types/turndown (shows index.d.ts header and latest 5.x release). [1]

turndown current releases (v7.x). [2]

Address type definition mismatch for turndown.

The turndown library at ^7.2.2 correctly includes a ReDoS security fix. However, @types/turndown at ^5.0.6 is the latest available version and targets turndown 5.0—no updated types for 7.x exist in the @types registry. This creates a type safety gap where newer APIs may lack proper type coverage. Consider:

Verifying that turndown's own TypeScript declarations (if available) provide sufficient coverage for your use cases

Using type assertions (as any) or @ts-ignore comments for APIs not covered by the 5.0 types if necessary

Monitoring for a community-maintained or official types update if relying heavily on TypeScript type safety

🤖 Prompt for AI Agents

In `@package.json` at line 200, The project depends on turndown@^7.2.2 while `@types/turndown`@^5.0.6 targets Turndown v5 causing a type mismatch; to fix, verify whether turndown v7 ships its own TypeScript declarations (inspect node_modules/turndown for index.d.ts) and if it does remove the outdated `@types/turndown` entry from package.json, otherwise keep runtime dependency but remove or replace the `@types` package and add either a small local declaration file (e.g., declare module 'turndown' with the needed APIs) or use targeted type assertions/@ts-ignore where you call newer v7 APIs; optionally set "skipLibCheck": true in tsconfig to suppress mismatched lib errors and document this decision in package.json or the repo README.

sonarqubecloud · 2026-01-14T23:19:19Z

Quality Gate passed

Issues
7 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-advanced-security bot found potential problems Jan 14, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Jan 14, 2026

View reviewed changes

github-advanced-security bot found potential problems Jan 14, 2026

View reviewed changes

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

cloudflare_workers/email/discord.ts Outdated Show resolved Hide resolved

riderx and others added 2 commits January 14, 2026 21:50

riderx force-pushed the riderx/nashville branch from 5a4640f to afd4114 Compare January 14, 2026 21:50

github-advanced-security bot found potential problems Jan 14, 2026

View reviewed changes

riderx and others added 3 commits January 14, 2026 23:22

Merge branch 'main' into riderx/nashville

97c7fa9

github-advanced-security bot found potential problems Jan 14, 2026

View reviewed changes

cloudflare_workers/email/discord.ts Fixed Show fixed Hide fixed

riderx and others added 2 commits January 14, 2026 23:29

Merge branch 'main' into riderx/nashville

0180595

fix: use character-by-character fallback for HTML stripping

ef68dd3

Replace regex-based fallback with safe character-by-character approach to eliminate security scanner warning about incomplete sanitization. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

riderx merged commit b79881d into main Jan 14, 2026
11 checks passed

riderx deleted the riderx/nashville branch January 14, 2026 23:23

This was referenced Jan 15, 2026

fix: clean email formatting for Discord messages #1441

Merged

feat: add security email filter to email worker #1446

Merged

Uh oh!

Conversation

riderx commented Jan 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary (AI generated)

Test plan (AI generated)

Checklist (AI generated)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

socket-security bot commented Jan 14, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jan 14, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

riderx commented Jan 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 14, 2026 •

edited

Loading