Skip to content

fix: improve email encoding and HTML rendering#1435

Merged
riderx merged 8 commits intomainfrom
riderx/nashville
Jan 14, 2026
Merged

fix: improve email encoding and HTML rendering#1435
riderx merged 8 commits intomainfrom
riderx/nashville

Conversation

@riderx
Copy link
Copy Markdown
Member

@riderx riderx commented Jan 14, 2026

Summary (AI generated)

Fixed email encoding issues in the Cloudflare Worker email-to-Discord system:

  • Base64 decoding now properly handles UTF-8 characters (was using Latin-1)
  • Quoted-printable decoding correctly handles multi-byte UTF-8 sequences
  • HTML-to-text conversion preserves structure with proper block element handling and entity decoding

Test plan (AI generated)

  • Send emails with non-ASCII characters (accents, emoji, special symbols) and verify they display correctly in Discord
  • Test HTML emails to ensure proper formatting with preserved paragraphs, lists, and tables
  • Verify HTML entities ( , —, smart quotes) are decoded properly

Checklist (AI generated)

  • Code follows project style
  • Documentation update needed
  • E2E test coverage added
  • Manually tested encoding with sample emails

Generated with AI

Summary by CodeRabbit

  • Improvements
    • Better HTML-to-plain-text conversion that preserves structure (lists, headings, code blocks) and normalizes whitespace for clearer message text.
    • Robust fallback for HTML conversion to ensure safe plain-text output when conversion errors occur.
    • Charset-aware decoding for base64 and quoted-printable content to reliably display international characters and symbols.
    • Safer normalization and handling of embedded HTML for more consistent plain-text and embed rendering.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 14, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds Turndown-based HTML→Markdown conversion with a safe fallback in discord.ts; enhances charset-aware decoding for base64 and quoted-printable content using TextDecoder and byte-level decoding in email-parser.ts; adds turndown dependency. No public API changes.

Changes

Cohort / File(s) Summary
HTML-to-Markdown / Safe Stripping
cloudflare_workers/email/discord.ts
Replaced simple HTML stripping with TurndownService-based HTML→Markdown conversion; added module-scoped Turndown initialization, improved newline/whitespace normalization, and a character-by-character stripTagsSafely(html) fallback used if Turndown fails. Attachment and Discord API behavior unchanged.
Charset-aware Decoding
cloudflare_workers/email/email-parser.ts
Added Content-Type charset extraction; introduced decodeBase64Utf8(base64, charset), decodeQuotedPrintable(content, charset), and decodeBytes(bytes, charset) using TextDecoder with fallbacks; updated decodeContent to use charset-aware decoders.
Dependencies
package.json
Added runtime dependency turndown and dev dependency @types/turndown.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰
I hop through tags and turn them kind,
Turndown trims what once waslined,
Bytes convert from base64 dens,
Newlines hum with softer pens,
A happy rabbit bounds—refined!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers main changes and testing approach, but is incomplete: lacks detailed implementation details, does not check all required checklist items, and missing critical E2E test coverage documentation despite non-trivial changes. Complete the checklist by verifying linting, add E2E test coverage details for encoding/rendering tests, and update documentation if the new Turndown dependency affects public APIs or deployment.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: improving email encoding (Base64/Quoted-printable UTF-8 handling) and HTML rendering (Turndown integration), which are the core modifications across all files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c3c8b7550d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +282 to +285
// Replace numeric entities ({ or 😀)
result = result
.replace(/&#(\d+);/g, (_, code) => String.fromCodePoint(Number.parseInt(code, 10)))
.replace(/&#x([0-9a-fA-F]+);/g, (_, code) => String.fromCodePoint(Number.parseInt(code, 16)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard numeric entity decoding against invalid code points

The new numeric-entity decoding calls String.fromCodePoint for every &#...;/&#x...; without validating the range. If an email contains an out-of-range value (e.g., � or a surrogate), String.fromCodePoint throws a RangeError. Because stripHtml doesn’t catch this, the exception will bubble and abort message handling, dropping the Discord post. Before this change the entity would have remained as text, so this is a regression for malformed or hostile HTML. Consider range-checking and leaving the original entity on failure.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cloudflare_workers/email/discord.ts`:
- Around line 282-286: The numeric-entity replacement callbacks can throw from
String.fromCodePoint on invalid code points; update both replace callbacks (the
ones calling String.fromCodePoint in the result.replace chain) to parse the
code, validate it is an integer within 0..0x10FFFF (and non-negative), and only
call String.fromCodePoint for valid values, otherwise return a safe fallback
(e.g., the original match or the Unicode replacement character U+FFFD) to avoid
throwing; apply the same check for decimal and hex branches.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abc6ffd and c3c8b75.

📒 Files selected for processing (2)
  • cloudflare_workers/email/discord.ts
  • cloudflare_workers/email/email-parser.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{vue,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint to lint Vue, TypeScript, and JavaScript files; use bun lint:fix to auto-fix issues

Files:

  • cloudflare_workers/email/email-parser.ts
  • cloudflare_workers/email/discord.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use single quotes and no semicolons per @antfu/eslint-config rules

Files:

  • cloudflare_workers/email/email-parser.ts
  • cloudflare_workers/email/discord.ts
**/*.{ts,tsx,js,jsx,vue}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint or lint/format command before validating any backend or frontend task to ensure consistent formatting

Files:

  • cloudflare_workers/email/email-parser.ts
  • cloudflare_workers/email/discord.ts
🪛 GitHub Check: CodeQL
cloudflare_workers/email/discord.ts

[failure] 198-221: Incomplete multi-character sanitization
This string may still contain <script, which may cause an HTML element injection vulnerability.


[failure] 198-203: Incomplete multi-character sanitization
This string may still contain <!--, which may cause an HTML element injection vulnerability.


[failure] 198-201: Incomplete multi-character sanitization
This string may still contain <script, which may cause an HTML element injection vulnerability.


[failure] 198-200: Incomplete multi-character sanitization
This string may still contain <style, which may cause an HTML element injection vulnerability.


[failure] 201-201: Bad HTML filtering regexp
This regular expression does not match script end tags like </script >.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (6)
cloudflare_workers/email/email-parser.ts (4)

143-160: LGTM! Good charset-aware content decoding.

The charset extraction from Content-Type header with UTF-8 fallback is correct. The regex handles both quoted and unquoted charset values properly.


166-185: LGTM! Correct UTF-8 base64 decoding.

The approach of converting the Latin-1 string from atob() to a byte array and then using TextDecoder with the specified charset is the correct way to handle multi-byte UTF-8 sequences in base64-encoded content.


192-227: LGTM! Proper quoted-printable decoding with UTF-8 support.

The byte accumulation approach correctly handles multi-byte UTF-8 sequences that appear as consecutive =XX patterns. The flush-before-plain-text logic ensures bytes are decoded as a unit before mixing with ASCII text.


232-241: LGTM!

Simple and effective byte-to-string conversion with appropriate fallback.

cloudflare_workers/email/discord.ts (2)

241-280: LGTM! Comprehensive entity decoding.

The entity map covers common named entities including typographic characters, currency symbols, and mathematical operators. Using Unicode escape sequences for special quotes avoids source file encoding issues.


197-221: The catch-all pattern on line 221 (/<[^>]+>/g) already removes any remaining HTML tags, including edge cases like </script > with trailing spaces. Since the stripHtml() output is sent to Discord's API as plain text (not rendered HTML), there is no XSS vulnerability here. The suggested regex improvements are unnecessary.

If desired, add a JSDoc comment clarifying that this function converts HTML to plain text for display in Discord messages (where HTML is not rendered), rather than for security-critical sanitization.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

riderx and others added 2 commits January 14, 2026 21:50
- Fix base64 decoding to properly handle UTF-8 characters instead of Latin-1
- Fix quoted-printable decoding to correctly handle multi-byte UTF-8 sequences
- Improve HTML-to-text conversion to preserve structure and decode entities
  - Handle block elements (p, div, h1-h6) with newlines
  - Convert list items to bullet points
  - Format tables readably
  - Decode HTML entities including smart quotes, dashes, and special symbols
  - Support numeric entities (&#123; and &#x1F600;)

This ensures emails display correctly in Discord without encoding artifacts
or broken HTML formatting.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Add multi-pass loop for script/style/comment removal to handle nested tags
- Allow whitespace in closing tags (e.g., </script >)
- Handle malformed comments with extra dashes
- Run sanitization again after entity decoding to catch decoded tags
- Escape remaining angle brackets in final output

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Use [^>]* for closing tags to match variants like </script\t\n bar>
- Remove orphaned closing tags and unclosed comments
- Remove < and > entirely instead of replacing with identical unicode escapes
- Validate code points in numeric entity decoding to prevent RangeError

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
riderx and others added 3 commits January 14, 2026 23:22
…oach

Use a non-regex approach to strip HTML tags to avoid security scanner
warnings about incomplete multi-character sanitization. The new
stripAllTags function processes HTML character by character, which is
inherently safe against nested/malformed tag attacks.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace custom HTML sanitization with turndown library for converting
HTML emails to Markdown. This provides:
- Better Markdown output that Discord can render
- Proper handling of complex HTML structures
- Well-tested library vs custom regex patterns
- Eliminates security scanner warnings about HTML sanitization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
riderx and others added 2 commits January 14, 2026 23:29
Replace regex-based fallback with safe character-by-character approach
to eliminate security scanner warning about incomplete sanitization.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​@​types/​turndown@​5.0.61001009780100

View full report

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@package.json`:
- Line 200: The project depends on turndown@^7.2.2 while `@types/turndown`@^5.0.6
targets Turndown v5 causing a type mismatch; to fix, verify whether turndown v7
ships its own TypeScript declarations (inspect node_modules/turndown for
index.d.ts) and if it does remove the outdated `@types/turndown` entry from
package.json, otherwise keep runtime dependency but remove or replace the `@types`
package and add either a small local declaration file (e.g., declare module
'turndown' with the needed APIs) or use targeted type assertions/@ts-ignore
where you call newer v7 APIs; optionally set "skipLibCheck": true in tsconfig to
suppress mismatched lib errors and document this decision in package.json or the
repo README.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between afd4114 and ef68dd3.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • cloudflare_workers/email/discord.ts
  • cloudflare_workers/email/email-parser.ts
  • package.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • cloudflare_workers/email/discord.ts
  • cloudflare_workers/email/email-parser.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

"semver": "^7.7.3",
"stripe": "^19.3.1",
"supabase": "^2.58.5",
"turndown": "^7.2.2",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Check latest versions and current info from npm registry
echo "=== Checking turndown and `@types/turndown` on npm ==="
npm view turndown@latest version 2>/dev/null || echo "turndown info unavailable"
npm view `@types/turndown`@latest version 2>/dev/null || echo "@types/turndown info unavailable"

# Get the full package data to understand versions better
echo -e "\n=== turndown package info ==="
npm view turndown@7.2.2 time 2>/dev/null | head -20 || echo "Unable to fetch turndown details"

echo -e "\n=== `@types/turndown` package info ==="
npm view `@types/turndown`@5.0.6 time 2>/dev/null | head -20 || echo "Unable to fetch `@types/turndown` details"

Repository: Cap-go/capgo

Length of output: 1303


🌐 Web query:

turndown npm latest version 7.2.2 security

💡 Result:

Short answer: turndown v7.2.2 fixes a Regular‑Expression DoS (ReDoS) vulnerability present in versions <= 7.2.1 — upgrade to >= 7.2.2. (security.snyk.io)

Citations:


🌐 Web query:

@types/turndown latest version compatibility with turndown 7

💡 Result:

Short answer: No — the published @types/turndown typings target turndown 5.x (index.d.ts header) and the latest @types release is 5.x, so they are not officially updated for turndown 7. See the @types package (declares "Type definitions for turndown 5.0"). [1][2]

Details / links:

  • @types/turndown (shows index.d.ts header and latest 5.x release). [1]
  • turndown current releases (v7.x). [2]

Address type definition mismatch for turndown.

The turndown library at ^7.2.2 correctly includes a ReDoS security fix. However, @types/turndown at ^5.0.6 is the latest available version and targets turndown 5.0—no updated types for 7.x exist in the @types registry. This creates a type safety gap where newer APIs may lack proper type coverage. Consider:

  • Verifying that turndown's own TypeScript declarations (if available) provide sufficient coverage for your use cases
  • Using type assertions (as any) or @ts-ignore comments for APIs not covered by the 5.0 types if necessary
  • Monitoring for a community-maintained or official types update if relying heavily on TypeScript type safety
🤖 Prompt for AI Agents
In `@package.json` at line 200, The project depends on turndown@^7.2.2 while
`@types/turndown`@^5.0.6 targets Turndown v5 causing a type mismatch; to fix,
verify whether turndown v7 ships its own TypeScript declarations (inspect
node_modules/turndown for index.d.ts) and if it does remove the outdated
`@types/turndown` entry from package.json, otherwise keep runtime dependency but
remove or replace the `@types` package and add either a small local declaration
file (e.g., declare module 'turndown' with the needed APIs) or use targeted type
assertions/@ts-ignore where you call newer v7 APIs; optionally set
"skipLibCheck": true in tsconfig to suppress mismatched lib errors and document
this decision in package.json or the repo README.

@sonarqubecloud
Copy link
Copy Markdown

@riderx riderx merged commit b79881d into main Jan 14, 2026
11 checks passed
@riderx riderx deleted the riderx/nashville branch January 14, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants