fix: improve email encoding and HTML rendering#1435
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds Turndown-based HTML→Markdown conversion with a safe fallback in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c3c8b7550d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
cloudflare_workers/email/discord.ts
Outdated
| // Replace numeric entities ({ or 😀) | ||
| result = result | ||
| .replace(/&#(\d+);/g, (_, code) => String.fromCodePoint(Number.parseInt(code, 10))) | ||
| .replace(/&#x([0-9a-fA-F]+);/g, (_, code) => String.fromCodePoint(Number.parseInt(code, 16))) |
There was a problem hiding this comment.
Guard numeric entity decoding against invalid code points
The new numeric-entity decoding calls String.fromCodePoint for every &#...;/&#x...; without validating the range. If an email contains an out-of-range value (e.g., � or a surrogate), String.fromCodePoint throws a RangeError. Because stripHtml doesn’t catch this, the exception will bubble and abort message handling, dropping the Discord post. Before this change the entity would have remained as text, so this is a regression for malformed or hostile HTML. Consider range-checking and leaving the original entity on failure.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@cloudflare_workers/email/discord.ts`:
- Around line 282-286: The numeric-entity replacement callbacks can throw from
String.fromCodePoint on invalid code points; update both replace callbacks (the
ones calling String.fromCodePoint in the result.replace chain) to parse the
code, validate it is an integer within 0..0x10FFFF (and non-negative), and only
call String.fromCodePoint for valid values, otherwise return a safe fallback
(e.g., the original match or the Unicode replacement character U+FFFD) to avoid
throwing; apply the same check for decimal and hex branches.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cloudflare_workers/email/discord.tscloudflare_workers/email/email-parser.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{vue,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Run
bun lintto lint Vue, TypeScript, and JavaScript files; usebun lint:fixto auto-fix issues
Files:
cloudflare_workers/email/email-parser.tscloudflare_workers/email/discord.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use single quotes and no semicolons per
@antfu/eslint-configrules
Files:
cloudflare_workers/email/email-parser.tscloudflare_workers/email/discord.ts
**/*.{ts,tsx,js,jsx,vue}
📄 CodeRabbit inference engine (AGENTS.md)
Run
bun lintor lint/format command before validating any backend or frontend task to ensure consistent formatting
Files:
cloudflare_workers/email/email-parser.tscloudflare_workers/email/discord.ts
🪛 GitHub Check: CodeQL
cloudflare_workers/email/discord.ts
[failure] 198-221: Incomplete multi-character sanitization
This string may still contain <script, which may cause an HTML element injection vulnerability.
[failure] 198-203: Incomplete multi-character sanitization
This string may still contain <!--, which may cause an HTML element injection vulnerability.
[failure] 198-201: Incomplete multi-character sanitization
This string may still contain <script, which may cause an HTML element injection vulnerability.
[failure] 198-200: Incomplete multi-character sanitization
This string may still contain <style, which may cause an HTML element injection vulnerability.
[failure] 201-201: Bad HTML filtering regexp
This regular expression does not match script end tags like </script >.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run tests
🔇 Additional comments (6)
cloudflare_workers/email/email-parser.ts (4)
143-160: LGTM! Good charset-aware content decoding.The charset extraction from Content-Type header with UTF-8 fallback is correct. The regex handles both quoted and unquoted charset values properly.
166-185: LGTM! Correct UTF-8 base64 decoding.The approach of converting the Latin-1 string from
atob()to a byte array and then usingTextDecoderwith the specified charset is the correct way to handle multi-byte UTF-8 sequences in base64-encoded content.
192-227: LGTM! Proper quoted-printable decoding with UTF-8 support.The byte accumulation approach correctly handles multi-byte UTF-8 sequences that appear as consecutive
=XXpatterns. The flush-before-plain-text logic ensures bytes are decoded as a unit before mixing with ASCII text.
232-241: LGTM!Simple and effective byte-to-string conversion with appropriate fallback.
cloudflare_workers/email/discord.ts (2)
241-280: LGTM! Comprehensive entity decoding.The entity map covers common named entities including typographic characters, currency symbols, and mathematical operators. Using Unicode escape sequences for special quotes avoids source file encoding issues.
197-221: The catch-all pattern on line 221 (/<[^>]+>/g) already removes any remaining HTML tags, including edge cases like</script >with trailing spaces. Since thestripHtml()output is sent to Discord's API as plain text (not rendered HTML), there is no XSS vulnerability here. The suggested regex improvements are unnecessary.If desired, add a JSDoc comment clarifying that this function converts HTML to plain text for display in Discord messages (where HTML is not rendered), rather than for security-critical sanitization.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
- Fix base64 decoding to properly handle UTF-8 characters instead of Latin-1 - Fix quoted-printable decoding to correctly handle multi-byte UTF-8 sequences - Improve HTML-to-text conversion to preserve structure and decode entities - Handle block elements (p, div, h1-h6) with newlines - Convert list items to bullet points - Format tables readably - Decode HTML entities including smart quotes, dashes, and special symbols - Support numeric entities ({ and 😀) This ensures emails display correctly in Discord without encoding artifacts or broken HTML formatting. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Add multi-pass loop for script/style/comment removal to handle nested tags - Allow whitespace in closing tags (e.g., </script >) - Handle malformed comments with extra dashes - Run sanitization again after entity decoding to catch decoded tags - Escape remaining angle brackets in final output Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
5a4640f to
afd4114
Compare
- Use [^>]* for closing tags to match variants like </script\t\n bar> - Remove orphaned closing tags and unclosed comments - Remove < and > entirely instead of replacing with identical unicode escapes - Validate code points in numeric entity decoding to prevent RangeError Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…oach Use a non-regex approach to strip HTML tags to avoid security scanner warnings about incomplete multi-character sanitization. The new stripAllTags function processes HTML character by character, which is inherently safe against nested/malformed tag attacks. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace custom HTML sanitization with turndown library for converting HTML emails to Markdown. This provides: - Better Markdown output that Discord can render - Proper handling of complex HTML structures - Well-tested library vs custom regex patterns - Eliminates security scanner warnings about HTML sanitization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace regex-based fallback with safe character-by-character approach to eliminate security scanner warning about incomplete sanitization. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@package.json`:
- Line 200: The project depends on turndown@^7.2.2 while `@types/turndown`@^5.0.6
targets Turndown v5 causing a type mismatch; to fix, verify whether turndown v7
ships its own TypeScript declarations (inspect node_modules/turndown for
index.d.ts) and if it does remove the outdated `@types/turndown` entry from
package.json, otherwise keep runtime dependency but remove or replace the `@types`
package and add either a small local declaration file (e.g., declare module
'turndown' with the needed APIs) or use targeted type assertions/@ts-ignore
where you call newer v7 APIs; optionally set "skipLibCheck": true in tsconfig to
suppress mismatched lib errors and document this decision in package.json or the
repo README.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
cloudflare_workers/email/discord.tscloudflare_workers/email/email-parser.tspackage.json
🚧 Files skipped from review as they are similar to previous changes (2)
- cloudflare_workers/email/discord.ts
- cloudflare_workers/email/email-parser.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run tests
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| "semver": "^7.7.3", | ||
| "stripe": "^19.3.1", | ||
| "supabase": "^2.58.5", | ||
| "turndown": "^7.2.2", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check latest versions and current info from npm registry
echo "=== Checking turndown and `@types/turndown` on npm ==="
npm view turndown@latest version 2>/dev/null || echo "turndown info unavailable"
npm view `@types/turndown`@latest version 2>/dev/null || echo "@types/turndown info unavailable"
# Get the full package data to understand versions better
echo -e "\n=== turndown package info ==="
npm view turndown@7.2.2 time 2>/dev/null | head -20 || echo "Unable to fetch turndown details"
echo -e "\n=== `@types/turndown` package info ==="
npm view `@types/turndown`@5.0.6 time 2>/dev/null | head -20 || echo "Unable to fetch `@types/turndown` details"Repository: Cap-go/capgo
Length of output: 1303
🌐 Web query:
turndown npm latest version 7.2.2 security
💡 Result:
Short answer: turndown v7.2.2 fixes a Regular‑Expression DoS (ReDoS) vulnerability present in versions <= 7.2.1 — upgrade to >= 7.2.2. (security.snyk.io)
Citations:
🌐 Web query:
@types/turndown latest version compatibility with turndown 7
💡 Result:
Short answer: No — the published @types/turndown typings target turndown 5.x (index.d.ts header) and the latest @types release is 5.x, so they are not officially updated for turndown 7. See the @types package (declares "Type definitions for turndown 5.0"). [1][2]
Details / links:
@types/turndown(shows index.d.ts header and latest 5.x release). [1]- turndown current releases (v7.x). [2]
Address type definition mismatch for turndown.
The turndown library at ^7.2.2 correctly includes a ReDoS security fix. However, @types/turndown at ^5.0.6 is the latest available version and targets turndown 5.0—no updated types for 7.x exist in the @types registry. This creates a type safety gap where newer APIs may lack proper type coverage. Consider:
- Verifying that turndown's own TypeScript declarations (if available) provide sufficient coverage for your use cases
- Using type assertions (
as any) or@ts-ignorecomments for APIs not covered by the 5.0 types if necessary - Monitoring for a community-maintained or official types update if relying heavily on TypeScript type safety
🤖 Prompt for AI Agents
In `@package.json` at line 200, The project depends on turndown@^7.2.2 while
`@types/turndown`@^5.0.6 targets Turndown v5 causing a type mismatch; to fix,
verify whether turndown v7 ships its own TypeScript declarations (inspect
node_modules/turndown for index.d.ts) and if it does remove the outdated
`@types/turndown` entry from package.json, otherwise keep runtime dependency but
remove or replace the `@types` package and add either a small local declaration
file (e.g., declare module 'turndown' with the needed APIs) or use targeted type
assertions/@ts-ignore where you call newer v7 APIs; optionally set
"skipLibCheck": true in tsconfig to suppress mismatched lib errors and document
this decision in package.json or the repo README.
|



Summary (AI generated)
Fixed email encoding issues in the Cloudflare Worker email-to-Discord system:
Test plan (AI generated)
,—, smart quotes) are decoded properlyChecklist (AI generated)
Generated with AI
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.