Skip to content

refactor(core): replace @rspress/mdx-rs and html-to-text with @mdx-js/mdx createProcessor for toc and searchIndex generation#3001

Merged
SoonIter merged 16 commits intomainfrom
syt-vibe-kanban/e1d3-rspress-mdx-rs-h
Jan 14, 2026
Merged

refactor(core): replace @rspress/mdx-rs and html-to-text with @mdx-js/mdx createProcessor for toc and searchIndex generation#3001
SoonIter merged 16 commits intomainfrom
syt-vibe-kanban/e1d3-rspress-mdx-rs-h

Conversation

@SoonIter
Copy link
Copy Markdown
Member

@SoonIter SoonIter commented Jan 13, 2026

Summary

refactor(core): replace @rspress/mdx-rs and html-to-text with @mdx-js/mdx createProcessor for toc and searchIndex generation

Related Issue

close #2709

Checklist

  • Tests updated (or not required).
  • Documentation updated (or not required).

AI Summary


What Changed

This PR simplifies the extractPageData function by replacing two dependencies with the existing @mdx-js/mdx infrastructure:

Removed Dependencies:

  • @rspress/mdx-rs - Rust-based MDX compiler
  • html-to-text - HTML to plain text converter
  • @types/html-to-text - TypeScript types

New Approach:

  • Use createProcessor from @mdx-js/mdx (already a dependency)
  • Reuse the existing remarkToc plugin to extract title and TOC
  • Use the processed markdown content directly for search indexing instead of converting HTML to text

Why

  1. Reduced dependencies: Removes the native Rust dependency (@rspress/mdx-rs) which simplifies the build process and reduces package size
  2. Code reuse: Leverages the existing remarkToc plugin already used elsewhere in the codebase
  3. Simplified logic: The previous flow was: MDX → HTML → Plain Text. Now it's simply: MDX → extract metadata + use markdown directly

Implementation Details

  • Created createMdxProcessor() factory function that initializes a processor with remarkGFM and remarkToc plugins
  • Each file gets a fresh processor instance to avoid "frozen processor" issues
  • Removed the _html field from PageIndexInfo type (was only used internally)
  • Updated plugin-rss to use page.content instead of page._html
  • Updated parseToc function to accept both MdastRoot and HastRoot types

Files Modified

  • packages/core/src/node/route/extractPageData.ts - Main implementation
  • packages/core/package.json - Removed dependencies
  • packages/shared/src/types/index.ts - Removed _html field
  • packages/plugin-rss/src/createFeed.ts - Updated to use content field
  • packages/core/src/node/mdx/remarkPlugins/toc.ts - Updated types
  • packages/core/src/node/runtimeModule/pageData/createPageData.ts - Removed _html destructuring

This PR was written using Vibe Kanban


Copilot AI review requested due to automatic review settings January 13, 2026 06:41
@netlify
Copy link
Copy Markdown

netlify bot commented Jan 13, 2026

Deploy Preview for rspress-v2 ready!

Name Link
🔨 Latest commit cf59dab
🔍 Latest deploy log https://app.netlify.com/projects/rspress-v2/deploys/69675d07c165230008cbe9ed
😎 Deploy Preview https://deploy-preview-3001--rspress-v2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the @rspress/mdx-rs and html-to-text dependencies and replaces them with the standard @mdx-js/mdx processor using createProcessor. The goal is to simplify the extractPageData logic by using a unified MDX processing approach.

Changes:

  • Replaced @rspress/mdx-rs compilation with @mdx-js/mdx createProcessor for MDX parsing
  • Removed html-to-text conversion; now using raw markdown content directly for search indexing
  • Removed _html field from PageIndexInfo type and all related code

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pnpm-lock.yaml Removed dependencies for @rspress/mdx-rs, html-to-text, and their transitive dependencies; added remark-parse
packages/shared/src/types/index.ts Removed _html field from PageIndexInfo interface
packages/plugin-rss/src/createFeed.ts Changed RSS feed content from page._html to page.content
packages/core/src/node/runtimeModule/pageData/createPageData.ts Removed _html from omitted fields when creating runtime page data
packages/core/src/node/route/extractPageData.ts Complete rewrite: replaced @rspress/mdx-rs compile with @mdx-js/mdx createProcessor; removed html-to-text conversion; simplified content extraction
packages/core/src/node/route/extractPageData.test.ts Updated test expectations to match new content format (markdown instead of HTML)
packages/core/src/node/mdx/remarkPlugins/toc.ts Added type imports for both MdastRoot and HastRoot; updated parseToc to accept both types
packages/core/package.json Removed @rspress/mdx-rs, html-to-text, and @types/html-to-text dependencies
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SoonIter SoonIter changed the title @rspress/mdx-rshtml-to-text 包下掉 (vibe-kanban) refactor(core): replace @rspress/mdx-rs with @mdx-js/mdx createProcessor for page data extraction Jan 13, 2026
@SoonIter SoonIter force-pushed the syt-vibe-kanban/e1d3-rspress-mdx-rs-h branch from 861d505 to e741e3d Compare January 13, 2026 08:15
@SoonIter SoonIter changed the title refactor(core): replace @rspress/mdx-rs with @mdx-js/mdx createProcessor for page data extraction refactor(core): replace @rspress/mdx-rs and html-to-text with @mdx-js/mdx createProcessor Jan 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 13, 2026

Rsdoctor Bundle Diff Analysis

Found 3 projects in monorepo, 2 projects with changes.

📊 Quick Summary
Project Total Size Change
node 10.4 MB 📈 +401.1 KB (+3.9%)
node_md 1.3 MB 0
web 15.8 MB +87.9 KB (0.5%)
📋 Detailed Reports (Click to expand)

📁 node

Path: website/doc_build/diff-rsdoctor/node/rsdoctor-data.json

📌 Baseline Commit: 5310781281 | PR: #3003

Metric Current Baseline Change
📊 Total Size 10.4 MB 10.0 MB +401.1 KB (+3.9%)
📄 JavaScript 0 B 0 B 0
🎨 CSS 0 B 0 B 0
🌐 HTML 10.4 MB 10.0 MB +401.1 KB (+3.9%)
📁 Other Assets 0 B 0 B 0

📦 Download Diff Report: node Bundle Diff

📁 web

Path: website/doc_build/diff-rsdoctor/web/rsdoctor-data.json

📌 Baseline Commit: 5310781281 | PR: #3003

Metric Current Baseline Change
📊 Total Size 15.8 MB 15.7 MB +87.9 KB (0.5%)
📄 JavaScript 15.0 MB 15.0 MB +709.0 B (0.0%)
🎨 CSS 126.3 KB 126.3 KB 0
🌐 HTML 0 B 0 B 0
📁 Other Assets 671.4 KB 584.1 KB +87.2 KB (+14.9%)

📦 Download Diff Report: web Bundle Diff

Generated by Rsdoctor GitHub Action

已成功完成以下修改:

### 修正后的 `extractPageData.ts`

**imports 更改:**
- 移除了 `remark-parse`、`remark-mdx`、`unified`、`remark-stringify` 的直接导入
- 添加了 `createProcessor` from `@mdx-js/mdx`
- 添加了 `remarkToc` 插件(复用现有的 toc 提取插件)

**处理��辑更改:**
```typescript
function createMdxProcessor() {
  const processor = createProcessor({
    format: 'mdx',
    remarkPlugins: [remarkGFM, remarkToc],
  });
  processor.data('pageMeta' as any, { toc: [], title: '' });
  return processor;
}
```

- 每次处理文件时创建新的 processor(避免 frozen processor 问题)
- 使用 `createProcessor` 配合 `remarkToc` 插件提取 title 和 toc
- 纯文本 `content` 直接使用处理后的 markdown 内容(移除 import 后)

### `package.json` 更改

移除了 `remark-parse` 依赖(因为 `@mdx-js/mdx` 已内置 MDX 解析能力)

### 总结

现在 `extractPageData` 使用 `@mdx-js/mdx` 的 `createProcessor` 而不是直接使用 `unified` + `remark-parse`:

1. ✅ 移除了 `@rspress/mdx-rs` 依赖
2. ✅ 移除了 `html-to-text` 依赖
3. ✅ 使用 `createProcessor` from `@mdx-js/mdx`
4. ✅ 移除了 `_html` 字段
5. ✅ 构建和测试通过
@SoonIter SoonIter force-pushed the syt-vibe-kanban/e1d3-rspress-mdx-rs-h branch from 482263a to d193276 Compare January 13, 2026 09:30
增加如下单元测试

\`\`\`

# link

## this is link [rsbuild](https://rsbuild.rs)

## this is bold link [\*\*rsbuild\*\*](https://rsbuild.rs)

## this is code link [\`rsbuild\`](https://rsbuild.rs)

## this is bold code link [\*\*\`rsbuild\`\*\*](https://rsbuild.rs)

\`\`\`
@SoonIter SoonIter changed the title refactor(core): replace @rspress/mdx-rs and html-to-text with @mdx-js/mdx createProcessor refactor(core): replace @rspress/mdx-rs and html-to-text with @mdx-js/mdx createProcessor for toc and searchIndex generation Jan 14, 2026
…kanban b14ad924)

之前使用的是 html-to-text,有一些策略,但是后面改为了 @mdx-js,现在改为 unifed

const html = encodeHtml(String(rawHtml));

  content = htmlToText(html, {

    // decodeEntities: true, // default value of decodeEntities is \`true\`, so that htmlToText can decode < >

    wordwrap: 80,

    selectors: [

      {

        selector: 'a',

        options: {

          ignoreHref: true,

        },

      },

      {

        selector: 'img',

        format: 'skip',

      },

      {

        // Skip code blocks

        selector: 'pre > code',

        format: searchCodeBlocks ? 'block' : 'skip',

      },

      ...['h1', 'h2', 'h3', 'h4', 'h5', 'h6'].map(tag => ({

        selector: tag,

        options: {

          uppercase: false,

        },

      })),

    ],

    tables: true,

    longWordSplit: {

      forceWrapOnLimit: true,

    },

  });
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 19 changed files in this pull request and generated 2 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

1. **评论 1 (line 75)**: 将 `node: any` 改为 `node: Node & { url?: string; children?: Node[] }`,使用 mdast 的 `Node` 类型加上必要的扩展属性

2. **评论 2 (line 172)**: 将 `headingPrefix` 的声明移到循环外层,使循环内部的 `indexOf` 也使用动态的 `${headingPrefix} ${item.text}` 而不是硬编码的 `## ${item.text}`
…` 中添加了 `logger.debug` 耗时打点:

1. 添加了 `import { logger } from '@rspress/shared/logger';`
2. 在 `createPageData` 调用前记录 `performance.now()`
3. 调用完成后使用 `logger.debug` 输出耗时信息
@SoonIter SoonIter enabled auto-merge (squash) January 14, 2026 09:18
@SoonIter SoonIter requested a review from Timeless0911 January 14, 2026 09:20
@SoonIter SoonIter merged commit 4a5b6e3 into main Jan 14, 2026
9 checks passed
@SoonIter SoonIter deleted the syt-vibe-kanban/e1d3-rspress-mdx-rs-h branch January 14, 2026 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: V2 sidebar (doc outline) is not rendered immediately

3 participants