feat(core): auto-extract description from first contentful line in extractPageData by SoonIter · Pull Request #3006 · web-infra-dev/rspress

SoonIter · 2026-01-14T12:10:14Z

Summary

Add automatic description extraction to extractPageData when frontmatter.description is not provided. The description is extracted from the first paragraph before the first h2 heading in the markdown content.

https://docusaurus.io/docs/markdown-features/head-metadata#markdown-page-description

Related Issue

Checklist

Tests updated (or not required).
Documentation updated (or not required).

AI Summary

Changes Made

Added description field to PageIndexInfo type (packages/shared/src/types/index.ts)
- New optional description?: string field added to the interface
Implemented description extraction in extractPageData.ts (packages/core/src/node/route/extractPageData.ts)
- Added extractTextFromNode() function: recursively extracts plain text content from AST nodes
- Added extractDescription() function: extracts the first paragraph's text content before any h2 heading
- The description field is set at the top level of PageIndexInfo, while frontmatter.description preserves the original frontmatter value
Added test cases (packages/core/src/node/route/extractPageData.test.ts)
- Tests for automatic description extraction when frontmatter.description is not set
- Tests to verify frontmatter.description takes priority when provided
- Tests to ensure frontmatter.description remains unchanged (preserves original value)

How It Works

When processing a markdown file, if frontmatter.description exists, use it as the description field value
If frontmatter.description is not set, extract the first paragraph before the first h2 heading as the description
The frontmatter.description always preserves the original frontmatter value (not modified by extraction)

This PR was written using Vibe Kanban

## 变更总结在 `extractPageData.ts` 中添加了两个新函数来实现自动提取 description 功能： ### 1. `extractTextFromNode` (第 92-100 行) 递归提取节点的纯文本内容，处理文本节点和包含子节点的情况。 ### 2. `extractDescription` (第 106-118 行) 从 AST 中提取第一个 h2 之前的第一段文本作为 description： - 遍历根节点的子节点 - 遇到 h2 (`heading` with `depth === 2`) 时停止搜索 - 找到第一个 `paragraph` 节点时，返回其文本内容 ### 3. 集成到 `getPageIndexInfoByRoute` - 第 171-174 行：当 `frontmatter.description` 不存在时，调用 `extractDescription` 提取 description - 第 239 行：在返回的 frontmatter 中设置 description，优先使用 frontmatter 中的值，否则使用提取的值 ### 新增测试 - 测试自动提取 description（无 frontmatter.description 时） - 测试 frontmatter.description 优先级（有 frontmatter.description 时不提取）

## 变更总结 ### 1. `packages/shared/src/types/index.ts` 在 `PageIndexInfo` 接口中添加了新的 `description?: string` 字段。 ### 2. `packages/core/src/node/route/extractPageData.ts` - 添加了 `extractTextFromNode` 函数：递归提取节点的纯文本内容 - 添加了 `extractDescription` 函数：从 AST 中提取第一个 h2 之前的第一段文本 - 在返回结果中设置独立的 `description` 字段，而 `frontmatter.description` 保持原样 ### 3. 测试更新 - 新增 3 个测试用例验证 description 提取逻辑 - 验证当 `frontmatter.description` 存在时，`description` 字段使用它的值，同时 `frontmatter.description` 保持不变 - 验证当 `frontmatter.description` 不存在时，`description` 字段使用从内容中提取的值，而 `frontmatter.description` 保持 `undefined`

netlify · 2026-01-14T12:11:20Z

✅ Deploy Preview for rspress-v2 ready!

Name	Link
🔨 Latest commit	`160dd7f`
🔍 Latest deploy log	https://app.netlify.com/projects/rspress-v2/deploys/696887bb39875600082893ed
😎 Deploy Preview	https://deploy-preview-3006--rspress-v2.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-01-14T12:12:56Z

Rsdoctor Bundle Diff Analysis

Found 3 projects in monorepo, 3 projects with changes.

📊 Quick Summary

Project	Total Size	Change
node	10.4 MB	+33.5 KB (0.3%)
node_md	1.3 MB	📈 +36.2 KB (+2.8%)
web	15.9 MB	+79.8 KB (0.5%)

📋 Detailed Reports (Click to expand)

📁 node

Path: website/doc_build/diff-rsdoctor/node/rsdoctor-data.json

📌 Baseline Commit: 8b7c0550c4 | PR: #3005

Metric	Current	Baseline	Change
📊 Total Size	10.4 MB	10.4 MB	+33.5 KB (0.3%)
📄 JavaScript	0 B	0 B	0
🎨 CSS	0 B	0 B	0
🌐 HTML	10.4 MB	10.4 MB	+33.5 KB (0.3%)
📁 Other Assets	0 B	0 B	0

📦 Download Diff Report: node Bundle Diff

📁 node_md

Path: website/doc_build/diff-rsdoctor/node_md/rsdoctor-data.json

📌 Baseline Commit: 8b7c0550c4 | PR: #3005

Metric	Current	Baseline	Change
📊 Total Size	1.3 MB	1.3 MB	+36.2 KB (+2.8%)
📄 JavaScript	0 B	0 B	0
🎨 CSS	0 B	0 B	0
🌐 HTML	0 B	0 B	0
📁 Other Assets	1.3 MB	1.3 MB	+36.2 KB (+2.8%)

📦 Download Diff Report: node_md Bundle Diff

📁 web

Path: website/doc_build/diff-rsdoctor/web/rsdoctor-data.json

📌 Baseline Commit: 8b7c0550c4 | PR: #3005

Metric	Current	Baseline	Change
📊 Total Size	15.9 MB	15.8 MB	+79.8 KB (0.5%)
📄 JavaScript	15.1 MB	15.0 MB	+39.8 KB (0.3%)
🎨 CSS	126.3 KB	126.3 KB	0
🌐 HTML	0 B	0 B	0
📁 Other Assets	713.0 KB	672.9 KB	+40.0 KB (+5.9%)

📦 Download Diff Report: web Bundle Diff

Generated by Rsdoctor GitHub Action

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This pull request adds functionality to automatically extract page descriptions from markdown content when they are not explicitly provided in frontmatter. The description is extracted from the first paragraph that appears before any h2 heading.

Changes:

Added optional description field to PageIndexInfo type
Implemented description extraction logic that uses the first paragraph before h2 as fallback
Added comprehensive test coverage including new test fixture and test cases
Updated existing test snapshots to include description field

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
packages/shared/src/types/index.ts	Added optional `description` field to `PageIndexInfo` interface
packages/core/src/node/route/fixtures/content-processing/with-description.mdx	New test fixture for description extraction feature
packages/core/src/node/route/extractPageData.ts	Implemented `extractTextFromNode` and `extractDescription` helper functions, integrated description extraction into page data processing
packages/core/src/node/route/extractPageData.test.ts	Added three new test cases and updated existing test snapshots to include description field

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/core/src/node/route/extractPageData.ts

- Update plugin-llms to use page.description for llms.txt generation - Update Layout component to use page.description for meta tags This ensures the auto-extracted description is used when frontmatter.description is not provided. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

现在 llmsTxt 拿不到 createPageData 中的 page.description，尝试解决，可能的解决方案是 createPageData 后的内容放在全局变量上，llmsTxt.ts 中去拿取

1. 完善测试拆分成两个 describe ，getPageIndexInfoByRoute一个单独 describe 2. 修复错误 FAIL packages/core/src/node/route/extractPageData.test.ts > extractPageData > basic TypeError: routeService.getRoutePageByRoutePath is not a function 251 | const pageIndexInfo = await getPageIndexInfoByRoute(routeMeta, options); 252 | // Store pageIndexInfo in RoutePage for llmsTxt to access > 253 | const routePage = routeService.getRoutePageByRoutePath( | ^ 254 | routeMeta.routePath, 255 | ); at packages/core/src/node/route/extractPageData.ts:253:38 at extractPageData (packages/core/src/node/route/extractPageData.ts:249:20) at packages/core/src/node/route/extractPageData.test.ts:28:22

Refactor extractDescription to collect all paragraph text between h1 and h2 instead of just the first paragraph. Skip code blocks, HTML, imports, tables, and images following Docusaurus createExcerpt strategy. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Reuse processor instances at module level instead of creating new ones per file - Use structuredClone to reuse parsed AST tree, avoiding duplicate parsing - Use unist-util-visit for more efficient tree traversal in remarkStripLinkUrls These optimizations reduce the overhead when processing multiple markdown files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The tree is no longer needed after extracting TOC and description, so we can pass it directly to stringifyProcessor.run() without cloning. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove separate parseProcessor, use the same processor for both parse() and run()+stringify() operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

SoonIter · 2026-01-15T06:59:54Z

This PR also includes some performance improvements and adds some cold start time.

SoonIter added 2 commits January 14, 2026 19:59

Copilot AI review requested due to automatic review settings January 14, 2026 12:10

Copilot started reviewing on behalf of SoonIter January 14, 2026 12:10 View session

SoonIter changed the title ~~extractPageData 增加 remark 插件用来获取 description (vibe-kanban)~~ feat(core): auto-extract description from first paragraph before h2 in extractPageData Jan 14, 2026

github-actions bot added the change: feat label Jan 14, 2026

style: apply heading-case formatting to test fixture

d410410

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot AI reviewed Jan 14, 2026

View reviewed changes

packages/core/src/node/route/extractPageData.ts Show resolved Hide resolved

SoonIter and others added 6 commits January 14, 2026 20:22

core 中的 llmsTxt，自己重新提取了 frontmatter (vibe-kanban a88c5340)

c1619de

现在 llmsTxt 拿不到 createPageData 中的 page.description，尝试解决，可能的解决方案是 createPageData 后的内容放在全局变量上，llmsTxt.ts 中去拿取

chore: upgrade

b62981d

chore: upgrade

730f118

SoonIter changed the title ~~feat(core): auto-extract description from first paragraph before h2 in extractPageData~~ feat(core): auto-extract description from first contentful line in extractPageData Jan 14, 2026

SoonIter and others added 3 commits January 15, 2026 12:14

SoonIter requested a review from Timeless0911 January 15, 2026 06:59

Timeless0911 approved these changes Jan 15, 2026

View reviewed changes

SoonIter merged commit 4d0815f into main Jan 15, 2026
8 checks passed

SoonIter deleted the syt-vibe-kanban/99b2-extractpagedata branch January 15, 2026 08:16

github-actions bot mentioned this pull request Jan 15, 2026

docs: Update documentation and improve frontmatter references #3009

Merged

SoonIter mentioned this pull request Feb 12, 2026

feat(plugin-typedoc): Add description to frontmatter in generated docs #3134

Closed

2 tasks

This was referenced Mar 11, 2026

fix(core/extractDescription): skip container directives (:::tip, :::info, etc.) #3167

Merged

feat(core): add markdown.extractDescription config #3023

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): auto-extract description from first contentful line in extractPageData#3006

feat(core): auto-extract description from first contentful line in extractPageData#3006
SoonIter merged 12 commits intomainfrom
syt-vibe-kanban/99b2-extractpagedata

SoonIter commented Jan 14, 2026 •

edited

Loading

Uh oh!

netlify bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 14, 2026 •

edited

Loading

📁 node

📁 node_md

📁 web

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

SoonIter commented Jan 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

SoonIter commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Checklist

AI Summary

Changes Made

How It Works

Uh oh!

netlify bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for rspress-v2 ready!

Uh oh!

github-actions bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rsdoctor Bundle Diff Analysis

📁 node

📁 node_md

📁 web

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

SoonIter commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SoonIter commented Jan 14, 2026 •

edited

Loading

netlify bot commented Jan 14, 2026 •

edited

Loading

github-actions bot commented Jan 14, 2026 •

edited

Loading

SoonIter commented Jan 15, 2026 •

edited

Loading