Skip to content

Commit c03d863

Browse files
authored
docs: upgrade llms.txt and ssg-md (#2927)
1 parent b93fade commit c03d863

File tree

5 files changed

+275
-10
lines changed

5 files changed

+275
-10
lines changed

packages/core/src/theme/layout/Layout/index.tsx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ import {
33
Content,
44
useFrontmatter,
55
useLocaleSiteData,
6-
usePageData,
6+
usePage,
77
useSite,
88
} from '@rspress/core/runtime';
99
import type { HomeLayoutProps } from '@theme';
@@ -139,7 +139,7 @@ export function Layout(props: LayoutProps) {
139139
beforeFeatures,
140140
afterFeatures,
141141
};
142-
const { page } = usePageData();
142+
const { page } = usePage();
143143
const { site } = useSite();
144144
const { frontmatter } = useFrontmatter();
145145
const {

website/docs/en/guide/basic/ssg-md.mdx

Lines changed: 163 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,166 @@ tag: experimental
44

55
# llms.txt (SSG-MD)
66

7-
{/* TODO */}
7+
Rspress provides experimental SSG-MD capability, which is a brand new feature. As its name suggests SSG-MD, the only difference from [Static Site Generation (SSG)](./ssg) is that it renders your pages as Markdown files instead of HTML files, and generates [`llms.txt`](https://llmstxt.org/) and llms-full.txt related files, making it easier for large language models to understand and use your technical documentation.
8+
9+
## Why SSG-MD?
10+
11+
In frontend frameworks based on React dynamic rendering, there is often a problem of difficulty in extracting static information. This also exists in MDX, where `.mdx` files contain both Markdown content and support embedding React components, enhancing the interactivity of documents. For Rspress, Rspress allows users to use [MDX fragments](../use-mdx/components.mdx), custom components, React Hooks, tsx files as routes, etc. to enhance the expressiveness of document content. However, these dynamic contents are difficult to convert to Markdown format, and even if the html generated during the SSG phase is converted to markdown, the results are often unsatisfactory.
12+
13+
[Static Site Generation (SSG)](./ssg) can generate static HTML files for crawlers to crawl, improving [SEO](https://en.wikipedia.org/wiki/Search_engine_optimization). SSG-MD also solves similar problems, improving [GEO](https://en.wikipedia.org/wiki/Generative_engine_optimization) and the quality of static information for large language models. Compared to converting html to markdown, React's virtual DOM during rendering has a better source of information.
14+
15+
## How to implement SSG-MD?
16+
17+
1. Rspress internally implements a `renderToMarkdownString` method similar to `renderToString` in `react-dom`.
18+
19+
```tsx
20+
import { expect, describe, it } from '@rstest/core';
21+
import { renderToMarkdownString } from './react-render-to-markdown';
22+
import { useState } from 'react';
23+
24+
describe('renderToMarkdownString', () => {
25+
it('renders text', () => {
26+
expect(
27+
renderToMarkdownString(
28+
<div>
29+
<strong>foo</strong>
30+
<span>bar</span>
31+
</div>,
32+
),
33+
).toBe('**foo**bar');
34+
});
35+
it('renders header and paragraph', () => {
36+
const Comp1 = () => {
37+
const [count, setCount] = useState(1);
38+
return <h1>Header {count}</h1>;
39+
};
40+
const Comp2 = () => {
41+
return (
42+
<>
43+
<Comp1 />
44+
<p>Paragraph</p>
45+
</>
46+
);
47+
};
48+
expect(renderToMarkdownString(<Comp2 />)).toBe('# Header 1\n\nParagraph\n');
49+
});
50+
});
51+
```
52+
53+
2. Provides `process.env.__SSR_MD__` environment variable, making it easy for users to distinguish between SSG-MD rendering and browser rendering in MDX components, thus achieving more flexible content customization. For example:
54+
55+
```tsx
56+
export function Tab({ label }: { label: string }) {
57+
if (process.env.__SSR_MD__) {
58+
return <>{`** Here is a Tab named ${label}**`}</>;
59+
}
60+
return <div>{label}</div>;
61+
}
62+
```
63+
64+
3. Rspress internal component library has been adapted for SSG-MD to ensure reasonable Markdown content is rendered during the SSG-MD phase. For example:
65+
66+
```tsx
67+
<PackageManagerTabs command="create rspress@latest" />
68+
```
69+
70+
Will be rendered as:
71+
72+
````md
73+
```sh [npm]
74+
npm create rspress@latest
75+
```
76+
77+
```sh [yarn]
78+
yarn create rspress
79+
```
80+
81+
```sh [pnpm]
82+
pnpm create rspress@latest
83+
```
84+
85+
```sh [bun]
86+
bun create rspress@latest
87+
```
88+
89+
```sh [deno]
90+
deno init --npm rspress@latest
91+
```
92+
````
93+
94+
We believe that with the introduction of this feature, all websites built with React in the future can use SSG-MD to achieve better GEO.
95+
96+
## Features
97+
98+
- Renders each site page as a `.md` file, convenient for vectorization or providing to large language models. `/guide/start/introduction.html` can be accessed by replacing the `.html` suffix with `.md`.
99+
- Generates [`llms.txt`](https://llmstxt.org/), displaying the title and description of each page in navigation and sidebar order.
100+
- Generates `llms-full.txt`, containing the Markdown content of each page, convenient for batch import.
101+
- Supports multilingual sites, outputting corresponding `{lang}/llms.txt` and `{lang}/llms-full.txt` for non-default languages.
102+
103+
## Output example
104+
105+
```txt
106+
doc_build
107+
├── llms.txt
108+
├── llms-full.txt
109+
├── guide
110+
│ └── start
111+
│ └── introduction.md
112+
└── ...
113+
```
114+
115+
The actual files are placed in the build directory (such as `guide/start/introduction.md`), and the `url` in `llms-full.txt` will carry the site prefix, such as `/guide/start/introduction.md`.
116+
117+
`llms-full.txt` example snippet:
118+
119+
```md
120+
---
121+
url: /guide/start/introduction.md
122+
---
123+
124+
# Introduction
125+
126+
...
127+
```
128+
129+
## How to enable
130+
131+
Enable `llms` in `rspress.config.ts` to generate the above files during the build phase:
132+
133+
```ts title="rspress.config.ts"
134+
import { defineConfig } from '@rspress/core';
135+
136+
export default defineConfig({
137+
llms: true,
138+
});
139+
```
140+
141+
After executing `rspress build`, you can see `llms.txt`, `llms-full.txt` and the `.md` files corresponding to each route in the output directory (default `doc_build`).
142+
143+
:::warning
144+
145+
`llms` is an experimental capability, mainly used to generate Markdown data that is easy for large language models or retrieval systems to use. It will be continuously optimized in future versions and may have stability or compatibility issues.
146+
147+
If your project does not support SSG, such as using `ssg: false`, please use [@rspress/plugin-llms](/plugin/official-plugins/llms).
148+
149+
:::
150+
151+
## Custom MDX splitting (Optional)
152+
153+
When documents contain custom components, you can control which components to keep or convert to plain text when converting to Markdown through `remarkSplitMdxOptions`:
154+
155+
```ts title="rspress.config.ts"
156+
import { defineConfig } from '@rspress/core';
157+
158+
export default defineConfig({
159+
llms: {
160+
remarkSplitMdxOptions: {
161+
excludes: [[['Demo'], '@project/components']],
162+
},
163+
},
164+
});
165+
```
166+
167+
- `excludes`: Matched components will be converted to plain text, with the highest priority.
168+
- `includes`: If set, only matched components are allowed to be retained, and the rest will be converted to plain text.
169+
- When configured simultaneously, `excludes` will be applied first, then filtered by `includes`.

website/docs/en/plugin/official-plugins/llms.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@ import { SourceCode } from '@rspress/core/theme';
88

99
Generate [llms.txt](https://llmstxt.org/) related files for your Rspress site, allowing large language models to better understand your documentation site.
1010

11+
:::warning
12+
13+
`@rspress/plugin-llms` is only intended as an alternative solution for scenarios where `ssg: false` is used. It is recommended to prioritize using the [SSG-MD](/guide/basic/ssg-md) feature.
14+
15+
:::
16+
1117
## Installation
1218

1319
import { PackageManagerTabs } from '@theme';

website/docs/zh/guide/basic/ssg-md.mdx

Lines changed: 98 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,99 @@ tag: experimental
44

55
# llms.txt (SSG-MD)
66

7-
Rspress 提供实验性的 SSG-MD 能力,在 [静态站点生成(SSG)](./ssg) 时额外输出便于大模型理解的 Markdown 资源。
7+
Rspress 提供实验性的 SSG-MD 能力,这是一个全新的功能。与它的名字一样 SSG-MD,与 [静态站点生成(SSG)](./ssg) 唯一的不同在于它将你的页面渲染为 Markdown 文件,而非 HTML 文件,并生成 [`llms.txt`](https://llmstxt.org/) 及 llms-full.txt 相关文件,便于大模型理解和使用你的技术文档。
8+
9+
## 为什么需要 SSG-MD?
10+
11+
在基于 React 动态渲染的前端框架中, 往往存在静态信息难以提取的问题。这在 MDX 中同样存在,`.mdx` 文件既包含 Markdown 内容,也支持嵌入 React 组件,增强文档的交互能力。对于 Rspress 而言,Rspress 允许用户使用 [MDX 片段](../use-mdx/components.mdx)、自定义组件、React Hooks、用 tsx 文件作为路由等等来增强文档内容的表现力。然而,这些动态内容很难被转化为 Markdown 格式,即使使用 SSG 阶段产生后的 html 转为 markdown,结果也往往不尽如人意。
12+
13+
[静态站点生成(SSG)](./ssg) 可以生成静态的 HTML 文件提供给爬虫爬取,提高 [SEO](https://en.wikipedia.org/wiki/Search_engine_optimization),SSG-MD 也为了解决类似的问题,提升 [GEO](https://en.wikipedia.org/wiki/Generative_engine_optimization) 和给大模型的静态信息质量,相比与将 html 转化为 markdown,React 在渲染期间的虚拟 DOM 拥有更好的信息源。
14+
15+
## 怎么实现 SSG-MD?
16+
17+
1. Rspress 内部实现了类似 `react-dom``renderToString` 相同的 `renderToMarkdownString` 方法。
18+
19+
```tsx
20+
import { expect, describe, it } from '@rstest/core';
21+
import { renderToMarkdownString } from './react-render-to-markdown';
22+
import { useState } from 'react';
23+
24+
describe('renderToMarkdownString', () => {
25+
it('renders text', () => {
26+
expect(
27+
renderToMarkdownString(
28+
<div>
29+
<strong>foo</strong>
30+
<span>bar</span>
31+
</div>,
32+
),
33+
).toBe('**foo**bar');
34+
});
35+
it('renders header and paragraph', () => {
36+
const Comp1 = () => {
37+
const [count, setCount] = useState(1);
38+
return <h1>Header {count}</h1>;
39+
};
40+
const Comp2 = () => {
41+
return (
42+
<>
43+
<Comp1 />
44+
<p>Paragraph</p>
45+
</>
46+
);
47+
};
48+
expect(renderToMarkdownString(<Comp2 />)).toBe('# Header 1\n\nParagraph\n');
49+
});
50+
});
51+
```
52+
53+
2. 提供 `process.env.__SSR_MD__` 环境变量,方便用户在 React 组件中区分 SSG-MD 渲染和浏览器渲染,从而实现更灵活的内容定制。例如:
54+
55+
```tsx
56+
export function Tab({ label }: { label: string }) {
57+
if (process.env.__SSR_MD__) {
58+
return <>{`** Here is a Tab named ${label}**`}</>;
59+
}
60+
return <div>{label}</div>;
61+
}
62+
```
63+
64+
3. Rspress 内部组件对于 SSG-MD 做了适配,确保在 SSG-MD 阶段渲染出合理的 Markdown 内容。例如:
65+
66+
```tsx
67+
<PackageManagerTabs command="create rspress@latest" />
68+
```
69+
70+
将被渲染为:
71+
72+
````md
73+
```sh [npm]
74+
npm create rspress@latest
75+
```
76+
77+
```sh [yarn]
78+
yarn create rspress
79+
```
80+
81+
```sh [pnpm]
82+
pnpm create rspress@latest
83+
```
84+
85+
```sh [bun]
86+
bun create rspress@latest
87+
```
88+
89+
```sh [deno]
90+
deno init --npm rspress@latest
91+
```
92+
````
93+
94+
相信随着这一功能的推出,未来所有使用 React 构建的网站都可以运用 SSG-MD 获得更好的 GEO。
895

996
## 功能介绍
1097

11-
- 将站点页面渲染为 `.md` 文件,便于向量化或提供给大模型。
12-
- 生成 `llms.txt`,按导航、侧边栏顺序罗列各页面标题与描述
98+
- 将每个站点页面渲染为 `.md` 文件,便于向量化或提供给大模型`/guide/start/introduction.html``.html` 后缀替换为 `.md` 即可访问
99+
- 生成 [`llms.txt`](https://llmstxt.org/),按导航、侧边栏顺序展示各页面标题与描述
13100
- 生成 `llms-full.txt`,包含每个页面的 Markdown 内容,方便批量导入。
14101
- 支持多语言站点,会为非默认语言输出对应的 `{lang}/llms.txt``{lang}/llms-full.txt`
15102

@@ -53,6 +140,14 @@ export default defineConfig({
53140

54141
执行 `rspress build` 后,可在输出目录(默认 `doc_build`)中看到 `llms.txt``llms-full.txt` 以及各路由对应的 `.md` 文件。
55142

143+
:::warning
144+
145+
`llms` 为实验能力,主要用于生成便于大模型或检索系统使用的 Markdown 数据。会在未来版本中持续优化,可能存在不稳定或兼容性问题。
146+
147+
如果你的项目不支持 SSG,例如使用 `ssg: false`,请使用 [@rspress/plugin-llms](/plugin/official-plugins/llms)
148+
149+
:::
150+
56151
## 自定义 MDX 拆分(可选)
57152

58153
当文档中包含自定义组件时,可以通过 `remarkSplitMdxOptions` 控制哪些组件在转换为 Markdown 时保留或转成纯文本:
@@ -72,7 +167,3 @@ export default defineConfig({
72167
- `excludes`:匹配的组件会被转成纯文本,优先级最高。
73168
- `includes`:若设置,仅允许匹配的组件保留,其余会转成纯文本。
74169
- 同时配置时会先应用 `excludes`,再按 `includes` 进行过滤。
75-
76-
:::warning
77-
`llms` 为实验能力,主要用于生成便于大模型或检索系统使用的 Markdown 数据。
78-
:::

website/docs/zh/plugin/official-plugins/llms.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@ import { SourceCode } from '@rspress/core/theme';
88

99
为 Rspress 站点生成 [llms.txt](https://llmstxt.org/) 相关文件,使大模型可以更好地理解你的文档站。
1010

11+
:::warning
12+
13+
`@rspress/plugin-llms` 仅作为 `ssg: false` 场景下的替代方案使用,建议优先使用 [SSG-MD](/guide/basic/ssg-md) 功能。
14+
15+
:::
16+
1117
## 安装
1218

1319
import { PackageManagerTabs } from '@theme';

0 commit comments

Comments
 (0)