Skip to content

Commit 450d2bb

Browse files
committed
add ast output
1 parent 7db53d6 commit 450d2bb

File tree

3 files changed

+128
-4
lines changed

3 files changed

+128
-4
lines changed

.changeset/ast-option.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
'markdown-to-jsx': minor
3+
---
4+
5+
Added `ast` option to compiler to expose the parsed AST directly. When `ast: true`, the compiler returns the AST structure (`ParserResult[]`) instead of rendered JSX.
6+
7+
**First time the AST is accessible to users!** This enables:
8+
9+
- AST manipulation and transformation before rendering
10+
- Custom rendering logic without parsing
11+
- Caching parsed AST for performance
12+
- Linting or validation of markdown structure
13+
14+
**Usage:**
15+
16+
```typescript
17+
import { compiler } from 'markdown-to-jsx'
18+
import type { MarkdownToJSX } from 'markdown-to-jsx'
19+
20+
// Get the AST structure
21+
const ast: MarkdownToJSX.AST[] = compiler('# Hello world', {
22+
ast: true,
23+
})
24+
25+
// Inspect/modify AST
26+
console.log(ast) // Array of parsed nodes
27+
28+
// Render AST to JSX using createRenderer (not implemented yet)
29+
```
30+
31+
The AST format is `MarkdownToJSX.AST[]`. When footnotes are present, the returned value will be an object with `ast` and `footnotes` properties instead of just the AST array.

Performance Ideas.md

Lines changed: 70 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -327,10 +327,15 @@ state.prevCapture = captureParts.join('')
327327
- ✅ One allocation instead of many
328328
- ❌ Array overhead
329329
- ❌ Extra joins
330+
-**Doesn't work with recursive parsing**
330331

331332
**Complexity**: Medium - refactor concatenation points
332333
**Expected Gain**: 10-15% reduction
333334

335+
**Attempted**: Failed because nested parsing relies on incremental updates to `state.prevCapture`. The buffer approach doesn't update the state during nested parsing, breaking lookback functionality used by list items and other nested rules.
336+
337+
**Status**: ❌ Not viable due to recursive parsing requirements
338+
334339
---
335340

336341
#### Idea G: Avoid String Operations Entirely Where Possible
@@ -1047,10 +1052,71 @@ The performance gap is from **fundamental architectural differences**:
10471052

10481053
### Potential Improvements (Without Full Rewrite)
10491054

1050-
1. **Token Abstraction**: Parse to intermediate token format, render separately
1051-
2. **Lazy JSX Creation**: Generate JSX only at render time
1052-
3. **Memoization**: Cache token trees, regenerate JSX from tokens
1053-
4. **String Reduction**: Use token positions instead of substring calls
1055+
1.**AST Exposure**: Parse to AST and expose it (IMPLEMENTED - Feature)
1056+
2. **Profile React Rendering**: Measure time spent in React.createElement
1057+
3. **Optimize Rendering**: Reduce JSX creation overhead
1058+
4. **Memoization**: Cache rendered JSX for repeated ASTs
1059+
5. ~~**String Reduction**: Use token positions instead of substring calls~~ (❌ Attempted - incompatible with recursive parsing)
1060+
1061+
### Next Steps for Performance:
1062+
1063+
**Phase 1: Measure rendering performance**
1064+
1065+
- Profile `render()` function separately
1066+
- Measure React.createElement overhead
1067+
- Determine parsing vs rendering split
1068+
1069+
**Phase 2: Optimize based on findings**
1070+
1071+
- If rendering is the bottleneck: optimize JSX creation
1072+
- If parsing is the bottleneck: focus on parsing optimizations (already done)
1073+
- If memory is the bottleneck: reduce allocations during rendering
1074+
1075+
## Next: Profile React Rendering Performance
1076+
1077+
We've focused heavily on parsing performance, but **we haven't profiled the React rendering phase yet**.
1078+
1079+
### Questions to Answer:
1080+
1081+
1. **How much time is spent in React.createElement()?**
1082+
1083+
- Each AST node creates JSX elements
1084+
- For a 27KB document, we might have hundreds of React.createElement calls
1085+
- Is this a bottleneck?
1086+
1087+
2. **What's the overhead of JSX creation?**
1088+
1089+
- Every AST node → JSX element conversion
1090+
- Component overhead vs plain objects
1091+
- React reconciliation preparation
1092+
1093+
3. **Can we optimize rendering?**
1094+
- Memoization opportunities
1095+
- Reducing object allocations during rendering
1096+
- Optimizing the render function
1097+
1098+
### How to Profile Rendering:
1099+
1100+
```javascript
1101+
// Profile rendering separately from parsing
1102+
const ast = compiler(markdown, { ast: true })
1103+
1104+
// Measure rendering time
1105+
const t0 = performance.now()
1106+
const jsx = renderAST(ast) // Use createRenderer
1107+
const t1 = performance.now()
1108+
1109+
console.log('Render time:', t1 - t0, 'ms')
1110+
```
1111+
1112+
We need to:
1113+
1114+
1. Add profiling hooks to the `render()` function
1115+
2. Measure time spent in `React.createElement`
1116+
3. Measure memory allocations during rendering
1117+
4. Compare parsing time vs rendering time
1118+
1119+
**Hypothesis**: React rendering might be a significant portion of the 14ms total time.
10541120

10551121
### Should We Rewrite?
10561122

README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ The most lightweight, customizable React markdown component.
2525
- [options.namedCodesToUnicode](#optionsnamedcodestounicode)
2626
- [options.disableAutoLink](#optionsdisableautolink)
2727
- [options.disableParsingRawHTML](#optionsdisableparsingrawhtml)
28+
- [options.ast](#optionsast)
2829
- [Syntax highlighting](#syntax-highlighting)
2930
- [Handling shortcodes](#handling-shortcodes)
3031
- [Getting the smallest possible bundle size](#getting-the-smallest-possible-bundle-size)
@@ -573,6 +574,32 @@ compiler('This text has <span>html</span> in it but it won't be rendered', { dis
573574
<span>This text has &lt;span&gt;html&lt;/span&gt; in it but it won't be rendered</span>
574575
```
575576
577+
#### options.ast
578+
579+
When `ast: true`, the compiler returns the parsed AST structure instead of rendered JSX. **This is the first time the AST is accessible to users!**
580+
581+
```tsx
582+
import { compiler } from 'markdown-to-jsx'
583+
import type { MarkdownToJSX } from 'markdown-to-jsx'
584+
585+
// Get the AST directly
586+
const ast = compiler('# Hello world', { ast: true })
587+
588+
// TypeScript: AST is MarkdownToJSX.AST[]
589+
console.log(ast) // Array of parsed nodes with types
590+
591+
// You can manipulate, transform, or analyze the AST before rendering
592+
```
593+
594+
The AST format is `MarkdownToJSX.AST[]` and enables:
595+
596+
- AST manipulation and transformation
597+
- Custom rendering logic without re-parsing
598+
- Caching parsed AST for performance
599+
- Linting or validation of markdown structure
600+
601+
When footnotes are present, the returned value will be an object with `ast` and `footnotes` properties instead of just the AST array.
602+
576603
### Syntax highlighting
577604
578605
When using [fenced code blocks](https://www.markdownguide.org/extended-syntax/#syntax-highlighting) with language annotation, that language will be added to the `<code>` element as `class="lang-${language}"`. For best results, you can use `options.overrides` to provide an appropriate syntax highlighting integration like this one using `highlight.js`:

0 commit comments

Comments
 (0)