Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions .claude/context/MDXish/Processor Overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# MDXish Processor Overview

Comment thread
rafegoldberg marked this conversation as resolved.
## `mdxishAstProcessor()`

### Preprocessing Step

> **See**: @lib/mdxish.ts#92-103

`preprocessContent` is a string-level preprocessor that runs before the markdown is handed to remarkParse. It exists because several syntactic patterns in ReadMe's flavor of markdown would confuse or break the standard CommonMark/MDX parser if fed to it directly. By patching the raw string first, these issues are sidestepped.

It applies four transforms in sequence:

1. **`normalizeTableSeparator()`**

Fixes malformed GFM table separator rows — e.g. misplaced alignment colons like `|: ---` → `| :---`. Without this, remarkGfm would fail to recognize the table.
1. **`terminateHtmlFlowBlocks()`**

Inserts blank lines after standalone HTML elements (like `<div>...</div>`) when the next line is regular markdown. CommonMark's HTML flow rules only terminate on blank lines, so without this, the parser would swallow subsequent markdown content into the HTML block token.
1. **`preprocessJSXExpressions()`** (skipped in safeMode)

Handles JSX attribute expressions (`href={someVar}`) and unbalanced braces before the MDX expression tokenizer sees them. It evaluates attribute expressions against jsxContext, converts style objects to CSS strings, and escapes stray braces that would cause MDX parse errors.
1. **`processSnakeCaseComponent()`**

Remark's parser rejects tag names containing underscores (e.g. `<my_component>`). This step replaces known snake_case component names with safe placeholder names (`<MDXishSnakeCase0>`) and returns a mapping so they can be restored later by the `restoreSnakeCaseComponentNames` transformer in the run phase.

##### Where it sits in the flow

```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I've been thinking about lately, and not something you need to address, is that Jest shipped a change recently to reduce their output when it's run by an LLM in jestjs/jest@3f17932. I wonder what the impact of the whitespace and ASCII tables within these Claude files are towards our token usages, and if rewriting them as Mermaid diagrams would cut that down at all.

preprocessContent (string → string)
┌─────────────────────────────────────────────────────┐
│ normalizeTableSeparator — fix table syntax │
│ terminateHtmlFlowBlocks — fix HTML flow │
│ preprocessJSXExpressions — eval/escape JSX │ before
│ processSnakeCaseComponent — placeholder swap │ parsing
└─────────────────────────────┬───────────────────────┘
remarkParse (tokenize)
MDAST transformers...
...
restoreSnakeCaseComponentNames ◄── undo (4)
```

### Processor Pipeline

> **See**: @lib/mdxish.ts#105-178

The core Xish engine which parses Markdown and converts it to an MDAST object. This is the base processor used for both the editor and rendering flows.

```
| ................ process (parse only) ...................... |
| .. parse ........... | .............. run .................. |

NO COMPILER
+--------+ +----------+ (MDAST is
Input ->- | Parser | ->- Syntax Tree ->- | N/A | returned
+--------+ | +----------+ directly)
| |
| X
| |
| +--------------+
| | Transformers |
| +--------------+
| |
┌────────────┘ ┌───┴──────────────────────────────┐
│ │ │
│ PARSER │ MDAST TRANSFORMERS │
│ (micromark) │ (remark plugins) │
│ │ │
│ remarkParse │ remarkFrontmatter │
│ + extensions: │ normalizeEmphasisAST │
│ · magicBlock │ magicBlockTransformer │
│ · legacyVariable │ imageTransformer │
│ · looseHtmlEntity │ defaultTransformers │
│ · mdxExprTextOnly │ (callouts, codeTabs, │
│ │ gemoji, embeds) │
│ + fromMarkdown: │ mdxishComponentBlocks │
│ · magicBlock │ restoreSnakeCaseComponentNames │
│ · legacyVariable │ mdxishTables │
│ · emptyTaskList… │ mdxishHtmlBlocks │
│ · looseHtmlEntity │ mdxishJsxToMdast? │
│ · mdxExpression… │ variablesTextTransformer │
│ │ tailwindTransformer? │
│ │ remarkGfm │
│ │ │
└───────────────────────┴──────────────────────────────────┘
```

## `mdxish()`

### Preprocessing Step

> **See**: @lib/mdxish.ts#209-212

These three lines are a protect-strip-restore pattern that removes JSX comments (`{/* ... */}`) from the markdown before anything else processes it. Here's the step-by-step:

1. **`protectCodeBlocks(mdContent)`**

Replaces fenced code blocks and inline code with placeholder tokens (`___CODE_BLOCK_0___`, `___INLINE_CODE_0___`), stashing the originals in arrays. This prevents the next step from stripping things that look like JSX comments but are actually inside code.
2. **`removeJSXComments(protectedContent)`**

Strips all JSX comment expressions from the (now code-protected) string via a single regex. With code blocks safely out of the way, this only hits actual JSX comments in prose/component markup.
3. **`restoreCodeBlocks(withoutComments, protectedCode)`**

Swaps the placeholder tokens back to their original code content, yielding the final `contentWithoutComments` string.

##### Why it's necessary

JSX comments are valid in MDX but have no meaning in the rendered output. If left in, they'd be parsed by the MDX expression tokenizer (the `mdxExprTextOnly` micromark extension) as expression nodes and could appear as literal text or cause parse errors. Stripping them at the string level — before `mdxishAstProcessor` and `preprocessContent` even run — is the simplest way to ensure they're gone.

##### Where it sits in the flow

This runs in `mdxish()` before calling `mdxishAstProcessor`, making it the very first string-level transform — even before `preprocessContent`:

```
mdContent (raw input)
┌──────────────────────────────┐
│ protectCodeBlocks │ ◄── lines 209-212
│ removeJSXComments │ (in mdxish())
│ restoreCodeBlocks │
└──────────────┬───────────────┘
│ contentWithoutComments
┌──────────────────────────────┐
│ preprocessContent │ ◄── inside mdxishAstProcessor()
│ normalizeTableSeparator │
│ terminateHtmlFlowBlocks │
│ preprocessJSXExpressions │
│ processSnakeCaseComponent │
└──────────────┬───────────────┘
│ parserReadyContent
remarkParse → transformers → ...
```

### Processor Pipeline

> **See**: @lib/mdxish.ts#214-239

```
| ................ process (parse + run only) ................. |
| .. parse ........... | .............. run ................... |

NO COMPILER
+--------+ +----------+ (HAST obj
Input ->- | Parser | ->- Syntax Tree ->- | N/A | returned
+--------+ | +----------+ directly)
| X
| |
| +--------------+
| | Transformers |
| +--------------+
| |
| |
┌────────────┘ ┌───┴───────────────────────────────────────────┐
│ │ │
│ PARSER │ MDAST TRANSFORMERS HAST XFORMERS │
│ (micromark) │ (remark plugins) (rehype) │
│ │ │
│ remarkParse │ remarkFrontmatter preserveBool… │
│ + extensions: │ normalizeEmphasisAST rehypeRaw │
│ · magicBlock │ magicBlockTransformer restoreBool… │
│ · legacyVariable │ imageTransformer rehypeFlatten… │
│ · looseHtmlEntity │ defaultTransformers mdxishMermaid… │
│ · mdxExprTextOnly │ (callouts, codeTabs, generateSlug… │
│ │ gemoji, embeds) rehypeMdxish… │
│ + fromMarkdown: │ mdxishComponentBlocks │
│ · magicBlock │ restoreSnakeCase… ▲ │
│ · legacyVariable │ mdxishTables │ │
│ · emptyTaskList… │ mdxishHtmlBlocks │ │
│ · looseHtmlEntity │ mdxishJsxToMdast? │ bridge: │
│ · mdxExpression… │ variablesTextTransformer │ remarkRehype │
│ │ tailwindTransformer? │ (MDAST → HAST) │
│ │ remarkGfm │ │
│ │ evaluateExpressions? │ │
│ │ remarkBreaks │ │
│ │ variablesCodeResolver ───┘ │
│ │ │
└───────────────────────┴───────────────────────────────────────────────┘
```
134 changes: 134 additions & 0 deletions .claude/context/MDXish/Supported Syntax.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# MDXish Supported Syntax

## Custom Blocks

### Code Tabs

Tabbed interface for multiple code blocks - written as immediately consecutive standard code blocks (i.e. **without** any line breaks between them).

```js Title One
console.log('Tab One');
```
```js Title Two
console.log('Tab Two');
```

### Callouts

Blockquotes that start with an emoji are rendered determine the theme:

> 👍 Success
>
> Your success message here

#### Supported Themes

- **Info**: 📘 or ℹ️ (blue)
- **Success**: 👍 or ✅ (green)
- **Warning**: 🚧 or ⚠️ (orange)
- **Error**: ❗️ or 🛑 (red)
- **Default**: any other emoji (gray)

### Embeds

Simple markdown link with `@embed` title:

[Embed Title](https://youtu.be/example "@embed")

## Data Replacement Syntaxes

### User Variables

Double angle-bracket notation for JWT login variables:

Hi, my name is **<<name>>**!

### Glossary Terms

Double angle-brackets with `glossary:` prefix:

**<<glossary:exogenous>>** and **<<glossary:endogenous>>**

## MDX Syntax

A subset of MDX syntax is supported.

### Custom Components
You can embed React components or reusable Markdown snippets in a document using JSX elements:

<MyComponent prop="value" />

### Logical Expressions

Simple logic is also supported using the JSX-style curly brace syntax:

{(4 * 3) / 2} of 1, a half dozen of another.

This expression syntax can also be used as an alternative for user variables:

Hi, my name is **{user.name}**!

## Standard Markdown Extensions

Full **CommonMark** and **GitHub-flavored Markdown** support, including:

### Emoji Shortcodes

GitHub-style emoji codes:

:sparkles:

### Tables

GFM-style tables with alignment support:

| Left | Center | Right |
|:-----|:--------:|------:|
| L0 | **bold** | $1600 |

### Lists

Standard bulleted (`-` or `*`) and numbered lists (`1.`, `2.`, etc.) are supported, as well as GFM-style checklists:

```md
- [x] finished item
- [ ] unfinished item
```

### Headings

Standard Markdown heading syntaxes (`#` prefixes) are supported, as well as compact and ATX-wrapped variations:

##Compact Heading without a space

## ATX-Style Wrapped Heading ##

Underline notation (using `=` or `-` are also supported for first and second level headings, respectively.

## Legacy Magic Blocks

The engine also supports the legacy JSON-based "magic block" syntax for backwards compatibility.

[block:api-header]
{
"title": "Section Title"
}
[/block]

This is a legacy format that should be transpiled to newer ReadMe-flavored syntax. Supported magic blocks include:

| Feature | Magic Block Name |
|---------------|------------------------|
| Heading | `[block:api-header]` |
| Callout | `[block:callout]` |
| Embed | `[block:embed]` |
| Custom HTML | `[block:html]` |
| Image | `[block:image]` |
| Table | `[block:parameters]` |

## Additional Features

- **Auto-generated heading anchors** with incremental IDs for duplicate headings
- **Table of Contents generation** from markup
- **Custom `doc:` and `ref:` protocols** for internal documentation links
- **Both JSX and HTML comments** for non-rendered notes and annotations
16 changes: 16 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# `@readme/markdown`

This repo contains two Markdown processing engines, both built on top of Unified.js + Remark.

## RMDX

A strict MDX processor written on top of Unified.js + Remark. RMDX handles standard Markdown + GFM, as well as ReadMe's flavored custom syntax. Because it is an MDX-first processor, all RMDX-processed docs must adhere to strict JSX syntax rules. (See @lib/mdx.ts)

## MDXish (aka Xish)

The Xish processor supports standard Markdown with GFM extensions, as well as ReadMe's flavored syntax. This engine also supports a subset of MDX functionality (specifically custom components and logical expressions) without requiring strict JSX compliance. (See @lib/mdxish.ts)

### Further Context

- @.claude/context/MDXish/Processor Overview.md
- @.claude/context/MDXish/Supported Syntax.md