-
Notifications
You must be signed in to change notification settings - Fork 18
docs(xish): add CLAUDE.md + processor flow overview docs #1370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cc9be60
1f6b549
c747bb9
9b9ed52
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,186 @@ | ||
| # MDXish Processor Overview | ||
|
|
||
| ## `mdxishAstProcessor()` | ||
|
|
||
| ### Preprocessing Step | ||
|
|
||
| > **See**: @lib/mdxish.ts#92-103 | ||
|
|
||
| `preprocessContent` is a string-level preprocessor that runs before the markdown is handed to remarkParse. It exists because several syntactic patterns in ReadMe's flavor of markdown would confuse or break the standard CommonMark/MDX parser if fed to it directly. By patching the raw string first, these issues are sidestepped. | ||
|
|
||
| It applies four transforms in sequence: | ||
|
|
||
| 1. **`normalizeTableSeparator()`** | ||
|
|
||
| Fixes malformed GFM table separator rows — e.g. misplaced alignment colons like `|: ---` → `| :---`. Without this, remarkGfm would fail to recognize the table. | ||
| 1. **`terminateHtmlFlowBlocks()`** | ||
|
|
||
| Inserts blank lines after standalone HTML elements (like `<div>...</div>`) when the next line is regular markdown. CommonMark's HTML flow rules only terminate on blank lines, so without this, the parser would swallow subsequent markdown content into the HTML block token. | ||
| 1. **`preprocessJSXExpressions()`** (skipped in safeMode) | ||
|
|
||
| Handles JSX attribute expressions (`href={someVar}`) and unbalanced braces before the MDX expression tokenizer sees them. It evaluates attribute expressions against jsxContext, converts style objects to CSS strings, and escapes stray braces that would cause MDX parse errors. | ||
| 1. **`processSnakeCaseComponent()`** | ||
|
|
||
| Remark's parser rejects tag names containing underscores (e.g. `<my_component>`). This step replaces known snake_case component names with safe placeholder names (`<MDXishSnakeCase0>`) and returns a mapping so they can be restored later by the `restoreSnakeCaseComponentNames` transformer in the run phase. | ||
|
|
||
| ##### Where it sits in the flow | ||
|
|
||
| ``` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something I've been thinking about lately, and not something you need to address, is that Jest shipped a change recently to reduce their output when it's run by an LLM in jestjs/jest@3f17932. I wonder what the impact of the whitespace and ASCII tables within these Claude files are towards our token usages, and if rewriting them as Mermaid diagrams would cut that down at all. |
||
| preprocessContent (string → string) | ||
| │ | ||
| ▼ | ||
| ┌─────────────────────────────────────────────────────┐ | ||
| │ normalizeTableSeparator — fix table syntax │ | ||
| │ terminateHtmlFlowBlocks — fix HTML flow │ | ||
| │ preprocessJSXExpressions — eval/escape JSX │ before | ||
| │ processSnakeCaseComponent — placeholder swap │ parsing | ||
| └─────────────────────────────┬───────────────────────┘ | ||
| │ | ||
| ▼ | ||
| remarkParse (tokenize) | ||
| │ | ||
| ▼ | ||
| MDAST transformers... | ||
| ... | ||
| restoreSnakeCaseComponentNames ◄── undo (4) | ||
| ``` | ||
|
|
||
| ### Processor Pipeline | ||
|
|
||
| > **See**: @lib/mdxish.ts#105-178 | ||
|
|
||
| The core Xish engine which parses Markdown and converts it to an MDAST object. This is the base processor used for both the editor and rendering flows. | ||
|
|
||
| ``` | ||
| | ................ process (parse only) ...................... | | ||
| | .. parse ........... | .............. run .................. | | ||
|
|
||
| NO COMPILER | ||
| +--------+ +----------+ (MDAST is | ||
| Input ->- | Parser | ->- Syntax Tree ->- | N/A | returned | ||
| +--------+ | +----------+ directly) | ||
| | | | ||
| | X | ||
| | | | ||
| | +--------------+ | ||
| | | Transformers | | ||
| | +--------------+ | ||
| | | | ||
| ┌────────────┘ ┌───┴──────────────────────────────┐ | ||
| │ │ │ | ||
| │ PARSER │ MDAST TRANSFORMERS │ | ||
| │ (micromark) │ (remark plugins) │ | ||
| │ │ │ | ||
| │ remarkParse │ remarkFrontmatter │ | ||
| │ + extensions: │ normalizeEmphasisAST │ | ||
| │ · magicBlock │ magicBlockTransformer │ | ||
| │ · legacyVariable │ imageTransformer │ | ||
| │ · looseHtmlEntity │ defaultTransformers │ | ||
| │ · mdxExprTextOnly │ (callouts, codeTabs, │ | ||
| │ │ gemoji, embeds) │ | ||
| │ + fromMarkdown: │ mdxishComponentBlocks │ | ||
| │ · magicBlock │ restoreSnakeCaseComponentNames │ | ||
| │ · legacyVariable │ mdxishTables │ | ||
| │ · emptyTaskList… │ mdxishHtmlBlocks │ | ||
| │ · looseHtmlEntity │ mdxishJsxToMdast? │ | ||
| │ · mdxExpression… │ variablesTextTransformer │ | ||
| │ │ tailwindTransformer? │ | ||
| │ │ remarkGfm │ | ||
| │ │ │ | ||
| └───────────────────────┴──────────────────────────────────┘ | ||
| ``` | ||
|
|
||
| ## `mdxish()` | ||
|
|
||
| ### Preprocessing Step | ||
|
|
||
| > **See**: @lib/mdxish.ts#209-212 | ||
|
|
||
| These three lines are a protect-strip-restore pattern that removes JSX comments (`{/* ... */}`) from the markdown before anything else processes it. Here's the step-by-step: | ||
|
|
||
| 1. **`protectCodeBlocks(mdContent)`** | ||
|
|
||
| Replaces fenced code blocks and inline code with placeholder tokens (`___CODE_BLOCK_0___`, `___INLINE_CODE_0___`), stashing the originals in arrays. This prevents the next step from stripping things that look like JSX comments but are actually inside code. | ||
| 2. **`removeJSXComments(protectedContent)`** | ||
|
|
||
| Strips all JSX comment expressions from the (now code-protected) string via a single regex. With code blocks safely out of the way, this only hits actual JSX comments in prose/component markup. | ||
| 3. **`restoreCodeBlocks(withoutComments, protectedCode)`** | ||
|
|
||
| Swaps the placeholder tokens back to their original code content, yielding the final `contentWithoutComments` string. | ||
|
|
||
| ##### Why it's necessary | ||
|
|
||
| JSX comments are valid in MDX but have no meaning in the rendered output. If left in, they'd be parsed by the MDX expression tokenizer (the `mdxExprTextOnly` micromark extension) as expression nodes and could appear as literal text or cause parse errors. Stripping them at the string level — before `mdxishAstProcessor` and `preprocessContent` even run — is the simplest way to ensure they're gone. | ||
|
|
||
| ##### Where it sits in the flow | ||
|
|
||
| This runs in `mdxish()` before calling `mdxishAstProcessor`, making it the very first string-level transform — even before `preprocessContent`: | ||
|
|
||
| ``` | ||
| mdContent (raw input) | ||
| │ | ||
| ▼ | ||
| ┌──────────────────────────────┐ | ||
| │ protectCodeBlocks │ ◄── lines 209-212 | ||
| │ removeJSXComments │ (in mdxish()) | ||
| │ restoreCodeBlocks │ | ||
| └──────────────┬───────────────┘ | ||
| │ contentWithoutComments | ||
| ▼ | ||
| ┌──────────────────────────────┐ | ||
| │ preprocessContent │ ◄── inside mdxishAstProcessor() | ||
| │ normalizeTableSeparator │ | ||
| │ terminateHtmlFlowBlocks │ | ||
| │ preprocessJSXExpressions │ | ||
| │ processSnakeCaseComponent │ | ||
| └──────────────┬───────────────┘ | ||
| │ parserReadyContent | ||
| ▼ | ||
| remarkParse → transformers → ... | ||
| ``` | ||
|
|
||
| ### Processor Pipeline | ||
|
|
||
| > **See**: @lib/mdxish.ts#214-239 | ||
|
|
||
| ``` | ||
| | ................ process (parse + run only) ................. | | ||
| | .. parse ........... | .............. run ................... | | ||
|
|
||
| NO COMPILER | ||
| +--------+ +----------+ (HAST obj | ||
| Input ->- | Parser | ->- Syntax Tree ->- | N/A | returned | ||
| +--------+ | +----------+ directly) | ||
| | X | ||
| | | | ||
| | +--------------+ | ||
| | | Transformers | | ||
| | +--------------+ | ||
| | | | ||
| | | | ||
| ┌────────────┘ ┌───┴───────────────────────────────────────────┐ | ||
| │ │ │ | ||
| │ PARSER │ MDAST TRANSFORMERS HAST XFORMERS │ | ||
| │ (micromark) │ (remark plugins) (rehype) │ | ||
| │ │ │ | ||
| │ remarkParse │ remarkFrontmatter preserveBool… │ | ||
| │ + extensions: │ normalizeEmphasisAST rehypeRaw │ | ||
| │ · magicBlock │ magicBlockTransformer restoreBool… │ | ||
| │ · legacyVariable │ imageTransformer rehypeFlatten… │ | ||
| │ · looseHtmlEntity │ defaultTransformers mdxishMermaid… │ | ||
| │ · mdxExprTextOnly │ (callouts, codeTabs, generateSlug… │ | ||
| │ │ gemoji, embeds) rehypeMdxish… │ | ||
| │ + fromMarkdown: │ mdxishComponentBlocks │ | ||
| │ · magicBlock │ restoreSnakeCase… ▲ │ | ||
| │ · legacyVariable │ mdxishTables │ │ | ||
| │ · emptyTaskList… │ mdxishHtmlBlocks │ │ | ||
| │ · looseHtmlEntity │ mdxishJsxToMdast? │ bridge: │ | ||
| │ · mdxExpression… │ variablesTextTransformer │ remarkRehype │ | ||
| │ │ tailwindTransformer? │ (MDAST → HAST) │ | ||
| │ │ remarkGfm │ │ | ||
| │ │ evaluateExpressions? │ │ | ||
| │ │ remarkBreaks │ │ | ||
| │ │ variablesCodeResolver ───┘ │ | ||
| │ │ │ | ||
| └───────────────────────┴───────────────────────────────────────────────┘ | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| # MDXish Supported Syntax | ||
|
|
||
| ## Custom Blocks | ||
|
|
||
| ### Code Tabs | ||
|
|
||
| Tabbed interface for multiple code blocks - written as immediately consecutive standard code blocks (i.e. **without** any line breaks between them). | ||
|
|
||
| ```js Title One | ||
| console.log('Tab One'); | ||
| ``` | ||
| ```js Title Two | ||
| console.log('Tab Two'); | ||
| ``` | ||
|
|
||
| ### Callouts | ||
|
|
||
| Blockquotes that start with an emoji are rendered determine the theme: | ||
|
|
||
| > 👍 Success | ||
| > | ||
| > Your success message here | ||
|
|
||
| #### Supported Themes | ||
|
|
||
| - **Info**: 📘 or ℹ️ (blue) | ||
| - **Success**: 👍 or ✅ (green) | ||
| - **Warning**: 🚧 or ⚠️ (orange) | ||
| - **Error**: ❗️ or 🛑 (red) | ||
| - **Default**: any other emoji (gray) | ||
|
|
||
| ### Embeds | ||
|
|
||
| Simple markdown link with `@embed` title: | ||
|
|
||
| [Embed Title](https://youtu.be/example "@embed") | ||
|
|
||
| ## Data Replacement Syntaxes | ||
|
|
||
| ### User Variables | ||
|
|
||
| Double angle-bracket notation for JWT login variables: | ||
|
|
||
| Hi, my name is **<<name>>**! | ||
|
|
||
| ### Glossary Terms | ||
|
|
||
| Double angle-brackets with `glossary:` prefix: | ||
|
|
||
| **<<glossary:exogenous>>** and **<<glossary:endogenous>>** | ||
|
|
||
| ## MDX Syntax | ||
|
|
||
| A subset of MDX syntax is supported. | ||
|
|
||
| ### Custom Components | ||
| You can embed React components or reusable Markdown snippets in a document using JSX elements: | ||
|
|
||
| <MyComponent prop="value" /> | ||
|
|
||
| ### Logical Expressions | ||
|
|
||
| Simple logic is also supported using the JSX-style curly brace syntax: | ||
|
|
||
| {(4 * 3) / 2} of 1, a half dozen of another. | ||
|
|
||
| This expression syntax can also be used as an alternative for user variables: | ||
|
|
||
| Hi, my name is **{user.name}**! | ||
|
|
||
| ## Standard Markdown Extensions | ||
|
|
||
| Full **CommonMark** and **GitHub-flavored Markdown** support, including: | ||
|
|
||
| ### Emoji Shortcodes | ||
|
|
||
| GitHub-style emoji codes: | ||
|
|
||
| :sparkles: | ||
|
|
||
| ### Tables | ||
|
|
||
| GFM-style tables with alignment support: | ||
|
|
||
| | Left | Center | Right | | ||
| |:-----|:--------:|------:| | ||
| | L0 | **bold** | $1600 | | ||
|
|
||
| ### Lists | ||
|
|
||
| Standard bulleted (`-` or `*`) and numbered lists (`1.`, `2.`, etc.) are supported, as well as GFM-style checklists: | ||
|
|
||
| ```md | ||
| - [x] finished item | ||
| - [ ] unfinished item | ||
| ``` | ||
|
|
||
| ### Headings | ||
|
|
||
| Standard Markdown heading syntaxes (`#` prefixes) are supported, as well as compact and ATX-wrapped variations: | ||
|
|
||
| ##Compact Heading without a space | ||
|
|
||
| ## ATX-Style Wrapped Heading ## | ||
|
|
||
| Underline notation (using `=` or `-` are also supported for first and second level headings, respectively. | ||
|
|
||
| ## Legacy Magic Blocks | ||
|
|
||
| The engine also supports the legacy JSON-based "magic block" syntax for backwards compatibility. | ||
|
|
||
| [block:api-header] | ||
| { | ||
| "title": "Section Title" | ||
| } | ||
| [/block] | ||
|
|
||
| This is a legacy format that should be transpiled to newer ReadMe-flavored syntax. Supported magic blocks include: | ||
|
|
||
| | Feature | Magic Block Name | | ||
| |---------------|------------------------| | ||
| | Heading | `[block:api-header]` | | ||
| | Callout | `[block:callout]` | | ||
| | Embed | `[block:embed]` | | ||
| | Custom HTML | `[block:html]` | | ||
| | Image | `[block:image]` | | ||
| | Table | `[block:parameters]` | | ||
|
|
||
| ## Additional Features | ||
|
|
||
| - **Auto-generated heading anchors** with incremental IDs for duplicate headings | ||
| - **Table of Contents generation** from markup | ||
| - **Custom `doc:` and `ref:` protocols** for internal documentation links | ||
| - **Both JSX and HTML comments** for non-rendered notes and annotations |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| # `@readme/markdown` | ||
|
|
||
| This repo contains two Markdown processing engines, both built on top of Unified.js + Remark. | ||
|
|
||
| ## RMDX | ||
|
|
||
| A strict MDX processor written on top of Unified.js + Remark. RMDX handles standard Markdown + GFM, as well as ReadMe's flavored custom syntax. Because it is an MDX-first processor, all RMDX-processed docs must adhere to strict JSX syntax rules. (See @lib/mdx.ts) | ||
|
|
||
| ## MDXish (aka Xish) | ||
|
|
||
| The Xish processor supports standard Markdown with GFM extensions, as well as ReadMe's flavored syntax. This engine also supports a subset of MDX functionality (specifically custom components and logical expressions) without requiring strict JSX compliance. (See @lib/mdxish.ts) | ||
|
|
||
| ### Further Context | ||
|
|
||
| - @.claude/context/MDXish/Processor Overview.md | ||
| - @.claude/context/MDXish/Supported Syntax.md |
Uh oh!
There was an error while loading. Please reload this page.