fix(mdxish): parse <Table> as single token to prevent blank-line fragmentations#1371
Conversation
…rm-15613-properly-parse-and-render-fragmented-table-elements
|
okay im trying something new in this commit 753a8e4 i realised now that the tokenizer can ensure the entire |
|
with the new tokenizer, the entire
EDIT (some follow ups):
|
There was a problem hiding this comment.
This is working well, save for a couple issues I noticed:
- an unclosed
<Table>component will swallow all following content1 - writing “</Table>” in an inline code snippet inside a cell is not auto-escaped1
The second one is really more of an edge case, but the first one I think we actually need to fix before this gets merged?
Footnotes
| * | ||
| * @see https://github.com/syntax-tree/mdast#flowcontent | ||
| */ | ||
| export const FLOW_TYPES = new Set([ |
There was a problem hiding this comment.
exposing a list of FLOW_TYPES here for consumers to use, I was deliberating to make a new Set over in the monorepo but opted to just put this as an import so we can a single SOT
There was a problem hiding this comment.
This is to help with differentiating when to use GFM vs JSX table syntax in the editor serialization right?
There was a problem hiding this comment.
yea, tldr, if the table contains any FLOW types (except html) then we serialize to JSX syntax to keep it rich (ie to allow code blocks)
| expect(cells).toHaveLength(2); | ||
| }); | ||
|
|
||
| it('does not swallow content after an unclosed Table', () => { |
There was a problem hiding this comment.
Do we also need to check there's no table nodes in the tree?
This PR was released!🚀 Changes included in v13.6.1 |
…ip corruption (#1430) | 🎫 Resolve RM-16064 | | :-----------------: | ## 🎯 What does this PR do? The `jsxTable` micromark tokenizer captures `<Table>...</Table>` as a single opaque `html` node. `mdxishTables` then re-parses that node's `value` through a separate `unified()` pipeline (`tableNodeProcessor`) to extract structured cells. The problem is that every node produced by that re-parse carries positions **relative to the `html` node's `value` string** and not the outer document. So when a downstream consumer slices `originalSource` by `node.position.offset` (as the editor's [`nodeToSource`](https://github.com/readmeio/readme/blob/a2931f875c422bce8cdd9581505f6ac369da02ea/packages/react/src/ui/MdxishEditor/extensions/document/utils.ts#L20-L49) does for [`SyntaxFlow` and `SyntaxText`](https://github.com/readmeio/readme/blob/a2931f875c422bce8cdd9581505f6ac369da02ea/packages/react/src/ui/MdxishEditor/extensions/document/Syntax/createSyntaxExtension.ts#L404-L408)), it's reading bytes from the wrong location in the outer source. This was a gap that #1371 failed to account for mainly because: - The rendering path (`mdxish()` → hast) doesn't use positions - Markdown-only roundtrips (`mdast()` → `mdxishMdastToMd()`) don't slice by offset - Only the editor's PM pipeline and only syntaxNodes uses `nodeToSource` and the slicing The fix had 2 places: 1. preserve blank lines while reassembling the `html` node's `value`. The original code joined per-line chunks with `'\n'`, collapsing multi-line gaps. This is a prerequisite for (2): without it, inner offsets drift relative to the outer source past the first blank line. 2. after `tableNodeProcessor` returns its subtree, walk every node and shift `position.start.offset` / `position.end.offset` / `position.start.line` / `position.end.line` by the outer `html` node's base offset and line. This translates inner coordinates into the outer source coordinate space so downstream consumers can slice correctly. The blank-line fix isn't the actual bug (it happens at the reassembly stage), but rather simply a hard prerequisite for the position translation to produce byte-identical slices against the outer source. ## 🧪 QA tips - Paste a `<Table>` with `<ul>/<li>` inside a `<td>` in the mdxish editor - Save and re-open multiple times — closing tags should remain intact - The companion test PR in readme is [readmeio/readme#18209](readmeio/readme#18209) ## 📸 Screenshot or Loom https://github.com/user-attachments/assets/4fca5ae1-0c46-4e2b-a58d-25a5a1cbf45c
## Version 13.8.5 ### 🛠 Fixes & Updates * actually fix toc links with query params ([#1436](#1436)) ([b59ce5f](b59ce5f)) * **mdxish:** fix FA Emojis in Callouts ([#1421](#1421)) ([f7caa3e](f7caa3e)) * **mdxish:** preserve blank lines in JSX table parsing to fix roundtrip corruption ([#1430](#1430)) ([9fa040c](9fa040c)), closes [#1371](#1371) * proper spacing in empty callouts ([#1433](#1433)) ([97b8a41](97b8a41)) * scope link color inheritance to headings ([#1432](#1432)) ([cd78a62](cd78a62)) <!--SKIP CI-->


🧰 Changes
CommonMark's HTML block type 6 (spec) matches
<Table>case-insensitively against the table tag, which means it terminates at blank lines. Since<Table>JSX blocks always have blank lines between rows and sections, the mdxish pipeline was fragmenting them into multiple broken nodes.Adds a custom micromark tokenizer (
jsxTable) that grabs<Table>before the built-inhtmlFlowparser and consumes everything through</Table>as a single tokenmdast-util-from-markdownextension turns that token into anhtmlnodemdxishTablestransformer handles the rest by re-parsing it withremarkMdxinto proper table nodesDemo
Screen.Recording.2026-03-16.at.19.30.56.mov
Warning
There still needs some work to be done on the monorepo side to fully support editing fragmented tables but for the moment, this PR ensures that we are rendering fragmented tables correctly. ie if you go back and forth in raw and rich mode in the new editor it breaks
This PR is essentially part 1 of solving RM-15601 and RM-15600
🧬 QA & Testing
Paste this in the demo app, it should render as ONE big table
and some edge case tests if you are interested