Skip to content

fix(mdxish): parse <Table> as single token to prevent blank-line fragmentations#1371

Merged
kevinports merged 17 commits intonextfrom
falco/rm-15613-properly-parse-and-render-fragmented-table-elements
Mar 17, 2026
Merged

fix(mdxish): parse <Table> as single token to prevent blank-line fragmentations#1371
kevinports merged 17 commits intonextfrom
falco/rm-15613-properly-parse-and-render-fragmented-table-elements

Conversation

@maximilianfalco
Copy link
Copy Markdown
Contributor

@maximilianfalco maximilianfalco commented Mar 12, 2026

PR App Fix RM-15613

🧰 Changes

CommonMark's HTML block type 6 (spec) matches <Table> case-insensitively against the table tag, which means it terminates at blank lines. Since <Table> JSX blocks always have blank lines between rows and sections, the mdxish pipeline was fragmenting them into multiple broken nodes.

Adds a custom micromark tokenizer (jsxTable) that grabs <Table> before the built-in htmlFlow parser and consumes everything through </Table> as a single token

  • An mdast-util-from-markdown extension turns that token into an html node
  • The existing mdxishTables transformer handles the rest by re-parsing it with remarkMdx into proper table nodes
Before After
Screenshot 2026-03-16 at 18 14 36 Screenshot 2026-03-16 at 18 13 57

Demo

Screen.Recording.2026-03-16.at.19.30.56.mov

Warning

There still needs some work to be done on the monorepo side to fully support editing fragmented tables but for the moment, this PR ensures that we are rendering fragmented tables correctly. ie if you go back and forth in raw and rich mode in the new editor it breaks

This PR is essentially part 1 of solving RM-15601 and RM-15600

🧬 QA & Testing

Paste this in the demo app, it should render as ONE big table

<Table align={["left","left"]}>
  <thead>
    <tr>
      <th style={{ textAlign: "left" }}>
        Field
      </th>

      <th style={{ textAlign: "left" }}>
        Description
      </th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td style={{ textAlign: "left" }}>
        Fenced code block
      </td>

      <td style={{ textAlign: "left" }}>
        ```
        {
          "field": "ID",
          "type": "ASC"
        }
        ```
      </td>
    </tr>

    <tr>
      <td style={{ textAlign: "left" }}>
        Lists
      </td>

      <td style={{ textAlign: "left" }}>
        Oh no

        * no no no
        * no no no
        * Im so sorry
      </td>
    </tr>

    <tr>
      <td style={{ textAlign: "left" }}>
        Inline code with a pipe
      </td>

      <td style={{ textAlign: "left" }}>
        `foo | bar`
      </td>
    </tr>
  </tbody>
</Table>

and some edge case tests if you are interested

<Table>
  <tbody>
    <tr>
      <td>Literal closing tag as text</td>
      <td>\</Table> should not be separated</td>
    </tr>
  </tbody>
</Table>

---

<Table>
  <tbody>
    <tr>
      <td>Literal closing tag as text</td>
    </tr>
  </tbody>
</Table>

---

> <Table>
>   <tbody>
>     <tr><td>quoted</td></tr>
>   </tbody>
> </Table>
>
> text after quote

---

<Table>
  <thead>
  <tr><th>A</th></tr>
  </thead>
</Table>

---

<Table>
  <tbody>
    <tr>
      <td>Closing tag in code</td>
      <td>This `</Table>` should be auto-escaped.</td>
    </tr>
  </tbody>
</Table>

---

<Table>
  <tbody>
    <tr>
      <td>Closing tag in code</td>
    </tr>
  </tbody>

You should still see me!

---

<Table>
  <tbody>
    <tr>
      <td>
        <Table>
          <tbody>
            <tr>
              <td>Nested Table</td>
            </tr>
          </tbody>
        </Table>
      </td>
      <td>Hi</td>
    </tr>
  </tbody>
</Table>

Comment thread __tests__/parsers/tables.test.ts
Comment thread lib/micromark/jsx-table/syntax.ts
Comment thread lib/micromark/jsx-table/syntax.ts
Comment thread lib/micromark/jsx-table/syntax.ts
Comment thread lib/stripComments.ts
Comment thread processor/transform/mdxish/mdxish-tables.ts
Comment thread package.json Outdated
Comment thread __tests__/parsers/tables.test.ts
@maximilianfalco
Copy link
Copy Markdown
Contributor Author

maximilianfalco commented Mar 13, 2026

okay im trying something new in this commit 753a8e4

i realised now that the tokenizer can ensure the entire<Table> comes in one singular node, we can make use of remarkMdx and all its benefits to parse cell contents instead relying on the current lightweight (remarkParse + remarkGfm only) parser. hopefully this'll allow use to have parity with both RDMD and RMDX.

@maximilianfalco maximilianfalco marked this pull request as ready for review March 16, 2026 03:07
Comment thread processor/transform/mdxish/mdxish-tables.ts Outdated
Comment thread processor/transform/mdxish/mdxish-tables.ts
@maximilianfalco
Copy link
Copy Markdown
Contributor Author

maximilianfalco commented Mar 16, 2026

with the new tokenizer, the entire <Table> ... </Table> should always come in a single block which means that the transformer no longer has to handle multiple different possible shapes hence why we are able to remove a lot of stuff in mdxish-tables

  • the isMDXElement visit pass and mdxishComponentBlocks import are dead code since the tokenizer captures the table before mdxishComponentBlocks ever sees it
  • handleTableInNode + separate html/raw visits are no longer needed since the tokenizer guarantees a single html node, so they collapse into one visit(tree, 'html', ...)
  • replaced mdCellProcessor / parseMarkdown with tableNodeProcessor which uses remarkMdx instead of lightweight remarkParse + remarkGfm, so cells now support full mdxish syntax (callouts, gemoji, components, etc)

EDIT (some follow ups):

  • tokenizer now handles backslash-escaped \</Table> in cell content without prematurely closing the block
  • tables without <thead> are kept as JSX instead of converting to mdast (mdast always treats the first row as a header, so body rows would get promoted)
  • single-line <tr><th>...</th></tr> now renders correctly
  • paragraph wrappers and whitespace-only text nodes produced by remarkMdx are cleaned up in the JSX path to prevent empty <p>/<br> gaps

Copy link
Copy Markdown
Contributor

@rafegoldberg rafegoldberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working well, save for a couple issues I noticed:

  1. an unclosed <Table> component will swallow all following content1
  2. writing “</Table>” in an inline code snippet inside a cell is not auto-escaped1

The second one is really more of an edge case, but the first one I think we actually need to fix before this gets merged?

Footnotes

  1. run the start script to view these links 2

Comment thread lib/constants.ts
*
* @see https://github.com/syntax-tree/mdast#flowcontent
*/
export const FLOW_TYPES = new Set([
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exposing a list of FLOW_TYPES here for consumers to use, I was deliberating to make a new Set over in the monorepo but opted to just put this as an import so we can a single SOT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to help with differentiating when to use GFM vs JSX table syntax in the editor serialization right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, tldr, if the table contains any FLOW types (except html) then we serialize to JSX syntax to keep it rich (ie to allow code blocks)

expect(cells).toHaveLength(2);
});

it('does not swallow content after an unclosed Table', () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to check there's no table nodes in the tree?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should still be able to see tables, this is on par with RDMD
Screenshot 2026-03-17 at 15 59 31

<Table>
  <tbody>
    <tr>
      <td>Closing tag in code</td>
    </tr>
  </tbody>

You should still see me!

@kevinports kevinports merged commit ce73b7f into next Mar 17, 2026
14 of 15 checks passed
@kevinports kevinports deleted the falco/rm-15613-properly-parse-and-render-fragmented-table-elements branch March 17, 2026 18:28
rafegoldberg pushed a commit that referenced this pull request Mar 17, 2026
## Version 13.6.1
### 🛠 Fixes & Updates

* **mdxish:** parse <Table> as single token to prevent blank-line fragmentations ([#1371](#1371)) ([ce73b7f](ce73b7f))
* **mdxish:** variables in callout titles causing it to crash ([#1378](#1378)) ([4006247](4006247)), closes [#1362](#1362)

<!--SKIP CI-->
@rafegoldberg
Copy link
Copy Markdown
Contributor

This PR was released!

🚀 Changes included in v13.6.1

maximilianfalco added a commit that referenced this pull request Apr 14, 2026
…ip corruption (#1430)

| 🎫 Resolve RM-16064 |
| :-----------------: |

## 🎯 What does this PR do?

The `jsxTable` micromark tokenizer captures `<Table>...</Table>` as a
single opaque `html` node. `mdxishTables` then re-parses that node's
`value` through a separate `unified()` pipeline (`tableNodeProcessor`)
to extract structured cells.

The problem is that every node produced by that re-parse carries
positions **relative to the `html` node's `value` string** and not the
outer document. So when a downstream consumer slices `originalSource` by
`node.position.offset` (as the editor's
[`nodeToSource`](https://github.com/readmeio/readme/blob/a2931f875c422bce8cdd9581505f6ac369da02ea/packages/react/src/ui/MdxishEditor/extensions/document/utils.ts#L20-L49)
does for [`SyntaxFlow` and
`SyntaxText`](https://github.com/readmeio/readme/blob/a2931f875c422bce8cdd9581505f6ac369da02ea/packages/react/src/ui/MdxishEditor/extensions/document/Syntax/createSyntaxExtension.ts#L404-L408)),
it's reading bytes from the wrong location in the outer source.

This was a gap that #1371 failed to account for mainly because:

- The rendering path (`mdxish()` → hast) doesn't use positions
- Markdown-only roundtrips (`mdast()` → `mdxishMdastToMd()`) don't slice
by offset
- Only the editor's PM pipeline and only syntaxNodes uses `nodeToSource`
and the slicing

The fix had 2 places:

1. preserve blank lines while reassembling the `html` node's `value`.
The original code joined per-line chunks with `'\n'`, collapsing
multi-line gaps. This is a prerequisite for (2): without it, inner
offsets drift relative to the outer source past the first blank line.

2. after `tableNodeProcessor` returns its subtree, walk every node and
shift `position.start.offset` / `position.end.offset` /
`position.start.line` / `position.end.line` by the outer `html` node's
base offset and line. This translates inner coordinates into the outer
source coordinate space so downstream consumers can slice correctly.

The blank-line fix isn't the actual bug (it happens at the reassembly
stage), but rather simply a hard prerequisite for the position
translation to produce byte-identical slices against the outer source.

## 🧪 QA tips

- Paste a `<Table>` with `<ul>/<li>` inside a `<td>` in the mdxish
editor
- Save and re-open multiple times — closing tags should remain intact
- The companion test PR in readme is
[readmeio/readme#18209](readmeio/readme#18209)

## 📸 Screenshot or Loom


https://github.com/user-attachments/assets/4fca5ae1-0c46-4e2b-a58d-25a5a1cbf45c
rafegoldberg pushed a commit that referenced this pull request Apr 14, 2026
## Version 13.8.5
### 🛠 Fixes & Updates

* actually fix toc links with query params ([#1436](#1436)) ([b59ce5f](b59ce5f))
* **mdxish:** fix FA Emojis in Callouts ([#1421](#1421)) ([f7caa3e](f7caa3e))
* **mdxish:** preserve blank lines in JSX table parsing to fix roundtrip corruption ([#1430](#1430)) ([9fa040c](9fa040c)), closes [#1371](#1371)
* proper spacing in empty callouts ([#1433](#1433)) ([97b8a41](97b8a41))
* scope link color inheritance to headings ([#1432](#1432)) ([cd78a62](cd78a62))

<!--SKIP CI-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants