Skip to content

fix(mdxish): parse <Table> as single token to prevent blank-line fragmentations#1371

Merged
kevinports merged 17 commits intonextfrom
falco/rm-15613-properly-parse-and-render-fragmented-table-elements
Mar 17, 2026
Merged

fix(mdxish): parse <Table> as single token to prevent blank-line fragmentations#1371
kevinports merged 17 commits intonextfrom
falco/rm-15613-properly-parse-and-render-fragmented-table-elements

Conversation

@maximilianfalco
Copy link
Copy Markdown
Contributor

@maximilianfalco maximilianfalco commented Mar 12, 2026

PR App Fix RM-15613

🧰 Changes

CommonMark's HTML block type 6 (spec) matches <Table> case-insensitively against the table tag, which means it terminates at blank lines. Since <Table> JSX blocks always have blank lines between rows and sections, the mdxish pipeline was fragmenting them into multiple broken nodes.

Adds a custom micromark tokenizer (jsxTable) that grabs <Table> before the built-in htmlFlow parser and consumes everything through </Table> as a single token

  • An mdast-util-from-markdown extension turns that token into an html node
  • The existing mdxishTables transformer handles the rest by re-parsing it with remarkMdx into proper table nodes
Before After
Screenshot 2026-03-16 at 18 14 36 Screenshot 2026-03-16 at 18 13 57

Demo

Screen.Recording.2026-03-16.at.19.30.56.mov

Warning

There still needs some work to be done on the monorepo side to fully support editing fragmented tables but for the moment, this PR ensures that we are rendering fragmented tables correctly. ie if you go back and forth in raw and rich mode in the new editor it breaks

This PR is essentially part 1 of solving RM-15601 and RM-15600

🧬 QA & Testing

Paste this in the demo app, it should render as ONE big table

<Table align={["left","left"]}>
  <thead>
    <tr>
      <th style={{ textAlign: "left" }}>
        Field
      </th>

      <th style={{ textAlign: "left" }}>
        Description
      </th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td style={{ textAlign: "left" }}>
        Fenced code block
      </td>

      <td style={{ textAlign: "left" }}>
        ```
        {
          "field": "ID",
          "type": "ASC"
        }
        ```
      </td>
    </tr>

    <tr>
      <td style={{ textAlign: "left" }}>
        Lists
      </td>

      <td style={{ textAlign: "left" }}>
        Oh no

        * no no no
        * no no no
        * Im so sorry
      </td>
    </tr>

    <tr>
      <td style={{ textAlign: "left" }}>
        Inline code with a pipe
      </td>

      <td style={{ textAlign: "left" }}>
        `foo | bar`
      </td>
    </tr>
  </tbody>
</Table>

and some edge case tests if you are interested

<Table>
  <tbody>
    <tr>
      <td>Literal closing tag as text</td>
      <td>\</Table> should not be separated</td>
    </tr>
  </tbody>
</Table>

---

<Table>
  <tbody>
    <tr>
      <td>Literal closing tag as text</td>
    </tr>
  </tbody>
</Table>

---

> <Table>
>   <tbody>
>     <tr><td>quoted</td></tr>
>   </tbody>
> </Table>
>
> text after quote

---

<Table>
  <thead>
  <tr><th>A</th></tr>
  </thead>
</Table>

---

<Table>
  <tbody>
    <tr>
      <td>Closing tag in code</td>
      <td>This `</Table>` should be auto-escaped.</td>
    </tr>
  </tbody>
</Table>

---

<Table>
  <tbody>
    <tr>
      <td>Closing tag in code</td>
    </tr>
  </tbody>

You should still see me!

---

<Table>
  <tbody>
    <tr>
      <td>
        <Table>
          <tbody>
            <tr>
              <td>Nested Table</td>
            </tr>
          </tbody>
        </Table>
      </td>
      <td>Hi</td>
    </tr>
  </tbody>
</Table>

@maximilianfalco
Copy link
Copy Markdown
Contributor Author

maximilianfalco commented Mar 13, 2026

okay im trying something new in this commit 753a8e4

i realised now that the tokenizer can ensure the entire<Table> comes in one singular node, we can make use of remarkMdx and all its benefits to parse cell contents instead relying on the current lightweight (remarkParse + remarkGfm only) parser. hopefully this'll allow use to have parity with both RDMD and RMDX.

@maximilianfalco maximilianfalco marked this pull request as ready for review March 16, 2026 03:07
@maximilianfalco
Copy link
Copy Markdown
Contributor Author

maximilianfalco commented Mar 16, 2026

with the new tokenizer, the entire <Table> ... </Table> should always come in a single block which means that the transformer no longer has to handle multiple different possible shapes hence why we are able to remove a lot of stuff in mdxish-tables

  • the isMDXElement visit pass and mdxishComponentBlocks import are dead code since the tokenizer captures the table before mdxishComponentBlocks ever sees it
  • handleTableInNode + separate html/raw visits are no longer needed since the tokenizer guarantees a single html node, so they collapse into one visit(tree, 'html', ...)
  • replaced mdCellProcessor / parseMarkdown with tableNodeProcessor which uses remarkMdx instead of lightweight remarkParse + remarkGfm, so cells now support full mdxish syntax (callouts, gemoji, components, etc)

EDIT (some follow ups):

  • tokenizer now handles backslash-escaped \</Table> in cell content without prematurely closing the block
  • tables without <thead> are kept as JSX instead of converting to mdast (mdast always treats the first row as a header, so body rows would get promoted)
  • single-line <tr><th>...</th></tr> now renders correctly
  • paragraph wrappers and whitespace-only text nodes produced by remarkMdx are cleaned up in the JSX path to prevent empty <p>/<br> gaps

Copy link
Copy Markdown
Contributor

@rafegoldberg rafegoldberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working well, save for a couple issues I noticed:

  1. an unclosed <Table> component will swallow all following content1
  2. writing “</Table>” in an inline code snippet inside a cell is not auto-escaped1

The second one is really more of an edge case, but the first one I think we actually need to fix before this gets merged?

Footnotes

  1. run the start script to view these links 2

*
* @see https://github.com/syntax-tree/mdast#flowcontent
*/
export const FLOW_TYPES = new Set([
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exposing a list of FLOW_TYPES here for consumers to use, I was deliberating to make a new Set over in the monorepo but opted to just put this as an import so we can a single SOT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to help with differentiating when to use GFM vs JSX table syntax in the editor serialization right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, tldr, if the table contains any FLOW types (except html) then we serialize to JSX syntax to keep it rich (ie to allow code blocks)

expect(cells).toHaveLength(2);
});

it('does not swallow content after an unclosed Table', () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to check there's no table nodes in the tree?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should still be able to see tables, this is on par with RDMD
Screenshot 2026-03-17 at 15 59 31

<Table>
  <tbody>
    <tr>
      <td>Closing tag in code</td>
    </tr>
  </tbody>

You should still see me!

@kevinports kevinports merged commit ce73b7f into next Mar 17, 2026
14 of 15 checks passed
@kevinports kevinports deleted the falco/rm-15613-properly-parse-and-render-fragmented-table-elements branch March 17, 2026 18:28
rafegoldberg pushed a commit that referenced this pull request Mar 17, 2026
## Version 13.6.1
### 🛠 Fixes & Updates

* **mdxish:** parse <Table> as single token to prevent blank-line fragmentations ([#1371](#1371)) ([ce73b7f](ce73b7f))
* **mdxish:** variables in callout titles causing it to crash ([#1378](#1378)) ([4006247](4006247)), closes [#1362](#1362)

<!--SKIP CI-->
@rafegoldberg
Copy link
Copy Markdown
Contributor

This PR was released!

🚀 Changes included in v13.6.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants