Skip to content

[lexical-markdown] Bug Fix: Replace regex-based format matching with …#8093

Merged
etrepum merged 3 commits intofacebook:mainfrom
kimseongyu:fix-incorrect-tag-conversion-from-markdown-to-lexical
Jan 27, 2026
Merged

[lexical-markdown] Bug Fix: Replace regex-based format matching with …#8093
etrepum merged 3 commits intofacebook:mainfrom
kimseongyu:fix-incorrect-tag-conversion-from-markdown-to-lexical

Conversation

@kimseongyu
Copy link
Copy Markdown
Contributor

Description

Current Behavior

Currently, Markdown texts are converted into Lexical nodes using regex. In issue #8073, this approach fails to correctly identify nested or complex matching pairs.

For example, with the pattern *text**text***, the opening tag * should find its corresponding closing tag. But since regex searches for an independently existing closing *, it cannot match correctly.

Changes in This PR

To resolve these inconsistencies, I have implemented the CommonMark Delimiter Algorithm for processing emphasis and strong emphasis.

The algorithm proceeds as follows:

  1. Scan text to build a delimiter stack with canOpen/canClose properties
  2. Process delimiters to find matching pairs using flanking rules and the rule of 3
  3. Return the outermost matched emphasis

Additionally, this PR fixes the issue where formats inside code spans were incorrectly processed. However, there are still some remaining issues. inline elements other than code spans (e.g., links, raw HTML) are handled by text match transformers, making them difficult to address with the current implementation. A new conversion approach may be needed to fully resolve these cases.

As a result, all tests pass. One previously incorrect test was fixed, and 5 new test cases were added.

Closes #8073

Test plan

Before

before.mov

After

after.mov

…CommonMark delimiter algorithm

Previously, the outer format detection relied on regex patterns to find matched formats. However, regex cannot cover all Markdown specification edge cases.

This change implements the CommonMark delimiter algorithm to properly handle emphasis parsing:
1. Scan text to build a delimiter stack with canOpen/canClose properties
2. Process delimiters to find matching pairs using flanking rules and the rule of 3
3. Return the outermost matched emphasis
@vercel
Copy link
Copy Markdown

vercel bot commented Jan 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
lexical Ready Ready Preview, Comment Jan 26, 2026 11:39am
lexical-playground Ready Ready Preview, Comment Jan 26, 2026 11:39am

Request Review

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 25, 2026
Copy link
Copy Markdown
Collaborator

@etrepum etrepum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't carefully review the algorithm, but this does seem like a positive change! Do you have any thoughts on this before going forward @AlessioGr?

@etrepum etrepum added the extended-tests Run extended e2e tests on a PR label Jan 25, 2026
@AlessioGr
Copy link
Copy Markdown
Contributor

I think this is a good change!

@kimseongyu
Copy link
Copy Markdown
Contributor Author

Thank you! I've addressed the feedback in the latest commit. I also fixed the case with *a `*` b `x`*.

Previously, the delimiter stack included delimiters inside code spans, causing incorrect matching. Now, I exclude code span ranges from delimiter scanning to ensure correct behavior.

playground

@etrepum etrepum added this pull request to the merge queue Jan 27, 2026
Merged via the queue into facebook:main with commit f1e4f66 Jan 27, 2026
42 checks passed
@kimseongyu kimseongyu deleted the fix-incorrect-tag-conversion-from-markdown-to-lexical branch January 27, 2026 20:38
@etrepum etrepum mentioned this pull request Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. extended-tests Run extended e2e tests on a PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Markdown emphasis parser fails on overlapping delimiters (*text**more***) - CommonMark compliance issue

4 participants