[lexical-markdown] Bug Fix: Replace regex-based format matching with …#8093
Merged
etrepum merged 3 commits intofacebook:mainfrom Jan 27, 2026
Conversation
…CommonMark delimiter algorithm Previously, the outer format detection relied on regex patterns to find matched formats. However, regex cannot cover all Markdown specification edge cases. This change implements the CommonMark delimiter algorithm to properly handle emphasis parsing: 1. Scan text to build a delimiter stack with canOpen/canClose properties 2. Process delimiters to find matching pairs using flanking rules and the rule of 3 3. Return the outermost matched emphasis
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
etrepum
approved these changes
Jan 25, 2026
Collaborator
etrepum
left a comment
There was a problem hiding this comment.
I didn't carefully review the algorithm, but this does seem like a positive change! Do you have any thoughts on this before going forward @AlessioGr?
AlessioGr
reviewed
Jan 25, 2026
packages/lexical-markdown/src/__tests__/unit/LexicalMarkdown.test.ts
Outdated
Show resolved
Hide resolved
Contributor
|
I think this is a good change! |
0093256 to
04ae8ba
Compare
Contributor
Author
|
Thank you! I've addressed the feedback in the latest commit. I also fixed the case with Previously, the delimiter stack included delimiters inside code spans, causing incorrect matching. Now, I exclude code span ranges from delimiter scanning to ensure correct behavior. |
etrepum
approved these changes
Jan 27, 2026
ivailop7
approved these changes
Jan 27, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Current Behavior
Currently, Markdown texts are converted into Lexical nodes using regex. In issue #8073, this approach fails to correctly identify nested or complex matching pairs.
For example, with the pattern
*text**text***, the opening tag*should find its corresponding closing tag. But since regex searches for an independently existing closing*, it cannot match correctly.Changes in This PR
To resolve these inconsistencies, I have implemented the CommonMark Delimiter Algorithm for processing emphasis and strong emphasis.
The algorithm proceeds as follows:
Additionally, this PR fixes the issue where formats inside code spans were incorrectly processed. However, there are still some remaining issues. inline elements other than code spans (e.g., links, raw HTML) are handled by text match transformers, making them difficult to address with the current implementation. A new conversion approach may be needed to fully resolve these cases.
As a result, all tests pass. One previously incorrect test was fixed, and 5 new test cases were added.
Closes #8073
Test plan
Before
before.mov
After
after.mov