support matcher function for markdown transformers by xinyuan0801 · Pull Request #4500 · facebook/lexical

xinyuan0801 · 2023-05-15T23:57:07Z

Background

As outlined in issue #4481, the current implementation requires the use of complex regular expressions to match a transformer in Markdown. This solution can be quite convoluted and difficult to manage for more complex scenarios.

Purpose of this PR

This Pull Request proposes the introduction of a matcher property to both ElementTransformer and TextMatchTransformer, aiming to offer a more flexible and intuitive method for matching patterns. This is an optional property.

ElementTransformer

In the context of ElementTransformer, the matcher property is a function of type (textContent: string) => Array<string> | null. This function takes the textContent as input and finds all matching substrings or return null if none is found. It is an optional property of ElementTransformer.

When the matcher function is provided, the transformer will use it to find the matching patterns. If the matcher property is absent, the transformer will fall back to the existing regExp property to find matches.

TextMatchTransformer

As for TextMatchTransformer, the matcher property takes the form of a function (textContent: string) => {index: number; matchText: Array<string>} | null. This function operates similarly to the ElementTransformer's matcher, with the addition of an index property, mirroring the index property found in RegExpMatchArray. This is required by the runTextMatchTransformers function.

By implementing these changes, we anticipate a more versatile and user-friendly approach to finding matches within our Markdown transformers, reducing the reliance on complex regular expressions.

Usage

an example usage of the matcher property can be

export const HEADING: ElementTransformer = {
  dependencies: [HeadingNode],
  export: (node, exportChildren) => {
    if (!$isHeadingNode(node)) {
      return null;
    }
    const level = Number(node.getTag().slice(1));
    return '#'.repeat(level) + ' ' + exportChildren(node);
  },
  matcher: (textContent): string[] | null => {
    const headingPrefixes: string[] = ['# ', '## ', '### ', '#### ', '##### ', '###### '];
    const matches: string[] = [];

    for (const prefix of headingPrefixes) {
      if (textContent.startsWith(prefix)) {
        matches.push(prefix, prefix.trim());
      }
    }
    return matches.length > 0 ? matches : null;
  },
  regExp: /^(#{1,6})\s/,
  replace: createBlockNode((match) => {
    const tag = ('h' + match[1].length) as HeadingTagType;
    return $createHeadingNode(tag);
  }),
  type: 'element',
};

where the matcher will handle the pattern finding.

Something To Note

In this PR, the type of the replace parameter in TextMatchTransformer has been updated from (node: TextNode, match: RegExpMatchArray) => void to (node: TextNode, match: Array<string>) => void. This change was made because it was determined that the additional properties provided by RegExpMatchArray are not necessary for the current usage of the function.

By aligning the type of the match parameter in TextMatchTransformer with the replace function in ElementTransformer, which is of type Array<string>, we ensure consistency and simplicity in the codebase.

Fixes #4481

vercel · 2023-05-15T23:57:12Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
lexical	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 16, 2023 0:30am
lexical-playground	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 16, 2023 0:30am

fantactuka · 2023-05-16T13:08:30Z

It would really help to see some example where string match will do any different than reg exp

xinyuan0801 · 2023-05-17T02:29:56Z

It would really help to see some example where string match will do any different than reg exp

I think the key insight here is to not make regexp the only way to do string match. regexp is a powerful tool and i have no doubt it can do most string match needed in markdown plugin, but it does comes with some limitations.

Readability and Maintainability
Examples will be the IMAGE markdown plugin in playground where the regexp is /!(?:\[([^[]*)\])(?:$([^(]+)$)$/, or the regexp for LINK, which is /(?:\[([^[]+)\])(?:$(?:([^()\s]+)(?:\s"((?:[^"]*\\")*[^"]*)"\s*)?)$)$/. Both are pretty long and hard to understand without taking a closer look.

I will give an example comparison with the IMAGE example.

matcher: (  
textContent: string,  
): {index: number; matchText: string[]} | null => {  
  // Find the index of the start of the markdown image tag.
  const start = textContent.indexOf('![');
  if (start < 0) return null;

  // Identify the end of the alt text and extract it.
  const endBracket = textContent.indexOf(']', start);
  if (endBracket < 0) return null;
  const altText = textContent.substring(start + 2, endBracket);

  // Identify the start and end of the URL and extract it.
  const startParen = textContent.indexOf('(', endBracket);
  if (startParen < 0) return null;
  const endParen = textContent.indexOf(')', startParen);
  if (endParen < 0) return null;
  const url = textContent.substring(startParen + 1, endParen);

  // Return the start index and an array with the complete markdown tag, alt text, and URL.
  return {index: start, matchText: [`![${altText}](${url})`, altText, url]};
}

The matcher function accomplishes the same task as the regular expression /!(?:[([^[]*)])(?:(([^(]+)))/. It should be noted that I have only tested it with basic examples, such as single and multiple images, so there might still be some bugs. However, the key idea behind this function is that it offers improved readability compared to the regular expression and makes the debugging process easier.

support matcher function for markdown transformers

6cd622f

xinyuan0801 requested review from acywatson, fantactuka, thegreatercurve, tylerjbainbridge and zurfyx as code owners May 15, 2023 23:57

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2023

vercel bot deployed to Preview – lexical May 15, 2023 23:58 View deployment

vercel bot deployed to Preview – lexical-playground May 15, 2023 23:59 View deployment

fix matcher output type

09c6678

vercel bot deployed to Preview – lexical May 16, 2023 00:29 View deployment

vercel bot deployed to Preview – lexical-playground May 16, 2023 00:30 View deployment

fantactuka mentioned this pull request May 25, 2023

Feature: Split markdown shortcuts, import and export #4550

Open

xinyuan0801 closed this Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support matcher function for markdown transformers#4500

support matcher function for markdown transformers#4500
xinyuan0801 wants to merge 2 commits intofacebook:mainfrom
xinyuan0801:markdown-matcher-support

xinyuan0801 commented May 15, 2023 •

edited

Loading

Uh oh!

vercel bot commented May 15, 2023 •

edited

Loading

Uh oh!

fantactuka commented May 16, 2023

Uh oh!

xinyuan0801 commented May 17, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xinyuan0801 commented May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Purpose of this PR

ElementTransformer

TextMatchTransformer

Usage

Something To Note

Uh oh!

vercel bot commented May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fantactuka commented May 16, 2023

Uh oh!

xinyuan0801 commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xinyuan0801 commented May 15, 2023 •

edited

Loading

vercel bot commented May 15, 2023 •

edited

Loading

xinyuan0801 commented May 17, 2023 •

edited

Loading