Skip to content

support matcher function for markdown transformers#4500

Closed
xinyuan0801 wants to merge 2 commits intofacebook:mainfrom
xinyuan0801:markdown-matcher-support
Closed

support matcher function for markdown transformers#4500
xinyuan0801 wants to merge 2 commits intofacebook:mainfrom
xinyuan0801:markdown-matcher-support

Conversation

@xinyuan0801
Copy link
Copy Markdown
Contributor

@xinyuan0801 xinyuan0801 commented May 15, 2023

Background

As outlined in issue #4481, the current implementation requires the use of complex regular expressions to match a transformer in Markdown. This solution can be quite convoluted and difficult to manage for more complex scenarios.

Purpose of this PR

This Pull Request proposes the introduction of a matcher property to both ElementTransformer and TextMatchTransformer, aiming to offer a more flexible and intuitive method for matching patterns. This is an optional property.

ElementTransformer

In the context of ElementTransformer, the matcher property is a function of type (textContent: string) => Array<string> | null. This function takes the textContent as input and finds all matching substrings or return null if none is found. It is an optional property of ElementTransformer.

When the matcher function is provided, the transformer will use it to find the matching patterns. If the matcher property is absent, the transformer will fall back to the existing regExp property to find matches.

TextMatchTransformer

As for TextMatchTransformer, the matcher property takes the form of a function (textContent: string) => {index: number; matchText: Array<string>} | null. This function operates similarly to the ElementTransformer's matcher, with the addition of an index property, mirroring the index property found in RegExpMatchArray. This is required by the runTextMatchTransformers function.

By implementing these changes, we anticipate a more versatile and user-friendly approach to finding matches within our Markdown transformers, reducing the reliance on complex regular expressions.

Usage

an example usage of the matcher property can be

export const HEADING: ElementTransformer = {
  dependencies: [HeadingNode],
  export: (node, exportChildren) => {
    if (!$isHeadingNode(node)) {
      return null;
    }
    const level = Number(node.getTag().slice(1));
    return '#'.repeat(level) + ' ' + exportChildren(node);
  },
  matcher: (textContent): string[] | null => {
    const headingPrefixes: string[] = ['# ', '## ', '### ', '#### ', '##### ', '###### '];
    const matches: string[] = [];

    for (const prefix of headingPrefixes) {
      if (textContent.startsWith(prefix)) {
        matches.push(prefix, prefix.trim());
      }
    }
    return matches.length > 0 ? matches : null;
  },
  regExp: /^(#{1,6})\s/,
  replace: createBlockNode((match) => {
    const tag = ('h' + match[1].length) as HeadingTagType;
    return $createHeadingNode(tag);
  }),
  type: 'element',
};      

where the matcher will handle the pattern finding.

Something To Note

In this PR, the type of the replace parameter in TextMatchTransformer has been updated from (node: TextNode, match: RegExpMatchArray) => void to (node: TextNode, match: Array<string>) => void. This change was made because it was determined that the additional properties provided by RegExpMatchArray are not necessary for the current usage of the function.

By aligning the type of the match parameter in TextMatchTransformer with the replace function in ElementTransformer, which is of type Array<string>, we ensure consistency and simplicity in the codebase.

Fixes #4481

@vercel
Copy link
Copy Markdown

vercel bot commented May 15, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
lexical ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 16, 2023 0:30am
lexical-playground ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 16, 2023 0:30am

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2023
@fantactuka
Copy link
Copy Markdown
Collaborator

It would really help to see some example where string match will do any different than reg exp

@xinyuan0801
Copy link
Copy Markdown
Contributor Author

xinyuan0801 commented May 17, 2023

It would really help to see some example where string match will do any different than reg exp

I think the key insight here is to not make regexp the only way to do string match. regexp is a powerful tool and i have no doubt it can do most string match needed in markdown plugin, but it does comes with some limitations.

  • Readability and Maintainability
    Examples will be the IMAGE markdown plugin in playground where the regexp is /!(?:\[([^[]*)\])(?:\(([^(]+)\))$/, or the regexp for LINK, which is /(?:\[([^[]+)\])(?:\((?:([^()\s]+)(?:\s"((?:[^"]*\\")*[^"]*)"\s*)?)\))$/. Both are pretty long and hard to understand without taking a closer look.

I will give an example comparison with the IMAGE example.

matcher: (  
textContent: string,  
): {index: number; matchText: string[]} | null => {  
  // Find the index of the start of the markdown image tag.
  const start = textContent.indexOf('![');
  if (start < 0) return null;

  // Identify the end of the alt text and extract it.
  const endBracket = textContent.indexOf(']', start);
  if (endBracket < 0) return null;
  const altText = textContent.substring(start + 2, endBracket);

  // Identify the start and end of the URL and extract it.
  const startParen = textContent.indexOf('(', endBracket);
  if (startParen < 0) return null;
  const endParen = textContent.indexOf(')', startParen);
  if (endParen < 0) return null;
  const url = textContent.substring(startParen + 1, endParen);

  // Return the start index and an array with the complete markdown tag, alt text, and URL.
  return {index: start, matchText: [`![${altText}](${url})`, altText, url]};
}

The matcher function accomplishes the same task as the regular expression /!(?:[([^[]*)])(?:(([^(]+)))/. It should be noted that I have only tested it with basic examples, such as single and multiple images, so there might still be some bugs. However, the key idea behind this function is that it offers improved readability compared to the regular expression and makes the debugging process easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Allow matcher funcion for markdown transformers

3 participants