Skip to content

RFC: Add hook to transform LexicalNodes converted from HTML during pasting#1153

Merged
acywatson merged 6 commits intomainfrom
post-transform
Jan 22, 2022
Merged

RFC: Add hook to transform LexicalNodes converted from HTML during pasting#1153
acywatson merged 6 commits intomainfrom
post-transform

Conversation

@acywatson
Copy link
Copy Markdown
Contributor

@acywatson acywatson commented Jan 21, 2022

This implements a post-transform hook to allow manipulation of the Lexical sub-tree formed from the DOM Node's children. The immediate problem this would solve is allowing us to properly support code pasted from VS Code (and other code editors), which uses a div element per line of code when code is copied to the clipboard.

I need to add some tests for this if we keep it.

Partially addresses #1042

@vercel
Copy link
Copy Markdown

vercel bot commented Jan 21, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/fbopensource/lexical/GjrfvHt6GqNsDDEb7yezVg2wHDnN
✅ Preview: https://lexical-git-post-transform-fbopensource.vercel.app

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2022
Comment on lines +98 to +149
div: (domNode: Node) => {
return {
node: null,
after: (lexicalNodes, dom) => {
const domParent = dom.parentNode;
if (domParent != null && dom !== domParent.lastChild) {
lexicalNodes.push($createLineBreakNode());
}
return lexicalNodes;
},
};
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API you're suggesting is fine. I just wonder if it's possible to merge everything into node.

div: (domNode, dom) => {
  return [nodes]
}

If we have to go more complex maybe it will make sense to evolve this into something that you pass a selection in and returns the desired output. Kinda like a transform?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domNode and dom are actually the same thing - a reference to the DOM node we're currently converting to Lexical. Maksim pointed out below that we actually don't need to pass that into after, since we can access it from the scope of the root conversion.

In the solution that you suggested, I'm not sure what a user would do with the Selection. We can discuss offline, if you want. It'd probably be beneficial for me to brief you (and others) on the intent behind this whole system, anyway.

@fantactuka
Copy link
Copy Markdown
Collaborator

fantactuka commented Jan 21, 2022

Had similar idea with post-processing hook, even name was the same!

Few thoughts around it:

  • We can give more control to after hook if pass currentLexicalNode and childNodes and let it modify both:
  const children = node.childNodes;
  let childLexicalNodes = []
  for (let i = 0; i < children.length; i++) {
    childLexicalNodes.push(...$createNodesFromDOM(
      children[i],
      conversionMap,
      editor,
      currentTextFormat,
    ));
  }

  [currentLexicalNode, childLexicalNodes] = transformOutput.after(currentLexicalNode, childLexicalNodes);

  // rest of the logic where we append child nodes to current node if it's not null, or concat with other nodes otherwise
  • after does not need node param as it's available in convertor itself:
div: (domNode: Node)
      ^^^^^^^

Alternative solution I was looking into is to modify dom node within convertor, e.g.:

+  const isInlineNode = (domNode: Node): boolean => {
+    return inlineNodeNames.has(domNode.nodeName.toLowerCase());
+  };
+
+  const isCodeElement = (div: HTMLDivElement): boolean => {
+    return div.style.fontFamily.match('monospace') !== null;
+  };

...

+  pre: () => ({node: $createCodeNode()}),
+  div: (domNode: Node) => {
+    const {childNodes} = domNode;
+    const lastChild = childNodes[childNodes.length - 1];
+    // $FlowFixMe[incompatible-cast] it is a <div> since it's matched by nodeName
+    const isCode = isCodeElement((domNode: HTMLDivElement));
+
+    // Mainly for code copy/pasting, adding line break in the end of the div
+    // if it's last child is inline (text) node.
+    if (
+      !isCode &&
+      lastChild &&
+      isInlineNode(lastChild) &&
+      domNode.nextSibling !== null
+    ) {
+      domNode.appendChild(document.createElement('br'));
+    }
+
+    return {
+      node: isCode ? $createCodeNode() : null,
+    };
+  },

That way we "prepare" html in advance by injecting missed <br> for code blocks. The reason I was trying this is (surprisingly) Github. All editors (even FB CodeHub) I've tried generate reasonable HTML for copied code, but GH's markup is based on tables 🙈

<table ...>
  <tr>
    <td data-line-number /><td>....code </td>
  </tr>
  <tr>
    <td data-line-number /><td>....code </td>
  </tr>
</table>

Which is quite hard to transform into code, so my idea was when we encounter this specific case in table converter, we can do same html preparation and replace table with <pre> and getting content by traversing its rows. Which could be harder with hook as we only have access to first level child nodes.

Thinking more about it, I don't know if we want to have this GH case as a built-it, but it's an interesting scenario anyways

@acywatson
Copy link
Copy Markdown
Contributor Author

Had similar idea with post-processing hook, even name was the same!

Few thoughts around it:

  • We can give more control to after hook if pass currentLexicalNode and childNodes and let it modify both:
  const children = node.childNodes;
  let childLexicalNodes = []
  for (let i = 0; i < children.length; i++) {
    childLexicalNodes.push(...$createNodesFromDOM(
      children[i],
      conversionMap,
      editor,
      currentTextFormat,
    ));
  }

  [currentLexicalNode, childLexicalNodes] = transformOutput.after(currentLexicalNode, childLexicalNodes);

  // rest of the logic where we append child nodes to current node if it's not null, or concat with other nodes otherwise
  • after does not need node param as it's available in convertor itself:
div: (domNode: Node)
      ^^^^^^^

Alternative solution I was looking into is to modify dom node within convertor, e.g.:

+  const isInlineNode = (domNode: Node): boolean => {
+    return inlineNodeNames.has(domNode.nodeName.toLowerCase());
+  };
+
+  const isCodeElement = (div: HTMLDivElement): boolean => {
+    return div.style.fontFamily.match('monospace') !== null;
+  };

...

+  pre: () => ({node: $createCodeNode()}),
+  div: (domNode: Node) => {
+    const {childNodes} = domNode;
+    const lastChild = childNodes[childNodes.length - 1];
+    // $FlowFixMe[incompatible-cast] it is a <div> since it's matched by nodeName
+    const isCode = isCodeElement((domNode: HTMLDivElement));
+
+    // Mainly for code copy/pasting, adding line break in the end of the div
+    // if it's last child is inline (text) node.
+    if (
+      !isCode &&
+      lastChild &&
+      isInlineNode(lastChild) &&
+      domNode.nextSibling !== null
+    ) {
+      domNode.appendChild(document.createElement('br'));
+    }
+
+    return {
+      node: isCode ? $createCodeNode() : null,
+    };
+  },

That way we "prepare" html in advance by injecting missed <br> for code blocks. The reason I was trying this is (surprisingly) Github. All editors (even FB CodeHub) I've tried generate reasonable HTML for copied code, but GH's markup is based on tables 🙈

<table ...>
  <tr>
    <td data-line-number /><td>....code </td>
  </tr>
  <tr>
    <td data-line-number /><td>....code </td>
  </tr>
</table>

Which is quite hard to transform into code, so my idea was when we encounter this specific case in table converter, we can do same html preparation and replace table with \<pre\> and getting content by traversing its rows. Which could be harder with hook as we only have access to first level child nodes.

Thinking more about it, I don't know if we want to have this GH case as a built-it, but it's an interesting scenario anyways

As we discussed offline, I think the DOM pre-processing approach can be useful in other scenarios - may even be worth documenting as a sort of escape hatch in certain use cases or just to demonstrate the flexibility of this interface.

I'm going to go ahead and make the other changes you suggested and merge our PRs together. Thanks for working through this with me!

@acywatson acywatson merged commit 6e35193 into main Jan 22, 2022
@trueadm trueadm deleted the post-transform branch April 6, 2022 15:22
acywatson added a commit that referenced this pull request Apr 9, 2022
…sting (#1153)

* wip

* wip

* add postTransform to HTML paste logic

* Add code pasting support (#1159)

* Add code detecting and e2e tests

* Add quip paste

* Add github table-based code pasting

* remove dom node from post-processor signature

* fix rebase issue

Co-authored-by: Maksim Horbachevsky <fantactuka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants