Feature Request: Preserve Original HTML Attributes in $generateNodesFromDOM
Problem
When importing backend-generated HTML into Lexical, I need to retain all original attributes (e.g., id, style, class, data-*) so the DOM structure remains faithful to the source.
Currently, using the official API:
const parser = new DOMParser();
const dom = parser.parseFromString(activeDocument.html_content!, 'text/html');
const nodes = $generateNodesFromDOM(editor, dom);
…the resulting Lexical nodes strip most attributes, leaving only the textual content and basic structure.
Attempted Workarounds
I tried hooking into the built-in importers to intercept conversion and preserve IDs:
const builtInImporters: DOMConversionMap[] = [
TextNode.importDOM?.() || {},
HeadingNode.importDOM?.() || {},
ParagraphNode.importDOM?.() || {},
listImporters, // custom list importers with ID preservation
QuoteNode.importDOM?.() || {},
CodeNode.importDOM?.() || {},
];
const allImporters: Record<string, any> = {};
for (const importer of builtInImporters) {
Object.assign(allImporters, importer);
}
for (const [tag, importerFn] of Object.entries(allImporters)) {
importMap[tag] = (domNode: Node) => {
const importer = importerFn(domNode);
if (!importer) return null;
return {
...importer,
conversion: (element: Element) => {
const output = importer.conversion(element);
const nodeId = element.getAttribute('id');
if (output?.node && nodeId) {
$setNodeId(output.node, nodeId);
}
return output;
},
};
};
}
However, this approach has several limitations:
-
Incomplete coverage – some built-in importers (e.g., listImporters) can’t be hooked reliably.
-
Nested elements fail – complex HTML like
<li id="li-1"><strong id="strong-1">hello</strong></li>
results in missing or invalid nodes (e.g. the <strong> node disappears).
-
Manual maintenance – every new node type requires custom logic, making it brittle.
Desired Solution
Provide an official mechanism to:
- Preserve all standard attributes (
id, class, style, data-*) during $generateNodesFromDOM.
- Optionally allow a custom attribute mapper (e.g., a callback) so developers can whitelist/transform attributes as needed.
- Work seamlessly with nested/inline nodes and custom node types.
Why This Matters
Preserving IDs and classes is critical for:
- Keeping backend-generated anchors and CSS selectors intact.
- Enabling round-trip editing where HTML comes from a CMS or WYSIWYG backend.
- Maintaining compatibility with analytics/data attributes or test automation.
Suggested API Ideas
- Add an
attributePreservation option to $generateNodesFromDOM, e.g.:
$generateNodesFromDOM(editor, dom, {
preserveAttributes: true,
mapAttributes: (name, value, element) => {
// e.g., strip inline styles or remap classes
return value;
},
});
- Provide a higher-level import pipeline hook that runs before nodes are finalized.
Would the Lexical team consider adding this feature or offering guidance on a more reliable way to preserve attributes?
This would greatly simplify integrating Lexical with backend-driven HTML content.
Would you like me to tailor this for a GitHub issue format (with checkboxes, labels, etc.) or leave it as a general proposal?
Feature Request: Preserve Original HTML Attributes in
$generateNodesFromDOMProblem
When importing backend-generated HTML into Lexical, I need to retain all original attributes (e.g.,
id,style,class,data-*) so the DOM structure remains faithful to the source.Currently, using the official API:
…the resulting Lexical nodes strip most attributes, leaving only the textual content and basic structure.
Attempted Workarounds
I tried hooking into the built-in importers to intercept conversion and preserve IDs:
However, this approach has several limitations:
Incomplete coverage – some built-in importers (e.g.,
listImporters) can’t be hooked reliably.Nested elements fail – complex HTML like
results in missing or invalid nodes (e.g. the
<strong>node disappears).Manual maintenance – every new node type requires custom logic, making it brittle.
Desired Solution
Provide an official mechanism to:
id,class,style,data-*) during$generateNodesFromDOM.Why This Matters
Preserving IDs and classes is critical for:
Suggested API Ideas
attributePreservationoption to$generateNodesFromDOM, e.g.:Would the Lexical team consider adding this feature or offering guidance on a more reliable way to preserve attributes?
This would greatly simplify integrating Lexical with backend-driven HTML content.
Would you like me to tailor this for a GitHub issue format (with checkboxes, labels, etc.) or leave it as a general proposal?