Skip to content

Feature: How can we maintain the html attributes like id, style from backend? #7840

@KefeiQian

Description

@KefeiQian

Feature Request: Preserve Original HTML Attributes in $generateNodesFromDOM

Problem

When importing backend-generated HTML into Lexical, I need to retain all original attributes (e.g., id, style, class, data-*) so the DOM structure remains faithful to the source.

Currently, using the official API:

const parser = new DOMParser();
const dom = parser.parseFromString(activeDocument.html_content!, 'text/html');
const nodes = $generateNodesFromDOM(editor, dom);

…the resulting Lexical nodes strip most attributes, leaving only the textual content and basic structure.

Attempted Workarounds

I tried hooking into the built-in importers to intercept conversion and preserve IDs:

const builtInImporters: DOMConversionMap[] = [
  TextNode.importDOM?.() || {},
  HeadingNode.importDOM?.() || {},
  ParagraphNode.importDOM?.() || {},
  listImporters, // custom list importers with ID preservation
  QuoteNode.importDOM?.() || {},
  CodeNode.importDOM?.() || {},
];

const allImporters: Record<string, any> = {};
for (const importer of builtInImporters) {
  Object.assign(allImporters, importer);
}

for (const [tag, importerFn] of Object.entries(allImporters)) {
  importMap[tag] = (domNode: Node) => {
    const importer = importerFn(domNode);
    if (!importer) return null;
    return {
      ...importer,
      conversion: (element: Element) => {
        const output = importer.conversion(element);
        const nodeId = element.getAttribute('id');
        if (output?.node && nodeId) {
          $setNodeId(output.node, nodeId);
        }
        return output;
      },
    };
  };
}

However, this approach has several limitations:

  • Incomplete coverage – some built-in importers (e.g., listImporters) can’t be hooked reliably.

  • Nested elements fail – complex HTML like

    <li id="li-1"><strong id="strong-1">hello</strong></li>

    results in missing or invalid nodes (e.g. the <strong> node disappears).

  • Manual maintenance – every new node type requires custom logic, making it brittle.

Desired Solution

Provide an official mechanism to:

  • Preserve all standard attributes (id, class, style, data-*) during $generateNodesFromDOM.
  • Optionally allow a custom attribute mapper (e.g., a callback) so developers can whitelist/transform attributes as needed.
  • Work seamlessly with nested/inline nodes and custom node types.

Why This Matters

Preserving IDs and classes is critical for:

  • Keeping backend-generated anchors and CSS selectors intact.
  • Enabling round-trip editing where HTML comes from a CMS or WYSIWYG backend.
  • Maintaining compatibility with analytics/data attributes or test automation.

Suggested API Ideas

  • Add an attributePreservation option to $generateNodesFromDOM, e.g.:
$generateNodesFromDOM(editor, dom, {
  preserveAttributes: true,
  mapAttributes: (name, value, element) => {
    // e.g., strip inline styles or remap classes
    return value;
  },
});
  • Provide a higher-level import pipeline hook that runs before nodes are finalized.

Would the Lexical team consider adding this feature or offering guidance on a more reliable way to preserve attributes?
This would greatly simplify integrating Lexical with backend-driven HTML content.


Would you like me to tailor this for a GitHub issue format (with checkboxes, labels, etc.) or leave it as a general proposal?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovement over existing feature

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions