Skip to content

Whitespace in pasted HTML is not handled according to the HTML spec #2713

@12joan

Description

@12joan

Note to anyone attempting the bounty: Please read the Expected Behaviour section carefully and write tests to check that all cases are implemented correctly. Discuss it with us via this issue or Discord if anything is unclear.

Deadline: Thursday 2 November 2023

Clarifications:

Description

When pasting HTML into Plate via the deserialize HTML plugin, the pasted HTML is parsed inside getFragment using parseHtmlDocument before being passed as a HTML element to deserializeHtml. As a result of the logic inside deserializeHtml, the stripWhitespace option is ignored when the given argument is a HTML element instead of a string, hence whitespace is not stripped from pasted HTML.

Note that this is the intended behaviour in some circumstances, such as <pre> tags, but not all. See Expected Behaviour.

Context

When copying HTML from Firefox, Firefox inserts additional line feed characters (\n) at regular intervals. For example, the same paragraph with no newlines becomes the following when opened in and copied from Firefox:

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do 
eiusmod tempor incididunt ut labore et dolore magna aliqua. Risus 
pretium quam vulputate dignissim suspendisse in est. Commodo elit at 
imperdiet dui accumsan sit amet nulla facilisi. Bibendum at varius vel 
pharetra vel turpis. Urna nec tincidunt praesent semper. Libero volutpat
 sed cras ornare arcu. Phasellus vestibulum lorem sed risus ultricies 
tristique. Tempus iaculis urna id volutpat lacus laoreet non. Lacus 
vestibulum sed arcu non odio euismod lacinia at quis. Quis lectus nulla 
at volutpat. Auctor urna nunc id cursus metus aliquam. Diam volutpat 
commodo sed egestas egestas fringilla phasellus faucibus scelerisque. 
Odio morbi quis commodo odio aenean sed adipiscing diam. Enim tortor at 
auctor urna. Pulvinar sapien et ligula ullamcorper malesuada proin 
libero nunc consequat. Duis convallis convallis tellus id interdum 
velit. Amet dictum sit amet justo donec enim diam vulputate. Etiam erat 
velit scelerisque in dictum.</p>

This bug results in those same line feed characters appearing inside text nodes when pasting into Plate. While this browser quirk is specific to Firefox, the HTML pasting bug in Plate applies to all browsers.

Steps to Reproduce

  1. Place the above HTML in test.html
  2. Open the HTML file in Firefox
  3. Select the paragraph and copy
  4. Paste the HTML into Plate
Screenshot of HTML pasted into Plate

Expected Behavior

HTML should be parsed as per the HTML spec:

  1. By default, newlines should be ignored
  2. By default, sequences of two or more spaces should be collapsed into a single space

Additionally, this behaviour should be modifiable using the white-space CSS property, regardless of whether this property is included explicitly using a style prop or implicitly though default browser styles. (The <pre> element applies an implicit white-space: pre style.)

See MDN's docs for a complete description of each possible value

image

Finally, as per the HTML spec, 4.4.3 The pre element:"In the HTML syntax, a leading newline character immediately following the pre element start tag is stripped."

Environment

  • slate: N/A
  • slate-react: N/A
  • browser: all browsers

Bounty

Click here to add a bounty via Algora.

Funding

  • You can sponsor this specific effort via a Polar.sh pledge below
  • We receive the pledge once the issue is completed & verified
Fund with Polar

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions