Skip to content

feat(html): tweak printer#5134

Closed
ikatyang wants to merge 10 commits intoprettier:masterfrom
ikatyang:fix/html/strict-indent
Closed

feat(html): tweak printer#5134
ikatyang wants to merge 10 commits intoprettier:masterfrom
ikatyang:fix/html/strict-indent

Conversation

@ikatyang
Copy link
Copy Markdown
Member

@ikatyang ikatyang commented Sep 23, 2018

  • tweak the ast in the printer.preprocess
    • whitespace-only text nodes are removed
      (this does not affect how we print whitespaces, it's an internal change.)
    • add startLocation and endLocation for nextEmptyLine
  • tweak the boundary of printed nodes
    • newlines between siblings are always handled in printChildren
  • indent the script/style content in multiparser
  • no inconsistent output for 2+ attributes (repro, comment 1, comment 2)
  • force break tag if there're multiline attributes (repro, comment)

Prettier pr-5134
Playground link

--parser html

Input:

<X>
</X>
<X a="1">
</X>
<X a="1" b="2">
</X>
<X a="1" b="2" c="3">
</X>

<p 
  class="
    foo
    bar
    baz
  "
>
  Hello world!<BR/> This is HTML5 Boilerplate.
</p>

Output:

<X></X>
<X a="1"></X>
<X a="1" b="2"></X>
<X a="1" b="2" c="3"></X>

<p
  class="
    foo
    bar
    baz
  "
>
  Hello world!
  <br />
  This is HTML5 Boilerplate.
</p>

  • I’ve added tests to confirm my change works.
  • (If changing the API or CLI) I’ve documented the changes I’ve made (in the docs/ directory)
  • I’ve read the contributing guidelines.

@ikatyang ikatyang mentioned this pull request Sep 23, 2018
26 tasks
@ikatyang ikatyang force-pushed the fix/html/strict-indent branch from a45ae9f to 84e3f8d Compare September 23, 2018 06:58
@thorn0
Copy link
Copy Markdown
Member

thorn0 commented Sep 23, 2018

Whitespaces in HTML are such a PITA. E.g.

<div>200 m<sup>2</sup></div>

should not become

<div>
  200 m
  <sup>2</sup>
</div>

repro

@ikatyang
Copy link
Copy Markdown
Member Author

@thorn0 That's a known issue and so far we don't know how to fix it without being ugly, see #4753 (comment) and its following comments for more info.

@thorn0
Copy link
Copy Markdown
Member

thorn0 commented Sep 23, 2018

A rule of thumb might look like following:

  1. Never add a whitespace (new line) between nodes if there was no whitespace between them in the input.
  2. Never remove whitespaces completely, keep at least one whitespace character like the conservativeCollapse option of html-minifier does.

Overall, it might be useful to have a look at html-minifier. Of course, it does the opposite thing of what Prettier does, but still, they have faced similar issues.

@thorn0
Copy link
Copy Markdown
Member

thorn0 commented Sep 23, 2018

A more sophisticated (but also fragile) approach would be to take into account the default value of display of the standard elements and assume that it's safe to add/remove whitespaces between elements that by default are block (div, p, etc.). That's what TinyMCE's built-in formatter does as this behavior is quite expected. On the other hand, it will require heavy usage of prettier-ignore in some situations.

@ikatyang
Copy link
Copy Markdown
Member Author

There's not much possible solution for whitespace formatting in HTML, the spec is quite limited. I think I'll try to implement #4753 (comment) (won't be in this PR), which is the most safe one but a little ugly. Added to the TODO list.

Something like the following cases, though I'm not sure if we should apply the sophisticated rule, CSS makes everything possible.

<!-- before -->
<div> <p> 123 <a href="#something"> First First First First First First First First First First First First First </a> 456 <a> Second Second Second Second Second Second Second Second Second Second Second Second Second </a> 789 </p> </div>

<!-- after -->
<div>
  <p>
    123
    <a href="#something">
      First First First First First First First First First First First First First
    </a>
    456
    <a>
      Second Second Second Second Second Second Second Second Second Second Second Second Second
    </a>
    789
  </p>
</div>
<!-- before -->
<div><p>123<a href="#something">First First First First First First First First First First First First First</a>456<a>Second Second Second Second Second Second Second Second Second Second Second Second Second</a>789</p></div>

<!-- after -->
<div
  ><p
    >123<a
      href="#something"
      >First First First First First First First First First First First First First</a
    >456<a
      >Second Second Second Second Second Second Second Second Second Second Second Second Second</a
    >789</p
  ></div
>
<!-- before -->
<div> <p> 123 <a href="#something"> First First First First First First First First First First First First First </a>456<a>Second Second Second Second Second Second Second Second Second Second Second Second Second</a>789</p></div>

<!-- after -->
<div>
  <p>
    123
    <a href="#something">
      First First First First First First First First First First First First First
    </a
    >456<a
      >Second Second Second Second Second Second Second Second Second Second Second Second Second</a
    >789</p
  ></div
>
<ul
  ><li>First</li
  ><li>Second</li
></ul>

<ul
  >123<li
    >First</li
  ><li>Second</li
></ul>

<ul
  ><li>First</li
  >456<li
    >Second</li
></ul>

<ul
  ><li>First</li
  ><li>Second</li
  >789</ul
>


<ul
  >123<li
    >First</li
  >456<li
    >Second</li
></ul>


<ul
  >123<li
    >First</li
  ><li>Second</li
  >789</ul
>

<ul
  ><li>First</li
  >456<li
    >Second</li
  >789</ul
>

<ul
  >123<li
    >First</li
  >456<li
    >Second</li
  >789</ul
>

@lydell
Copy link
Copy Markdown
Member

lydell commented Sep 23, 2018

I think we should format as good as we can and document the gotchas and their workarounds.

@ikatyang
Copy link
Copy Markdown
Member Author

IMO, we shouldn't break users' input, at least not by default. Maybe we should introduce an option for this whitespace thing:

  • break tags (default, the safest one) ref: [WIP] feat: html support #4753 (comment)
    <ul
      ><li>First</li
      ><li>Second</li
    ></ul>
    
    <ul
      >123<li
        >First</li
      >456<li
        >Second</li
      >789</ul
    >
  • add comments (safe if no DOM manipulation) ref: [WIP] feat: html support #4753 (comment)
    <ul><!--
      --><li>First</li><!--
      --><li>Second</li><!--
    --></ul>
    
    <ul><!--
      -->123<!--
      --><li>First</li><!--
      -->456<!--
      --><li>Second</li><!--
      -->789<!--
    --></ul>
  • ignore whitespaces, the current behavior. (not safe but pretty)
    <ul>
      <li>First</li>
      <li>Second</li>
    </ul>
    
    <ul>
      123
      <li>First</li>
      456
      <li>Second</li>
      789
    </ul>

cc @vjeux

@vjeux
Copy link
Copy Markdown
Contributor

vjeux commented Sep 24, 2018

While i’m all for safety, it so drastically reduces our options for the common case that I believe it’s okay to make the default not safe. We should provide a reasonable way to find when whitespace would be important (can be hard since it is based on css) and provide a reasonable way to tell prettier not to remove them.

@ikatyang
Copy link
Copy Markdown
Member Author

How about:

  • apply the sophisticated rule (i.e., take the default value of display into account),
  • allow magic comment <!-- prettier-whitespace-sensitive -->/<!-- prettier-whitespace-insensitive --> to override the sophisticated rule, and
  • add an option for how to handle whitespaces, e.g., --html-break-tags (default: true) ignore whitespaces if false.

I've modified the rule for breaking tags, it looks better now:

<ul
  ><li>First</li
  ><li>Second</li
></ul>

<ul
  >123<li
  >First</li
  ><li>Second</li
></ul>

<ul
  ><li>First</li
  >456<li
  >Second</li
></ul>

<ul
  ><li>First</li
  ><li>Second</li
  >789</ul
>

<ul
  >123<li
  >First</li
  >456<li
  >Second</li
  >789</ul
>

<ul
  >123<li
    class="foo"
    id="bar"
  >First</li
  >456<li
    class="baz"
  >Second</li
  >789</ul
>

@claytonrcarter
Copy link
Copy Markdown

I just found Prettier and I'm very excited about HTML support. I think that the the so called "sophisticated rule" is a reasonable default. I would think it safe to assume that most people have not altered the display of the standard elements. If they have, then they've already "broken out" of default behavior elsewhere, so it seems reasonable to expect them to use special config/comments to do so here as well.

Furthermore, I would encourage you to use some real world markup as you test. Things like @thorn0 posted (<div>200 m<sup>2</sup></div>) or something like this example from MDN

<p>Want to write us a letter? Use our <a href="contacts.html#Mailing_address">mailing address</a>.</p>

Some of the examples above (with First First First etc) seem contrived and may not represent the bulk of existing HTML content.

My use cases may be unique, but I did a quick check of our HTML templates (several hundred) and I found numerous examples of things like:

<a>Lorem</a>, ispum dolor sit <strong>amet</strong>.

In these cases, moving the punctuation onto their own lines would be undesirable.

Thank you for all of your work on this! I'm looking forward to when it lands!

@michaeljota
Copy link
Copy Markdown

Whitespaces in HTML are such a PITA. E.g.

<div>200 m<sup>2</sup></div>

should not become

<div>
 200 m
 <sup>2</sup>
</div>

I don't know about that. I mean, that's literally what a formatter should do. I think @ikatyang is doing a great job so far, somethings still behave weir, but the format itself is working as expected.

@ikatyang
Copy link
Copy Markdown
Member Author

@michaeljota The formatting there is not safe, for example the before/after:

image

@ikatyang
Copy link
Copy Markdown
Member Author

ikatyang commented Sep 26, 2018

I'll move this discussion to a new thread later as it's not suitable to discuss in an unrelated thread. (This PR does not change how we print whitespaces but the internal implementation.)

@ikatyang
Copy link
Copy Markdown
Member Author

Closing in favor of #5168

@ikatyang ikatyang closed this Sep 30, 2018
@ikatyang ikatyang deleted the fix/html/strict-indent branch September 30, 2018 09:15
@lipis lipis added this to the 1.15 milestone Oct 26, 2018
@lock lock bot added the locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting. label Jan 24, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants