Skip to content

Basic HTML pretty-printing#5533

Merged
laurmaedje merged 13 commits intotypst:mainfrom
01mf02:main
Dec 10, 2024
Merged

Basic HTML pretty-printing#5533
laurmaedje merged 13 commits intotypst:mainfrom
01mf02:main

Conversation

@01mf02
Copy link
Contributor

@01mf02 01mf02 commented Dec 5, 2024

With this PR, the following Typst document:

= Main Section

Hello T#emph[yps]t from *HTML*!

== Subsection

#html.elem("div")[#html.elem("button")[one]#html.elem("button")[two]]

turns into:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
  </head>
  <body>
    <h2>
      Main Section
    </h2>
    <p>
      Hello T<em>yps</em>t from <strong>HTML</strong>!
    </p>
    <h3>
      Subsection
    </h3>
    <div>
      <button>one</button><button>two</button>
    </div>
  </body>
</html>

Before, it was:

<!DOCTYPE html><html><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1"></head><body><h2>1. Main Section</h2><p>Hello T<em>yps</em>t from <strong>HTML</strong>!</p><h3>1.a. Subsection</h3><div><button>one</button><button>two</button></div></body></html>

This should make writing tests much easier.

Pretty-printing can still be disabled in encode.rs by setting pretty: false.

Note that the goal of this PR is not to write a full-fledged HTML prettifier; in particular, content is not line-wrapped. Such more complex behaviour would make the implementation harder to understand and make its behaviour less predictable for users. This PR is written with the 80/20 rule in mind: Make it 80% pretty with 20% of the work. That's also why relatively few elements currently trigger an indentation. Still, this list of elements can be gradually expanded.

Thanks to @reknih for pointing out that CSS can actually make whitespace matter where it does not by default, which breaks pretty printing assumptions that we can insert whitespace while preserving semantics. An example that makes <p> whitespace-sensitive, for example: https://www.w3.org/TR/css-text-3/#example-af2745cd
Still, for most users, I suspect that having some pretty printing is still beneficial (at least for us, it will be when we write tests!).
So I think that to cover all (potentially pathological) use cases, it would be enough to make pretty printing configurable (on/off), with defaulting to on. Or we even make the list of tags that should be indented configurable. We could do this in a follow-up PR.

@laurmaedje
Copy link
Member

I think this approach can work. Using the default value of display: block is also what prettier does.

A few remarks:

  • Perhaps we should prevent any kind of wrapping in non-wrapping elements in a nested way (e.g. for #span[#div[]], even though this is not technically valid), because the output looks bad.

  • A new function tag::is_block_by_default would be good to extract and use in whitespace_inside. I think we can fill in the whole list from the get go rather than just adding a few elements. Probably tag:is_inline should be renamed to tag::is_inline_by_default.

  • <meta> should not allow whitespace_inside since it's void.

@01mf02
Copy link
Contributor Author

01mf02 commented Dec 6, 2024

Perhaps we should prevent any kind of wrapping in non-wrapping elements in a nested way (e.g. for #span[#div[]], even though this is not technically valid), because the output looks bad.

I would not try to correct technically invalid input. Garbage in, garbage out.

A new function tag::is_block_by_default would be good to extract and use in whitespace_inside. I think we can fill in the whole list from the get go rather than just adding a few elements. Probably tag:is_inline should be renamed to tag::is_inline_by_default.

I agree about block_by_default.

I would be all for having such a list, but your saying "the whole list" implies that there is such a list, whereas the HTML standard does not mention one, and other sources deny that there is such a list. The problem is intensified by the fact that many tags can appear in multiple contexts. Which elements should be part of the list in your opinion?

<meta> should not allow whitespace_inside since it's void.

I agree.

@reknih
Copy link
Member

reknih commented Dec 6, 2024

I would be all for having such a list, but your saying "the whole list" implies that there is such a list, whereas the HTML standard does not mention one, and other sources deny that there is such a list. The problem is intensified by the fact that many tags can appear in multiple contexts. Which elements should be part of the list in your opinion?

You can obtain the list that prettier uses in these two files:

@laurmaedje laurmaedje mentioned this pull request Dec 6, 2024
48 tasks
@laurmaedje
Copy link
Member

I would not try to correct technically invalid input. Garbage in, garbage out.

It's not about correcting anything, just about not making the printing unnecessarily ugly just because the element nesting is semantically invalid. It should be trivial to disable pretty printing within the scope of a non-block element. I see no reason for not doing it.

@01mf02
Copy link
Contributor Author

01mf02 commented Dec 6, 2024

You can obtain the list that prettier uses in these two files:

Thanks a lot for the links! I added the elements in the second file that are marked with:

    styles: [
      {
        property: 'display',
        value: 'block',
      },
    ],

@01mf02
Copy link
Contributor Author

01mf02 commented Dec 6, 2024

It's not about correcting anything, just about not making the printing unnecessarily ugly just because the element nesting is semantically invalid. It should be trivial to disable pretty printing within the scope of a non-block element. I see no reason for not doing it.

A reason for not doing it could be code size. But I do not have a strong opinion about this, so I did implement your suggestion in c067909.

@01mf02
Copy link
Contributor Author

01mf02 commented Dec 6, 2024

The input:

= Main Section

Hello T#emph[yps]t from *HTML*!

#html.elem("button")[#html.elem("div", [one])]

== Subsection

#html.elem("div")[#html.elem("div")[#html.elem("button")[one]#html.elem("button")[two]]]

now produces:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1">
  </head>
  <body>
    <h2>
      Main Section
    </h2>
    <p>
      Hello T<em>yps</em>t from <strong>HTML</strong>!
    </p>
    <button><div>one</div></button>
    <h3>
      Subsection
    </h3>
    <div>
      <div>
        <button>one</button><button>two</button>
      </div>
    </div>
  </body>
</html>

@laurmaedje
Copy link
Member

Looks good mostly, though I think <meta> should probably have whitespace around it.

I see that you deleted the whitespace_around / whitespace_inside distinction though. I was unsure about it as well, but for <meta> it might make sense.

@laurmaedje laurmaedje changed the title HTML pretty-printing. HTML pretty-printing Dec 8, 2024
@01mf02
Copy link
Contributor Author

01mf02 commented Dec 9, 2024

Looks good mostly, though I think <meta> should probably have whitespace around it.

I implemented and documented a solution for this (584dbf0). Let me know what you think about it.

@laurmaedje
Copy link
Member

I think I'd prefer to add <meta> to HtmlElement::is_pretty rather than is_block_by_default. The comments look good!

@01mf02 01mf02 marked this pull request as ready for review December 9, 2024 13:52
@laurmaedje laurmaedje changed the title HTML pretty-printing Basic HTML pretty-printing Dec 10, 2024
@laurmaedje laurmaedje enabled auto-merge December 10, 2024 09:54
@laurmaedje
Copy link
Member

Thanks!

@laurmaedje laurmaedje added this pull request to the merge queue Dec 10, 2024
Merged via the queue into typst:main with commit 17f20c6 Dec 10, 2024
@laurmaedje laurmaedje added the html Related to HTML export label Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

html Related to HTML export

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants