Basic HTML pretty-printing#5533
Conversation
|
I think this approach can work. Using the default value of A few remarks:
|
I would not try to correct technically invalid input. Garbage in, garbage out.
I agree about I would be all for having such a list, but your saying "the whole list" implies that there is such a list, whereas the HTML standard does not mention one, and other sources deny that there is such a list. The problem is intensified by the fact that many tags can appear in multiple contexts. Which elements should be part of the list in your opinion?
I agree. |
You can obtain the list that |
It's not about correcting anything, just about not making the printing unnecessarily ugly just because the element nesting is semantically invalid. It should be trivial to disable pretty printing within the scope of a non-block element. I see no reason for not doing it. |
Thanks a lot for the links! I added the elements in the second file that are marked with: |
A reason for not doing it could be code size. But I do not have a strong opinion about this, so I did implement your suggestion in c067909. |
|
The input: = Main Section
Hello T#emph[yps]t from *HTML*!
#html.elem("button")[#html.elem("div", [one])]
== Subsection
#html.elem("div")[#html.elem("div")[#html.elem("button")[one]#html.elem("button")[two]]]now produces: <!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<h2>
Main Section
</h2>
<p>
Hello T<em>yps</em>t from <strong>HTML</strong>!
</p>
<button><div>one</div></button>
<h3>
Subsection
</h3>
<div>
<div>
<button>one</button><button>two</button>
</div>
</div>
</body>
</html> |
|
Looks good mostly, though I think I see that you deleted the whitespace_around / whitespace_inside distinction though. I was unsure about it as well, but for |
I implemented and documented a solution for this (584dbf0). Let me know what you think about it. |
|
I think I'd prefer to add |
|
Thanks! |
With this PR, the following Typst document:
turns into:
Before, it was:
This should make writing tests much easier.
Pretty-printing can still be disabled in
encode.rsby settingpretty: false.Note that the goal of this PR is not to write a full-fledged HTML prettifier; in particular, content is not line-wrapped. Such more complex behaviour would make the implementation harder to understand and make its behaviour less predictable for users. This PR is written with the 80/20 rule in mind: Make it 80% pretty with 20% of the work. That's also why relatively few elements currently trigger an indentation. Still, this list of elements can be gradually expanded.
Thanks to @reknih for pointing out that CSS can actually make whitespace matter where it does not by default, which breaks pretty printing assumptions that we can insert whitespace while preserving semantics. An example that makes
<p>whitespace-sensitive, for example: https://www.w3.org/TR/css-text-3/#example-af2745cdStill, for most users, I suspect that having some pretty printing is still beneficial (at least for us, it will be when we write tests!).
So I think that to cover all (potentially pathological) use cases, it would be enough to make pretty printing configurable (on/off), with defaulting to on. Or we even make the list of tags that should be indented configurable. We could do this in a follow-up PR.