Skip to content

Switch PDF backend to krilla#5420

Merged
laurmaedje merged 125 commits intotypst:mainfrom
LaurenzV:krilla-port
Apr 1, 2025
Merged

Switch PDF backend to krilla#5420
laurmaedje merged 125 commits intotypst:mainfrom
LaurenzV:krilla-port

Conversation

@LaurenzV
Copy link
Collaborator

@LaurenzV LaurenzV commented Nov 14, 2024

Note that this PR is not ready for review, and probably won't be for a long while. It's just for experimenting, for now.

Introduction

krilla is a Rust crate I have been working on in the past few months. It builds on top of the pdf-writer crate and is a higher-level crate that allows for the creation of PDF files using high-level primitives. It therefore encompasses the features svg2pdf and typst-pdf currently have (and a lot more), and abstracts away most complexities of PDF away behind a high-level interface.

Why?

Using this crate would have multiple advantages:

The main disadvantage is that it probably will lead to some performance penalty due to the extra layer of abstraction added, thought it remains to be determined how much. PDF 2.0 and its substandards are currently not supported either, unfortunately.

Notes

Note that it's not been decided whether we actually want to make that switch, and if it does happen it will still be multiple weeks in the future, as it would require extensive testing (both in terms of correctness as well as performance-wise), and there are still some things left to do in krilla itself.

TODO List

Next steps:

  • You (probably?) will take a closer look at the krilla code logic.
  • I will clean up a bit more.
  • We still need to look into changing the cluster assignment logic so that ActualText works as intended. I did some more testing and I don't think we will get it to a state where we can close Fix luma to CMYK conversion #4425, but hopefully it should still be an improvement.
  • I will re-validate that I ported all features of existing typst-pdf have been ported.
  • I will do some rough testing using 2-3 documents and squash some more bugs.
  • You review the code on the Typst side.

After the above is done and you think overall the shape of the code is good, extensive testing would be the next step and roughly like this:

  • Check that all of the linked bugs are fixed. (I can do that)
  • Run the whole test suite and diff the PDFs visually against current main. (I can do that)
  • Same as above, but check using the Airlington PDF model instead.
  • Compile a few different small documents, manually inspect the PDF output and ensure it looks sensible, also look at things like outlines, etc. that can only be inspected manually.
  • Test with maybe 5-6 bigger documents (templates, theses documents) and ensure there are no issues in the output.
  • While doing the above, also try different export modes and use verapdf and Adobe Acrobat to see whether validation passes.
  • Run performance tests with a handful of bigger documents (also check single/multi threading) to ensure there are no major performance regressions.
  • Also check what the file sizes are like.
  • We probably should also do a "stress test" with a very big document (1000+ pages), if such a document exists, as a sanity check that the new backend can deal with bigger documents, too.
  • Creates releases of all dependent crates.
  • 🚢

@facundoq
Copy link

facundoq commented Dec 5, 2024

While HTML output would be superior in terms of accesiblity, I cannot emphasize enough how much adoption Typst would gain in providing tagged pdf support. I would take a 3x performance hit for this feature.

@LaurenzV
Copy link
Collaborator Author

LaurenzV commented Mar 28, 2025

I ran benchmarks with the following five Typst documents:

masterproef-main (https://github.com/Dherse/masterproef)
phd thesis (https://www.github.com/jrihon/thesis)
OI-Wiki (https://github.com/OI-wiki/OI-Wiki-export)
lorem (Simple Typst document with lots of lorem paragraphs)

Results:

masterproef:
Clean compiles, all jobs:
- krilla: 1.6s
- main: 1.5s

Clean compiles, 1 job:
- krilla: 3.5s
- main: 3.9s

Watch, all jobs:
- main: 300ms
- krilla: 330ms

phd thesis:
Clean compiles, all jobs:
- krilla: 2.6s
- main: 4.7s

Clean compiles, 1 job:
- krilla: 7.8s
- main: 12.2s

Watch, all jobs:
- main: 0.22s
- krilla: 1.5s

lorem:
Clean compiles, 1 job:
- krilla: 9.9s
- main: 11.6s

Clean compiles, all jobs:
- krilla: 4.0s
- main: 4.1s

Watch, all jobs:
- main: 1.4s
- krilla: 1.2s

oi-wiki (it should be noted that the results are not very meaningful for this one, because according to my benchmarking only 2-3 seconds are actually
spent writing the PDF):
Clean compile, 1 job:
- krilla: 117s
- main: 118.56s

Clean compile, all jobs:
- krilla: 56.258s
- main: 55.796

I think there are two conclusions to draw from this:

  • For documents with many SVGs, we can expect much better clean compile times, on the one hand since SVGs are much more tightly integrated in the document, on the other hand since we also use a new crate for compression, which is faster than the previous one.
  • However, due to the same reasons the same documents might see much worse performance for incremental compilation (only when compiling to PDF), the reason being that SVGs cannot be cached anymore due to above reason. That is a bit unfortunate, but is a trade-off we have to accept, I think.
  • Other than that, there don't appear to be any major changes.

@LaurenzV
Copy link
Collaborator Author

Here's the file sizes for the documents I tested:

image

As you can see, the file size does increase in certain cases, but the reason for this is that JPEG images are now embedded as raw instead of re-encoding them. It seems like the the image used to re-encode them with a lower file size (but therefore probably also lower quality), so that's not a regression.

Other than that, there are some nice file size gains on the PHD thesis document, most likely due to the fact that resources can now be re-used between different SVGs.

@LaurenzV
Copy link
Collaborator Author

@laurmaedje I think from my side I am mostly done now. There is one bug remaining that I found, when compiling the following document:

#set par(justify: true)
#h(15cm) explanation

trying to compile with the standard a-2u doesn't work, because it complains about the hyphenation not being assigned a codepoint. I checked in the debugger, and for some reason it gets assigned the range 5..5. Though from what I can tell, in the code I changed it does get assigned the right range, so I assume that some code afterwards breaks because it relied on the previous behavior. Since you are probably more familiar with that code, if you can take a look that would be great.

If you want to do some more testing yourself, that would be appreciated, too. But given the tests I've already run I think we can be pretty sure that no major regressions should be introduced, though there obviously is only one way to find out. 😬

@laurmaedje
Copy link
Member

There is one bug remaining that I found, when compiling the following document: [...]

Hyphens are created here, always with zero-sized ranges. They don't appear in the source text, so their codepoint can't really be accessed via range, unfortunately. On main, the assigned text in the glyph_set is also "".

@LaurenzV
Copy link
Collaborator Author

Ah, that's unfortunate, because it means that a document with hyphenation cannot be exported to PDF/A2/A3-u... But I guess this can be figured out at a later point.

@laurmaedje laurmaedje marked this pull request as ready for review March 31, 2025 11:56
@laurmaedje
Copy link
Member

Something about krilla's output is not reproducible. Just running the Typst test suite twice will generate differing PDFs. I'm not sure why.

@laurmaedje laurmaedje changed the title Attempt to port PDF backend to use krilla Switch PDF backend to krilla Apr 1, 2025
@laurmaedje laurmaedje added this pull request to the merge queue Apr 1, 2025
@laurmaedje
Copy link
Member

Let's merge this! 🎉

It's been quite the journey and I'm super grateful for all the work you've poured into this. This is easily the largest and most impressive community contribution to Typst so far. I believe it builds a really strong foundation for the future of Typst's PDF export. So, huge props and thanks!

The road to merge was probably much longer than anticipated, so thanks for staying on it. :)

Merged via the queue into typst:main with commit 96dd67e Apr 1, 2025
7 checks passed
@LaurenzV LaurenzV deleted the krilla-port branch April 1, 2025 15:33
hongjr03 pushed a commit to hongjr03/typst that referenced this pull request Apr 16, 2025
Co-authored-by: Laurenz <laurmaedje@gmail.com>
@Dherse Dherse mentioned this pull request May 22, 2025
13 tasks
laurmaedje added a commit that referenced this pull request Sep 30, 2025
This is how it used to work but one place was missed in #5420. This PR factors the behaviour out into a function, such that it is not missed again.

This is necessary because e.g. the position of backlinks to footnotes is always at the baseline and if you link directly to it, the text will not be visible since it is right above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment