Refactor PDF export by elegaanz · Pull Request #4154 · typst/typst

elegaanz · 2024-05-16T16:11:35Z

The PDF exporter now uses a type state to ensure that it is impossible to access context that is not yet available. It also makes extensive use of the pdf_writer::Chunk::rewrite_into API to have each step being more independent, and thus potentially easier to memoize.

The actual behavior of the crate didn't change a lot, everything is just less entangled and hopefully easier to reason about.

TODO:

adapt Add parameter to select pages to be exported by CLI #4039 for this branch. I will wait for a first round of reviews to be ready before doing that, so that I can be sure of the design that will be adopted in the end and work from that.
lots of testing, because quite a lot of things have been moved around, and some parts may not work anymore even if what gets written is not supposed to have changed. The test suite passes, but it mostly catches that there are no duplicate object references, I can't really tell if the PDF is valid or not besides that. I've also run qpdf --check on all files generated by the test runner, and no errors were reported.

This won't work, because most ID inserted in the global context are in fact local, and should be remapped.

Each "write" pass now return the sub-state it produced instead of mutating a global state.

This may break emojis TODO: check that it doesn't

Also move some more stuff to the catalog module

Not sure that this is a good idea, but it sort of makes the design simpler.

This reverts commit fce86ca.

This reverts commit 286953b.

LaurenzV · 2024-05-27T16:45:03Z

I ran the test suite through my script that renders a PDF using different viewers, and I noticed that the following file doesn't seem to render correctly anymore:

// Test the different radial gradient features.
---

#square(
  size: 100pt,
  fill: gradient.radial(..color.map.rainbow, space: color.hsl),
)
---

#grid(
  columns: 2,
  square(
    size: 50pt,
    fill: gradient.radial(..color.map.rainbow, space: color.hsl, center: (0%, 0%)),
  ),
  square(
    size: 50pt,
    fill: gradient.radial(..color.map.rainbow, space: color.hsl, center: (0%, 100%)),
  ),
  square(
    size: 50pt,
    fill: gradient.radial(..color.map.rainbow, space: color.hsl, center: (100%, 0%)),
  ),
  square(
    size: 50pt,
    fill: gradient.radial(..color.map.rainbow, space: color.hsl, center: (100%, 100%)),
  ),
)

---

#square(
  size: 50pt,
  fill: gradient.radial(..color.map.rainbow, space: color.hsl, radius: 10%),
)
#square(
  size: 50pt,
  fill: gradient.radial(..color.map.rainbow, space: color.hsl, radius: 72%),
)

---
#circle(
  radius: 25pt,
  fill: gradient.radial(white, rgb("#8fbc8f"), focal-center: (35%, 35%), focal-radius: 5%),
)
#circle(
  radius: 25pt,
  fill: gradient.radial(white, rgb("#8fbc8f"), focal-center: (75%, 35%), focal-radius: 5%),
)

Before:

After:

Other than that everything else seemed fine, although I admittedly only skimmed it. :)

elegaanz · 2024-05-27T17:23:04Z

Thanks, I'll see if I can reproduce and fix. And I will definitely do some more in depth testing before merging.

once again

Refactor frame metadata into tags (typst#4212) Require `Send` and `Sync` for worlds (typst#4219) Optimize counters and state (typst#4223) Add `windows` method to array (typst#4136) Improve `CITATION.cff` file (typst#4201) Fix equation resizing when adding the equation number (typst#4179) `layout` documentation improvements (typst#4196) Allow somewhat arbitrary characters as `mat`, `vec` and `cases` `delim` (typst#4211) Do layout short-circuit in flow instead of realization (typst#4231) Split `BitSet` into two types and make it a bit nicer (typst#4249) Set default value of `raw.theme` to `auto`, and allow setting `raw.theme` to `auto` (typst#4186) Extended cargo installation instructions (typst#4168) Hint for language-region pair on `text.lang` (typst#4183) Improve macro docs (+ Native*Data docs) (typst#4240) Rephrase the sentence on variable scope in Scripting documentation (typst#4250) Refactor `Capable::vtable` to return `Option<NonNull<()>>` (typst#4252) Nicer test helper CSS (typst#4269) Trim weak spacing at line start/end in paragraph layout (typst#4087) Add ability to choose between minified and pretty-printed JSON (typst#4161) Refactor PDF export (typst#4154) Reorder syntax kinds (typst#4287) Fix figure centering (typst#4276) Fix `Default` impls for AST nodes (typst#4288) Bump libc to v0.2.155 (typst#4268) Bump time dependency (typst#4294)

Resolves typst#4582 There are two sources of information for `/ToUnicode`: the `glyph_set` recorded while `write_text`, and `cmap` tables of the font. `improve_glyph_sets` leverages the font. It was refactored as a function in typst#4154, but the real code even predates ed6550f (2 years ago). `improve_glyph_sets` was necessary before ad34763, when a `glyph_set` was a list of glyphs, and we had to search the font (again) for their texts. (Each glyph represents a text, which is a Unicode code point or a sequence of code points (e.g. ligature).) In ad34763, the `glyph_set` is refactored to a map from glyphs to texts. Now we have enough information for `/ToUnicode` CMap—no need to search the font. If the glyph… - …represents a single character… - …and is mapped from only one code point: No change. - …and is shared by multiple code points (e.g. CJK unified/compatibility): `/ToUnicode` changes from the largest code points to the first occurrence. - …represents a sequence of characters (e.g. ligature)… - …and they are also encoded as a single code point for compatibility (e.g “fi”/ﬁ): `/ToUnicode` changes from a single compatibility code point (ﬁ) to the sequence (fi). The behaviour in PDF viewers usually does not change. - …and is not encoded in Unicode (e.g. “Th” in Linux Libertine): No change.

…just by deleting `improve_glyph_sets`! There are two sources of information for `/ToUnicode`: the `glyph_set` recorded while `write_text`, and `cmap` tables of the font. `improve_glyph_sets` leverages the font. It was refactored into a function in typst#4154, but the real code even predates ed6550f (2 years ago). `improve_glyph_sets` was necessary before ad34763, when a `glyph_set` was a list of glyphs, and we had to search the font (again) for their texts. (Each glyph represents a text, which is a Unicode code point or a sequence of code points (e.g. ligature).) In ad34763, the `glyph_set` is refactored to a map from glyphs to texts. Now we have enough information for `/ToUnicode` CMap—no need to search the font. If the glyph… - …represents a single character… - …and is mapped from only one code point: No change. - …and is shared by multiple code points (e.g. CJK unified/compatibility): `/ToUnicode` changes from the largest code points to the first occurrence, and fixes typst#4582. - …represents a sequence of characters (e.g. ligature)… - …and they are also encoded as a single code point for compatibility (e.g “fi”/ﬁ): `/ToUnicode` changes from a single compatibility code point (ﬁ) to the sequence (fi). The behaviour in PDF viewers usually does not change. - …and is not encoded in Unicode (e.g. “Th” in Linux Libertine): No change.

Resolves typst#4582 …just by deleting `improve_glyph_sets`! There are two sources of information for `/ToUnicode`: the `glyph_set` recorded while `write_text`, and `cmap` tables of the font. `improve_glyph_sets` leverages the font. It was refactored into a function in typst#4154, but the real code even predates ed6550f (2 years ago). `improve_glyph_sets` was necessary before ad34763, when a `glyph_set` was a list of glyphs, and we had to search the font (again) for their texts. (Each glyph represents a text, which is a Unicode code point or a sequence of code points (e.g. ligature).) In ad34763, the `glyph_set` is refactored to a map from glyphs to texts. Now we have enough information for `/ToUnicode` CMap—no need to search the font. If the glyph… - …represents a single character… - …and is mapped from only one code point: No change. - …and is shared by multiple code points (e.g. CJK unified/compatibility): `/ToUnicode` changes from the largest code points to the first occurrence, and fixes typst#4582. - …represents a sequence of characters (e.g. ligature)… - …and they are also encoded as a single code point for compatibility (e.g “fi”/ﬁ): `/ToUnicode` changes from a single compatibility code point (ﬁ) to the sequence (fi). The behaviour in PDF viewers usually does not change. - …and is not encoded in Unicode (e.g. “Th” in Linux Libertine): No change.

elegaanz added 30 commits May 6, 2024 16:55

PDF: Don't depend on a global alloc and writer

6a1d3e9

This won't work, because most ID inserted in the global context are in fact local, and should be remapped.

Remove a few unecessary PDF reference allocators

8e31d5d

Refactor PageContext into something more generic

83b6972

Remove dependency on a big part of PdfContext when writing patterns

05aef39

Split PDF state in two

f9228fc

Alternative approach with less cloning

fd6929c

Less mutability

e871668

Each "write" pass now return the sub-state it produced instead of mutating a global state.

Less mutability by pre-fixing glyph sets

73b8098

A (mostly) working implementation

bb80f69

Use renumbering to avoid huge xref table

f666693

Fix transform

2de9dd8

This may break emojis TODO: check that it doesn't

clippy

5884aab

Introduce some abstractions

135d4e8

Correctly handle pattern resources

6ef84ff

Move color fonts to their own module

a950c6e

Remove unused function

c2465cc

Remove another useless function

ed991d2

Move Catalog writting to its own module

91d6f91

Reorganize imports

564728d

Move resources to their own module

f491aa9

Also move some more stuff to the catalog module

Reorganize imports

c6438f2

Clippy

c979cbf

Write basic docs

aa80f94

Start to rework color font handling

b2f4bb1

Lazily initialize color font map

c75e6e7

Potentially fully recursive Type3 font resources

72faa52

Not sure that this is a good idea, but it sort of makes the design simpler.

Make sub-PDF-contexts work

ab43625

Fix stack overflow + duplicate references

69fec5a

Track resources used by patterns with subcontexts

8f98aff

Remove arbitrary allocation offset

4fbb256

elegaanz force-pushed the refactor-pdf-export branch from 0b9ff11 to a31ce39 Compare May 27, 2024 11:03

CI: Also build fuzzers on recent Rust

fce86ca

elegaanz force-pushed the refactor-pdf-export branch from a31ce39 to fce86ca Compare May 27, 2024 11:07

elegaanz requested a review from laurmaedje May 27, 2024 11:25

elegaanz added 3 commits May 27, 2024 18:10

Don't use Option::inspect

301390f

Revert "CI: Also build fuzzers on recent Rust"

9251273

This reverts commit fce86ca.

Revert "Bump MSRV to 1.76, for Option::inspect"

35f8a91

This reverts commit 286953b.

elegaanz added 4 commits May 28, 2024 12:36

Write color spaces for radial gradients

50bab7e

Merge branch 'main' into refactor-pdf-export

5200c7e

Correctly write color glyph content streams

42b5146

Update documentation

dfcf3ef

elegaanz force-pushed the refactor-pdf-export branch from af90503 to dfcf3ef Compare May 28, 2024 13:46

elegaanz added 3 commits May 29, 2024 14:18

Don't allocate HashMaps for renumbering

7c73a4e

Dictionnary → dictionary

96543d2

once again

Make renumbering safer

8b2f16e

laurmaedje added this pull request to the merge queue May 29, 2024

Merged via the queue into main with commit 2946cde May 29, 2024

elegaanz deleted the refactor-pdf-export branch May 29, 2024 13:16

YDX-2147483647 mentioned this pull request Jul 19, 2024

Use texts of the first occurrences for /ToUnicode CMap #4585

Merged

1 task

This was referenced Oct 5, 2024

Weird PDF pages output in Okular after using --pages flag (that should remove unused pages) #5129

Closed

Fix excluded PDF pages being written #5133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor PDF export#4154

Refactor PDF export#4154
laurmaedje merged 83 commits intomainfrom
refactor-pdf-export

elegaanz commented May 16, 2024 •

edited

Loading

Uh oh!

LaurenzV commented May 27, 2024

Uh oh!

elegaanz commented May 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

elegaanz commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LaurenzV commented May 27, 2024

Uh oh!

elegaanz commented May 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elegaanz commented May 16, 2024 •

edited

Loading