Skip to content

Refactor PDF export#4154

Merged
laurmaedje merged 83 commits intomainfrom
refactor-pdf-export
May 29, 2024
Merged

Refactor PDF export#4154
laurmaedje merged 83 commits intomainfrom
refactor-pdf-export

Conversation

@elegaanz
Copy link
Copy Markdown
Member

@elegaanz elegaanz commented May 16, 2024

The PDF exporter now uses a type state to ensure that it is impossible to access context that is not yet available. It also makes extensive use of the pdf_writer::Chunk::rewrite_into API to have each step being more independent, and thus potentially easier to memoize.

The actual behavior of the crate didn't change a lot, everything is just less entangled and hopefully easier to reason about.

TODO:

  • adapt Add parameter to select pages to be exported by CLI #4039 for this branch. I will wait for a first round of reviews to be ready before doing that, so that I can be sure of the design that will be adopted in the end and work from that.
  • lots of testing, because quite a lot of things have been moved around, and some parts may not work anymore even if what gets written is not supposed to have changed. The test suite passes, but it mostly catches that there are no duplicate object references, I can't really tell if the PDF is valid or not besides that. I've also run qpdf --check on all files generated by the test runner, and no errors were reported.

elegaanz added 30 commits May 6, 2024 16:55
This won't work, because most ID inserted in the
global context are in fact local, and should be
remapped.
Each "write" pass now return the sub-state it
produced instead of mutating a global state.
This may break emojis
TODO: check that it doesn't
Also move some more stuff to the catalog module
Not sure that this is a good idea, but it sort
of makes the design simpler.
@elegaanz elegaanz force-pushed the refactor-pdf-export branch from 0b9ff11 to a31ce39 Compare May 27, 2024 11:03
@elegaanz elegaanz force-pushed the refactor-pdf-export branch from a31ce39 to fce86ca Compare May 27, 2024 11:07
@elegaanz elegaanz requested a review from laurmaedje May 27, 2024 11:25
@LaurenzV
Copy link
Copy Markdown
Collaborator

I ran the test suite through my script that renders a PDF using different viewers, and I noticed that the following file doesn't seem to render correctly anymore:

// Test the different radial gradient features.
---

#square(
  size: 100pt,
  fill: gradient.radial(..color.map.rainbow, space: color.hsl),
)
---

#grid(
  columns: 2,
  square(
    size: 50pt,
    fill: gradient.radial(..color.map.rainbow, space: color.hsl, center: (0%, 0%)),
  ),
  square(
    size: 50pt,
    fill: gradient.radial(..color.map.rainbow, space: color.hsl, center: (0%, 100%)),
  ),
  square(
    size: 50pt,
    fill: gradient.radial(..color.map.rainbow, space: color.hsl, center: (100%, 0%)),
  ),
  square(
    size: 50pt,
    fill: gradient.radial(..color.map.rainbow, space: color.hsl, center: (100%, 100%)),
  ),
)

---

#square(
  size: 50pt,
  fill: gradient.radial(..color.map.rainbow, space: color.hsl, radius: 10%),
)
#square(
  size: 50pt,
  fill: gradient.radial(..color.map.rainbow, space: color.hsl, radius: 72%),
)

---
#circle(
  radius: 25pt,
  fill: gradient.radial(white, rgb("#8fbc8f"), focal-center: (35%, 35%), focal-radius: 5%),
)
#circle(
  radius: 25pt,
  fill: gradient.radial(white, rgb("#8fbc8f"), focal-center: (75%, 35%), focal-radius: 5%),
)

Before:
image

After:
image

Other than that everything else seemed fine, although I admittedly only skimmed it. :)

@elegaanz
Copy link
Copy Markdown
Member Author

Thanks, I'll see if I can reproduce and fix. And I will definitely do some more in depth testing before merging.

@elegaanz elegaanz force-pushed the refactor-pdf-export branch from af90503 to dfcf3ef Compare May 28, 2024 13:46
@laurmaedje laurmaedje added this pull request to the merge queue May 29, 2024
Merged via the queue into main with commit 2946cde May 29, 2024
@elegaanz elegaanz deleted the refactor-pdf-export branch May 29, 2024 13:16
PgBiel added a commit to tulio240/typst that referenced this pull request May 30, 2024
Refactor frame metadata into tags (typst#4212)


Require `Send` and `Sync` for worlds (typst#4219)


Optimize counters and state (typst#4223)


Add `windows` method to array (typst#4136)

Improve `CITATION.cff` file (typst#4201)


Fix equation resizing when adding the equation number (typst#4179)


`layout` documentation improvements (typst#4196)

Allow somewhat arbitrary characters as `mat`, `vec` and `cases` `delim` (typst#4211)


Do layout short-circuit in flow instead of realization (typst#4231)


Split `BitSet` into two types and make it a bit nicer (typst#4249)


Set default value of `raw.theme` to `auto`, and allow setting `raw.theme` to `auto` (typst#4186)


Extended cargo installation instructions (typst#4168)

Hint for language-region pair on `text.lang` (typst#4183)


Improve macro docs (+ Native*Data docs) (typst#4240)


Rephrase the sentence on variable scope in Scripting documentation (typst#4250)


Refactor `Capable::vtable` to return `Option<NonNull<()>>` (typst#4252)


Nicer test helper CSS (typst#4269)


Trim weak spacing at line start/end in paragraph layout (typst#4087)


Add ability to choose between minified and pretty-printed JSON (typst#4161)


Refactor PDF export (typst#4154)

Reorder syntax kinds (typst#4287)


Fix figure centering (typst#4276)

Fix `Default` impls for AST nodes (typst#4288)


Bump libc to v0.2.155 (typst#4268)


Bump time dependency (typst#4294)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Jul 19, 2024
Resolves typst#4582

There are two sources of information for `/ToUnicode`: the `glyph_set` recorded while `write_text`, and `cmap` tables of the font.
`improve_glyph_sets` leverages the font. It was refactored as a function in typst#4154, but the real code even predates ed6550f (2 years ago).

`improve_glyph_sets` was necessary before ad34763, when a `glyph_set` was a list of glyphs, and we had to search the font (again) for their texts. (Each glyph represents a text, which is a Unicode code point or a sequence of code points (e.g. ligature).)

In ad34763, the `glyph_set` is refactored to a map from glyphs to texts.
Now we have enough information for `/ToUnicode` CMap—no need to search the font.

If the glyph…
- …represents a single character…
    - …and is mapped from only one code point:
        No change.
    - …and is shared by multiple code points (e.g. CJK unified/compatibility):
        `/ToUnicode` changes from the largest code points to the first occurrence.
- …represents a sequence of characters (e.g. ligature)…
    - …and they are also encoded as a single code point for compatibility (e.g “fi”/fi):
        `/ToUnicode` changes from a single compatibility code point (fi) to the sequence (fi).
        The behaviour in PDF viewers usually does not change.
    - …and is not encoded in Unicode (e.g. “Th” in Linux Libertine):
        No change.
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Jul 19, 2024
…just by deleting `improve_glyph_sets`!

There are two sources of information for `/ToUnicode`: the `glyph_set` recorded while `write_text`, and `cmap` tables of the font.
`improve_glyph_sets` leverages the font. It was refactored into a function in typst#4154, but the real code even predates ed6550f (2 years ago).

`improve_glyph_sets` was necessary before ad34763, when a `glyph_set` was a list of glyphs, and we had to search the font (again) for their texts. (Each glyph represents a text, which is a Unicode code point or a sequence of code points (e.g. ligature).)

In ad34763, the `glyph_set` is refactored to a map from glyphs to texts.
Now we have enough information for `/ToUnicode` CMap—no need to search the font.

If the glyph…
- …represents a single character…
    - …and is mapped from only one code point:
        No change.
    - …and is shared by multiple code points (e.g. CJK unified/compatibility):
        `/ToUnicode` changes from the largest code points to the first occurrence, and fixes typst#4582.
- …represents a sequence of characters (e.g. ligature)…
    - …and they are also encoded as a single code point for compatibility (e.g “fi”/fi):
        `/ToUnicode` changes from a single compatibility code point (fi) to the sequence (fi).
        The behaviour in PDF viewers usually does not change.
    - …and is not encoded in Unicode (e.g. “Th” in Linux Libertine):
        No change.
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Jul 19, 2024
Resolves typst#4582
…just by deleting `improve_glyph_sets`!

There are two sources of information for `/ToUnicode`: the `glyph_set` recorded while `write_text`, and `cmap` tables of the font.
`improve_glyph_sets` leverages the font. It was refactored into a function in typst#4154, but the real code even predates ed6550f (2 years ago).

`improve_glyph_sets` was necessary before ad34763, when a `glyph_set` was a list of glyphs, and we had to search the font (again) for their texts. (Each glyph represents a text, which is a Unicode code point or a sequence of code points (e.g. ligature).)

In ad34763, the `glyph_set` is refactored to a map from glyphs to texts.
Now we have enough information for `/ToUnicode` CMap—no need to search the font.

If the glyph…
- …represents a single character…
    - …and is mapped from only one code point:
        No change.
    - …and is shared by multiple code points (e.g. CJK unified/compatibility):
        `/ToUnicode` changes from the largest code points to the first occurrence, and fixes typst#4582.
- …represents a sequence of characters (e.g. ligature)…
    - …and they are also encoded as a single code point for compatibility (e.g “fi”/fi):
        `/ToUnicode` changes from a single compatibility code point (fi) to the sequence (fi).
        The behaviour in PDF viewers usually does not change.
    - …and is not encoded in Unicode (e.g. “Th” in Linux Libertine):
        No change.
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Jul 19, 2024
Resolves typst#4582
…just by deleting `improve_glyph_sets`!

There are two sources of information for `/ToUnicode`: the `glyph_set` recorded while `write_text`, and `cmap` tables of the font.
`improve_glyph_sets` leverages the font. It was refactored into a function in typst#4154, but the real code even predates ed6550f (2 years ago).

`improve_glyph_sets` was necessary before ad34763, when a `glyph_set` was a list of glyphs, and we had to search the font (again) for their texts. (Each glyph represents a text, which is a Unicode code point or a sequence of code points (e.g. ligature).)

In ad34763, the `glyph_set` is refactored to a map from glyphs to texts.
Now we have enough information for `/ToUnicode` CMap—no need to search the font.

If the glyph…
- …represents a single character…
    - …and is mapped from only one code point:
        No change.
    - …and is shared by multiple code points (e.g. CJK unified/compatibility):
        `/ToUnicode` changes from the largest code points to the first occurrence, and fixes typst#4582.
- …represents a sequence of characters (e.g. ligature)…
    - …and they are also encoded as a single code point for compatibility (e.g “fi”/fi):
        `/ToUnicode` changes from a single compatibility code point (fi) to the sequence (fi).
        The behaviour in PDF viewers usually does not change.
    - …and is not encoded in Unicode (e.g. “Th” in Linux Libertine):
        No change.
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Jul 19, 2024
Resolves typst#4582
…just by deleting `improve_glyph_sets`!

There are two sources of information for `/ToUnicode`: the `glyph_set` recorded while `write_text`, and `cmap` tables of the font.
`improve_glyph_sets` leverages the font. It was refactored into a function in typst#4154, but the real code even predates ed6550f (2 years ago).

`improve_glyph_sets` was necessary before ad34763, when a `glyph_set` was a list of glyphs, and we had to search the font (again) for their texts. (Each glyph represents a text, which is a Unicode code point or a sequence of code points (e.g. ligature).)

In ad34763, the `glyph_set` is refactored to a map from glyphs to texts.
Now we have enough information for `/ToUnicode` CMap—no need to search the font.

If the glyph…
- …represents a single character…
    - …and is mapped from only one code point:
        No change.
    - …and is shared by multiple code points (e.g. CJK unified/compatibility):
        `/ToUnicode` changes from the largest code points to the first occurrence, and fixes typst#4582.
- …represents a sequence of characters (e.g. ligature)…
    - …and they are also encoded as a single code point for compatibility (e.g “fi”/fi):
        `/ToUnicode` changes from a single compatibility code point (fi) to the sequence (fi).
        The behaviour in PDF viewers usually does not change.
    - …and is not encoded in Unicode (e.g. “Th” in Linux Libertine):
        No change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants