Skip to content

PDF Embedding: Get the number of pages of a PDF image #6644

@YDX-2147483647

Description

@YDX-2147483647

Description

#6623 implemented image("image.pdf", page: p), but there is no way to know how many pages a PDF has.

Use Case

It would be useful to provide an API replacing the awkwardly hard-coded 7 in the following example.

#grid(
  columns: 3,
  ..range(7).map(
    p => image("image.pdf", page: p + 1),
  ),
)

Previous discussions

Not difficult to add in principle but the question is how the api would look like

image.pages, maybe

https://discord.com/channels/1054443721975922748/1054443722592497796/1397229132860887081

This feature is also requested in a forum post.

See also the general issue on SVG and PDF: #3160.

Reference designs

muchpdf

If you do not want to insert every page into your document, you can provide the pages argument. Note that it starts at zero, not one.

#let data = read("document.pdf", encoding: none)
#muchpdf(data, pages: 3)
#muchpdf(data, pages: (0, 2, 10))
#muchpdf(data, pages: (start: 5, end: 9))
#muchpdf(data, pages: (start: 5, end: 9, step: 2)) // every second page
#muchpdf(data, pages: (0, 2, (start: 4, end: 7))) // combine lists and ranges

The function returns:

#sequence(
  image(source: bytes(356406), format: "svg"),
  image(source: bytes(361106), format: "svg"),
  image(source: bytes(331419), format: "svg"),
  …
)

LaTeX pdfpages

  • pages

    • pages={3,{},8-11,15} will insert page 3, an empty page, and pages 8, 9, 10, 11, and 15.
    • ⟨m⟩-⟨n⟩ selects all pages from ⟨m⟩ to ⟨n⟩. Omitting ⟨m⟩ defaults to the first page; omitting ⟨n⟩ defaults to the last page of the document.
    • Another way to select the last page of the document, is to use the keyword last.
    • pages=- will insert all pages of the document, and pages=last-1 will insert all pages in reverse order.
    • Default: pages=1
  • nup

    • Puts multiple logical pages onto each sheet of paper. The syntax of this option is: nup=⟨xnup⟩x⟨ynup⟩. Where ⟨xnup⟩ and ⟨ynup⟩ specify the number of logical pages in horizontal and vertical direction, which are arranged on each sheet of paper.
    • Default: nup=1x1
  • And many more options…

layout column strict

My ideas

I think we only need one simple feature in the core typst compiler: Getting the total number of pages of a PDF.

Fancy ways to specify page ranges, layout a grid, offset and scale, or anything should go into a package.
We already have numbly for setting numbring: numbly("Appendix {1:A}.", "{1:A}.{2}.").
And we could have another package for pages("2, 3..7, 9..-1", total: n), where n comes from a typst API that does not yet exist.
This pages(…) will turn into array<int>, array<image>, or a content (sequence<image> or grid<image>).

A key question might be whether the typst API provides the number of pages directly, or provides all pages (and let us use array.len()).
The latter is much more useful (Discord), but may have performance issue.

= Example of the former

#let n = image.number-of-pages("image.pdf")

#image("image.pdf", page: 7, width: 60%)
#image("image.pdf", page: n, width: 80%)

// Alternative:
#image("image.pdf", index: 7, width: 60%)
#image("image.pdf", index: n, width: 80%)

= Example of the latter

#let pages = image.batch("image.pdf") // may also work for GIF in the future

#image(pages.at(6), width: 60%)
#image(pages.last(), width: 80%)

// Alternative:
#pages.at(6).display(width: 60%)
#pages.last().display(width: 80%)

When including only page 30, 32, 34, 36, 38 from a 100-page PDF, is it possible to load only 5 pages rather than all 100 pages?
You can probably optimize by representing the pages array as an array of magic "internal representations" that only point to a page number, and their contents only get read if necessary.

I think a downside of a function that returns an array of images is that it makes the configuration of parameters weird, e.g. width and height. Do I write image("file.pdf", width: 4cm).at(2) and then the width was first applied to all pages even though it might only make sense for page 2? That's also odd.
https://discord.com/channels/1054443721975922748/1054443722592497796/1397469452551061615

Another problem: array(…).at(i)counts from 0, but image("*.pdf", page: p) counts from 1. At least that's the case for the current dev version.
https://discord.com/channels/1054443721975922748/1054443722592497796/1397575617645645896

My previous proposal
  • images("image.pdf", all: true)(image("image.pdf", page: 1), image("image.pdf", page: 2), …).

  • images("image.pdf", start: 3, end: -2)(image("image.pdf", page: 3), image("image.pdf", page: 4), …, image("image.pdf", page: the-third-page-from-the-end)).

    Or just images("image.pdf").slice(3, -2) by leveraging array.slice, if zero performance cost is possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestNew feature or requestpdfRelated to PDF export or PDF embedding.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions