Thank you for this great project! I've been using pdfcpu and really appreciate the work you've put into it.
Description
When splitting a PDF that contains annotations with a missing indirect reference object, migrateAnnots panics with a nil interface type assertion.
How to reproduce
Generate a minimal PDF with an annotation referencing a non-existent object, then split it:
go run gen_test_pdf.go test.pdf
pdfcpu split test.pdf output_dir/
gen_test_pdf.go
// This program generates a minimal PDF with an annotation that references
// a non-existent object, which triggers a panic in pdfcpu split.
package main
import (
"fmt"
"os"
)
func main() {
if len(os.Args) < 2 {
fmt.Fprintf(os.Stderr, "usage: %s output.pdf\n", os.Args[0])
os.Exit(1)
}
f, err := os.Create(os.Args[1])
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
os.Exit(1)
}
defer f.Close()
offsets := make([]int64, 5) // objects 1-4, index 0 unused
pos := int64(0)
write := func(s string) {
n, _ := f.WriteString(s)
pos += int64(n)
}
// Header
write("%PDF-1.4\n")
// Object 1: Catalog
offsets[1] = pos
write("1 0 obj\n<< /Type /Catalog /Pages 2 0 R >>\nendobj\n")
// Object 2: Pages
offsets[2] = pos
write("2 0 obj\n<< /Type /Pages /Kids [3 0 R] /Count 1 >>\nendobj\n")
// Object 3: Page with annotation referencing non-existent object 99
offsets[3] = pos
write("3 0 obj\n<< /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] /Annots [99 0 R] >>\nendobj\n")
// Object 4: a dummy so xref size is reasonable
offsets[4] = pos
write("4 0 obj\n<< /Length 0 >>\nstream\n\nendstream\nendobj\n")
// Cross-reference table
xrefPos := pos
write("xref\n")
write(fmt.Sprintf("0 %d\n", 5))
write("0000000000 65535 f \n")
for i := 1; i <= 4; i++ {
write(fmt.Sprintf("%010d 00000 n \n", offsets[i]))
}
// Trailer
write("trailer\n")
write(fmt.Sprintf("<< /Size %d /Root 1 0 R >>\n", 5))
write("startxref\n")
write(fmt.Sprintf("%d\n", xrefPos))
write("%%EOF\n")
}
Note that pdfcpu validate passes for this PDF in both relaxed and strict modes:
$ pdfcpu validate test.pdf
validating(mode=relaxed) test.pdf ...
validation ok
$ pdfcpu validate -m strict test.pdf
validating(mode=strict) test.pdf ...
validation ok
The annotation array correctly contains an IndirectRef (per the PDF spec), but the referenced object simply does not exist.
I also encountered this panic with a real-world PDF where strict validation fails with corrupt name object on the referenced object. In that case the object exists but cannot be parsed, and Dereference returns nil in the same way.
Output
splitting test.pdf to output_dir/...
optimizing...
unexpected panic attack: interface conversion: types.Object is nil, not types.Dict
Root cause
In migrateAnnots (pkg/pdfcpu/migrate.go), when an annotation entry is an IndirectRef, migrateIndRef is called to dereference and migrate the object. If the referenced object is missing, migrateIndRef returns nil without error (via ctxSource.Dereference). The subsequent type assertion o1.(types.Dict) then panics on the nil value.
Possible fix
In case it helps, I was able to work around this by adding a nil check after migrateIndRef returns to skip the annotation entry:
o1, err := migrateIndRef(&o, ctxSrc, ctxDest, migrated)
if err != nil {
return nil, err
}
arr[i] = o
if o1 == nil {
continue
}
d = o1.(types.Dict)
- Confirmed on the latest commit (
2117365)
- OS: Linux (WSL2) amd64
- Go: 1.24.4
Thank you for this great project! I've been using pdfcpu and really appreciate the work you've put into it.
Description
When splitting a PDF that contains annotations with a missing indirect reference object,
migrateAnnotspanics with a nil interface type assertion.How to reproduce
Generate a minimal PDF with an annotation referencing a non-existent object, then split it:
gen_test_pdf.go
Note that
pdfcpu validatepasses for this PDF in both relaxed and strict modes:The annotation array correctly contains an
IndirectRef(per the PDF spec), but the referenced object simply does not exist.I also encountered this panic with a real-world PDF where strict validation fails with
corrupt name objecton the referenced object. In that case the object exists but cannot be parsed, andDereferencereturns nil in the same way.Output
Root cause
In
migrateAnnots(pkg/pdfcpu/migrate.go), when an annotation entry is anIndirectRef,migrateIndRefis called to dereference and migrate the object. If the referenced object is missing,migrateIndRefreturnsnilwithout error (viactxSource.Dereference). The subsequent type assertiono1.(types.Dict)then panics on the nil value.Possible fix
In case it helps, I was able to work around this by adding a nil check after
migrateIndRefreturns to skip the annotation entry:2117365)