Skip to content

Panic in migrateAnnots when annotation IndirectRef resolves to nil #1341

@utahta

Description

@utahta

Thank you for this great project! I've been using pdfcpu and really appreciate the work you've put into it.

Description

When splitting a PDF that contains annotations with a missing indirect reference object, migrateAnnots panics with a nil interface type assertion.

How to reproduce

Generate a minimal PDF with an annotation referencing a non-existent object, then split it:

go run gen_test_pdf.go test.pdf
pdfcpu split test.pdf output_dir/
gen_test_pdf.go
// This program generates a minimal PDF with an annotation that references
// a non-existent object, which triggers a panic in pdfcpu split.
package main

import (
	"fmt"
	"os"
)

func main() {
	if len(os.Args) < 2 {
		fmt.Fprintf(os.Stderr, "usage: %s output.pdf\n", os.Args[0])
		os.Exit(1)
	}

	f, err := os.Create(os.Args[1])
	if err != nil {
		fmt.Fprintf(os.Stderr, "error: %v\n", err)
		os.Exit(1)
	}
	defer f.Close()

	offsets := make([]int64, 5) // objects 1-4, index 0 unused
	pos := int64(0)

	write := func(s string) {
		n, _ := f.WriteString(s)
		pos += int64(n)
	}

	// Header
	write("%PDF-1.4\n")

	// Object 1: Catalog
	offsets[1] = pos
	write("1 0 obj\n<< /Type /Catalog /Pages 2 0 R >>\nendobj\n")

	// Object 2: Pages
	offsets[2] = pos
	write("2 0 obj\n<< /Type /Pages /Kids [3 0 R] /Count 1 >>\nendobj\n")

	// Object 3: Page with annotation referencing non-existent object 99
	offsets[3] = pos
	write("3 0 obj\n<< /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] /Annots [99 0 R] >>\nendobj\n")

	// Object 4: a dummy so xref size is reasonable
	offsets[4] = pos
	write("4 0 obj\n<< /Length 0 >>\nstream\n\nendstream\nendobj\n")

	// Cross-reference table
	xrefPos := pos
	write("xref\n")
	write(fmt.Sprintf("0 %d\n", 5))
	write("0000000000 65535 f \n")
	for i := 1; i <= 4; i++ {
		write(fmt.Sprintf("%010d 00000 n \n", offsets[i]))
	}

	// Trailer
	write("trailer\n")
	write(fmt.Sprintf("<< /Size %d /Root 1 0 R >>\n", 5))
	write("startxref\n")
	write(fmt.Sprintf("%d\n", xrefPos))
	write("%%EOF\n")
}

Note that pdfcpu validate passes for this PDF in both relaxed and strict modes:

$ pdfcpu validate test.pdf
validating(mode=relaxed) test.pdf ...
validation ok

$ pdfcpu validate -m strict test.pdf
validating(mode=strict) test.pdf ...
validation ok

The annotation array correctly contains an IndirectRef (per the PDF spec), but the referenced object simply does not exist.

I also encountered this panic with a real-world PDF where strict validation fails with corrupt name object on the referenced object. In that case the object exists but cannot be parsed, and Dereference returns nil in the same way.

Output

splitting test.pdf to output_dir/...
optimizing...
unexpected panic attack: interface conversion: types.Object is nil, not types.Dict

Root cause

In migrateAnnots (pkg/pdfcpu/migrate.go), when an annotation entry is an IndirectRef, migrateIndRef is called to dereference and migrate the object. If the referenced object is missing, migrateIndRef returns nil without error (via ctxSource.Dereference). The subsequent type assertion o1.(types.Dict) then panics on the nil value.

Possible fix

In case it helps, I was able to work around this by adding a nil check after migrateIndRef returns to skip the annotation entry:

o1, err := migrateIndRef(&o, ctxSrc, ctxDest, migrated)
if err != nil {
    return nil, err
}
arr[i] = o
if o1 == nil {
    continue
}
d = o1.(types.Dict)

  • Confirmed on the latest commit (2117365)
  • OS: Linux (WSL2) amd64
  • Go: 1.24.4

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions