Skip to content

pdfalto error: Syntax Warning: Invalid entry in bfchar block in ToUnicode CMap #923

@artturimatias

Description

@artturimatias

Hi,
I'm getting following error with certain pdf:

ERROR [2022-06-07 08:02:33,838] org.grobid.core.process.ProcessPdfToXml: pdfalto process finished with error code: 143. [/opt/grobid/grobid-home/pdfalto/lin-64/pdfalto_server, -fullFontName, -noLineNumbers, -noImage, -annotation, -filesLimit, 2000, /opt/grobid/grobid-home/tmp/origin3690432459378499723.pdf, /opt/grobid/grobid-home/tmp/czDhswmAVc.lxml]
ERROR [2022-06-07 08:02:33,838] org.grobid.core.process.ProcessPdfToXml: pdfalto return message: 
Syntax Warning: Invalid entry in bfchar block in ToUnicode CMap
Syntax Warning: Invalid entry in bfchar block in ToUnicode CMap
... LOT of these lines

This is the problematic PDF:
https://jyx.jyu.fi/bitstream/handle/123456789/81469/978-951-39-9321-4_vaitos10062022.pdf?sequence=1&isAllowed=y
Its a dissertation with multiple articles in it.

I'm calling grobid with httpie like this:
http -f POST :8070/api/processReferences input@'./978-951-39-9321-4_vaitos10062022.pdf;type=application/pdf'

Same problem also happens via web UI.

OS: Debian 11
Grobid version: 0.7.1 (Docker image)

Any clues what might be causing this?

Metadata

Metadata

Assignees

Labels

bugFrom Hemiptera and especially its suborder HeteropteraimplementedThe issue has been implementedpdfaltoIssue related to pdfalto

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions