-
Notifications
You must be signed in to change notification settings - Fork 537
Closed
Labels
bugFrom Hemiptera and especially its suborder HeteropteraFrom Hemiptera and especially its suborder HeteropteraimplementedThe issue has been implementedThe issue has been implementedpdfaltoIssue related to pdfaltoIssue related to pdfalto
Description
Hi,
I'm getting following error with certain pdf:
ERROR [2022-06-07 08:02:33,838] org.grobid.core.process.ProcessPdfToXml: pdfalto process finished with error code: 143. [/opt/grobid/grobid-home/pdfalto/lin-64/pdfalto_server, -fullFontName, -noLineNumbers, -noImage, -annotation, -filesLimit, 2000, /opt/grobid/grobid-home/tmp/origin3690432459378499723.pdf, /opt/grobid/grobid-home/tmp/czDhswmAVc.lxml]
ERROR [2022-06-07 08:02:33,838] org.grobid.core.process.ProcessPdfToXml: pdfalto return message:
Syntax Warning: Invalid entry in bfchar block in ToUnicode CMap
Syntax Warning: Invalid entry in bfchar block in ToUnicode CMap
... LOT of these lines
This is the problematic PDF:
https://jyx.jyu.fi/bitstream/handle/123456789/81469/978-951-39-9321-4_vaitos10062022.pdf?sequence=1&isAllowed=y
Its a dissertation with multiple articles in it.
I'm calling grobid with httpie like this:
http -f POST :8070/api/processReferences input@'./978-951-39-9321-4_vaitos10062022.pdf;type=application/pdf'
Same problem also happens via web UI.
OS: Debian 11
Grobid version: 0.7.1 (Docker image)
Any clues what might be causing this?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugFrom Hemiptera and especially its suborder HeteropteraFrom Hemiptera and especially its suborder HeteropteraimplementedThe issue has been implementedThe issue has been implementedpdfaltoIssue related to pdfaltoIssue related to pdfalto