-
Notifications
You must be signed in to change notification settings - Fork 537
Closed
Labels
bugFrom Hemiptera and especially its suborder HeteropteraFrom Hemiptera and especially its suborder HeteropteraimplementedThe issue has been implementedThe issue has been implemented
Description
Hi, I found a small regression with the latest version of grobid, for what concern the results of the segmentation model with the new pdfalto version (0.5).
First of all, I just realised that the pdf alto version for mac was not updated (still at version 0.3) on master. Should I update to the latest one?
The segmentation model seems to behave differently including a large part of the body in the header. In particular, the publication date is not correctly extracted.
Here an example paper: https://aip.scitation.org/doi/pdf/10.1063/1.2925452
Here the output from grobid:
- Fulltext extraction (pdfalto 0.5) 1.2925452_processPDF_pdfalto0.5.pdf.tei.xml.zip
- Header extraction (pdflato 0.5) 1.2925452_processHeader_pdfalto0.5.pdf.tei.xml.zip
Here the result from the segmentation model:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugFrom Hemiptera and especially its suborder HeteropteraFrom Hemiptera and especially its suborder HeteropteraimplementedThe issue has been implementedThe issue has been implemented