Skip to content

Regression in the segmentation model results due to changes in pdfalto  #759

@lfoppiano

Description

@lfoppiano

Hi, I found a small regression with the latest version of grobid, for what concern the results of the segmentation model with the new pdfalto version (0.5).

First of all, I just realised that the pdf alto version for mac was not updated (still at version 0.3) on master. Should I update to the latest one?

The segmentation model seems to behave differently including a large part of the body in the header. In particular, the publication date is not correctly extracted.

Here an example paper: https://aip.scitation.org/doi/pdf/10.1063/1.2925452

Here the output from grobid:

Here the result from the segmentation model:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugFrom Hemiptera and especially its suborder HeteropteraimplementedThe issue has been implemented

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions