Skip to content

Incorrect sentences coordinates #908

@NicolasKieffer

Description

@NicolasKieffer

Sentences sometimes have wrong coordinates.

Sample files used (PDF, TEI & training files) : 60806_R1.zip

Notes:

  • borders are rendered by our application, based on the TEI elements s[coords] values (which are usually correct)
  • GROBID segmentation model have been trained on these PDF (and the fulltext model "recognises the refs correctly")

Case sentence with element <ref> containing char ;

Incorrect coordinates

Exemple 1

PDF (coordinates rendering)

First bugged sentence
image

Group of bugged sentence
image

All sentences of this page
image

TEI (processing)

image

Note : the right part of the ref is no longer in this file (after the ; char)

TEI (training)

image
Note : the entire ref is in this file

Correct coordinates

Exemple 1

PDF (coordinates rendering)

image
image

TEI (processing)

image

TEI (training)

image

Exemple 2

PDF (coordinates rendering)

image
image

TEI (processing)

image

TEI (training)

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugFrom Hemiptera and especially its suborder HeteropteraimplementedThe issue has been implemented

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions