[feature request] to skip FullTextParse on certain page region

The provided model cannot correctly categorize some "vaguely" plotted Figures and Tables. In this case, the word in the Table region will be considered as normal Text, thus hinder the normal reading order.
IMHO, one solution is to parse the PDF file and use certain rules to detect Figures and Tables, then we can pass these region information to Grobid to preempt `FullTextParse` on those "hard" parts. Another solution could be an API exposure for the sequence labeling task, so we can directly pass a manually region-cleared ALTO (xml) file and let Grobid finish the remaining procedures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] to skip FullTextParse on certain page region #950

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature request] to skip FullTextParse on certain page region #950

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions