Skip to content

[feature request] to skip FullTextParse on certain page region #950

@frankang

Description

@frankang

The provided model cannot correctly categorize some "vaguely" plotted Figures and Tables. In this case, the word in the Table region will be considered as normal Text, thus hinder the normal reading order.
IMHO, one solution is to parse the PDF file and use certain rules to detect Figures and Tables, then we can pass these region information to Grobid to preempt FullTextParse on those "hard" parts. Another solution could be an API exposure for the sequence labeling task, so we can directly pass a manually region-cleared ALTO (xml) file and let Grobid finish the remaining procedures.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions