Skip to content

Paragraph content missing for 'Funding' (annex) section #895

@mdparkin

Description

@mdparkin

I am using GROBID to convert bioRxiv preprints PDFs to XML and finding that paragraph content with the heading 'Funding' is not being captured in the TEI XML output (version 0.7.0, lightweight Docker image).

Example PDF:
2021.09.27.461862v1.full.pdf

The 'Funding' section in this document appears immediately after the 'Acknowledgements' on Page 19 of the PDF. It is captured in the TEI XML as @type="annex", but the <p> element and text content is missing:

<div type="annex">
  <div 
    xmlns="http://www.tei-c.org/ns/1.0">
    <head>Funding</head>
   </div>
</div>

Although recognition of this section with some @type="funding" attribute would be the ideal scenario, for my use case I simply need the associated <p> content (i.e. "This work was funded by...") to be output in the TEI XML.

Many thanks in advance for looking into this issue.

Metadata

Metadata

Assignees

Labels

bugFrom Hemiptera and especially its suborder Heteropteraenhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions