I am using GROBID to convert bioRxiv preprints PDFs to XML and finding that paragraph content with the heading 'Funding' is not being captured in the TEI XML output (version 0.7.0, lightweight Docker image).
Example PDF:
2021.09.27.461862v1.full.pdf
The 'Funding' section in this document appears immediately after the 'Acknowledgements' on Page 19 of the PDF. It is captured in the TEI XML as @type="annex", but the <p> element and text content is missing:
<div type="annex">
<div
xmlns="http://www.tei-c.org/ns/1.0">
<head>Funding</head>
</div>
</div>
Although recognition of this section with some @type="funding" attribute would be the ideal scenario, for my use case I simply need the associated <p> content (i.e. "This work was funded by...") to be output in the TEI XML.
Many thanks in advance for looking into this issue.
I am using GROBID to convert bioRxiv preprints PDFs to XML and finding that paragraph content with the heading 'Funding' is not being captured in the TEI XML output (version 0.7.0, lightweight Docker image).
Example PDF:
2021.09.27.461862v1.full.pdf
The 'Funding' section in this document appears immediately after the 'Acknowledgements' on Page 19 of the PDF. It is captured in the TEI XML as
@type="annex", but the<p>element and text content is missing:Although recognition of this section with some
@type="funding"attribute would be the ideal scenario, for my use case I simply need the associated<p>content (i.e. "This work was funded by...") to be output in the TEI XML.Many thanks in advance for looking into this issue.