The goal of this issue is to extract "author contributions" and "conflict of interest" as specific sections in the grobid output.
Right now the author contributions statements/conflict of interest are either lost (when in header) or placed in the annex of the article as “normal” sections that needs to be matched by section title.
There is no special section defined for them for grobid to extract to (as, for example funding/acknowledgement, or data availability statement).
The goal of this task is to extract “Author contribution” (and “conflict of interests”, since the additional effort is minimal) statements as a structured output.
Here an idea for the output:
<back>
<div type="conflict">Conflict of interests: no conflict of interest.</div>
<div type="contribution"> author contributions: AA did this, BB did that, CC supervised</div>
</back>
The same approach as for the acknowledgement and for the data availability statement apply:
- extraction in both header and segmentation
- we could add an additional segmentation to identify the author name, as second step
Task list:
- Update existing training data to identify the section. In particular we would need to work on two models: segmentation, for when the statement is at the end of the article, and header, for when the statement is in the header
- Add new training data if needed
- Update the grobid code to accommodate this new section and output it in the TEI
- Update the end2end evaluation data and metrics to add the new section
Related: #698, #1142
PR: #1319
@kermitt2 any comment is welcome :-)
The goal of this issue is to extract "author contributions" and "conflict of interest" as specific sections in the grobid output.
Right now the author contributions statements/conflict of interest are either lost (when in header) or placed in the annex of the article as “normal” sections that needs to be matched by section title.
There is no special section defined for them for grobid to extract to (as, for example funding/acknowledgement, or data availability statement).
The goal of this task is to extract “Author contribution” (and “conflict of interests”, since the additional effort is minimal) statements as a structured output.
Here an idea for the output:
The same approach as for the acknowledgement and for the data availability statement apply:
Task list:
Related: #698, #1142
PR: #1319
@kermitt2 any comment is welcome :-)