Skip to content

Extracts author contribution and conflict of interest statements #1318

@lfoppiano

Description

@lfoppiano

The goal of this issue is to extract "author contributions" and "conflict of interest" as specific sections in the grobid output.

Right now the author contributions statements/conflict of interest are either lost (when in header) or placed in the annex of the article as “normal” sections that needs to be matched by section title.

There is no special section defined for them for grobid to extract to (as, for example funding/acknowledgement, or data availability statement).
The goal of this task is to extract “Author contribution” (and “conflict of interests”, since the additional effort is minimal) statements as a structured output.

Here an idea for the output:

<back>
<div type="conflict">Conflict of interests: no conflict of interest.</div>
<div type="contribution"> author contributions: AA did this, BB did that, CC supervised</div>
</back>

The same approach as for the acknowledgement and for the data availability statement apply:

  • extraction in both header and segmentation
  • we could add an additional segmentation to identify the author name, as second step

Task list:

  • Update existing training data to identify the section. In particular we would need to work on two models: segmentation, for when the statement is at the end of the article, and header, for when the statement is in the header
  • Add new training data if needed
  • Update the grobid code to accommodate this new section and output it in the TEI
  • Update the end2end evaluation data and metrics to add the new section

Related: #698, #1142

PR: #1319

@kermitt2 any comment is welcome :-)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions