Skip to content

Add missing sentence segmentation in funding and acknowledgement #1106

Merged
lfoppiano merged 35 commits intomasterfrom
bugfix/sent-seg-ack-fund
Jun 9, 2024
Merged

Add missing sentence segmentation in funding and acknowledgement #1106
lfoppiano merged 35 commits intomasterfrom
bugfix/sent-seg-ack-fund

Conversation

@lfoppiano
Copy link
Copy Markdown
Member

@lfoppiano lfoppiano commented Apr 28, 2024

This PR implement fixes the way the funding-acknowledgment parser handles an already formatted statement and preserve their existing elements (references, sentences). #1090

This PR fixes the following problems:

  • sentence segmentation lost for funding and acknowlegment statements
  • reference markers are lost after the funding-acknowlegment parser is applied

The initial solution proposed in the #1090 discussion, to re-apply a sentence segmentation that act on the transformed TEI structure was not applicable because it was not possible to re-generate the sentence coordinates as the TEI-XML elements do not have anymore layout-token information.

NOTE: this should be merged after #1096

@coveralls
Copy link
Copy Markdown

coveralls commented Apr 28, 2024

Coverage Status

coverage: 40.787% (+0.6%) from 40.236%
when pulling bbca7dd on bugfix/sent-seg-ack-fund
into cb7118d on master.

@grobidOrg grobidOrg deleted a comment from github-actions bot May 1, 2024
@lfoppiano lfoppiano marked this pull request as ready for review May 5, 2024 06:10
@lfoppiano
Copy link
Copy Markdown
Member Author

I did run all documents from PLOS, PMC and biorxiv over this, I've checked manually differences for a sample from each corpus. I also checked manually the merging of the coordinates for certain problematic documents that were causing problems in the past. It seems that everyhting looks good.

@lfoppiano lfoppiano added this to the 0.8.1 milestone May 21, 2024
@lfoppiano lfoppiano merged commit 694f0ed into master Jun 9, 2024
@lfoppiano lfoppiano deleted the bugfix/sent-seg-ack-fund branch June 9, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants