Skip to content

Fix URL extraction when the regex falls short#1190

Merged
lfoppiano merged 9 commits intomasterfrom
fix-url-extraction-regex-shorter
Nov 25, 2024
Merged

Fix URL extraction when the regex falls short#1190
lfoppiano merged 9 commits intomasterfrom
fix-url-extraction-regex-shorter

Conversation

@lfoppiano
Copy link
Copy Markdown
Member

This PR fixes the URL extraction when the regular expression is shorter than the actual target (the annotated URL).

@coveralls
Copy link
Copy Markdown

coveralls commented Oct 24, 2024

Coverage Status

coverage: 40.768% (+0.01%) from 40.755%
when pulling 35ec905 on fix-url-extraction-regex-shorter
into be44579 on master.

@lfoppiano lfoppiano marked this pull request as ready for review October 26, 2024 04:54
@lfoppiano lfoppiano linked an issue Oct 31, 2024 that may be closed by this pull request
@lfoppiano lfoppiano requested a review from kermitt2 November 12, 2024 17:36
@lfoppiano
Copy link
Copy Markdown
Member Author

lfoppiano commented Nov 13, 2024

Added a fix for the edge case:

image

Where genius editors are adding the - for breaking up an URL over two lines.

Here the document: https://doi.org/10.1038/s41588-024-01785-9

@lfoppiano lfoppiano self-assigned this Nov 21, 2024
@lfoppiano lfoppiano added this to the 0.8.2 milestone Nov 21, 2024
@lfoppiano lfoppiano merged commit 61162e7 into master Nov 25, 2024
@lfoppiano lfoppiano deleted the fix-url-extraction-regex-shorter branch November 25, 2024 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

URLs where the regex capture less than the annotations are not consolidated with the clickable links from the PDF document

3 participants