Skip to content

Update jruby to 9.4.12.1#1293

Merged
lfoppiano merged 1 commit into
masterfrom
bugfix/fix-jruby-update
May 26, 2025
Merged

Update jruby to 9.4.12.1#1293
lfoppiano merged 1 commit into
masterfrom
bugfix/fix-jruby-update

Conversation

@lfoppiano

Copy link
Copy Markdown
Member

This PR fixes the regression introduced by updating to jruby 9.4.12.0 (in #1261):

  • update JRUBY
  • update pragmatic segmenter to 0.3.24

More tests are required, ideally a few thousand PDFs. Before testing, you have to switch to the pragmatic segmenter in the configuration:

  sentenceDetectorFactory: "org.grobid.core.lang.impl.PragmaticSentenceDetectorFactory"
#  sentenceDetectorFactory: "org.grobid.core.lang.impl.OpenNLPSentenceDetectorFactory"

@coveralls

Copy link
Copy Markdown

Coverage Status

coverage: 40.576%. remained the same
when pulling a0a82bb on bugfix/fix-jruby-update
into 23eef0f on master.

@lfoppiano lfoppiano requested a review from Copilot May 26, 2025 19:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates JRuby to 9.4.12.1 and bumps the pragmatic segmenter version to 0.3.24 to address a regression introduced in the previous update. Key changes include updating version numbers, refactoring method calls from instance calls to the new Rule.apply class method across multiple files, and adjusting text initialization from Text.new(text) to text.dup.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
grobid-home/sentence-segmentation/pragmatic_segmenter/version.rb Bumped version from "0.3.22" to "0.3.24".
grobid-home/sentence-segmentation/pragmatic_segmenter/types.rb Refactored rule application to use the new Rule.apply class method.
grobid-home/sentence-segmentation/pragmatic_segmenter/punctuation_replacer.rb Updated rule calls to use Rule.apply for both @text and local variables.
grobid-home/sentence-segmentation/pragmatic_segmenter/processor.rb Replaced instance .apply calls with Rule.apply for consistency.
grobid-home/sentence-segmentation/pragmatic_segmenter/list.rb Changed Text instantiation to text duplications and updated rule calls accordingly.
grobid-home/sentence-segmentation/pragmatic_segmenter/languages/* Standardized rule application for various language-specific processors.
grobid-home/sentence-segmentation/pragmatic_segmenter/cleaner.rb Updated text initialization and switched to Rule.apply throughout.
grobid-home/sentence-segmentation/pragmatic_segmenter/abbreviation_replacer.rb Applied similar changes to text handling and rule application.
Comments suppressed due to low confidence (2)

grobid-home/sentence-segmentation/pragmatic_segmenter/languages/common/numbers.rb:50

  • Ensure the updated regex pattern still correctly matches all intended numbered references; consider adding specific test cases to cover potential edge cases.
NUMBERED_REFERENCE_REGEX = /(?<=[^\d\s])(\.|∯)((\[(\d{1,3},?\s?-?\s?)?\b\d{1,3}\])+|((\d{1,3}\s?){0,3}\d{1,3}))(\s)(?=[A-Z])/

grobid-home/sentence-segmentation/pragmatic_segmenter/list.rb:51

  • Verify that replacing 'Text.new(text)' with 'text.dup' preserves any specialized behavior provided by the Text class and does not introduce unwanted side effects.
@text = text.dup

@lfoppiano lfoppiano merged commit 11d3763 into master May 26, 2025
11 checks passed
@lfoppiano lfoppiano deleted the bugfix/fix-jruby-update branch May 26, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants