Skip to content

OPENNLP-1731: Add Junits for NGramLanguageModelTool#778

Merged
rzo1 merged 11 commits intoapache:mainfrom
NishantShri4:main
May 15, 2025
Merged

OPENNLP-1731: Add Junits for NGramLanguageModelTool#778
rzo1 merged 11 commits intoapache:mainfrom
NishantShri4:main

Conversation

@NishantShri4
Copy link
Contributor

@NishantShri4 NishantShri4 commented May 5, 2025

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

@NishantShri4 NishantShri4 marked this pull request as ready for review May 5, 2025 13:02
@NishantShri4
Copy link
Contributor Author

NishantShri4 commented May 5, 2025

Will wait for the pipeline checks to complete before requesting review. Thanks.

@rzo1 rzo1 requested review from Copilot, mawiesne and rzo1 and removed request for Copilot May 5, 2025 18:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds JUnit test cases for the NGramLanguageModelTool and includes supporting test resources for language modeling and command line tests. The key changes include:

  • Correction of a spelling error in the sentences test resource.
  • Addition of two new test resource files for language model sentences.
  • Creation of new JUnit tests for the NGramLanguageModelTool and refactoring of logger configuration in the monitoring tests.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
opennlp-tools/src/test/resources/opennlp/tools/languagemodel/sentences.txt Fixed spelling error ("language-mofdeling" → "language-modeling").
opennlp-tools/src/test/resources/opennlp/tools/cmdline/languagemodel/sentences_set_2.txt Added new test sentences resource for command line language model tests.
opennlp-tools/src/test/resources/opennlp/tools/cmdline/languagemodel/sentences_set_1.txt Added new test sentences resource for command line language model tests.
opennlp-tools/src/test/java/opennlp/tools/monitoring/DefaultTrainingProgressMonitorTest.java Refactored logger setup by extending AbstractLoggerTest and removed redundant static logger config.
opennlp-tools/src/test/java/opennlp/tools/cmdline/languagemodel/NGramLanguageModelToolTest.java Added comprehensive tests for NGramLanguageModelTool including parameterized testing and log assertions.
opennlp-tools/src/test/java/opennlp/tools/AbstractLoggerTest.java Introduced a shared logger test base for configuring and restoring logger settings.

* An abstract class to configure a {@link Logger} instance with a test {@link ListAppender}
* to help with unit-testing.
*/
public abstract class AbstractLoggerTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have https://github.com/Hakky54/log-captor in the (test) classpath. Why not use it for this use-case?
With log-captor it should be possible to avoid the logback stuff, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Copy link
Contributor Author

@NishantShri4 NishantShri4 May 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions.
I have updated it to use logcaptor.
However, using logcaptor still needs switching-on the logging level to INFO to capture logs.
sfl4j doesn't seem to provide an API method to toggle the logging level. Hence needed to use logback (which offers such an api).
The logback use ( to switch-on/restore logging levels) remains limited to this abstract class only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - as an abstraction, we can not use slf4j to change it dynamically, so ok from my point of view.

@NishantShri4 NishantShri4 marked this pull request as draft May 7, 2025 13:37
* An abstract class to configure a {@link Logger} instance with a test {@link ListAppender}
* to help with unit-testing.
*/
public abstract class AbstractLoggerTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

@NishantShri4 NishantShri4 marked this pull request as ready for review May 10, 2025 10:51
Copy link
Contributor

@rzo1 rzo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but needs a rebase.

@NishantShri4
Copy link
Contributor Author

lgtm, but needs a rebase.

Thank you very much. It is rebased with the upstream now. Waiting for pipeline checks to run.

@jzonthemtn
Copy link
Contributor

Thanks @NishantShri4 and reviewers!

@rzo1 rzo1 merged commit 5eec98c into apache:main May 15, 2025
10 checks passed
NishantShri4 added a commit to NishantShri4/opennlp that referenced this pull request Jun 14, 2025
commit 52955e9
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Sat Jun 14 18:50:09 2025 +0100

    OPENNLP-1745: SentenceDetector - Add Junit test for useTokenEnd = false

commit fe59eb9
Merge: 67ac7b2 05f69a4
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Sat Jun 14 07:29:36 2025 +0100

    Merge remote-tracking branch 'origin/main'

commit 05f69a4
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Jun 9 11:49:38 2025 +0200

    OPENNLP-1724: Update JUnit to 5.13.1 (apache#790)

    Bumps `junit.version` from 5.13.0 to 5.13.1.

    Updates `org.junit.jupiter:junit-jupiter-api` from 5.13.0 to 5.13.1
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.13.0...r5.13.1)

    Updates `org.junit.jupiter:junit-jupiter-engine` from 5.13.0 to 5.13.1
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.13.0...r5.13.1)

    Updates `org.junit.jupiter:junit-jupiter-params` from 5.13.0 to 5.13.1
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.13.0...r5.13.1)

    ---
    updated-dependencies:
    - dependency-name: org.junit.jupiter:junit-jupiter-api
      dependency-version: 5.13.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    - dependency-name: org.junit.jupiter:junit-jupiter-engine
      dependency-version: 5.13.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    - dependency-name: org.junit.jupiter:junit-jupiter-params
      dependency-version: 5.13.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 32f4ef7
Author: Richard Zowalla <13417392+rzo1@users.noreply.github.com>
Date:   Sat Jun 7 21:21:36 2025 +0200

    Disable merge request requirement for opennlp-2.x (apache#789)

commit 8abfe0d
Author: Richard Zowalla <13417392+rzo1@users.noreply.github.com>
Date:   Sat Jun 7 20:45:08 2025 +0200

    Remove code review requirement for 2.x branch to allow cherry picking already reviewed commits. (apache#788)

commit 89e4260
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Jun 2 09:17:43 2025 +0200

    OPENNLP-1724: Update JUnit to 5.13.0 (apache#787)

    Bumps `junit.version` from 5.12.2 to 5.13.0.

    Updates `org.junit.jupiter:junit-jupiter-api` from 5.12.2 to 5.13.0
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.12.2...r5.13.0)

    Updates `org.junit.jupiter:junit-jupiter-engine` from 5.12.2 to 5.13.0
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.12.2...r5.13.0)

    Updates `org.junit.jupiter:junit-jupiter-params` from 5.12.2 to 5.13.0
    - [Release notes](https://github.com/junit-team/junit5/releases)
    - [Commits](junit-team/junit-framework@r5.12.2...r5.13.0)

    ---
    updated-dependencies:
    - dependency-name: org.junit.jupiter:junit-jupiter-api
      dependency-version: 5.13.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    - dependency-name: org.junit.jupiter:junit-jupiter-engine
      dependency-version: 5.13.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    - dependency-name: org.junit.jupiter:junit-jupiter-params
      dependency-version: 5.13.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...

    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 2c8e58b
Author: Martin Wiesner <mawiesne@users.noreply.github.com>
Date:   Sat May 24 20:59:20 2025 +0200

    OPENNLP-1708: Raise OpenNLP version to 3.x on main branch (apache#785)

    * OPENNLP-1708: Raise OpenNLP version to 3.x on main branch
    - adjusts all pom.xml files towards 3.0.0-SNAPSHOT
    - adjusts upper major model version to 3.x
    - adds static method Version#between for simpler version range checks in BaseModel
    - adds 'opennlp-2.x' branch to protected branches in .asf.yml
    - updates README.md with infos on 'Branches and Merging Strategy'
    - cures a typo
    - adds external link to the ONNX website

commit 0db3c10
Author: Richard Zowalla <13417392+rzo1@users.noreply.github.com>
Date:   Tue May 20 21:35:27 2025 +0200

    OPENNLP-1545 - Close ZipInputStream in BaseModel (apache#784)

commit 2ed9949
Author: Martin Wiesner <mawiesne@users.noreply.github.com>
Date:   Tue May 20 16:26:59 2025 +0200

    OPENNLP-1734: Adjust GH CI config to build with Java 25-ea (apache#781)

commit 5eec98c
Author: NishantShri4 <45672680+NishantShri4@users.noreply.github.com>
Date:   Thu May 15 09:25:06 2025 +0100

    OPENNLP-1731: Add Junits for NGramLanguageModelTool (apache#778)

    * OPENNLP-1731: Add Junits for NGramLanguageModelTool

    * OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment.

    * OPENNLP-1731: Add Junits for NGramLanguageModelTool

    * OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment.

    * OPENNLP-1731: Fixed a Generic RawType warning.

    * OPENNLP-1731: Rebased against upstream.

    * OPENNLP-1731: Rebased against upstream.

    * OPENNLP-1731: Rebased against upstream (removed extra new line).

    * OPENNLP-1731: Removed an extra newline.

commit 67ac7b2
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Mon May 12 20:23:00 2025 +0100

    OPENNLP-1731: Removed an extra newline.

commit 0d95dd9
Merge: 35de220 2580a20
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Mon May 12 20:20:59 2025 +0100

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	opennlp-tools/src/test/java/opennlp/tools/AbstractLoggerTest.java

commit 35de220
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Mon May 12 20:19:54 2025 +0100

    OPENNLP-1731: Rebased against upstream (removed extra new line).

commit e09f2ad
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Mon May 12 20:18:29 2025 +0100

    OPENNLP-1731: Rebased against upstream.

commit 6d84e2f
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Mon May 12 20:16:21 2025 +0100

    OPENNLP-1731: Rebased against upstream.

commit 2580a20
Merge: 0a20ef5 46d2d78
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Mon May 12 20:09:17 2025 +0100

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	opennlp-tools/src/test/java/opennlp/tools/monitoring/DefaultTrainingProgressMonitorTest.java

commit 0a20ef5
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Mon May 12 20:06:51 2025 +0100

    OPENNLP-1731: Fixed a Generic RawType warning.

commit cfa425f
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Sun May 11 17:32:59 2025 +0100

    OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment.

commit a7eb44a
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Sat May 10 23:24:16 2025 +0100

    OPENNLP-1731: Add Junits for NGramLanguageModelTool

commit f7be29d
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Mon May 12 20:35:53 2025 +0200

    Minor: Regenerated NOTICE File for 21a2a2a (apache#783)

    Signed-off-by: GitHub <noreply@github.com>
    Co-authored-by: mawiesne <mawiesne@users.noreply.github.com>

commit 21a2a2a
Author: Martin Wiesner <mawiesne@users.noreply.github.com>
Date:   Mon May 12 20:34:19 2025 +0200

    OPENNLP-1733: Remove implements Serializable from LanguageDetector (apache#780)

commit 7c72cb0
Author: Martin Wiesner <mawiesne@users.noreply.github.com>
Date:   Mon May 12 20:32:46 2025 +0200

    OPENNLP-1732: Eliminate use of raw types for StopCriteria (apache#779)

commit e4f5ce2
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon May 12 20:32:10 2025 +0200

    OPENNLP-1730: Update ONNX runtime to 1.22.0 (apache#782)

    Bumps `onnxruntime.version` from 1.21.1 to 1.22.0.

    Updates `com.microsoft.onnxruntime:onnxruntime` from 1.21.1 to 1.22.0
    - [Release notes](https://github.com/microsoft/onnxruntime/releases)
    - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md)
    - [Commits](microsoft/onnxruntime@v1.21.1...v1.22.0)

    Updates `com.microsoft.onnxruntime:onnxruntime_gpu` from 1.21.1 to 1.22.0
    - [Release notes](https://github.com/microsoft/onnxruntime/releases)
    - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md)
    - [Commits](microsoft/onnxruntime@v1.21.1...v1.22.0)

    ---
    updated-dependencies:
    - dependency-name: com.microsoft.onnxruntime:onnxruntime
      dependency-version: 1.22.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    - dependency-name: com.microsoft.onnxruntime:onnxruntime_gpu
      dependency-version: 1.22.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...

    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 46d2d78
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Sun May 11 17:32:59 2025 +0100

    OPENNLP-1731: AbstractLoggerTest : Corrected a javadoc comment.

commit 01a4695
Author: Nishant Shrivastava <shrivastava.nishant@gmail.com>
Date:   Sat May 10 23:24:16 2025 +0100

    OPENNLP-1731: Add Junits for NGramLanguageModelTool

commit 1675317
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Wed Apr 30 18:16:50 2025 +0200

    Minor: Regenerated NOTICE File for 95cd7c8 (apache#776)

    Signed-off-by: GitHub <noreply@github.com>
    Co-authored-by: mawiesne <mawiesne@users.noreply.github.com>

commit 95cd7c8
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Wed Apr 30 18:15:51 2025 +0200

    OPENNLP-1730: Update ONNX runtime to 1.21.1 (apache#774)

    Bumps `onnxruntime.version` from 1.21.0 to 1.21.1.

    Updates `com.microsoft.onnxruntime:onnxruntime` from 1.21.0 to 1.21.1
    - [Release notes](https://github.com/microsoft/onnxruntime/releases)
    - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md)
    - [Commits](microsoft/onnxruntime@v1.21.0...v1.21.1)

    Updates `com.microsoft.onnxruntime:onnxruntime_gpu` from 1.21.0 to 1.21.1
    - [Release notes](https://github.com/microsoft/onnxruntime/releases)
    - [Changelog](https://github.com/microsoft/onnxruntime/blob/main/docs/ReleaseManagement.md)
    - [Commits](microsoft/onnxruntime@v1.21.0...v1.21.1)

    ---
    updated-dependencies:
    - dependency-name: com.microsoft.onnxruntime:onnxruntime
      dependency-version: 1.21.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    - dependency-name: com.microsoft.onnxruntime:onnxruntime_gpu
      dependency-version: 1.21.1
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 7c85b94
Author: Martin Wiesner <mawiesne@users.noreply.github.com>
Date:   Wed Apr 30 18:12:19 2025 +0200

    OPENNLP-1729: Provide easier loading of Models for given model lang and type (apache#775)

    - extracts ModelType from DownloadUtil
    - adds new methods to ClassPathModelLoader to obtain actual model instances easily
    - adds ClassPathModelProvider interface
    - adds DefaultClassPathModelProvider which combines existing classes to achieve easier access to model objects via classpath loading
    - adds JUnit tests for the new classes
    - adds and improves JavaDoc

commit 28e2de6
Author: NishantShri4 <45672680+NishantShri4@users.noreply.github.com>
Date:   Fri Apr 25 18:59:14 2025 +0100

    OPENNLP-124: Maxent/Perceptron training should report progress back via an API (apache#758)

    * OPENNLP-124 : Maxent/Perceptron training should report progress back via an API
    * OPENNLP-124 : Fixed Review Comments
    * OPENNLP-124 : Updated javadoc for the new Trainer.init method

commit 2720a1b
Author: Martin Wiesner <mawiesne@users.noreply.github.com>
Date:   Fri Apr 25 17:32:20 2025 +0200

    OPENNLP-1728: Improve JavaDoc of opennlp.tools.models package (apache#772)

commit e1843dc
Author: Martin Wiesner <mawiesne@users.noreply.github.com>
Date:   Wed Apr 23 21:42:29 2025 +0200

    OPENNLP-1727: Correct example snippet for loading a model from the classpath (apache#771)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants