Skip to content

feat: add benchmarks for Lucene fulltext search and linked file indexing, including setup and teardown of the index.#15385

Merged
calixtus merged 1 commit into
JabRef:mainfrom
YashUkhare:benchmark-lucene
Mar 25, 2026
Merged

feat: add benchmarks for Lucene fulltext search and linked file indexing, including setup and teardown of the index.#15385
calixtus merged 1 commit into
JabRef:mainfrom
YashUkhare:benchmark-lucene

Conversation

@YashUkhare

Copy link
Copy Markdown
Contributor

Related issues and pull requests

Closes _____

PR Description

Steps to test

Checklist

  • I own the copyright of the code submitted and I license it under the MIT license
  • [.] I manually tested my changes in running JabRef (always required)
  • [.] I added JUnit tests for changes (if applicable)
  • [.] I added screenshots in the PR description (if change is visible to the user)
  • [.] I added a screenshot in the PR description showing a library with a single entry with me as author and as title the issue number
  • [.] I described the change in CHANGELOG.md in a way that can be understood by the average user (if change is visible to the user)
  • [.] I checked the user documentation for up to dateness and submitted a pull request to our user documentation repository

…ing, including setup and teardown of the index.
@github-actions

Copy link
Copy Markdown
Contributor

Hey @YashUkhare! 👋

Thank you for contributing to JabRef!

We have automated checks in place, based on which you will soon get feedback if any of them are failing. We also use Qodo for review assistance. It will update your pull request description with a review help and offer suggestions to improve the pull request.

After all automated checks pass, a maintainer will also review your contribution. Once that happens, you can go through their comments in the "Files changed" tab and act on them, or reply to the conversation if you have further inputs. You can read about the whole pull request process in our contribution guide.

Please ensure that your pull request is in line with our AI Usage Policy and make necessary disclosures.

@qodo-free-for-open-source-projects

Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Add Lucene fulltext search and linked file indexing benchmarks

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Implement Lucene fulltext search and linked file indexing benchmarks
• Add setup initialization for PDF entries and search infrastructure
• Replace placeholder TODO implementations with functional benchmark methods
• Add teardown cleanup for index directory and resources
Diagram
flowchart LR
  A["Benchmark Setup"] -->|Initialize| B["Lucene Index"]
  B -->|Index PDF Entries| C["DefaultLinkedFilesIndexer"]
  C -->|Enable Search| D["LinkedFilesSearcher"]
  D -->|Execute Query| E["Search Benchmark"]
  C -->|Rebuild Index| F["Index Benchmark"]
  E -->|Cleanup| G["TearDown"]
  F -->|Cleanup| G
Loading

Grey Divider

File Changes

1. jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java ✨ Enhancement +59/-6

Implement Lucene search and indexing benchmarks with setup/teardown

• Added imports for Lucene indexing, file searching, and file utilities
• Introduced instance variables for linkedFilesIndexer, linkedFilesSearcher,
 fulltextSearchQuery, luceneIndexDir, and pdfEntries
• Enhanced init() method to create temporary Lucene index directory, mock file preferences and
 database context, and initialize PDF entries with linked files
• Implemented search() benchmark to execute fulltext search queries using LinkedFilesSearcher
• Implemented index() benchmark to measure index rebuild performance by removing and re-adding
 entries
• Added @TearDown method to close indexer and clean up temporary index directory

jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Mar 22, 2026

Copy link
Copy Markdown
Contributor

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (1) 📎 Requirement gaps (0) 📐 Spec deviations (0)

Grey Divider


Remediation recommended

1. init() mixes responsibilities 📘 Rule violation ⚙ Maintainability
Description
The updated init() method now performs unrelated setup steps (temp dir creation, mocking
preferences/context, test data creation, indexing, and query creation) in one place. This reduces
readability and makes future benchmark setup changes harder to maintain.
Code

jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[R100-123]

+        luceneIndexDir = Files.createTempDirectory("jabref-benchmark-lucene");
+
+        FilePreferences filePreferences = mock(FilePreferences.class);
+        when(filePreferences.shouldFulltextIndexLinkedFiles()).thenReturn(true);
+
+        Path pdfResourceDir = Path.of("src/test/resources/pdfs");
+        BibDatabaseContext linkedFilesContext = mock(BibDatabaseContext.class);
+        when(linkedFilesContext.getDatabasePath()).thenReturn(Optional.of(pdfResourceDir.resolve("dummy.bib")));
+        when(linkedFilesContext.getFileDirectories(filePreferences)).thenReturn(List.of(pdfResourceDir));
+        when(linkedFilesContext.getFulltextIndexPath()).thenReturn(luceneIndexDir);
+
+        pdfEntries = List.of(
+                new BibEntry(StandardEntryType.PhdThesis)
+                        .withCitationKey("ExampleThesis2017")
+                        .withFiles(List.of(new LinkedFile("Example Thesis", "thesis-example.pdf", StandardFileType.PDF.getName())))
+        );
+        when(linkedFilesContext.getEntries()).thenReturn(pdfEntries);
+
+        linkedFilesIndexer = new DefaultLinkedFilesIndexer(linkedFilesContext, filePreferences);
+        linkedFilesIndexer.addToIndex(pdfEntries, mock(BackgroundTask.class));
+
+        linkedFilesSearcher = new LinkedFilesSearcher(linkedFilesContext, linkedFilesIndexer, filePreferences);
+
+        fulltextSearchQuery = new SearchQuery("title", EnumSet.of(SearchFlags.FULLTEXT));
Evidence
PR Compliance ID 10 requires new/changed methods to remain small and single-responsibility. The
added block in init() combines multiple concerns (filesystem temp dir, Mockito wiring, entry
creation, indexing, and search query setup) into a single method.

AGENTS.md
jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[100-123]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`Benchmarks.init()` has grown to include multiple unrelated responsibilities (temp filesystem setup, Mockito configuration, test data creation, indexing setup, and query construction), making it harder to read and maintain.

## Issue Context
This code is benchmark setup code, but it still benefits from being structured into focused helpers to keep the benchmark class understandable.

## Fix Focus Areas
- jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[100-123]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. CWD-dependent PDF resolution 🐞 Bug ⛯ Reliability
Description
Benchmarks.init() builds the PDF fixture directory using a relative path, so PDF resolution (and
thus indexing) depends on the JVM working directory. If the benchmark is launched from a different
directory, linked file resolution can fail and the Lucene index will be empty or incomplete, making
the benchmark results meaningless.
Code

jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[R105-110]

+        Path pdfResourceDir = Path.of("src/test/resources/pdfs");
+        BibDatabaseContext linkedFilesContext = mock(BibDatabaseContext.class);
+        when(linkedFilesContext.getDatabasePath()).thenReturn(Optional.of(pdfResourceDir.resolve("dummy.bib")));
+        when(linkedFilesContext.getFileDirectories(filePreferences)).thenReturn(List.of(pdfResourceDir));
+        when(linkedFilesContext.getFulltextIndexPath()).thenReturn(luceneIndexDir);
+
Evidence
The benchmark passes a relative directory ("src/test/resources/pdfs") as the base directory for
linked-file resolution. LinkedFile.findIn delegates to FileUtil.find, which checks existence via
directory.resolve(fileName) + Files.exists; if the directory is relative, the filesystem lookup is
implicitly relative to the current working directory.

jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[105-116]
jablib/src/main/java/org/jabref/model/entry/LinkedFile.java[244-270]
jablib/src/main/java/org/jabref/logic/util/io/FileUtil.java[446-495]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The benchmark uses `Path.of("src/test/resources/pdfs")` as a relative path. Because linked-file resolution ultimately uses `Files.exists(directory.resolve(fileName))`, a relative directory makes benchmark execution depend on the JVM working directory.

### Issue Context
The benchmark is meant to be runnable via JMH/Gradle forks and also potentially from IDEs or from the generated JMH runner jar. These environments can differ in `user.dir`.

### Fix Focus Areas
- jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[100-115]

### Suggested fix approach
- Resolve the PDF fixture location in a working-directory-independent way, e.g.:
 - Add the needed PDF(s) under `jablib/src/jmh/resources/pdfs/` and load via `Benchmarks.class.getResource("/pdfs/thesis-example.pdf")`, converting the URL to a `Path`, OR
 - Compute an absolute path robustly (e.g., probe both `Path.of("src/test/resources/pdfs")` and `Path.of("jablib/src/test/resources/pdfs")` and fail fast with a clear error if neither exists).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Mock creation in benchmark 🐞 Bug ➹ Performance
Description
Benchmarks.index() creates a new Mockito BackgroundTask mock on every benchmark invocation, adding
allocation/reflection overhead into the measured time. This distorts the indexing benchmark and can
dominate results for small indexes.
Code

jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[R161-163]

+    public void index() throws IOException {
+        linkedFilesIndexer.removeAllFromIndex();
+        linkedFilesIndexer.addToIndex(pdfEntries, mock(BackgroundTask.class));
Evidence
The index() @Benchmark method calls mock(BackgroundTask.class) each invocation. The indexer uses
the task object repeatedly per file (isCancelled/setTitle/updateProgress/updateMessage/showToUser),
so a reusable no-op BackgroundTask instance is sufficient and avoids per-iteration Mockito overhead.

jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[160-164]
jablib/src/main/java/org/jabref/logic/search/indexing/DefaultLinkedFilesIndexer.java[138-151]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The JMH `index()` benchmark currently includes Mockito mock creation (`mock(BackgroundTask.class)`) in the timed section. This adds non-indexing overhead and reduces benchmark fidelity.

### Issue Context
`DefaultLinkedFilesIndexer.addToIndex` only calls methods on the `BackgroundTask` for progress/cancel checks; it doesn’t require Mockito behavior.

### Fix Focus Areas
- jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[68-76]
- jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[118-123]
- jablib/src/jmh/java/org/jabref/benchmarks/Benchmarks.java[160-164]

### Suggested fix approach
- Add a field like `private BackgroundTask<Void> noopIndexTask;`
- Initialize it once in `@Setup` using `BackgroundTask.wrap(() -> {})` (or a small custom BackgroundTask subclass).
- Use the same instance in both the setup indexing and in `index()`:
 - `linkedFilesIndexer.addToIndex(pdfEntries, noopIndexTask);`
This keeps the benchmark focused on Lucene/PDF indexing work.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@jabref-machine

Copy link
Copy Markdown
Collaborator

Note that your PR will not be reviewed/accepted until you have gone through the mandatory checks in the description and marked each of them them exactly in the format of - [x] (done), - [ ] (yet to be done) or - [/] (not applicable). Please adhere to our pull request template.

@github-actions github-actions Bot added the status: changes-required Pull requests that are not yet complete label Mar 22, 2026
@testlens-app

testlens-app Bot commented Mar 22, 2026

Copy link
Copy Markdown

✅ All tests passed ✅

🏷️ Commit: b4729a9
▶️ Tests: 10203 executed
⚪️ Checks: 60/60 completed


Learn more about TestLens at testlens.app.

@faneeshh

Copy link
Copy Markdown
Collaborator

What issue is this supposed to close? Did you mean to create a draft PR?

@YashUkhare

Copy link
Copy Markdown
Contributor Author

What issue is this supposed to close? Did you mean to create a draft PR?

This PR is not linked to an existing issue. I noticed some TODO comments in the codebase and implemented them.

@calixtus calixtus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks legit to me.

@calixtus calixtus added this pull request to the merge queue Mar 25, 2026
@github-actions github-actions Bot added the status: to-be-merged PRs which are accepted and should go into the merge-queue. label Mar 25, 2026
Merged via the queue into JabRef:main with commit 07637c7 Mar 25, 2026
57 of 60 checks passed
jadegold55 pushed a commit to jadegold55/jabref that referenced this pull request Mar 26, 2026
Siedlerchr added a commit to geovani-rocha/jabref that referenced this pull request Mar 28, 2026
…o fix-group-icons

* 'fix-group-icons' of github.com:geovani-rocha/jabref: (26 commits)
  chore(deps): update dependency org.apache.logging.log4j:log4j-to-slf4j to v2.25.4 (JabRef#15436)
  chore(deps): update jackson monorepo to v3.1.1 (JabRef#15435)
  Fix PushToPreferences reset and import (JabRef#15395)
  Add fulltext fetcher for Wiley via their TDM API (JabRef#15388)
  Embed in-text nature in reference marks for CSL citations (JabRef#15381)
  Chore(deps): Bump com.gradleup.shadow:shadow-gradle-plugin (JabRef#15430)
  Fix not on fx thread exceptions for cleanup and cite key generator (JabRef#15424)
  Revert "Update gradle to nightly of 2026-03-23 (JabRef#15372)"
  feat: add benchmarks for Lucene fulltext search and linked file indexing, including setup and teardown of the index. (JabRef#15385)
  Chore(deps): Bump org.openrewrite.recipe:rewrite-recipe-bom (JabRef#15418)
  Add claude gitignore (JabRef#15413)
  Fix group filter icon in side pane (JabRef#15408)
  Add new prs_link feature
  Chore(deps): Bump org.glassfish.hk2:hk2-api in /versions (JabRef#15422)
  Chore(deps): Bump org.openrewrite.rewrite from 7.28.2 to 7.29.0 (JabRef#15419)
  Chore(deps): Bump jablib/src/main/resources/csl-styles (JabRef#15417)
  Fix for inconsistent "hide tab bar" behavior (JabRef#15409)
  Update dependency org.glassfish.hk2:hk2-utils to v4 (JabRef#15407)
  Persist file notifications (JabRef#15403)
  Update dependency org.glassfish.hk2:hk2-locator to v4 (JabRef#15405)
  ...
Ranjeet2702 pushed a commit to Ranjeet2702/jabref that referenced this pull request Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first contrib status: changes-required Pull requests that are not yet complete status: to-be-merged PRs which are accepted and should go into the merge-queue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants