Skip to content

ScienceDirect fulltext fetcher missing scidir-pdf link#15382

Merged
calixtus merged 8 commits into
JabRef:mainfrom
faneeshh:fix-12161
Mar 24, 2026
Merged

ScienceDirect fulltext fetcher missing scidir-pdf link#15382
calixtus merged 8 commits into
JabRef:mainfrom
faneeshh:fix-12161

Conversation

@faneeshh

Copy link
Copy Markdown
Collaborator

Related issues and pull requests

Closes #12161

PR Description

The Elsevier API response for a ScienceDirect article can include a direct PDF link alongside the article landing page link (scidir) in its coredata.link array. The fetcher was only reading scidir and discarding scidir-pdf, so it always fell through to Jsoup scraping the article page which frequently fails. This fix reads scidir-pdf when present and returns it directly which skips the scrape entirely.

Steps to test

The scidir-pdf link only appears for institutional Elsevier API keys or certain open access articles so manual end to end testing requires an institutional key but the fix could easily be tested by the new unit test though which mocks the API response with a fixture containing all three link types and asserts the PDF URL is returned directly.

./gradlew :jablib:fetcherTest --tests "org.jabref.logic.importer.fetcher.ScienceDirectTest.findsPdfDirectlyWhenSciDirPdfPresent"

Checklist

  • I own the copyright of the code submitted and I license it under the MIT license
  • I manually tested my changes in running JabRef (always required)
  • I added JUnit tests for changes (if applicable)
  • [/] I added screenshots in the PR description (if change is visible to the user)
  • [/] I added a screenshot in the PR description showing a library with a single entry with me as author and as title the issue number
  • I described the change in CHANGELOG.md in a way that can be understood by the average user (if change is visible to the user)
  • I checked the user documentation for up to dateness and submitted a pull request to our user documentation repository

@github-actions github-actions Bot added the status: changes-required Pull requests that are not yet complete label Mar 21, 2026
@qodo-free-for-open-source-projects

Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Support direct PDF links in ScienceDirect fetcher

🐞 Bug fix 🧪 Tests

Grey Divider

Walkthroughs

Description
• Added support for direct PDF links from Elsevier API responses
• Introduced DoiResolution record to track PDF availability
• Returns PDF directly when scidir-pdf link present, skipping scraping
• Added comprehensive unit test with mocked API response
Diagram
flowchart LR
  A["Elsevier API Response"] -->|Contains scidir-pdf link| B["DoiResolution with isPdfDirect=true"]
  A -->|Only scidir link| C["DoiResolution with isPdfDirect=false"]
  B -->|Return PDF directly| D["Skip Jsoup scraping"]
  C -->|Scrape article page| E["Extract PDF link via Jsoup"]
Loading

Grey Divider

File Changes

1. jablib/src/main/java/org/jabref/logic/importer/fetcher/ScienceDirect.java 🐞 Bug fix +19/-7

Extract and prioritize scidir-pdf links from API

• Created DoiResolution record to encapsulate URL and PDF availability flag
• Modified getUrlByDoi() to extract and return both scidir and scidir-pdf links
• Added logic to return PDF directly when scidir-pdf link is present
• Updated findFullText() to check for direct PDF and skip Jsoup scraping when available
• Changed variable references from String urlFromDoi to resolved.url()

jablib/src/main/java/org/jabref/logic/importer/fetcher/ScienceDirect.java


2. jablib/src/test/java/org/jabref/logic/importer/fetcher/ScienceDirectTest.java 🧪 Tests +37/-1

Add test for direct PDF link handling

• Added imports for mocking HTTP requests and responses
• Added new test findsPdfDirectlyWhenSciDirPdfPresent() with mocked API response
• Test verifies PDF URL is returned directly when scidir-pdf link is present
• Fixed API key retrieval to use "Scopus" instead of ScienceDirect.FETCHER_NAME

jablib/src/test/java/org/jabref/logic/importer/fetcher/ScienceDirectTest.java


3. CHANGELOG.md 📝 Documentation +1/-0

Document ScienceDirect PDF fetcher fix

• Added entry documenting the fix for ScienceDirect PDF fetching issue #12161

CHANGELOG.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Mar 21, 2026

Copy link
Copy Markdown
Contributor

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0) 📐 Spec deviations (0)

Grey Divider


Action required

1. Weak assertion on PDF URL📘 Rule violation ✓ Correctness
Description
The new unit test asserts only that the result URL contains("pdfft"), which can pass for incorrect
URLs and weakens regression protection. JabRef JUnit conventions prefer direct content assertions
over indirect assertTrue predicate checks.
Code

jablib/src/test/java/org/jabref/logic/importer/fetcher/ScienceDirectTest.java[R110-114]

+            Optional<URL> result = finder.findFullText(entry);
+
+            assertTrue(result.isPresent());
+            assertTrue(result.get().toString().contains("pdfft"));
+        }
Evidence
The checklist requires strong, content-based assertions and discourages indirect boolean checks; the
added test uses assertTrue with isPresent() and contains(...) instead of asserting the exact
expected Optional value.

AGENTS.md
AGENTS.md
jablib/src/test/java/org/jabref/logic/importer/fetcher/ScienceDirectTest.java[110-114]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The test `findsPdfDirectlyWhenSciDirPdfPresent` uses weak predicate assertions (`isPresent()` and `contains(&amp;amp;quot;pdfft&amp;amp;quot;)`) instead of verifying the exact URL returned.
## Issue Context
This behavior change is intended to return the `scidir-pdf` link directly. The test should assert the full expected URL (or at minimum equality to the mocked `@href` value) to reliably prevent regressions.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/importer/fetcher/ScienceDirectTest.java[91-115]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Magic string "Scopus" in test📘 Rule violation ⚙ Maintainability
Description
The modified test stubs getApiKey using the literal "Scopus" instead of reusing the existing
fetcher constant, increasing risk of typos and drift if the name changes. This reduces
maintainability and consistency across the codebase.
Code

jablib/src/test/java/org/jabref/logic/importer/fetcher/ScienceDirectTest.java[44]

+        when(importerPreferences.getApiKey("Scopus")).thenReturn(apiKey);
Evidence
The checklist requires maintainable code without avoidable duplication/magic values; the changed
line hardcodes the fetcher name where a shared constant exists.

AGENTS.md
jablib/src/test/java/org/jabref/logic/importer/fetcher/ScienceDirectTest.java[44-44]
jablib/src/main/java/org/jabref/logic/importer/fetcher/Scopus.java[32-33]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ScienceDirectTest` hardcodes the fetcher key name as `&amp;amp;quot;Scopus&amp;amp;quot;`, duplicating the value instead of using the existing shared constant.
## Issue Context
A constant already exists (`Scopus.FETCHER_NAME`). Using it reduces the risk of typos and keeps tests aligned if the fetcher name ever changes.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/importer/fetcher/ScienceDirectTest.java[42-46]
- jablib/src/main/java/org/jabref/logic/importer/fetcher/Scopus.java[32-33]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. DoiResolution name collision 🐞 Bug ⚙ Maintainability
Description
ScienceDirect introduces a nested record named DoiResolution, which shares the same simple name as
the existing public org.jabref.logic.importer.fetcher.DoiResolution class in the same package.
This name collision makes code navigation and future refactors more error-prone and confusing.
Code

jablib/src/main/java/org/jabref/logic/importer/fetcher/ScienceDirect.java[R137-140]

+    private record DoiResolution(String url, boolean isPdfDirect) { }
+
+    private DoiResolution getUrlByDoi(String doi) throws UnirestException {
   String sciLink = "";
Evidence
ScienceDirect now defines a private nested DoiResolution record, while the package already
contains a public DoiResolution fetcher class; having two different types with the same simple
name in the same package increases the chance of confusion during maintenance and review.

jablib/src/main/java/org/jabref/logic/importer/fetcher/ScienceDirect.java[137-140]
jablib/src/main/java/org/jabref/logic/importer/fetcher/DoiResolution.java[31-36]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ScienceDirect` introduces a nested record named `DoiResolution`, but the same package already has a public class named `DoiResolution`. This simple-name collision is confusing and makes future refactors/imports/navigation riskier than necessary.
### Issue Context
This is purely a naming/maintainability problem (no runtime failure), but it violates the “least surprise” principle because developers already associate `DoiResolution` with the existing DOI-redirect fulltext fetcher.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/fetcher/ScienceDirect.java[55-66]
- jablib/src/main/java/org/jabref/logic/importer/fetcher/ScienceDirect.java[137-170] շրջ
### Suggested change
Rename the nested record to something unique and specific (e.g., `ScienceDirectLinkResolution`, `ElsevierDoiResolution`, or `SciDirResolution`) and update references (`resolved` type + return type of `getUrlByDoi`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@github-actions github-actions Bot added good second issue Issues that involve a tour of two or three interweaved components in JabRef component: fetcher labels Mar 21, 2026
@testlens-app

This comment has been minimized.

}

private String getUrlByDoi(String doi) throws UnirestException {
private record DoiResolution(String url, boolean isPdfDirect) { }

@calixtus calixtus Mar 21, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dont like this tuple-results. Needs special knowledge to be used on the callers side. A blank String could also be misinterpreted.
How about a sealed interface with encoded types? Modern Java allows data-oriented patterns.

sealed interface DoiResolution permits Pdf, ScienceDirect, NotFound {}
record Pdf(String url) implements DoiResolution {}
record ScienceDirect(String url) implements DoiResolution {}
record NotFound() implements DoiResolution {}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm fair enough. Should be fine now.

@testlens-app

This comment has been minimized.

@testlens-app

This comment has been minimized.

@github-actions github-actions Bot added status: no-bot-comments and removed status: changes-required Pull requests that are not yet complete labels Mar 21, 2026
@testlens-app

This comment has been minimized.

@testlens-app

testlens-app Bot commented Mar 24, 2026

Copy link
Copy Markdown

✅ All tests passed ✅

🏷️ Commit: 097e9bf
▶️ Tests: 10203 executed
⚪️ Checks: 52/52 completed


Learn more about TestLens at testlens.app.

@calixtus calixtus added this pull request to the merge queue Mar 24, 2026
@github-actions github-actions Bot added the status: to-be-merged PRs which are accepted and should go into the merge-queue. label Mar 24, 2026
Merged via the queue into JabRef:main with commit 2752d33 Mar 24, 2026
52 checks passed
Ranjeet2702 pushed a commit to Ranjeet2702/jabref that referenced this pull request Apr 14, 2026
* Use scidir-pdf link from Elsevier API response when available

* Changelog

* use sealed interface for DoiResolution

* Refactor for readability

* Fix formatting

---------

Co-authored-by: Carl Christian Snethlage <calixtus@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: fetcher good second issue Issues that involve a tour of two or three interweaved components in JabRef status: no-bot-comments status: to-be-merged PRs which are accepted and should go into the merge-queue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Obtain fulltext from sciencedirect

2 participants