ScienceDirect fulltext fetcher missing scidir-pdf link#15382
Conversation
Review Summary by QodoSupport direct PDF links in ScienceDirect fetcher
WalkthroughsDescription• Added support for direct PDF links from Elsevier API responses • Introduced DoiResolution record to track PDF availability • Returns PDF directly when scidir-pdf link present, skipping scraping • Added comprehensive unit test with mocked API response Diagramflowchart LR
A["Elsevier API Response"] -->|Contains scidir-pdf link| B["DoiResolution with isPdfDirect=true"]
A -->|Only scidir link| C["DoiResolution with isPdfDirect=false"]
B -->|Return PDF directly| D["Skip Jsoup scraping"]
C -->|Scrape article page| E["Extract PDF link via Jsoup"]
File Changes1. jablib/src/main/java/org/jabref/logic/importer/fetcher/ScienceDirect.java
|
Code Review by Qodo
1.
|
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| private String getUrlByDoi(String doi) throws UnirestException { | ||
| private record DoiResolution(String url, boolean isPdfDirect) { } |
There was a problem hiding this comment.
I really dont like this tuple-results. Needs special knowledge to be used on the callers side. A blank String could also be misinterpreted.
How about a sealed interface with encoded types? Modern Java allows data-oriented patterns.
sealed interface DoiResolution permits Pdf, ScienceDirect, NotFound {}
record Pdf(String url) implements DoiResolution {}
record ScienceDirect(String url) implements DoiResolution {}
record NotFound() implements DoiResolution {}There was a problem hiding this comment.
Hmm fair enough. Should be fine now.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ All tests passed ✅🏷️ Commit: 097e9bf Learn more about TestLens at testlens.app. |
* Use scidir-pdf link from Elsevier API response when available * Changelog * use sealed interface for DoiResolution * Refactor for readability * Fix formatting --------- Co-authored-by: Carl Christian Snethlage <calixtus@users.noreply.github.com>
Related issues and pull requests
Closes #12161
PR Description
The Elsevier API response for a ScienceDirect article can include a direct PDF link alongside the article landing page link (scidir) in its coredata.link array. The fetcher was only reading scidir and discarding scidir-pdf, so it always fell through to Jsoup scraping the article page which frequently fails. This fix reads scidir-pdf when present and returns it directly which skips the scrape entirely.
Steps to test
The scidir-pdf link only appears for institutional Elsevier API keys or certain open access articles so manual end to end testing requires an institutional key but the fix could easily be tested by the new unit test though which mocks the API response with a fixture containing all three link types and asserts the PDF URL is returned directly.
./gradlew :jablib:fetcherTest --tests "org.jabref.logic.importer.fetcher.ScienceDirectTest.findsPdfDirectlyWhenSciDirPdfPresent"Checklist
CHANGELOG.mdin a way that can be understood by the average user (if change is visible to the user)