Fulltext fetcher for IACR eprints#9651
Merged
Merged
Conversation
Contributor
Author
|
The failing fetcher tests seem unrelated to my changes. Will take a closer look later. |
Siedlerchr
previously approved these changes
Mar 6, 2023
Siedlerchr
left a comment
Member
There was a problem hiding this comment.
codewise lgtm! Thanks for the addition.
Siedlerchr
reviewed
Mar 6, 2023
| if (urlField.isPresent()) { | ||
| String descriptiveHtml = getHtml(urlField.get()); | ||
| String startOfFulltextLink = "<a class=\"btn btn-sm btn-outline-dark\""; | ||
| String fulltextLinkAsInHtml = getRequiredValueBetween(startOfFulltextLink, ".pdf", descriptiveHtml); |
Member
There was a problem hiding this comment.
Alternatively, one could use JSOUP here https://jsoup.org/ (see e.g. SemanticScholar) to operate directly on the HTML dom elements, but as this getValueBeteween is already used here it's fine
Member
|
@SECtim you can ignore the unrelated fetcher tests. Sometimes they fail just on GitHub if they are executed to often or Github IPs are blocked |
Member
|
Just did a small simplification of the tests, it's easier to directly compare the Optionals |
calixtus
approved these changes
Mar 6, 2023
Siedlerchr
approved these changes
Mar 6, 2023
Siedlerchr
added a commit
that referenced
this pull request
Mar 14, 2023
…rg.mariadb.jdbc-mariadb-java-client-3.1.0 * upstream/main: (357 commits) Fix syntax Add experimental Fetcher for Bibliotheksverbund Bayern with MarcXML parser (#9641) Update guidelines-for-setting-up-a-local-workspace.md Update guidelines-for-setting-up-a-local-workspace.md Bump org.tinylog:slf4j-tinylog from 2.6.0 to 2.6.1 (#9665) Bump apple-actions/import-codesign-certs from 1 to 2 (#9662) Bump com.puppycrawl.tools:checkstyle from 10.8.0 to 10.8.1 (#9661) Bump gittools/actions from 0.9.15 to 0.10.2 (#9663) Bump hmarr/auto-approve-action from 3.1.0 to 3.2.0 (#9664) Bump io.github.classgraph:classgraph from 4.8.156 to 4.8.157 (#9666) Bump org.tinylog:tinylog-api from 2.6.0 to 2.6.1 (#9667) Add option to open arks in the browser from an ark identifier (#9601) remove "jdk 19 does not work" (#9658) Fulltext fetcher for IACR eprints (#9651) Observable Preferences S (#9619) Issue 9646: Right-click context menu "Attach file from URL" (#9648) Improve the INSPIREFetcher in "Update with bibliographic information from the web" (#9645) Bump appleboy/ssh-action from 0.1.7 to 0.1.8 (#9653) Bump com.fasterxml.jackson.datatype:jackson-datatype-jsr310 (#9656) Bump com.puppycrawl.tools:checkstyle from 10.7.0 to 10.8.0 (#9655) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds fulltext fetching for IACR eprint PDFs.
My Java skills are a bit rusty, and while the code works, I'm happy to make additional changes if there are more elegant ways to do things.
I am unsure about whether to add anything to the developer docs on fetchers, as they seem to only cover a subset of all fetchers anyway (so far, the IACR fetcher isn't mentioned at all).
CHANGELOG.mddescribed in a way that is understandable for the average user (if applicable)