Skip to content

Fulltext fetcher for IACR eprints#9651

Merged
Siedlerchr merged 8 commits into
JabRef:mainfrom
SECtim:iacr_fulltext
Mar 6, 2023
Merged

Fulltext fetcher for IACR eprints#9651
Siedlerchr merged 8 commits into
JabRef:mainfrom
SECtim:iacr_fulltext

Conversation

@SECtim

@SECtim SECtim commented Mar 6, 2023

Copy link
Copy Markdown
Contributor

Adds fulltext fetching for IACR eprint PDFs.

My Java skills are a bit rusty, and while the code works, I'm happy to make additional changes if there are more elegant ways to do things.

I am unsure about whether to add anything to the developer docs on fetchers, as they seem to only cover a subset of all fetchers anyway (so far, the IACR fetcher isn't mentioned at all).

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
    • (not applicable)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.
    • There does not seem to be fetcher-specific documentation for users (regarding fulltext fetching)

@SECtim

SECtim commented Mar 6, 2023

Copy link
Copy Markdown
Contributor Author

The failing fetcher tests seem unrelated to my changes. Will take a closer look later.

Siedlerchr
Siedlerchr previously approved these changes Mar 6, 2023

@Siedlerchr Siedlerchr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codewise lgtm! Thanks for the addition.

if (urlField.isPresent()) {
String descriptiveHtml = getHtml(urlField.get());
String startOfFulltextLink = "<a class=\"btn btn-sm btn-outline-dark\"";
String fulltextLinkAsInHtml = getRequiredValueBetween(startOfFulltextLink, ".pdf", descriptiveHtml);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, one could use JSOUP here https://jsoup.org/ (see e.g. SemanticScholar) to operate directly on the HTML dom elements, but as this getValueBeteween is already used here it's fine

@Siedlerchr

Copy link
Copy Markdown
Member

@SECtim you can ignore the unrelated fetcher tests. Sometimes they fail just on GitHub if they are executed to often or Github IPs are blocked

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Mar 6, 2023
@Siedlerchr

Copy link
Copy Markdown
Member

Just did a small simplification of the tests, it's easier to directly compare the Optionals

@Siedlerchr Siedlerchr merged commit 85ab410 into JabRef:main Mar 6, 2023
Siedlerchr added a commit that referenced this pull request Mar 14, 2023
…rg.mariadb.jdbc-mariadb-java-client-3.1.0

* upstream/main: (357 commits)
  Fix syntax
  Add experimental Fetcher for Bibliotheksverbund Bayern with MarcXML parser (#9641)
  Update guidelines-for-setting-up-a-local-workspace.md
  Update guidelines-for-setting-up-a-local-workspace.md
  Bump org.tinylog:slf4j-tinylog from 2.6.0 to 2.6.1 (#9665)
  Bump apple-actions/import-codesign-certs from 1 to 2 (#9662)
  Bump com.puppycrawl.tools:checkstyle from 10.8.0 to 10.8.1 (#9661)
  Bump gittools/actions from 0.9.15 to 0.10.2 (#9663)
  Bump hmarr/auto-approve-action from 3.1.0 to 3.2.0 (#9664)
  Bump io.github.classgraph:classgraph from 4.8.156 to 4.8.157 (#9666)
  Bump org.tinylog:tinylog-api from 2.6.0 to 2.6.1 (#9667)
  Add option to open arks in the browser from an ark identifier (#9601)
  remove "jdk 19 does not work" (#9658)
  Fulltext fetcher for IACR eprints (#9651)
  Observable Preferences S (#9619)
  Issue 9646: Right-click context menu "Attach file from URL" (#9648)
  Improve the INSPIREFetcher in "Update with bibliographic information from the web" (#9645)
  Bump appleboy/ssh-action from 0.1.7 to 0.1.8 (#9653)
  Bump com.fasterxml.jackson.datatype:jackson-datatype-jsr310 (#9656)
  Bump com.puppycrawl.tools:checkstyle from 10.7.0 to 10.8.0 (#9655)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants