Skip to content

Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists#6937

Closed
Toromtomtom wants to merge 4 commits into
JabRef:masterfrom
Toromtomtom:fix-6922
Closed

Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists#6937
Toromtomtom wants to merge 4 commits into
JabRef:masterfrom
Toromtomtom:fix-6922

Conversation

@Toromtomtom

Copy link
Copy Markdown
Contributor

Fixes #6922

  • Change in CHANGELOG.md described (not applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

@tobiasdiez

Copy link
Copy Markdown
Member

As you noticed, the underlying problem is actually that the DOI fetcher has a higher trust value as the publishers. I think it would be a good idea to change it to "publishers > identifier-based resolution (doi, arXiv) > general search (google)". @JabRef/developers @Toromtomtom do you see any problem with this solution?

@Toromtomtom

Copy link
Copy Markdown
Contributor Author

I think it would be a good idea to change it to "publishers > identifier-based resolution (doi, arXiv) > general search (google)".

I also think that this would be a better solution.

@koppor

koppor commented Sep 24, 2020

Copy link
Copy Markdown
Member

+1 from my side, too

@Toromtomtom

Copy link
Copy Markdown
Contributor Author

I reverted my previous commits and decreased the trust level of the DOI resolution fetcher. This works for me, but maybe someone more involved in the project wants to weigh in on the ranking of the full text fetchers.

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Sep 25, 2020
@koppor

koppor commented Sep 26, 2020

Copy link
Copy Markdown
Member

Food for thought:

  • A DOI uniquely identifies a paper. Per defition, a DOI leads to the right paper. Everything else is good guessing.
  • One title of a paper may lead to different publications of it. One the confernce version, the other the journal version. --> the PDF could be chosen randomly
  • What about the consequences for other fetchers? Do we overlook something?
  • Can't we contact Springer to fix their DOI 2 PDF mapping?

Proposal: Can we add a special handling for Springer? If a DOI directs to Springer, we use the Springer Fetcher. In all other cases, the functionality is untouched. In this way, we accept that this is a hack.

To really judge, there would be a test needed retrieving 1000 papers and check whether the retrieval rate is higher or lower with this check. - Alternatively, can we add telemetry for that?

@Override
public TrustLevel getTrustLevel() {
return TrustLevel.SOURCE;
return TrustLevel.META_SEARCH;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decreases the result quality of the DOI fetcher (always leading to the "right" paper) to the quality of Google Scholar. (From the highest to the lowest)

Can the solution of the title? 😇

Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists

@Siedlerchr

Copy link
Copy Markdown
Member

The problem is that a DOI often does not lead to the fulltext version directly, but to the site where to find the fulltext. And our DOIResolution Fetcher does some magic guessing by looking at the first PDF-link the sourcecode of the website.
Every publisher's/journal page looks different.

@tobiasdiez

Copy link
Copy Markdown
Member

The springer fetcher also only looks at the DOI, but uses the springer API to find the correct URL for the download.

Optional<DOI> doi = entry.getField(StandardField.DOI).flatMap(DOI::parse);
if (!doi.isPresent()) {
return Optional.empty();
}
// Available in catalog?
try {
HttpResponse<JsonNode> jsonResponse = Unirest.get(API_URL)
.queryString("api_key", API_KEY)
.queryString("q", String.format("doi:%s", doi.get().getDOI()))
.asJson();

@koppor

koppor commented Sep 29, 2020

Copy link
Copy Markdown
Member

Devcall decision: Use first solution. -- @koppor will do git magic

@Toromtomtom

Copy link
Copy Markdown
Contributor Author

All right, thanks for taking care of this!

@tobiasdiez

Copy link
Copy Markdown
Member

@koppor In addition, the SpringerLink should have a higher trust score as the DoiResolution fetcher, since it's also DOI-based but custom-tailored to Springer. I would also merge this class with the other springer fetcher.

@Toromtomtom

Copy link
Copy Markdown
Contributor Author

Is there anything I can do to move this forward? Reset the branch or something?

@koppor

koppor commented Oct 7, 2020

Copy link
Copy Markdown
Member

Steps:

  1. I do as promised at Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists #6937 (comment)
  2. I search all issues and PRs to document the idea of the concept (priorities, information sources, ...)
  3. I think how to transform the comment Make the DOI Resolution Fetcher return nothing when the DOI leads to a host for which a tailored fetcher exists #6937 (comment) to a follow-up issue. Which implications should be thought of.

In parallel, I discuss with @stefan-kolb, because he invented the whole thing. My mistake was not to enforce that design decisions are documented (either as ADR or as other text files)

@stefan-kolb

stefan-kolb commented Oct 7, 2020

Copy link
Copy Markdown
Member

Initial PR #3882
More info and discussion: #3881

@koppor

koppor commented Oct 7, 2020

Copy link
Copy Markdown
Member

I think, I collected all documentation and put it at the appropriate place at #6990.

So, nothing to do for @Toromtomtom in this PR.

@koppor koppor closed this Oct 7, 2020
@koppor

koppor commented Oct 7, 2020

Copy link
Copy Markdown
Member

The first two commits are in master now. See ce9f714.

@Toromtomtom Toromtomtom deleted the fix-6922 branch October 8, 2020 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DoiResolution Fetcher fetches whole book for some Springer conference papers

5 participants