Improve parsing of short DOIs#6920
Conversation
fix merge
tobiasdiez
left a comment
There was a problem hiding this comment.
Thanks a lot. It's a nice improvement, and I really like that you added many tests for it. The code looks good to me, I've only a few nitpick comments.
For the code styling, have a look at: https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace#using-jabrefs-code-style
| for (Field field : FIELDS) { | ||
| entry.getField(field).flatMap(DOI::parse) | ||
| .ifPresent(unused -> removeFieldValue(entry, field, changes)); | ||
| .ifPresent(unused -> removeFieldValue(entry, field, changes)); |
There was a problem hiding this comment.
please remove the added indent.
| this.doi = "10/"+shortcutDoiMatcher.group(1); | ||
| isShortDoi = true; | ||
| } else { | ||
|
|
| * checks whether the DOI might as well be something else, like: | ||
| * "10/2012" or " 10:as23 ". | ||
| * | ||
| * sets the property "isAmbiguous" accordingly. ( NOT USED YET! ) |
There was a problem hiding this comment.
We have the convention that in general dead/unused code should be removed. Do you have any concrete plans/ideas where the isAmbiguous variable is needed?
If not, I would ask you to remove it.
There was a problem hiding this comment.
Hi Tobias,
Yes I wasn't sure whether to open a new issue or to discuss my idea here in the comments somewhere:
When dois are cleaned up, the fields Notes and url are scanned for dois and short dois. The way it is implemented now, an entry like eg 10/2021 is interpreted as a short doi, when this might very well be something else, like a date...
checkForAmbiguousShortDOI() detects these ambiguous cases and sets the property isAmbiguous accordingly.
I stopped there, because I am still trying to find out whether a short doi like 10/1234 is even possible i.e. whether it can be fully numeric. I wrote an email to contact@doi.org, but haven't heard back yet...
Also, I wanted to open a discussion on what to do with such ambiguous cases. We could implement some DOIexistanceChecker that checks whether the supposed doi can be found. But a) I am not sure what would be the most efficient way of doing this and b) I am worried about scaling issues when there are thousands of ambiguous dois... Maybe the checking should be optional via a tick-box in respective gui?
There was a problem hiding this comment.
That's a good point to be aware of. For the conversation from the Notes and Url fields, I would say we can be optimistic and put it in the doi field if it's look like an DOI. I guess it's very rare to have only a date in the notes field.
I like the idea of checking doi's for existence (as an integrity check). I'm not sure how to implement it effectively, without querying crossref's api for each doi separately.
There was a problem hiding this comment.
ok. the way I understand you, the integrity checking would be applied to all DOIs right? should i remove the unused method and property regarding ambiguous short DOIs then?
There was a problem hiding this comment.
yes, integrity checks are always applied to all entries; and some users have 10k...
For now I would say remove the code for ambiguous dois. The code is not lost, as we can go back to this PR to copy it once it's really needed.
| assertEquals("10/gf4gqc", new DOI("doi:10/gf4gqc").getDOI()); | ||
| } | ||
|
|
||
|
|
|
Please fix the checkstyle issues. You can install the extension in Intellij directly: |
Hi Siedlerchr, What am I doing wrong? |
|
No idea what's wrong with the plugin, but you can always execute ./gradlew checkstyle on the comandline |
Siedlerchr
left a comment
There was a problem hiding this comment.
Looks good to me now! Thanks for tackling this issue
| @Test | ||
| public void acceptURNPrefix() { | ||
| assertEquals("10.123/456", new DOI("urn:10.123/456").getDOI()); | ||
| assertEquals("10.123/456", new DOI("urn:doi:10.123/456").getDOI()); |
There was a problem hiding this comment.
Note to self: urn:doi is not a valid DOI: See https://en.wikipedia.org/wiki/Digital_object_identifier#Standardization
…-issue-6707 * upstream/master: (35 commits) Fix a fetcher test for the ShortDOIService (#6945) Fixes Shared Database: Changes filtering in CoarseChangeFilter to attribute property (#6868) Changed default value of "search and store files relative to bibtex file" to true (#6928) Replace comment by just a failure (#6943) Fix: in entry types editor selected field is not removed after first click (#6941) Fix remove actions for entry types in the editor (#6933) Always use Java 15 (#6929) Update DevDocs: workaround for issues with local openjfx maven libraries (#6931) Fixes bugs in the `regex` cite key pattern modifier (#6893) Add missing author Readability for citation key patterns (#6706) Add new author Reset to master and add default case to switch (#6847) Bump mockito-core from 3.5.10 to 3.5.11 (#6924) Bump byte-buddy-parent from 1.10.14 to 1.10.15 (#6923) Bump org.beryx.jlink from 2.21.4 to 2.22.0 (#6925) Bump xmpbox from 2.0.20 to 2.0.21 (#6926) Bump pascalgn/automerge-action from v0.9.0 to v0.10.0 (#6927) Improve parsing of short DOIs (#6920) Bump junit-vintage-engine from 5.6.2 to 5.7.0 (#6910) ...
Fixes #6880
no more erroneous interpretation as short DOI of url's/notes like:
still correct detection of short DOIs hidden in url's as long as doi is part of subdomain, eg:
correct detection of short DOIs like:
added respective tests.
added entry to CHANGELOG