Add cleanup activity for URL field#9970
Conversation
|
|
||
| String NoteFieldValue = entry.getField(NOTE_FIELD).orElse(null); | ||
| String urlRegex = ( | ||
| "(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.]" |
There was a problem hiding this comment.
Can you please add a comment/link on the regex, e.g where you got it from
There was a problem hiding this comment.
This was originally fetched from stackoverflow, but there were some modifications in order for it to be functional. Shall I comment that link in my code?
There was a problem hiding this comment.
Yes, and the modifications you did
| Automatically\ assign\ new\ entry\ to\ selected\ groups=Αυτόματη ανάθεση νέας καταχώρησης στις επιλεγμένες ομάδες | ||
| %0\ mode=λειτουργία %0 | ||
| Move\ DOIs\ from\ note\ and\ URL\ field\ to\ DOI\ field\ and\ remove\ http\ prefix=Μετακίνηση του DOI (Ψηφιακό Αναγνωριστικό Αντικειμένου) από το πεδίο σημείωσης και διεύθυνσης URL, στο πεδίο DOI και αφαίρεση του προθέματος http | ||
| Move\ URLs\ in\ note\ field\ to\ url\ field= |
There was a problem hiding this comment.
Please only edit the en file. All other languages are handled by crowdin
|
Thanks for your contribution and really great that you added tests as well! |
|
There is one code style check failing (open rewrite)
Run 'gradle rewriteRun' to apply the recipes. |
| // Expected entry in case note field holds more than one URLs | ||
| BibEntry expectedWithMultipleURLs = new BibEntry() | ||
| .withField(StandardField.URL, "https://hdl.handle.net/10442/hedi/6089") | ||
| .withField(StandardField.NOTE, "/url{http://142.42.1.1:8080}"); |
There was a problem hiding this comment.
Can you please test with \url{} ? See also the link from koppor in the description. That would make more sense because it's a LaTeX command that is very liked to end up in the field.
| import org.jabref.model.entry.field.StandardField; | ||
|
|
||
| /** | ||
| * Checks whether URL exists in note field, and stores it under url field . |
There was a problem hiding this comment.
No space before final dot of a sentence.
| * Checks whether URL exists in note field, and stores it under url field . | |
| * Checks whether URL exists in note field, and stores it under url field. |
| /** | ||
| * Moves URL from NOTE to URL field. | ||
| */ |
There was a problem hiding this comment.
No comment necessary - the comment is in URLCleanup class - similar to the other cleanup jobs.
| /** | |
| * Moves URL from NOTE to URL field. | |
| */ |
| The\ output\ option\ depends\ on\ a\ valid\ input\ option.=L'option de sortie dépend d'une option d'entrée valide. | ||
| Linked\ file\ name\ conventions=Conventions pour les noms de fichiers liés | ||
| Filename\ format\ pattern=Modèle de format de nom de fichier | ||
| Filename\ format\ pattern=Modèle de format de nom de fichier |
There was a problem hiding this comment.
These changes will be overwritten. You need to go to Crowdin service to update issues at the translations.
|
|
||
| private static Stream<Arguments> provideURL() { | ||
| /* | ||
| * Expected entries with various types of URLs(e.g, not secure protocol, password included for |
There was a problem hiding this comment.
Space before opening brace in English text
| * Expected entries with various types of URLs(e.g, not secure protocol, password included for | |
| * Expected entries with various types of URLs (e.g, not secure protocol, password included for |
| BibEntry[] expectedwithURL = { | ||
| new BibEntry().withField(StandardField.URL, "https://hdl.handle.net/10442/hedi/6089"), | ||
| new BibEntry().withField(StandardField.URL, "http://hdl.handle.net/10442/hedi/6089"), | ||
| new BibEntry().withField(StandardField.URL, "http://userid:password@example.com:8080"), | ||
| new BibEntry().withField(StandardField.URL, "http://142.42.1.1:8080"), | ||
| new BibEntry().withField(StandardField.URL, "http://☺.damowmow.com"), | ||
| new BibEntry().withField(StandardField.URL, "http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com"), | ||
| new BibEntry().withField(StandardField.URL, "https://www.example.com/foo/?bar=baz&inga=42&quux"), | ||
| }; |
There was a problem hiding this comment.
These values are used only ones. Please inline them below.
Background: We aim to keep expected and actual togeher so that one can read them easily. Using this array approach, reading will get difficult. See one inline-proposal below
There was a problem hiding this comment.
@koppor So I follow this approach expected/actual together, as well for the other expected entries (e.g., expectedWithTwoNoteValues and expectedWithMultipleURLs
| Arguments.of(expectedwithURL[0], new BibEntry().withField( | ||
| StandardField.NOTE, "\\url{https://hdl.handle.net/10442/hedi/6089}")), |
There was a problem hiding this comment.
| Arguments.of(expectedwithURL[0], new BibEntry().withField( | |
| StandardField.NOTE, "\\url{https://hdl.handle.net/10442/hedi/6089}")), | |
| Arguments.of( | |
| new BibEntry().withField(StandardField.URL, "https://hdl.handle.net/10442/hedi/6089"), | |
| new BibEntry().withField(StandardField.NOTE, "\\url{https://hdl.handle.net/10442/hedi/6089}")), |
| String newNoteFieldValue = NoteFieldValue | ||
| .replace(url, "") | ||
| .replace("\\url{}", "") | ||
| .replace(",", "").trim(); |
There was a problem hiding this comment.
This won't work. in call cases. Maybe remove it - or fix it with maybe checking for regex "^, " - so that , appears at the beginning of the line AND is followed by a space.
Examples for note content:
\url{http://example.org}, cited by Kramer, 2002.
And some other strings one cannot forsee. Thus, simply removing all commas will produce wrong content.
I agree, this happens in edge cases. If you don't find a quick solution, keep as is.
There was a problem hiding this comment.
@koppor, three proposed solutions here:
- Either
urlis preceded or followed by a comma
String newNoteFieldValue = NoteFieldValue
.replace(url, "")
.replaceAll("(, )?\\\\?url\\{\\}(, )?", "").trim();
- Not preferred but works
String newNoteFieldValue = NoteFieldValue
.replace(url, "")
.replace(", \\url{}", "")
.replace("\\url{}, ", "")
.replace("\\url{}", "").trim();
- Another solution, working with the current cases
// works with: \url{http://example.org}, cited by Kramer, 2002.
// doesn't work with: Cited by Kramer, 2002, \url{http://example.org}
String newNoteFieldValue = NoteFieldValue
.replace(url, "")
.replace("\\url{}", "")
.replaceFirst(", ", "").trim();
Overall, I could go with the first proposed solution as is more effective and covers plenty of cases
Also added a test case for: \url{http://example.org}, cited by Kramer, 2002.
There was a problem hiding this comment.
The fist one looks good, you can also add a comment before the RegEx to explain it
Siedlerchr
left a comment
There was a problem hiding this comment.
Failing test is not related
koppor
left a comment
There was a problem hiding this comment.
Small nitpicks regarding casing of variable names (they should all be lower case in Java). I'll fix them for myself and then merge.
Thank you for working on this!
|
@Alexandra-Stath Thank you for the follow-up. I did not use any IDE and thus missed one thing. Merged now! |
|
@koppor @Siedlerchr |
| * remove it from the note field, and no other action is performed. | ||
| */ | ||
| if (entry.hasField(URL_FIELD)) { | ||
| String urlFieldValue = entry.getField(URL_FIELD).orElse(null); |
There was a problem hiding this comment.
I thought, this could lead to issues, but I did not see it. There were no tests covering the orElse branch.
In future: In case Optionals are available, make use of them!
Fix and link to issue at #10435
* upstream/main: (68 commits) Fix issue 9863 - Change Tab selection order (#9907) New Crowdin updates (#9994) Enable editing typo with double clicking on field name in custom entry type (#9977) Remove "Field name:" from localization - we already have "Field name" (#9991) Changed database to catalog in org.jabref.gui.slr and org.jabref.logic.crawler (#9989) Use environment variables for hostname detection (#9910) Add cleanup activity for URL field (#9970) Fix freezing when fetching IBSN and no results are found (#9987) Make Group(Node)TreeViewModel more OO (#9978) Fix container for group item count still visible if display count is off (#9980) Fix paste of BibTeX data (#9985) Bring back SimplifyBoolean* and UnnecessaryParantheses (and refine guide) (#9981) Add new menu entry to remove braces from selection, aka unprotect it. (#9968) User specific comment (#9727) Bump org.openrewrite.recipe:rewrite-recipe-bom from 1.19.3 to 2.0.0 (#9975) Fix typos Remove non-needed link Fix typos Enable RemoveJavaDocAuthorTag (#9973) Bump com.fasterxml.jackson.datatype:jackson-datatype-jsr310 (#9974) ...
Fixes koppor#216
Add cleanup activity for URL field. Specifically when a url is found in the note field of a BibTex entry, we place it under the URL field.
Mandatory checks
CHANGELOG.mddescribed in a way that is understandable for the average user (if applicable)