Improved abbreviation lookup with fuzzy matching#12652
Conversation
… errors and variations.
…onstructor for testing
There was a problem hiding this comment.
Your code currently does not meet JabRef's code guidelines. We use OpenRewrite to ensure "modern" Java coding practices. The issues found can be automatically fixed. Please execute the gradle task rewriteRun, check the results, commit, and push.
You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "OpenRewrite".
calixtus
left a comment
There was a problem hiding this comment.
Thank you for your PR. One note on preliminary checking.
Thanks for the tests.
| String journal = journalName.trim().replaceAll(Matcher.quoteReplacement("\\&"), "&"); | ||
| return customAbbreviations.stream().anyMatch(abbreviation -> isMatched(journal, abbreviation)) | ||
| boolean exactMatch = customAbbreviations.stream().anyMatch(abbreviation -> isMatched(journal, abbreviation)) | ||
| || fullToAbbreviationObject.containsKey(journal) | ||
| || abbreviationToAbbreviationObject.containsKey(journal) | ||
| || dotlessToAbbreviationObject.containsKey(journal) | ||
| || shortestUniqueToAbbreviationObject.containsKey(journal); | ||
|
|
||
| if (exactMatch) { | ||
| return true; | ||
| } | ||
|
|
||
| // If no exact match is found, attempt fuzzy matching | ||
| return findAbbreviationFuzzyMatched(journal).isPresent(); |
There was a problem hiding this comment.
Looks like code duplication to the get method.
Can you make an argument if the difference between containsKey and get this is time-relevant in the case of the calling methods?
Otherwise just return get().isPresent()?
There was a problem hiding this comment.
Good Point! I see how using get().isPresent() simplifies the code and avoids redundant lookups. I'll update it accordingly. Thanks for the feedback.
There was a problem hiding this comment.
@JihwanByun Please refrain from using AI in conversations. https://github.com/JabRef/jabref/blob/main/CONTRIBUTING.md#notes-on-ai-usage
If you are not fluent/comfortable in English, no problem, direct translations are okay, even improper English is fine too!
There was a problem hiding this comment.
Ah..,Thank you for advice! I do myself
| - We changed the "Copy Preview" button to "Copy citation (html) in the context menu of the preview. [#12551](https://github.com/JabRef/jabref/issues/12551) | ||
| - Pressing Tab in empty text fields of the entry editor now moves the focus to the next field instead of inserting a tab character. [#11938](https://github.com/JabRef/jabref/issues/11938) | ||
| - The embedded PostgresSQL server for the search now supports Linux and macOS ARM based distributions natively [#12607](https://github.com/JabRef/jabref/pull/12607) | ||
| - We Improved abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467) |
There was a problem hiding this comment.
| - We Improved abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467) | |
| - We improved journal abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467) |
There was a problem hiding this comment.
is it jounenal? or journal right? thanks for feekback
There was a problem hiding this comment.
Yes right sorry, that is a mistake. It should be journal*. I updated it.
|
|
||
| /** | ||
| * The class provides functionality for managing journal abbreviations. | ||
| * <a href="https://docs.jabref.org/setup/journal-abbreviations">JabRef Journal Abbreviations Documentation</a> |
There was a problem hiding this comment.
Wrong link. Please fix it (https://docs.jabref.org/advanced/journalabbreviations).
Which AI tool did you use? It did a relatively nice job in coding, though.
There was a problem hiding this comment.
I used gpt-4o partly, But I didn't just let LLM code it. It's tool for supporting. Thanks I'll fix it.
There was a problem hiding this comment.
That is the right way to use it :)
| @VisibleForTesting | ||
| public JournalAbbreviationRepository(Map<String, Abbreviation> abbreviations) { |
There was a problem hiding this comment.
The annotation is not needed if it is public already.
In fact, you don't need this constructor at all (since it's only for testing). You can use addCustomAbbreviations instead.
There was a problem hiding this comment.
Ah, It's not responsible for testing? That's correct. But, abbreviation fuzzy matching is working only for fullToAbbreviationObject, customAbbreviations does not work.(#12467) Or, Should I update customAbbreviations can work fuzzy matching?
There was a problem hiding this comment.
The old code used to work with custom abbreviations. See it's usages in get, isAbbreviatedName, etc.
So I think fuzzy matching should take care of that as well?
@calixtus it should work with both full and custom abbreviations, right?
There was a problem hiding this comment.
Thank you very much, I'll check usages and update it :)
| @Test | ||
| void fuzzyMatchEnglish() { | ||
| Map<String, Abbreviation> testAbbreviations = new HashMap<>(); |
There was a problem hiding this comment.
Convert these to parameterized tests and adapt as per addCustomAbbreviations (context in above comment). Hint: use a list of abbreviations.
There was a problem hiding this comment.
JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.
There was a problem hiding this comment.
JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.
There was a problem hiding this comment.
JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.
…abref into fix-for-issue-12467
koppor
left a comment
There was a problem hiding this comment.
Nice that you added JavaDoc!
Some minor comments remain.
| - Pressing Tab in empty text fields of the entry editor now moves the focus to the next field instead of inserting a tab character. [#11938](https://github.com/JabRef/jabref/issues/11938) | ||
| - The embedded PostgresSQL server for the search now supports Linux and macOS ARM based distributions natively [#12607](https://github.com/JabRef/jabref/pull/12607) | ||
| - We removed the obsolete Twitter link and added Mastodon and LinkedIn links in Help -> JabRef resources. [#12660](https://github.com/JabRef/jabref/issues/12660) | ||
| - We Improved journal abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467) |
There was a problem hiding this comment.
| - We Improved journal abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467) | |
| - We improved journal abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467) |
| double bestDistance = similarity.editDistanceIgnoreCase(input, candidates.getFirst().getName()); | ||
| double secondDistance = similarity.editDistanceIgnoreCase(input, candidates.get(1).getName()); | ||
|
|
||
| if (Math.abs(bestDistance - secondDistance) < 1.0) { |
There was a problem hiding this comment.
Please add a Java comment on why < 1.0. 1.0 seems to be a "magic number". Currently, I read it as: If there is a very close match of two abbreviations, do not use any of them, because they are too close. (Maybe, this is the Java comment already?)
There was a problem hiding this comment.
Thank you for feedback, my mistake for using magic number. I'll add comment too.
|
|
||
| static Stream<Object[]> provideAbbreviationTestCases() { | ||
| return Stream.of( | ||
| new Object[]{ |
|
|
||
| /** | ||
| * @param current the current representation of the journal abbreviation. | ||
| * @return The next representation in the abbreviation cycle. |
There was a problem hiding this comment.
This comment is not containing new information. What is the cycle? Where is it explained?
Maybe remove the whole JavaDoc here.
There was a problem hiding this comment.
I agree with you. I'll remove it. Thank you very much.
|
@trag-bot didn't find any issues in the code! ✅✨ |
* add documentation for journal abbreviation management * feat: Improve journal abbreviation lookup with fuzzy matching * feat: Improve journal abbreviation lookup with fuzzy matching * feat: Improve journal abbreviation lookup with fuzzy matching * Improve abbreviation lookup with fuzzy matching to handle minor input errors and variations. * tests: fuzzy matching for English, Chinese, German * feats: Improve journal abbreviation lookup with fuzzy matching, add Constructor for testing * Apply OpenRewrite fixes * refactor: isKnownName to use get().isPresent() instead of redundant containsKey checks * refactor: eliminate redundant lookup in abbreviate method * fix: abbreviation user link * fix: add 'journal' in log * fix: remove constructor for testing * feats: add customAbbreviations working fuzzy matching * fix: abbreviation addition to use a list * tests: abbreviation addition to use ParamterizedTest * tests: abbreviation addition to use ParamterizedTest * tests: fix german case * fix: remove JavaDoc on getNext * fix: typo * refactor: add Java comment on similarity comparison * tests: use Arguments for testing


Description:
The journal abbreviation feature previously failed when minor differences existed in the full journal name, such as capitalization, punctuation, or subtitles. To resolve this, I applied exact matching first, followed by fuzzy matching using Levenshtein similarity (StringSimilarity). When multiple similar names were found, their similarity scores were compared to reduce errors.
Fixes: #12467.
Mandatory checks
CHANGELOG.mddescribed in a way that is understandable for the average user (if change is visible to the user)