Skip to content

Improved abbreviation lookup with fuzzy matching#12652

Merged
koppor merged 26 commits into
JabRef:mainfrom
JihwanByun:fix-for-issue-12467
Mar 11, 2025
Merged

Improved abbreviation lookup with fuzzy matching#12652
koppor merged 26 commits into
JabRef:mainfrom
JihwanByun:fix-for-issue-12467

Conversation

@JihwanByun

@JihwanByun JihwanByun commented Mar 8, 2025

Copy link
Copy Markdown
Contributor

Description:
The journal abbreviation feature previously failed when minor differences existed in the full journal name, such as capitalization, punctuation, or subtitles. To resolve this, I applied exact matching first, followed by fuzzy matching using Levenshtein similarity (StringSimilarity). When multiple similar names were found, their similarity scores were compared to reduce errors.

Fixes: #12467.

Mandatory checks

  • I own the copyright of the code submitted and I licence it under the MIT license
  • Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines. We use OpenRewrite to ensure "modern" Java coding practices. The issues found can be automatically fixed. Please execute the gradle task rewriteRun, check the results, commit, and push.
You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "OpenRewrite".

@calixtus calixtus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR. One note on preliminary checking.
Thanks for the tests.

Comment on lines +109 to +121
String journal = journalName.trim().replaceAll(Matcher.quoteReplacement("\\&"), "&");
return customAbbreviations.stream().anyMatch(abbreviation -> isMatched(journal, abbreviation))
boolean exactMatch = customAbbreviations.stream().anyMatch(abbreviation -> isMatched(journal, abbreviation))
|| fullToAbbreviationObject.containsKey(journal)
|| abbreviationToAbbreviationObject.containsKey(journal)
|| dotlessToAbbreviationObject.containsKey(journal)
|| shortestUniqueToAbbreviationObject.containsKey(journal);

if (exactMatch) {
return true;
}

// If no exact match is found, attempt fuzzy matching
return findAbbreviationFuzzyMatched(journal).isPresent();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like code duplication to the get method.
Can you make an argument if the difference between containsKey and get this is time-relevant in the case of the calling methods?
Otherwise just return get().isPresent()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Point! I see how using get().isPresent() simplifies the code and avoids redundant lookups. I'll update it accordingly. Thanks for the feedback.

@subhramit subhramit Mar 10, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JihwanByun Please refrain from using AI in conversations. https://github.com/JabRef/jabref/blob/main/CONTRIBUTING.md#notes-on-ai-usage
If you are not fluent/comfortable in English, no problem, direct translations are okay, even improper English is fine too!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah..,Thank you for advice! I do myself

Comment thread CHANGELOG.md Outdated
- We changed the "Copy Preview" button to "Copy citation (html) in the context menu of the preview. [#12551](https://github.com/JabRef/jabref/issues/12551)
- Pressing Tab in empty text fields of the entry editor now moves the focus to the next field instead of inserting a tab character. [#11938](https://github.com/JabRef/jabref/issues/11938)
- The embedded PostgresSQL server for the search now supports Linux and macOS ARM based distributions natively [#12607](https://github.com/JabRef/jabref/pull/12607)
- We Improved abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467)

@subhramit subhramit Mar 9, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- We Improved abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467)
- We improved journal abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it jounenal? or journal right? thanks for feekback

@subhramit subhramit Mar 10, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes right sorry, that is a mistake. It should be journal*. I updated it.


/**
* The class provides functionality for managing journal abbreviations.
* <a href="https://docs.jabref.org/setup/journal-abbreviations">JabRef Journal Abbreviations Documentation</a>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong link. Please fix it (https://docs.jabref.org/advanced/journalabbreviations).
Which AI tool did you use? It did a relatively nice job in coding, though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used gpt-4o partly, But I didn't just let LLM code it. It's tool for supporting. Thanks I'll fix it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the right way to use it :)

Comment on lines +78 to +79
@VisibleForTesting
public JournalAbbreviationRepository(Map<String, Abbreviation> abbreviations) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation is not needed if it is public already.
In fact, you don't need this constructor at all (since it's only for testing). You can use addCustomAbbreviations instead.

@JihwanByun JihwanByun Mar 10, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, It's not responsible for testing? That's correct. But, abbreviation fuzzy matching is working only for fullToAbbreviationObject, customAbbreviations does not work.(#12467) Or, Should I update customAbbreviations can work fuzzy matching?

@subhramit subhramit Mar 10, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code used to work with custom abbreviations. See it's usages in get, isAbbreviatedName, etc.
So I think fuzzy matching should take care of that as well?
@calixtus it should work with both full and custom abbreviations, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much, I'll check usages and update it :)

Comment on lines +320 to +322
@Test
void fuzzyMatchEnglish() {
Map<String, Abbreviation> testAbbreviations = new HashMap<>();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert these to parameterized tests and adapt as per addCustomAbbreviations (context in above comment). Hint: use a list of abbreviations.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

@JihwanByun

JihwanByun commented Mar 10, 2025

Copy link
Copy Markdown
Contributor Author

image
I checked the fuzzy match for german on local.
image
but It was failed but, passed before. I am finding why it happens

@koppor koppor left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you added JavaDoc!

Some minor comments remain.

Comment thread CHANGELOG.md Outdated
- Pressing Tab in empty text fields of the entry editor now moves the focus to the next field instead of inserting a tab character. [#11938](https://github.com/JabRef/jabref/issues/11938)
- The embedded PostgresSQL server for the search now supports Linux and macOS ARM based distributions natively [#12607](https://github.com/JabRef/jabref/pull/12607)
- We removed the obsolete Twitter link and added Mastodon and LinkedIn links in Help -> JabRef resources. [#12660](https://github.com/JabRef/jabref/issues/12660)
- We Improved journal abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- We Improved journal abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467)
- We improved journal abbreviation lookup with fuzzy matching to handle minor input errors and variations. [#12467](https://github.com/JabRef/jabref/issues/12467)

double bestDistance = similarity.editDistanceIgnoreCase(input, candidates.getFirst().getName());
double secondDistance = similarity.editDistanceIgnoreCase(input, candidates.get(1).getName());

if (Math.abs(bestDistance - secondDistance) < 1.0) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a Java comment on why < 1.0. 1.0 seems to be a "magic number". Currently, I read it as: If there is a very close match of two abbreviations, do not use any of them, because they are too close. (Maybe, this is the Java comment already?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for feedback, my mistake for using magic number. I'll add comment too.


static Stream<Object[]> provideAbbreviationTestCases() {
return Stream.of(
new Object[]{

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, it is Arguments.of(


/**
* @param current the current representation of the journal abbreviation.
* @return The next representation in the abbreviation cycle.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not containing new information. What is the cycle? Where is it explained?

Maybe remove the whole JavaDoc here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. I'll remove it. Thank you very much.

@JihwanByun JihwanByun requested a review from subhramit March 11, 2025 04:37
@JabRef JabRef deleted a comment from trag-bot Bot Mar 11, 2025
@subhramit subhramit dismissed their stale review March 11, 2025 09:16

addressed

@trag-bot

trag-bot Bot commented Mar 11, 2025

Copy link
Copy Markdown

@trag-bot didn't find any issues in the code! ✅✨

@koppor koppor added this pull request to the merge queue Mar 11, 2025
Merged via the queue into JabRef:main with commit f7fbcd6 Mar 11, 2025
@JihwanByun JihwanByun deleted the fix-for-issue-12467 branch March 11, 2025 10:12
GuilhermeRibeiroPereira pushed a commit to GuilhermeRibeiroPereira/jabref that referenced this pull request Apr 1, 2025
* add documentation for journal abbreviation management

* feat: Improve journal abbreviation lookup with fuzzy matching

* feat: Improve journal abbreviation lookup with fuzzy matching

* feat: Improve journal abbreviation lookup with fuzzy matching

* Improve abbreviation lookup with fuzzy matching to handle minor input errors and variations.

* tests: fuzzy matching for English, Chinese, German

* feats: Improve journal abbreviation lookup with fuzzy matching, add Constructor for testing

* Apply OpenRewrite fixes

* refactor: isKnownName to use get().isPresent() instead of redundant containsKey checks

* refactor: eliminate redundant lookup in abbreviate method

* fix: abbreviation user link

* fix: add 'journal' in log

* fix: remove constructor for testing

* feats: add customAbbreviations working fuzzy matching

* fix: abbreviation addition to use a list

* tests: abbreviation addition to use ParamterizedTest

* tests: abbreviation addition to use ParamterizedTest

* tests: fix german case

* fix: remove JavaDoc on getNext

* fix: typo

* refactor: add Java comment on similarity comparison

* tests: use Arguments for testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fuzzy matching for journal abbreviations

4 participants