Fix default value for unwanted characters#15743
Conversation
Review Summary by Qodo(Agentic_describe updated until commit 4419075)Fix citation key generation and importer preference handling
WalkthroughsDescription• Moved DEFAULT_UNWANTED_CHARACTERS constant to CitationKeyPatternPreferences class for proper encapsulation • Added superscript and subscript character normalization (e.g., ¹ → 1, ₂ → 2) in citation key generation • Fixed EndnoteImporter and ReferImporter to respect user citation key preferences instead of hardcoded defaults • Replaced regex-based whitespace removal with Guava's CharMatcher for improved performance in cleanKey method Diagramflowchart LR
A["CitationKeyPatternPreferences"] -->|"contains DEFAULT_UNWANTED_CHARACTERS"| B["CitationKeyGenerator"]
B -->|"normalizes superscripts/subscripts"| C["cleanKey method"]
D["EndnoteImporter"] -->|"uses preferences"| A
E["ReferImporter"] -->|"uses preferences"| A
B -->|"replaces regex with CharMatcher"| F["whitespace removal"]
File Changes1. jablib/src/main/java/org/jabref/logic/citationkeypattern/CitationKeyPatternPreferences.java
|
Code Review by Qodo
1. Inline regex in authNofMth
|
|
Haha, we are back in the discussion of |
|
I think not including minus is okay, that doesn't break anything |
|
Your pull request conflicts with the target branch. Please merge with your code. For a step-by-step guide to resolve merge conflicts, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line. |
| /// Maps Unicode superscript and subscript digits to their ASCII counterparts (e.g., ¹ → 1, ₂ → 2). | ||
| /// Other characters in the Superscripts and Subscripts block (e.g., ⁿ ⁱ ⁺ ⁻ ⁽ ⁾) are stripped by returning -1. | ||
| /// All other code points are returned unchanged. | ||
| private static int normalizeSuperscriptOrSubscript(int codePoint) { |
There was a problem hiding this comment.
I would suggest moving this to an util class somewhere
| "", // keyPatternRegex | ||
| "", // keyPatternReplacement | ||
| "-`ʹ:!;?^$", // unwantedCharacters | ||
| CitationKeyGenerator.DEFAULT_UNWANTED_CHARACTERS, // unwantedCharacters |
There was a problem hiding this comment.
This should be defined inside the prefs as constant and then used in the places
Co-authored-by: Christoph <siedlerkiller@gmail.com>
Co-authored-by: Christoph <siedlerkiller@gmail.com>
CitationKeyGenerator.cleanKey stripped whitespace via String.replaceAll("\s", ""), recompiling the regex on every call. Replace it with Guava's CharMatcher.whitespace().removeFrom(...), which needs no regex engine and matches the existing string-handling idiom used in StringUtil.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Persistent review updated to latest commit 4419075 |
| } | ||
| String lastName = lastAuthor.getFamilyName() | ||
| .map(CitationKeyGenerator::removeDefaultUnwantedCharacters).orElse(""); | ||
| .map(name -> name.replaceAll("[-`ʹ:!;?^$]*", "")) |
There was a problem hiding this comment.
1. Inline regex in authnofmth 📘 Rule violation ➹ Performance
authNofMth uses String.replaceAll(...), which recompiles the regex on every call and violates the requirement to use compiled Pattern instances for regex checks/replacements. This can add avoidable overhead in citation key generation paths.
Agent Prompt
## Issue description
`BracketedPattern.authNofMth` calls `name.replaceAll(...)` with a regex literal. This recompiles the regex per invocation.
## Issue Context
Compliance requires regex usage to prefer reusable compiled `Pattern` instances instead of repeated compilation.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/citationkeypattern/BracketedPattern.java[974-976]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
The constant lives on CitationKeyPatternPreferences, not CitationKeyGenerator. Updates the reference and removes the now-unused import. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* upstream/main: Update PULL_REQUEST_TEMPLATE.md (#15788) New Crowdin updates (#15787) Update heylogs to 0.18.0 and use github-actions format (#15786) Grand refactoring of the AI features (#15688) Chore(deps): Bump com.fasterxml:aalto-xml in /versions (#15782) Chore(deps): Bump org.junit:junit-bom from 6.0.3 to 6.1.0 in /versions (#15783) Fix default value for unwanted characters (#15743) Fix runner tag Fix runner for JBang (PR) Fix duplicate finder progress counter incrementing on empty queue polls (#15781) Refine JabKit CLI: positional input argument and check command group (#15759) Ignore exception in unregisterListener to prevent exception (#15761) Fix wrong usage of "key" (#15779) Fix Hayagriva export to nest identifiers under serial-number (#15750)
* Fix default value for unwanted characters
* Fix visibility
* Remove strange comment
* Add support for streamlining super- and subscripts
* Use citation key preference at EndnoteImporter
* Fix ReferImporter
* Add link to PR to CHANGELOG.md entries
* Fix wrong constant use
Co-authored-by: Christoph <siedlerkiller@gmail.com>
* Move constant
Co-authored-by: Christoph <siedlerkiller@gmail.com>
* Use CharMatcher to strip whitespace in cleanKey
CitationKeyGenerator.cleanKey stripped whitespace via String.replaceAll("\s", ""), recompiling the regex on every call. Replace it with Guava's CharMatcher.whitespace().removeFrom(...), which needs no regex engine and matches the existing string-handling idiom used in StringUtil.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Fix formatting
* Adapt tests
* Fix normalization
* Fix -
* Annotate that its only for testing
* Fix import
* Fix import of DEFAULT_UNWANTED_CHARACTERS in AbstractJabKitTest
The constant lives on CitationKeyPatternPreferences, not
CitationKeyGenerator. Updates the reference and removes the now-unused
import.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Christoph <siedlerkiller@gmail.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Something was wrong with the unwanted characters
Steps to test
Difficult
Checklist
CHANGELOG.mdin a way that can be understood by the average user (if change is visible to the user)