Fix/non latin author parsed as name prefix#15823
Conversation
Review Summary by QodoFix non-Latin caseless author names parsed as namePrefix
WalkthroughsDescription• Fixed non-Latin caseless script author names incorrectly parsed as namePrefix • Added Unicode character type check to distinguish caseless scripts from lowercase • Enhanced AuthorListParser to properly handle Hindi, Arabic, Thai, Hebrew scripts • Added test cases for Hindi and Arabic author name parsing Diagramflowchart LR
A["AuthorListParser<br/>tokenCase detection"] -->|"Added Character.getType<br/>check for caseless scripts"| B["Proper handling of<br/>non-Latin scripts"]
B -->|"Distinguishes caseless<br/>from lowercase"| C["Correct familyName<br/>assignment"]
C -->|"Prevents wrong<br/>namePrefix population"| D["Fixed author parsing<br/>for Hindi, Arabic, Thai"]
File Changes1. jablib/src/main/java/org/jabref/logic/importer/AuthorListParser.java
|
Code Review by Qodo
1.
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Expands author parsing support for non‑Latin/caseless scripts (e.g., Hindi/Arabic) and adds regression coverage, along with a changelog entry.
Changes:
- Add test cases for Hindi and Arabic “Family, Given” parsing.
- Treat non‑Latin/caseless scripts as “upper-case-like” during tokenization to avoid misclassifying name parts.
- Document the fix in
CHANGELOG.md.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| jablib/src/test/java/org/jabref/logic/importer/AuthorListParserTest.java | Adds regression tests for Hindi/Arabic author parsing. |
| jablib/src/main/java/org/jabref/logic/importer/AuthorListParser.java | Adjusts token case detection to better handle caseless scripts; reformats some constants/comments. |
| CHANGELOG.md | Adds an entry describing the parsing fix. |
| if (!firstLetterIsFound && (currentBackslash < 0) && Character.isLetter(c)) { | ||
| if (bracesLevel == 0) { | ||
| tokenCase = Character.isUpperCase(c) || (Character.UnicodeScript.of(c) == Character.UnicodeScript.HAN); | ||
| tokenCase = Character.isUpperCase(c) || (Character.UnicodeScript.of(c) == Character.UnicodeScript.HAN || (Character.getType(c) != Character.LOWERCASE_LETTER)); |
|
Note that your PR will not be reviewed/accepted until you have gone through the mandatory checks in the description and marked each of them them exactly in the format of |
|
Please finish one PR at a time. As we can see, you have not followed up on #15738. P.S. since you already opened this and people have started reviewing, you can continue working on it. |
|
@subhramit I understand and apologize. This PR wasnt AI generated i am new to open source and was researching on how to send clean pr i came across code rabbit that reviews pr so that i could fix the small issues before submitting it here. Wont use that again if its not acceptable. |
|
Your pull request conflicts with the target branch. Please merge with your code. For a step-by-step guide to resolve merge conflicts, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line. |
…r.java Co-authored-by: Dark Blue <hancong0205@gmail.com>
* upstream/main: (29 commits) Chore(deps): Bump dev.langchain4j:langchain4j-bom in /versions (JabRef#15853) Chore(deps): Bump org.glassfish.jaxb:jaxb-runtime in /versions (JabRef#15854) Chore(deps): Bump com.gradleup.shadow:shadow-gradle-plugin (JabRef#15852) Chore(deps): Bump com.gradleup.shadow:shadow-gradle-plugin (JabRef#15849) Chore(deps): Bump com.autonomousapps:dependency-analysis-gradle-plugin (JabRef#15850) Update dependency org.apache.maven.plugins:maven-surefire-plugin to v3.5.6 (JabRef#15844) Fix reset and import of AiPreferences (JabRef#15843) Fix Comparable Contract Violation in SharedBibEntryData (JabRef#15806) (JabRef#15842) Chore(deps): Bump com.dlsc.gemsfx:gemsfx from 4.0.5 to 4.1.0 in /versions (JabRef#15841) Chore(deps): Bump jablib/src/main/resources/csl-styles (JabRef#15840) New Crowdin updates (JabRef#15839) Add group pseudonymization support (fixes JabRef#14117) (JabRef#15258) Feature parse MeSH terms in PubMed MEDLINE records (JabRef#15529) Fix/non latin author parsed as name prefix (JabRef#15823) Fix not on fx thread cleanup (JabRef#15835) Fix garbled BibEntry Javadoc example (JabRef#15834) Revert "Fix cleanup operationn setFiles not on fx thread causes exceptiosn" Revert "changelog" changelog Fix cleanup operationn setFiles not on fx thread causes exceptiosn ...
Related issues and pull requests
Closes #15813
PR Description
Fixed incorrect parsing of author names from caseless scripts (Hindi, Arabic,Thai, Hebrew, etc.) in AuthorListParser.
The parser was classifying these tokens as von particles due to Character.isUpperCase() returning false for scripts with
no case distinction, causing familyName to be null and namePrefix to be wrongly populated. Added
Character.getType(c) != Character.LOWERCASE_LETTER, which correctly treats all caseless scripts.Steps to test
run

./gradlew :jablib:test --tests "org.jabref.logic.importer.AuthorListParserTest"Checklist
CHANGELOG.mdin a way that can be understood by the average user (if change is visible to the user)