Skip to content

Feature parse MeSH terms in PubMed MEDLINE records#15529

Merged
calixtus merged 16 commits into
JabRef:mainfrom
LoayTarek5:feature-Parse-MeSH-terms-in-PubMed-MEDLINE-records
May 27, 2026
Merged

Feature parse MeSH terms in PubMed MEDLINE records#15529
calixtus merged 16 commits into
JabRef:mainfrom
LoayTarek5:feature-Parse-MeSH-terms-in-PubMed-MEDLINE-records

Conversation

@LoayTarek5

Copy link
Copy Markdown
Collaborator

Related issues and pull requests

Closes #12532

PR Description

Parse MeSH terms in Medline/PubMed importers (XML and Plain Text) into individual heading/qualifier pairs with major topic markers, matching PubMed's display format.

Steps to test

1- Download PubMed format, then save as a .txt file
2- In JabRef, go to File -> Import, select the saved file
3- Click the imported entry, go to the General tab, and check the Keywords field
You should see individual keyword chips matching PubMed's display format.
4- for XML import open for example https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi db=pubmed&id=23633646&retmode=xml -> right-click -> Save as .xml
6- In JabRef, File → Import → select the file (choose "Medline/PubMed" format)
7-Go to General tab -> check Keywords field

.txt
Capture

.xml
capture_260319_114050
The full keywords: Female; Graves Disease/radiotherapy*; Humans; Hypothyroidism/etiology*; Iodine Radioisotopes/adverse effects; Iodine Radioisotopes/therapeutic use*; Retrospective Studies; Thyrotoxicosis/radiotherapy*; Thyroxine/blood; Treatment Failure; Weight Gain

Checklist

  • I own the copyright of the code submitted and I license it under the MIT license
  • I manually tested my changes in running JabRef (always required)
  • I added JUnit tests for changes (if applicable)
  • I added screenshots in the PR description (if change is visible to the user)
  • I added a screenshot in the PR description showing a library with a single entry with me as author and as title the issue number
  • I described the change in CHANGELOG.md in a way that can be understood by the average user (if change is visible to the user)
  • [/] I checked the user documentation for up to dateness and submitted a pull request to our user documentation repository

@qodo-free-for-open-source-projects

Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Parse MeSH terms into individual heading/qualifier pairs

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Parse MeSH terms into individual heading/qualifier pairs with major topic markers
• Update MeshHeading record to track descriptor major flag and qualifier details
• Refactor keyword formatting to use slash-separated heading/qualifier syntax
• Add test coverage for MeSH term parsing in plain text importer
Diagram
flowchart LR
  A["MeSH Term Input<br/>e.g. *Kidney/diagnosis"] --> B["Parse Descriptor<br/>and Qualifiers"]
  B --> C["Extract Major<br/>Topic Flags"]
  C --> D["Format Keywords<br/>heading/qualifier*"]
  D --> E["Individual Keyword<br/>Chips in UI"]
Loading

Grey Divider

File Changes

1. jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlineImporter.java ✨ Enhancement +17/-8

Enhanced MeSH term parsing with major topic flags

• Extract descriptorMajor flag from DescriptorName MajorTopicYN attribute
• Create MeshHeading.QualifierName records to store qualifier name and major flag
• Refactor addMeshHeading() to format keywords as descriptor/qualifier* pairs
• Add asterisk suffix for major topic descriptors and qualifiers

jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlineImporter.java


2. jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java ✨ Enhancement +45/-7

Add MeSH term parsing for plain text format

• Implement parseMeshTerm() method to split compound MeSH terms into individual keywords
• Handle major topic markers (asterisk prefix) for descriptors and qualifiers
• Separate handling of MH (MeSH) and OT (other terms) fields
• Format keywords as descriptor/qualifier* pairs matching PubMed display format

jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java


3. jablib/src/main/java/org/jabref/logic/importer/fileformat/medline/MeshHeading.java ✨ Enhancement +4/-1

Extend MeshHeading record with major topic tracking

• Add descriptorMajor boolean field to track major topic flag
• Change qualifierNames from List<String> to List<QualifierName>
• Introduce nested QualifierName record with name and major flag fields

jablib/src/main/java/org/jabref/logic/importer/fileformat/medline/MeshHeading.java


View more (4)
4. jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java 🧪 Tests +18/-0

Add test for MeSH term parsing functionality

• Add test meshTermsAreParsedIntoIndividualKeywords() to verify MeSH term parsing
• Test compound term splitting with major topic markers
• Verify correct keyword formatting with slash separators and asterisks

jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java


5. jablib/src/test/resources/org/jabref/logic/importer/fileformat/MedlineImporterTestNbib.bib 🧪 Tests +1/-1

Update test fixture for new keyword format

• Update expected keywords to use new slash-separated format with major topic markers
• Change from comma-separated qualifiers to heading/qualifier* syntax

jablib/src/test/resources/org/jabref/logic/importer/fileformat/MedlineImporterTestNbib.bib


6. jablib/src/test/resources/org/jabref/logic/importer/fileformat/MedlinePlainImporterTestCompleteEntry.bib 🧪 Tests +1/-1

Update test fixture for new keyword format

• Update expected keywords to match new slash-separated format
• Adjust major topic asterisk placement to qualifier position

jablib/src/test/resources/org/jabref/logic/importer/fileformat/MedlinePlainImporterTestCompleteEntry.bib


7. CHANGELOG.md 📝 Documentation +1/-0

Document MeSH term parsing improvement

• Add entry documenting improved MeSH term parsing in Medline/PubMed importers
• Reference issue #12532

CHANGELOG.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (1)

Grey Divider


Action required

1. No MH separator conflict check 📎 Requirement gap ≡ Correctness
Description
The updated MH import path serializes parsed MeSH keywords using the configured keyword separator
without detecting whether that separator occurs inside the raw MH value. If the separator appears
in an MH line, the resulting keywords field can become ambiguous or corrupted without any
warning/substitution.
Code

jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[R193-199]

Evidence
PR Compliance ID 5 requires detecting conflicts when the user-defined keyword separator appears in
MH input lines. The new MH handling joins parsed tokens using the configured separator but
neither checks value for the separator nor applies any warning/substitution strategy in
parseMeshTerm.

Detect and handle conflicts when the user-defined keyword separator appears in PubMed MH input lines: Detect and handle conflicts when the user-defined keyword separator appears in PubMed MH input lines
jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[193-199]
jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[407-435]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The MEDLINE plain-text importer serializes `MH`-derived keywords using the configured keyword separator but does not detect when that same separator character appears inside the raw `MH` value, which can lead to ambiguous/corrupted keyword boundaries.
## Issue Context
Compliance requires checking `MH  - ...` lines for the user-defined keyword separator and either warning the user or applying a safe substitution/escaping strategy before serialization.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[193-199]
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[407-435]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Test uses setField ✓ Resolved 📘 Rule violation ⚙ Maintainability
Description
The newly added test constructs the expected BibEntry using mutable setField calls rather than
the preferred withField withers. This violates the project’s test construction conventions and
reduces consistency with the rest of the test suite.
Code

jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java[R127-131]

Evidence
PR Compliance ID 40 requires using BibEntry withers (withField) instead of setField in
tests/construction. The added test uses expectedEntry.setField(...) for both PMID and
KEYWORDS.

AGENTS.md: AGENTS.md
jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java[127-131]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A newly added unit test constructs `BibEntry` using `setField`, but project conventions require using `withField` for test construction/modification.
## Issue Context
Using `withField` keeps tests consistent with JabRef’s preferred immutable-style `BibEntry` usage.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java[127-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment on lines +193 to +199
case "MH" -> {
List<String> meshKeywords = parseMeshTerm(value);
Character separator = importFormatPreferences.bibEntryPreferences().getKeywordSeparator();
String meshString = String.join(separator + " ", meshKeywords);
fieldConversionMap.merge(StandardField.KEYWORDS, meshString,
(existing, newVal) -> existing + separator + " " + newVal);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. No mh separator conflict check 📎 Requirement gap ≡ Correctness

The updated MH import path serializes parsed MeSH keywords using the configured keyword separator
without detecting whether that separator occurs inside the raw MH value. If the separator appears
in an MH line, the resulting keywords field can become ambiguous or corrupted without any
warning/substitution.
Agent Prompt
## Issue description
The MEDLINE plain-text importer serializes `MH`-derived keywords using the configured keyword separator but does not detect when that same separator character appears inside the raw `MH` value, which can lead to ambiguous/corrupted keyword boundaries.

## Issue Context
Compliance requires checking `MH  - ...` lines for the user-defined keyword separator and either warning the user or applying a safe substitution/escaping strategy before serialization.

## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[193-199]
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[407-435]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentionally scoped out in the discussion of issue #12532.
Keyword separator escaping is handled in #12810 (resolved PR #14637)

@github-actions github-actions Bot added the status: changes-required Pull requests that are not yet complete label Apr 11, 2026
@github-actions github-actions Bot added status: no-bot-comments and removed status: changes-required Pull requests that are not yet complete labels Apr 11, 2026
@calixtus

Copy link
Copy Markdown
Member

Can you take a look @ryan-carpenter?

@faneeshh

Copy link
Copy Markdown
Collaborator

You could add a test for the XML importer's MeSH parsing. Everything else looks good to me.

@github-actions github-actions Bot added status: changes-required Pull requests that are not yet complete and removed status: no-bot-comments labels Apr 23, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Your pull request conflicts with the target branch.

Please merge with your code. For a step-by-step guide to resolve merge conflicts, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line.

@github-actions github-actions Bot added status: no-bot-comments and removed status: changes-required Pull requests that are not yet complete labels Apr 23, 2026
@calixtus calixtus added this pull request to the merge queue May 27, 2026
@github-actions github-actions Bot added the status: to-be-merged PRs which are accepted and should go into the merge-queue. label May 27, 2026
Merged via the queue into JabRef:main with commit bb928fb May 27, 2026
65 checks passed
Siedlerchr added a commit to InAnYan/jabref that referenced this pull request May 28, 2026
* upstream/main: (29 commits)
  Chore(deps): Bump dev.langchain4j:langchain4j-bom in /versions (JabRef#15853)
  Chore(deps): Bump org.glassfish.jaxb:jaxb-runtime in /versions (JabRef#15854)
  Chore(deps): Bump com.gradleup.shadow:shadow-gradle-plugin (JabRef#15852)
  Chore(deps): Bump com.gradleup.shadow:shadow-gradle-plugin (JabRef#15849)
  Chore(deps): Bump com.autonomousapps:dependency-analysis-gradle-plugin (JabRef#15850)
  Update dependency org.apache.maven.plugins:maven-surefire-plugin to v3.5.6 (JabRef#15844)
  Fix reset and import of AiPreferences (JabRef#15843)
  Fix Comparable Contract Violation in SharedBibEntryData (JabRef#15806) (JabRef#15842)
  Chore(deps): Bump com.dlsc.gemsfx:gemsfx from 4.0.5 to 4.1.0 in /versions (JabRef#15841)
  Chore(deps): Bump jablib/src/main/resources/csl-styles (JabRef#15840)
  New Crowdin updates (JabRef#15839)
  Add group pseudonymization support (fixes JabRef#14117) (JabRef#15258)
  Feature parse MeSH terms in PubMed MEDLINE records (JabRef#15529)
  Fix/non latin author parsed as name prefix (JabRef#15823)
  Fix not on fx thread cleanup (JabRef#15835)
  Fix garbled BibEntry Javadoc example (JabRef#15834)
  Revert "Fix cleanup operationn setFiles not on fx thread causes exceptiosn"
  Revert "changelog"
  changelog
  Fix cleanup operationn setFiles not on fx thread causes exceptiosn
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parse MeSH terms in PubMed MEDLINE records

3 participants