Fix XMP import by koppor · Pull Request #12833 · JabRef/jabref

koppor · 2025-03-26T15:37:27Z

Fixes Fix import of PDF #12829
Fixes Fix PdfMergeMetadataImporterTest #12620 (commit d9930cd)
Closes Fix PDF relativization test #12621

WIP, because this adds double parsing etc.

Mandatory checks

I own the copyright of the code submitted and I license it under the MIT license
Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
[/] Screenshots added in PR description (if change is visible to the user)
[/] Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
[/] Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

This reverts commit 7f398ee.

trag-bot · 2025-03-26T15:47:01Z

        }
+
+        // Support for UnknownField{name='rights'} and similar constructs
+        Matcher matcher = UNKNOWNFIELD_PATTERN.matcher(fieldName);


The code introduces a new pattern matching logic without updating the JavaDoc for the parseField method to reflect this change, violating the requirement to update JavaDoc when method code changes.

False alarm?

I... can't give a reasonable argument here

Oh, it could be to add some documentation about the accepted formats of the method. I added it.

InAnYan · 2025-03-26T15:50:04Z

-                key = key.substring("bibtex/".length());
-                Field field = FieldFactory.parseField(key);
+
+                String fieldName = key.substring("bibtex/".length());


Introduce constant? 😄

Not sure what you are referring to. The string appeared while reading the raw XMP meta data

InAnYan · 2025-03-26T15:52:55Z

+    }
+
+    public List<BibEntry> readXmp(Path path, PDDocument document, XmpPreferences xmpPreferences) {
+        SequencedCollection<BibEntry> result = new LinkedHashSet<>();


Why LinkedHashSet is used here?

Just curious. I wouldn't doubt any of your code

Maybe because we want the entries preserving their order in the doc, and don't want duplicates? Can't think of a more performant implementation for that use case.

Yeah, that was the idea!

I was too lazy to.implement subset-based elimination.

I implemented sub-set comparisoin at https://github.com/JabRef/jabref/blob/fix-12829/src/main/java/org/jabref/logic/bibtex/comparator/BibEntryCompare.java

InAnYan · 2025-03-26T15:54:08Z

        }
+
+        // Support for UnknownField{name='rights'} and similar constructs
+        Matcher matcher = UNKNOWNFIELD_PATTERN.matcher(fieldName);


False alarm?

I... can't give a reasonable argument here

trag-bot · 2025-03-27T15:05:41Z

-
-        List<BibEntry> result = new LinkedList<>();
-
+    public List<BibEntry> readXmp(Path path, XmpPreferences xmpPreferences) throws IOException {


The JavaDoc for this method is not updated to reflect the changes in the method's logic, especially regarding the merging of XMP data into a single BibEntry.

trag-bot · 2025-03-27T15:06:42Z

    }

-    protected static XMPMetadata parseXmpMetadata(InputStream is) throws IOException {
+    public XMPMetadata parseXmpMetadata(InputStream is) throws IOException {


The method parseXmpMetadata has changed but its JavaDoc has not been updated to reflect the changes, violating the requirement to update JavaDoc when method code changes.

trag-bot · 2025-03-27T15:06:44Z

    }

-    protected static XMPMetadata parseXmpMetadata(InputStream is) throws IOException {
+    public XMPMetadata parseXmpMetadata(InputStream is) throws IOException {


The method parseXmpMetadata should return an Optional instead of potentially returning null, adhering to modern Java practices.

trag-bot · 2025-03-27T15:07:53Z

+        if (this.getType().equals(DEFAULT_TYPE)) {
+            this.setType(other.getType());
+        }


The code does not follow the fail-fast principle. It should return early if the condition is not met, instead of nesting logic inside an else branch.

koppor · 2025-03-28T05:06:36Z

+        Path file = Path.of(PdfXmpImporterTest.class.getResource("2024_SPLC_Becker.pdf").toURI());
+        List<BibEntry> bibEntries = importer.importDatabase(file).getDatabase().getEntries();
+
+        // TODO: Adapt this


I don't see this in my IDE. strange.

trag-bot · 2025-03-31T17:38:23Z

@trag-bot didn't find any issues in the code! ✅✨

github-actions · 2025-03-31T17:48:46Z

The build for this PR is no longer available. Please visit https://builds.jabref.org/main/ for the latest build.

* Fix level for PicaXmlParser * Use withers * Use "withField" * Add debub statement * Modernize test * Add 2024_SPLC_Becker.pdf * Reuse object * Use well-known array list * Refactor xmpUtilReader to avoid double-loading of the PDF * Fail faster * Simplify test code * Reorder for better readability * Fix casing * Fix comments in XmpUtilReaderTest * Revert "Reorder for better readability" This reverts commit 7f398ee. * Make code more readable * Add support for UnknownField{name='rights'} * WIP: Fix of XmpUtilReader * Add skeletton for test * Introduce constant * Map more fields * Fix links * Add links to related classes * Add initial Markdown doc * Remove unnecessary method * Revert "WIP: Fix of XmpUtilReader" This reverts commit 85cf9f1. * Try to cache DOM parser * Merge also merges type * Remove "doi:" prefix at identifier * Return one entry only * Try to have org.jabref.logic.importer.fileformat.pdf.PdfMergeMetadataImporterTest#importRelativizesFilePath working * Fix tabs * Update CHANGELOG.md Co-authored-by: Subhramit Basu <subhramit.bb@live.in> * Fix path relativation * Use withers * Use "List.of" * Add comment on testing * Re-use merge logig of BibEntry * Add support of merging file field * Streamline code in PdfMergeMetadataImporter * Adapt test to real data * Refine JavaDoc * Fix issue in field writing * Introduce BibEntryCompare * Ease test * Add month handling during loading of XMP data * Add refined JavaDoc * Fix checkstyle * Remvoe "// TODO" --------- Co-authored-by: Subhramit Basu <subhramit.bb@live.in>

koppor added 19 commits March 26, 2025 14:44

Fix level for PicaXmlParser

f4eea1f

Use withers

8177587

Use "withField"

14a6dce

Add debub statement

803fa79

Modernize test

6cb5349

Add 2024_SPLC_Becker.pdf

79c5bb8

Reuse object

a4b746c

Use well-known array list

c12af26

Refactor xmpUtilReader to avoid double-loading of the PDF

c5534ec

Fail faster

56fb4a3

Simplify test code

42a19ad

Reorder for better readability

7f398ee

Fix casing

34bddcc

Fix comments in XmpUtilReaderTest

2b4758d

Revert "Reorder for better readability"

b5faf8b

This reverts commit 7f398ee.

Make code more readable

24200ec

Add support for UnknownField{name='rights'}

4de8825

WIP: Fix of XmpUtilReader

85cf9f1

Add skeletton for test

0daf630

trag-bot Bot reviewed Mar 26, 2025

View reviewed changes

Comment thread src/main/java/org/jabref/logic/importer/fileformat/pdf/PdfXmpImporter.java

trag-bot Bot reviewed Mar 26, 2025

View reviewed changes

Comment thread src/main/java/org/jabref/logic/xmp/DublinCoreExtractor.java

trag-bot Bot reviewed Mar 26, 2025

View reviewed changes

InAnYan reviewed Mar 26, 2025

View reviewed changes

koppor added 7 commits March 27, 2025 08:27

Merge remote-tracking branch 'origin/main' into fix-12829

0381fe1

Introduce constant

d434b75

Map more fields

70b331c

Fix links

49d414a

Add links to related classes

07aa65e

Add initial Markdown doc

4235ecd

Remove unnecessary method

66609ea

koppor mentioned this pull request Mar 27, 2025

EPUB import #12457

Closed

5 tasks