Skip to content

Fix XMP import#12833

Merged
Siedlerchr merged 54 commits into
mainfrom
fix-12829
Mar 31, 2025
Merged

Fix XMP import#12833
Siedlerchr merged 54 commits into
mainfrom
fix-12829

Conversation

@koppor

@koppor koppor commented Mar 26, 2025

Copy link
Copy Markdown
Member

WIP, because this adds double parsing etc.

Mandatory checks

  • I own the copyright of the code submitted and I license it under the MIT license
  • Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • [/] Screenshots added in PR description (if change is visible to the user)
  • [/] Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • [/] Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

Comment thread src/main/java/org/jabref/logic/xmp/DublinCoreExtractor.java
}

// Support for UnknownField{name='rights'} and similar constructs
Matcher matcher = UNKNOWNFIELD_PATTERN.matcher(fieldName);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code introduces a new pattern matching logic without updating the JavaDoc for the parseField method to reflect this change, violating the requirement to update JavaDoc when method code changes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False alarm?

I... can't give a reasonable argument here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it could be to add some documentation about the accepted formats of the method. I added it.

key = key.substring("bibtex/".length());
Field field = FieldFactory.parseField(key);

String fieldName = key.substring("bibtex/".length());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce constant? 😄

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are referring to. The string appeared while reading the raw XMP meta data

Comment thread src/main/java/org/jabref/logic/xmp/DublinCoreExtractor.java
}

public List<BibEntry> readXmp(Path path, PDDocument document, XmpPreferences xmpPreferences) {
SequencedCollection<BibEntry> result = new LinkedHashSet<>();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why LinkedHashSet is used here?

Just curious. I wouldn't doubt any of your code

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe because we want the entries preserving their order in the doc, and don't want duplicates? Can't think of a more performant implementation for that use case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was the idea!

I was too lazy to.implement subset-based elimination.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// Support for UnknownField{name='rights'} and similar constructs
Matcher matcher = UNKNOWNFIELD_PATTERN.matcher(fieldName);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False alarm?

I... can't give a reasonable argument here

@koppor koppor mentioned this pull request Mar 27, 2025
5 tasks

List<BibEntry> result = new LinkedList<>();

public List<BibEntry> readXmp(Path path, XmpPreferences xmpPreferences) throws IOException {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaDoc for this method is not updated to reflect the changes in the method's logic, especially regarding the merging of XMP data into a single BibEntry.

}

protected static XMPMetadata parseXmpMetadata(InputStream is) throws IOException {
public XMPMetadata parseXmpMetadata(InputStream is) throws IOException {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method parseXmpMetadata has changed but its JavaDoc has not been updated to reflect the changes, violating the requirement to update JavaDoc when method code changes.

}

protected static XMPMetadata parseXmpMetadata(InputStream is) throws IOException {
public XMPMetadata parseXmpMetadata(InputStream is) throws IOException {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method parseXmpMetadata should return an Optional instead of potentially returning null, adhering to modern Java practices.

Comment on lines +1271 to +1273
if (this.getType().equals(DEFAULT_TYPE)) {
this.setType(other.getType());
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code does not follow the fail-fast principle. It should return early if the condition is not met, instead of nesting logic inside an else branch.

Comment thread src/test/java/org/jabref/logic/exporter/XmpPdfExporterTest.java
Comment thread src/test/java/org/jabref/logic/xmp/XmpUtilReaderTest.java
Comment thread src/test/java/org/jabref/logic/xmp/XmpUtilReaderTest.java
Comment thread src/test/java/org/jabref/logic/xmp/XmpUtilReaderTest.java
@koppor koppor added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Mar 27, 2025
Path file = Path.of(PdfXmpImporterTest.class.getResource("2024_SPLC_Becker.pdf").toURI());
List<BibEntry> bibEntries = importer.importDatabase(file).getDatabase().getEntries();

// TODO: Adapt this

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this in my IDE. strange.

@trag-bot

trag-bot Bot commented Mar 31, 2025

Copy link
Copy Markdown

@trag-bot didn't find any issues in the code! ✅✨

@github-actions

github-actions Bot commented Mar 31, 2025

Copy link
Copy Markdown
Contributor

The build for this PR is no longer available. Please visit https://builds.jabref.org/main/ for the latest build.

@Siedlerchr Siedlerchr added this pull request to the merge queue Mar 31, 2025
Merged via the queue into main with commit c39222d Mar 31, 2025
@Siedlerchr Siedlerchr deleted the fix-12829 branch March 31, 2025 18:33
krishnagjsForGit pushed a commit to krishnagjsForGit/jabref that referenced this pull request May 2, 2025
* Fix level for PicaXmlParser

* Use withers

* Use "withField"

* Add debub statement

* Modernize test

* Add 2024_SPLC_Becker.pdf

* Reuse object

* Use well-known array list

* Refactor xmpUtilReader to avoid double-loading of the PDF

* Fail faster

* Simplify test code

* Reorder for better readability

* Fix casing

* Fix comments in XmpUtilReaderTest

* Revert "Reorder for better readability"

This reverts commit 7f398ee.

* Make code more readable

* Add support for UnknownField{name='rights'}

* WIP: Fix of XmpUtilReader

* Add skeletton for test

* Introduce constant

* Map more fields

* Fix links

* Add links to related classes

* Add initial Markdown doc

* Remove unnecessary method

* Revert "WIP: Fix of XmpUtilReader"

This reverts commit 85cf9f1.

* Try to cache DOM parser

* Merge also merges type

* Remove "doi:" prefix at identifier

* Return one entry only

* Try to have org.jabref.logic.importer.fileformat.pdf.PdfMergeMetadataImporterTest#importRelativizesFilePath working

* Fix tabs

* Update CHANGELOG.md

Co-authored-by: Subhramit Basu <subhramit.bb@live.in>

* Fix path relativation

* Use withers

* Use "List.of"

* Add comment on testing

* Re-use merge logig of BibEntry

* Add support of merging file field

* Streamline code in PdfMergeMetadataImporter

* Adapt test to real data

* Refine JavaDoc

* Fix issue in field writing

* Introduce BibEntryCompare

* Ease test

* Add month handling during loading of XMP data

* Add refined JavaDoc

* Fix checkstyle

* Remvoe "// TODO"

---------

Co-authored-by: Subhramit Basu <subhramit.bb@live.in>
krishnagjsForGit pushed a commit to krishnagjsForGit/jabref that referenced this pull request May 2, 2025
* Fix level for PicaXmlParser

* Use withers

* Use "withField"

* Add debub statement

* Modernize test

* Add 2024_SPLC_Becker.pdf

* Reuse object

* Use well-known array list

* Refactor xmpUtilReader to avoid double-loading of the PDF

* Fail faster

* Simplify test code

* Reorder for better readability

* Fix casing

* Fix comments in XmpUtilReaderTest

* Revert "Reorder for better readability"

This reverts commit 7f398ee.

* Make code more readable

* Add support for UnknownField{name='rights'}

* WIP: Fix of XmpUtilReader

* Add skeletton for test

* Introduce constant

* Map more fields

* Fix links

* Add links to related classes

* Add initial Markdown doc

* Remove unnecessary method

* Revert "WIP: Fix of XmpUtilReader"

This reverts commit 85cf9f1.

* Try to cache DOM parser

* Merge also merges type

* Remove "doi:" prefix at identifier

* Return one entry only

* Try to have org.jabref.logic.importer.fileformat.pdf.PdfMergeMetadataImporterTest#importRelativizesFilePath working

* Fix tabs

* Update CHANGELOG.md

Co-authored-by: Subhramit Basu <subhramit.bb@live.in>

* Fix path relativation

* Use withers

* Use "List.of"

* Add comment on testing

* Re-use merge logig of BibEntry

* Add support of merging file field

* Streamline code in PdfMergeMetadataImporter

* Adapt test to real data

* Refine JavaDoc

* Fix issue in field writing

* Introduce BibEntryCompare

* Ease test

* Add month handling during loading of XMP data

* Add refined JavaDoc

* Fix checkstyle

* Remvoe "// TODO"

---------

Co-authored-by: Subhramit Basu <subhramit.bb@live.in>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix import of PDF Fix PdfMergeMetadataImporterTest

5 participants