Feature: Add ability to remove XMP metadata from Linked Files#14853
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||
|
Will fix the failing tests in a bit |
| filePath.ifPresent(path -> { | ||
| try { | ||
| XmpUtilWriter.removeXmpMetadata(path); | ||
| changed.set(true); |
There was a problem hiding this comment.
Isnt it a bit bold to assume that if a file exists it will be changed? Maybe there was no metadata to remove in that file?
There was a problem hiding this comment.
Since the current approach does not remove selective fields from the metadata and erases it all entirely, checking if metadata is present adds unnecessary overhead. Erasing all metadata without a check would result in the same outcome.
There was a problem hiding this comment.
There is a con to this approach which I did not look at. If a PDF has no metadata and the cleanup is run, the UI reports a change (The entries containing any PDF regardless of if they have metadata or not). The benefit is if metadata exists -> cleanup without adding checking overhead but I think reporting that an entry was cleaned up when it had no metadata in the first place shouldn't be intended. What do you think? cc @Siedlerchr
There was a problem hiding this comment.
t I think reporting that an entry was cleaned up when it had no metadata in the first place shouldn't be intended
Yes. Always think of a huge library with 1000 or more entries.
- I do cleanup: 1500 files changed (because i have mulitple files attached)
- I do cleanup: 1500 files changed
- I think: Why? Why does JabRef need to change again?!
|
|
||
| if (changed.get()) { | ||
| // Since only metadata is removed, no field "changes" but we still return a list | ||
| return Collections.singletonList(new FieldChange(entry, StandardField.FILE, entry.getField(StandardField.FILE).orElse(null), entry.getField(StandardField.FILE).orElse(null))); |
There was a problem hiding this comment.
OrElse(null) looks very suspicious to me.
Please provide some rational here.
There was a problem hiding this comment.
I first returned an empty list since no field was being changed but the UI reports no changes (since it counts the number of lists returned) I asked an LLM for ideas on what to return and this approach made sense. Yes the orElse(null) is unnecessary and I will remove it. Will be more wary of the same in the future
| return List.of(new FieldChange( | ||
| entry, | ||
| StandardField.FILE, | ||
| entry.getField(StandardField.FILE).orElse(StandardFileType.PDF.toString()), |
There was a problem hiding this comment.
As I currently lack the resources for a detailed review, can you @Siedlerchr check on this?
There was a problem hiding this comment.
Yeah, the orElse is wrong. It should be get() -- but then there is an IntelliJ warning on this, which cannot be configured to be ignored properly.
Proposal: Java comment + .get().
…moveMetadata implementation
|
Your pull request conflicts with the target branch. Please merge with your code. For a step-by-step guide to resolve merge conflicts, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line. |
Double check your project setup. The project.xml in the repo was wrong until some days ago, but the guideline for setting up still comtains the hint. |
Auto Formatting somehow working again :), settings were fine so I did not have to change anything. |
|
I've resolved all conflicts, feel free to let me know if any other changes are necessary |
koppor
left a comment
There was a problem hiding this comment.
Tests need to be rectored. They seem to be AI-generated not following JabRef's code style.
| return List.of(new FieldChange( | ||
| entry, | ||
| StandardField.FILE, | ||
| entry.getField(StandardField.FILE).orElse(StandardFileType.PDF.toString()), |
There was a problem hiding this comment.
Yeah, the orElse is wrong. It should be get() -- but then there is an IntelliJ warning on this, which cannot be configured to be ignored properly.
Proposal: Java comment + .get().
| BibEntry metadataEntry = new BibEntry(); | ||
| metadataEntry.setCitationKey("Toot"); | ||
| metadataEntry.setField(StandardField.PDF, "aPdfFile"); | ||
| metadataEntry.setField(new UnknownField("some"), "1st"); | ||
| metadataEntry.setField(StandardField.DOI, "http://dx.doi.org/10.1016/0001-8708(80)90035-3"); | ||
| metadataEntry.setField(StandardField.MONTH, "01"); | ||
| metadataEntry.setField(StandardField.PAGES, "1-2"); | ||
| metadataEntry.setField(StandardField.DATE, "01/1999"); | ||
| metadataEntry.setField(StandardField.PDF, "aPdfFile"); | ||
| metadataEntry.setField(StandardField.ISSN, "aPsFile"); | ||
| metadataEntry.setField(StandardField.FILE, "link::"); | ||
| metadataEntry.setField(StandardField.JOURNAL, "test"); | ||
| metadataEntry.setField(StandardField.TITLE, "<b>hallo</b> units 1 A case AlGaAs and latex $\\alpha$$\\beta$"); | ||
| metadataEntry.setField(StandardField.ABSTRACT, "Réflexions"); |
There was a problem hiding this comment.
If possible, use withField chain.
There was a problem hiding this comment.
Test here was copied/inspired by the pre existing test below cleanupDoesNothingByDefault. Maybe that needs refactoring too?
| metadataEntry.setField(StandardField.DATE, "01/1999"); | ||
| metadataEntry.setField(StandardField.PDF, "aPdfFile"); | ||
| metadataEntry.setField(StandardField.ISSN, "aPsFile"); | ||
| metadataEntry.setField(StandardField.FILE, "link::"); |
There was a problem hiding this comment.
Just put test.pdf as content - and not a wrong format.
There was a problem hiding this comment.
Switched to the format from XmpMetadataCleanupTest since you said that the format is correct there
| metadataEntry.setField(StandardField.ISSN, "aPsFile"); | ||
| metadataEntry.setField(StandardField.FILE, "link::"); | ||
| metadataEntry.setField(StandardField.JOURNAL, "test"); | ||
| metadataEntry.setField(StandardField.TITLE, "<b>hallo</b> units 1 A case AlGaAs and latex $\\alpha$$\\beta$"); |
There was a problem hiding this comment.
Not sure why a bogus title is used - why not just "title"?
There was a problem hiding this comment.
Same as earlier, copied from cleanupDoesNothingByDefault
|
|
||
| CleanupPreferences preset = new CleanupPreferences(CleanupPreferences.CleanupStep.REMOVE_XMP_METADATA); | ||
| List<FieldChange> changes = worker.cleanup(preset, entry); | ||
| assertFalse(changes.isEmpty()); |
There was a problem hiding this comment.
| assertFalse(changes.isEmpty()); | |
| assertEquals(List.of(), changes); |
There was a problem hiding this comment.
Done, should be assertNotEquals
| olly2018.setField(StandardField.ABSTRACT, "ABSTRACT"); | ||
| olly2018.setField(StandardField.COMMENT, "COMMENT"); | ||
| olly2018.setField(StandardField.DOI, "10/3212.3123"); | ||
| olly2018.setField(StandardField.FILE, ":article_dublinCore.pdf:PDF"); |
There was a problem hiding this comment.
Here it was inspired by the earlier Pull Request for the issue #8681
| olly2018.setField(StandardField.AUTHOR, "Olly and Johannes"); | ||
| olly2018.setField(StandardField.TITLE, "Stefan's palace"); | ||
| olly2018.setField(StandardField.JOURNAL, "Test Journal"); | ||
| olly2018.setField(StandardField.VOLUME, "1"); | ||
| olly2018.setField(StandardField.NUMBER, "1"); |
There was a problem hiding this comment.
Use .withField
Regarding the citation key - see other test.
There was a problem hiding this comment.
Switched to useField, could you please elaborate on the citation key?
| assertFalse(changes.isEmpty(), "Cleanup should report a change"); | ||
| assertEquals(StandardField.FILE, changes.getFirst().getField()); |
There was a problem hiding this comment.
Why not equalling the field changes?
| BibEntry bibEntry = new BibEntry().withFiles(Collections.singletonList(linkedFile)); | ||
|
|
||
| List<FieldChange> changes = cleanupJob.cleanup(bibEntry); | ||
| assertTrue(changes.isEmpty(), "Cleanup should report no changes for file without metadata"); |
There was a problem hiding this comment.
No reasoning at the assertions - test title should do it
Use List.of(). See other comment
There was a problem hiding this comment.
Any reason for this other than style? I personally feel assertion comments offer more clarity.
|
Javadoc CI check unrelated to changes made by me looks like, seems to have failed on the last merged commit as well: https://github.com/JabRef/jabref/actions/runs/21373986928/job/61525481411 |
Note that runs will be deleted after some weeks. Normally, one copies the smallest possible excerpt of the run. For the concrete case, fixed in main. I merged JavaDoc would be working at JDK27 - see https://bugs.openjdk.org/browse/JDK-8371504 |
User description
Closes #8277
This PR adds a new
CleanupJobthat removes all XMP metadata from linked PDF files.Steps to test
BibEntryRemove XMP Metadatais selectedThe implementation in #8681 tried to use XmpPreferences and only delete specific fields giving the user more flexibility. However I think the approach of removing all XMP metadata works better since it does what was intended, let me know if this shouldn't be the expected behaviour.
Mandatory checks
CHANGELOG.mdin a way that is understandable for the average user (if change is visible to the user)PR Type
Enhancement
Description
Add new cleanup job to remove XMP metadata from linked PDF files
Implement XmpMetadataCleanup class with failure tracking and error handling
Integrate XMP removal into cleanup dialog UI with checkbox control
Add comprehensive unit tests for metadata removal functionality
Diagram Walkthrough
File Walkthrough
8 files
New cleanup job implementation for XMP metadata removalAdd REMOVE_XMP_METADATA cleanup step enumIntegrate XmpMetadataCleanup job and failure trackingAdd removeXmpMetadata static method for metadata removalSimplify hasMetadata to check raw XMP without preferencesAdd removeXmpMetadataSelected property and job selectionAdd checkbox control for XMP metadata removal optionAdd UI checkbox element for remove XMP metadata3 files
Fix javadoc comment from iff to ifAdd localization string for remove XMP metadataDocument new XMP metadata removal feature2 files
New comprehensive unit tests for XMP metadata cleanupAdd integration test for XMP metadata removal workflow