Skip to content

update findAssociatedFiles to find all instead of the first one#14700

Merged
koppor merged 13 commits into
JabRef:mainfrom
LangInteger:main
Dec 23, 2025
Merged

update findAssociatedFiles to find all instead of the first one#14700
koppor merged 13 commits into
JabRef:mainfrom
LangInteger:main

Conversation

@LangInteger

@LangInteger LangInteger commented Dec 23, 2025

Copy link
Copy Markdown
Contributor

User description

Closes #14697

  • reformated code with better naming
  • eliminated one performance issue by avoiding a loop in another loop
  • updated the find associated files by broken linked file name logic to return all files instead of only the first one
  • added a unit test

Steps to test

  1. Preparation
  • Ensure [Options/Preferences/Linked files/Automatically search and show unlinked files] is checked.
  • Open any database with a file directory set (e.g., set to ".").
  • Attach a PDF minimal.pdf under the same directory to an entry via the “General” tab.
  • Close the entry editor.
  1. Simulate a file move
  • Outside JabRef, move the file from ./minimal.pdf to ./sub1/minimal.pdf.
  1. Create a second copy of the file
  • Outside JabRef, copy the file from ./sub1/minimal.pdf to ./sub2/minimal.pdf.
  1. Check the auto-searched associated but unlinked files result
  • Reopen the entry editor.
  • The original linked file turns orange and shows “Could not find file minimal.pdf” when hovering on it
  • Two associated files are shown below the orange one for the user to check and link

Screenshot of my manual test

Before this PR, only one of the two associated files are displayed

Image

After fixing in this PR, the two associated files are displayed

a0587572-c6bc-477a-a373-6b70a2b80d7c

Mandatory checks

  • I own the copyright of the code submitted and I license it under the MIT license
  • I manually tested my changes in running JabRef (always required)
  • I added JUnit tests for changes (if applicable)
  • I added screenshots in the PR description (if change is visible to the user)
  • I described the change in CHANGELOG.md in a way that is understandable for the average user (if change is visible to the user)
  • [/] I checked the user documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request updating file(s) in https://github.com/JabRef/user-documentation/tree/main/en.

PR Type

Enhancement, Tests


Description

  • Modified findAssociatedFilesByBrokenLinkedFile to return all matching files instead of just the first one

  • Refactored variable naming for clarity and improved code readability

  • Eliminated nested loop performance issue by pre-computing linked files list

  • Added comprehensive unit test for finding multiple associated files


Diagram Walkthrough

flowchart LR
  A["Broken Linked File"] -->|Search by name| B["Find Associated Files"]
  B -->|Old: findFirst| C["Return First Match Only"]
  B -->|New: toList| D["Return All Matches"]
  D --> E["Display Multiple Options to User"]
Loading

File Walkthrough

Relevant files
Enhancement
AutoSetFileLinksUtil.java
Find all associated files instead of first one                     

jabgui/src/main/java/org/jabref/gui/externalfiles/AutoSetFileLinksUtil.java

  • Renamed variables for better clarity: linkedFiles
    associatedNotLinkedFiles, linkedFileassociateNotLinkedFile
  • Refactored findByBrokenLinkName method to
    findAssociatedFilesByBrokenLinkedFile with improved logic
  • Changed .findFirst() to .toList() to collect all matching files
    instead of only the first one
  • Optimized nested loop by pre-computing linkedFiles list outside the
    loop to avoid redundant stream operations
  • Improved variable naming in the broken link search logic for better
    readability
+35/-32 
Tests
AutoSetFileLinksUtilTest.java
Add test for multiple associated files discovery                 

jabgui/src/test/java/org/jabref/gui/externalfiles/AutoSetFileLinksUtilTest.java

  • Added new test method
    findAllAssociatedNotLinkedFilesInsteadOfTheFirstOne to verify multiple
    file discovery
  • Test simulates file move and creates duplicate copies to validate all
    matches are returned
  • Added assertion import for assertTrue to support new test validations
  • Test verifies both files are found and correctly identified by their
    relative paths
+42/-0   
Documentation
CHANGELOG.md
Document unlinked files feature improvement                           

CHANGELOG.md

+1/-0     

@github-actions github-actions Bot added the status: changes-required Pull requests that are not yet complete label Dec 23, 2025

for (BibEntry entry : entries) {
List<LinkedFile> linkedFiles = new ArrayList<>();
List<LinkedFile> associatedNotLinkedFiles = new ArrayList<>();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reformatted code with better naming

Comment on lines +111 to +125
List<Path> linkedFiles = entry.getFiles().stream()
.map(file -> file.findIn(directories))
.filter(Optional::isPresent)
.map(Optional::get)
.toList();
for (Path foundFile : result) {
boolean fileAlreadyLinked = entry.getFiles().stream()
.map(file -> file.findIn(directories))
.anyMatch(linked -> linked.filter(path -> {
try {
return Files.isSameFile(path, foundFile);
} catch (IOException e) {
LOGGER.debug("Unable to check file identity, assuming no identity", e);
return false;
}
}).isPresent());
boolean fileAlreadyLinked = linkedFiles.stream()
.anyMatch(linked -> {
try {
return Files.isSameFile(linked, foundFile);
} catch (IOException e) {
LOGGER.debug("Unable to check file identity, assuming no identity", e);
return false;
}
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eliminated one performance issue here by avoiding a loop in another loop

Comment on lines +143 to +157
private List<Path> findAssociatedFilesByBrokenLinkedFile(BibEntry entry) throws IOException {
List<Path> matches = new ArrayList<>();

for (LinkedFile brokenLink : entry.getFiles()) {
if (brokenLink.findIn(directories).isPresent()) {
continue;
}

String wantedBase = FileUtil.getBaseName(brokenLink.getLink());

for (Path directory : directories) {
try (Stream<Path> walk = Files.walk(directory)) {
walk.filter(path -> !Files.isDirectory(path))
.filter(path -> FileUtil.getBaseName(path).equalsIgnoreCase(wantedBase))
.findFirst()
.ifPresent(matches::add);
for (LinkedFile linkedFile : entry.getFiles()) {
boolean isLinkedFileBroken = linkedFile.findIn(directories).isEmpty();
if (isLinkedFileBroken) {
String linkedFileBaseName = FileUtil.getBaseName(linkedFile.getLink());

for (Path directory : directories) {
try (Stream<Path> walk = Files.walk(directory)) {
List<Path> found = walk.filter(path -> !Files.isDirectory(path))
.filter(path -> FileUtil.getBaseName(path).equalsIgnoreCase(linkedFileBaseName))
.toList();
matches.addAll(found);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the find associated files by broken linked file name logic to return all files instead of only the first one

@LangInteger LangInteger marked this pull request as ready for review December 23, 2025 13:28
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Dec 23, 2025

Copy link
Copy Markdown
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Unbounded directory traversal

Description: Unbounded directory traversal using Files.walk(directory) without depth limit could cause
performance issues or resource exhaustion if directories contain deeply nested structures
or symbolic link loops. AutoSetFileLinksUtil.java [152-157]

Referred Code
try (Stream<Path> walk = Files.walk(directory)) {
    List<Path> found = walk.filter(path -> !Files.isDirectory(path))
                           .filter(path -> FileUtil.getBaseName(path).equalsIgnoreCase(linkedFileBaseName))
                           .toList();
    matches.addAll(found);
}
Ticket Compliance
🟡
🎫 #13109
🔴 Add pseudonymization functionality to the CLI
Make `org.jabref.logic.pseudonymization.Pseudonymization` available on the CLI
Provide a similar CLI experience to the consistency check
Follow the implementation pattern of `org.jabref.cli.CheckConsistency` class
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Dec 23, 2025

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Optimize file search by reducing traversals
Suggestion Impact:The commit directly implements the suggested optimization. It collects broken link base names upfront in a Set, adds an early return for empty sets, and restructures the nested loops to traverse each directory only once while checking against all broken link base names using a Set.contains() operation.

code diff:

     private List<Path> findAssociatedFilesByBrokenLinkedFile(BibEntry entry) throws IOException {
+
+        Set<String> brokenLinkBaseNames = entry.getFiles().stream()
+                                               .filter(linkedFile -> linkedFile.findIn(directories).isEmpty())
+                                               .map(linkedFile -> FileUtil.getBaseName(linkedFile.getLink()).toLowerCase())
+                                               .collect(Collectors.toSet());
+
+        if (brokenLinkBaseNames.isEmpty()) {
+            return List.of();
+        }
+
         List<Path> matches = new ArrayList<>();
-
-        for (LinkedFile linkedFile : entry.getFiles()) {
-            boolean isLinkedFileBroken = linkedFile.findIn(directories).isEmpty();
-            if (isLinkedFileBroken) {
-                String linkedFileBaseName = FileUtil.getBaseName(linkedFile.getLink());
-
-                for (Path directory : directories) {
-                    try (Stream<Path> walk = Files.walk(directory)) {
-                        List<Path> found = walk.filter(path -> !Files.isDirectory(path))
-                                               .filter(path -> FileUtil.getBaseName(path).equalsIgnoreCase(linkedFileBaseName))
-                                               .toList();
-                        matches.addAll(found);
-                    }
-                }
+        for (Path directory : directories) {
+            try (Stream<Path> walk = Files.walk(directory)) {
+                List<Path> found = walk.filter(path -> !Files.isDirectory(path))
+                                       .filter(path -> brokenLinkBaseNames.contains(FileUtil.getBaseName(path).toLowerCase()))
+                                       .toList();
+                matches.addAll(found);
             }
         }
-
         return matches;
     }

Optimize the findAssociatedFilesByBrokenLinkedFile method to improve performance
by reducing redundant file system traversals. First, collect all unique base
names of broken links, then iterate through each directory only once to find all
matching files.

jabgui/src/main/java/org/jabref/gui/externalfiles/AutoSetFileLinksUtil.java [143-163]

 private List<Path> findAssociatedFilesByBrokenLinkedFile(BibEntry entry) throws IOException {
+    Set<String> brokenLinkBaseNames = entry.getFiles().stream()
+                                           .filter(linkedFile -> linkedFile.findIn(directories).isEmpty())
+                                           .map(linkedFile -> FileUtil.getBaseName(linkedFile.getLink()).toLowerCase())
+                                           .collect(Collectors.toSet());
+
+    if (brokenLinkBaseNames.isEmpty()) {
+        return List.of();
+    }
+
     List<Path> matches = new ArrayList<>();
-
-    for (LinkedFile linkedFile : entry.getFiles()) {
-        boolean isLinkedFileBroken = linkedFile.findIn(directories).isEmpty();
-        if (isLinkedFileBroken) {
-            String linkedFileBaseName = FileUtil.getBaseName(linkedFile.getLink());
-
-            for (Path directory : directories) {
-                try (Stream<Path> walk = Files.walk(directory)) {
-                    List<Path> found = walk.filter(path -> !Files.isDirectory(path))
-                                           .filter(path -> FileUtil.getBaseName(path).equalsIgnoreCase(linkedFileBaseName))
-                                           .toList();
-                    matches.addAll(found);
-                }
-            }
+    for (Path directory : directories) {
+        try (Stream<Path> walk = Files.walk(directory)) {
+            List<Path> found = walk.filter(path -> !Files.isDirectory(path))
+                                   .filter(path -> brokenLinkBaseNames.contains(FileUtil.getBaseName(path).toLowerCase()))
+                                   .toList();
+            matches.addAll(found);
         }
     }
 
     return matches;
 }

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a performance bottleneck where the file system is traversed multiple times and proposes an effective optimization to walk each directory only once, which can significantly improve performance.

Medium
  • Update

@LangInteger

Copy link
Copy Markdown
Contributor Author

@koppor Hi, I finished this PR. Feel free to review the changes here. I will avoid main branch next time.

Comment on lines +135 to +141
assertEquals(2, matches.size());
assertTrue(matches.stream().anyMatch(file -> {
return FileUtil.relativize(newPath1, List.of(directory)).equals(Path.of(file.getLink()));
}));
assertTrue(matches.stream().anyMatch(file -> {
return FileUtil.relativize(newPath2, List.of(directory)).equals(Path.of(file.getLink()));
}));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, FileUtil is used, but not scope of the test

Just test matches with a List.of(...) w/o calling FileUtil. I think, it should be easy to hard code the expected result?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Now the FileUtil is avoided.

@koppor koppor left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments in the PR. This makes reviewing easier.

I am not able to fully test and review; therefore a small comment on the test setup. (I think, the PR is fine otherwise; need to check later)

@LangInteger LangInteger requested a review from koppor December 23, 2025 17:56
Comment on lines +137 to +141
assertEquals(2, matches.size());
assertTrue(matches.stream().anyMatch(file ->
Path.of(newPath1String).equals(Path.of(file.getLink()))));
assertTrue(matches.stream().anyMatch(file ->
Path.of(newPath2String).equals(Path.of(file.getLink()))));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have

List<LinkedFile> expected = List.of(new LinkedFile(...), ...)

assertTrue(matches.stream().anyMatch(file ->
Path.of(newPath2String).equals(Path.of(file.getLink()))));
List<String> expected = List.of(newPath1String, newPath2String);
// findAssociatedNotLinkedFiles does not guarantee how the returned files are ordered

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we found a flaw, don't we?

  1. Stabilize findAssociatedNotLinkedFiles to return the files in the order they appear in the BibEntry
  2. Use Collection as return value

I prefer the former.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to comment

The function that accumulates associated files says nothing about how the returned file is ordered, so I used the HashSet trick to check.

Since multiple files (with the same name) might be found here for one broken linked file in a BibEntry, it seems there is no way to arrange the found files according to their order in BibEntry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I should go with option 2. How do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me - one can sort "later"

(I haven't checked all the code, just my gut feeling)

List<LinkedFile> matches = util.findAssociatedNotLinkedFiles(entry);
List<String> matchedFiles = util.findAssociatedNotLinkedFiles(entry)
.stream().map(LinkedFile::getLink).toList();
assertEquals(2, matchedFiles.size());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, will be done with assertEquals below

);
assertEquals(newFile,
matches.stream().findFirst().map(LinkedFile::getLink).orElse(""));
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix previous test to avoid FIleUtil here


for (BibEntry entry : entries) {
List<LinkedFile> linkedFiles = new ArrayList<>();
Collection<LinkedFile> associatedNotLinkedFiles = new ArrayList<>();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Collection<LinkedFile> associatedNotLinkedFiles = new ArrayList<>();
Collection<LinkedFile> associatedNotLinkedFiles = Set.of();

Because you overwrite?

Maybe even move the assignment to the catch block?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both fixed.

/// 1. This method does not check if the file is already linked to another entry.
/// 2. This method does not guarantee how the returned files are ordered.
/// Order by how they appear in BibEntry does not work since findAssociatedFilesByBrokenLinkedFile may return
/// multiple files (with the same name) for one broken linked file in the entry.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent...

(Otherwise nice)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Let me know if the fix is still not standard


Collection<String> matchedFiles = util.findAssociatedNotLinkedFiles(entry)
.stream().map(LinkedFile::getLink).toList();
List<String> expected = List.of(newPath1String, newPath2String);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why List? Shouldn't Set be OK?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Set<String> expected = Set.of(newPath1String, newPath2String);
// findAssociatedNotLinkedFiles does not guarantee how the returned files are ordered
// so here we compare equality without considering order
assertEquals(new HashSet<>(matchedFiles), expected);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not OK

I cannot push my updates to your main

To github.com:LangInteger/jabref.git
 ! [remote rejected]       LangInteger/main -> LangInteger/main (permission denied)
error: failed to push some refs to 'github.com:LangInteger/jabref.git'

Please, in future, allow updates from maintainers!

Please apply following diff:


diff --git a/jabgui/src/test/java/org/jabref/gui/externalfiles/AutoSetFileLinksUtilTest.java b/jabgui/src/test/java/org/jabref/gui/externalfiles/AutoSetFileLinksUtilTest.java
index 8db83b9be1..4ac2825da6 100644
--- a/jabgui/src/test/java/org/jabref/gui/externalfiles/AutoSetFileLinksUtilTest.java
+++ b/jabgui/src/test/java/org/jabref/gui/externalfiles/AutoSetFileLinksUtilTest.java
@@ -4,7 +4,6 @@ import java.io.IOException;
 import java.nio.file.Files;
 import java.nio.file.Path;
 import java.util.Collection;
-import java.util.HashSet;
 import java.util.List;
 import java.util.Set;
 import java.util.TreeSet;
@@ -104,18 +103,15 @@ class AutoSetFileLinksUtilTest {
     void findAllAssociatedNotLinkedFilesInsteadOfTheFirstOne(@TempDir Path tempDir) throws IOException {
         Path directory = tempDir.resolve("files");
         Path oldPath = directory.resolve("old/minimal.pdf");
-        Files.createDirectories(oldPath.getParent());
-        Files.createFile(oldPath);

-        LinkedFile stale = new LinkedFile("", oldPath.toString(), "PDF");
-        BibEntry entry = new BibEntry(StandardEntryType.Misc);
-        entry.addFile(stale);
+        BibEntry entry = new BibEntry(StandardEntryType.Misc)
+                .withFiles(List.of(new LinkedFile("", oldPath.toString(), "PDF")));

         // Simulate a file move
         String newPath1String = "new1/minimal.pdf";
         Path newPath1 = directory.resolve(newPath1String);
         Files.createDirectories(newPath1.getParent());
-        Files.move(oldPath, newPath1);
+        Files.createFile(newPath1);

         // Create a second copy of the file
         String newPath2String = "new2/minimal.pdf";
@@ -132,11 +128,10 @@ class AutoSetFileLinksUtilTest {
                 filePreferences,
                 autoLinkPrefs);

-        Collection<String> matchedFiles = util.findAssociatedNotLinkedFiles(entry)
-                                              .stream().map(LinkedFile::getLink).toList();
-        Set<String> expected = Set.of(newPath1String, newPath2String);
-        // findAssociatedNotLinkedFiles does not guarantee how the returned files are ordered
-        // so here we compare equality without considering order
-        assertEquals(new HashSet<>(matchedFiles), expected);
+        Collection<LinkedFile> matchedFiles = util.findAssociatedNotLinkedFiles(entry);
+        Set<LinkedFile> expected = Set.of(
+                new LinkedFile("", newPath1String, "PDF"),
+                new LinkedFile("", newPath2String, "PDF"));
+        assertEquals(expected, Set.copyOf(matchedFiles));
     }
 }
``´

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upate: I filed LangInteger#1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thx for the PR. I also applied the diff.

@koppor koppor mentioned this pull request Dec 23, 2025
@jabref-machine

Copy link
Copy Markdown
Collaborator

You committed your code on the main brach of your fork. This is a bad practice. The right way is to branch out from main, work on your patch/feature in that new branch, and then get that branch merged via the pull request (see GitHub flow).

For this pull request, this is OK. For subsequent pull requests, please start with a different branch with a proper branch name. See CONTRIBUTING.md for more details.

@koppor koppor enabled auto-merge December 23, 2025 20:45
@koppor koppor added this pull request to the merge queue Dec 23, 2025
Merged via the queue into JabRef:main with commit dd70582 Dec 23, 2025
50 of 51 checks passed
@LangInteger LangInteger mentioned this pull request Dec 24, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 2/5 status: changes-required Pull requests that are not yet complete

Projects

None yet

Development

Successfully merging this pull request may close these issues.

All the associated files should be displayed for broken linked files instead of only the first one

3 participants