Skip to content

implement auto-link#14710

Merged
calixtus merged 38 commits into
JabRef:mainfrom
LangInteger:fix-for-issue-9798
Dec 31, 2025
Merged

implement auto-link#14710
calixtus merged 38 commits into
JabRef:mainfrom
LangInteger:fix-for-issue-9798

Conversation

@LangInteger

@LangInteger LangInteger commented Dec 24, 2025

Copy link
Copy Markdown
Contributor

User description

Closes #9798

I implemented the feature - auto-link a file to a broken linked file in a BibEntry. The feature stands at the intersection of Quality - Automatically set file links and Entry Editor - Automatically search and show unlinked files in the entity editor as the two features share the code to find associated files using the function findAssociatedNotLinkedFiles in AutoSetFileLinksUtil.java.

Before #13326, the associated files are only searched by three variants of citation-index-based strategy configured by user (see doc auto-linking-files). #13326 expands the search scope to also consider broken linked file names. It has a flaw that it only returns the first found file based on broken linked file names, which is fixed by #14700. #13326 does not fix the problem that the associated files are never used to auto-link to the broken linked file, but arbitrarily add them as new linked files, which will be addressed by this PR. For details, check the function doLinkAssociateFiles in AutoSetFileLinksUtil.java.

Steps to test

Many unit tests for the function AutoSetFileLinksUtil.linkAssociatedFiles are created in AutoSetFileLinksUtilTest. They are categorized into the following groups. Most categories also consider cases where auto-link should not work.

  • Part 1. test auto link only by citation key
    • Part 1.1 test auto link only by citation key - CitationKeyDependency.START
    • Part 1.2 test auto link only by citation key - CitationKeyDependency.EXACT
    • Part 1.3 test auto link only by citation key - CitationKeyDependency.REGEX (test cases omitted for this)
  • Part 2. test auto link only by file name
  • Part 3. test auto link by file name and citation key

Required files for the three manual test cases are put in the following directory

  • test-support/src/manual-tests/issue-9798/example1: the minimal case introduced in If user moved file, it should simply be relinked - [Find Unlinked Files part] #9798, where the broken linked file is auto-linked with files searched via the citation key
  • test-support/src/manual-tests/issue-9798/example2: the case shows that multiple entries can be auto-linked at the same time. It also shows that auto-link may create extra linked files if they are found by citation index and not used to fix broken linked files.
  • test-support/src/manual-tests/issue-9798/example3: this case shows a broken linked file is auto-linked with files searched via the broken linked file names

The recording of the three manual tests is here:

Screen.Recording.2025-12-24.at.23.39.06.mov

Mandatory checks

  • I own the copyright of the code submitted and I license it under the MIT license
  • I manually tested my changes in running JabRef (always required)
  • I added JUnit tests for changes (if applicable)
  • I added screenshots in the PR description (if change is visible to the user)
  • I described the change in CHANGELOG.md in a way that is understandable for the average user (if change is visible to the user)
  • I checked the user documentation: The information is availabe at doc - auto-linking-files but not up to date. A related issue user-documentation/issues/369 to improve auto-link documentation has already been created earlier. The detail about finding associated files by broken linked file names should be added there.

PR Type

Enhancement, Bug fix


Description

  • Auto-link broken linked files to their actual locations when files are moved

  • Implement two-step auto-linking: first by citation key, then by broken file names

  • Only auto-link files with matching extensions to prevent incorrect associations

  • Add comprehensive unit tests covering citation key and file name matching scenarios


Diagram Walkthrough

flowchart LR
  A["BibEntry with<br/>broken linked files"] -->|Step 1: Match by<br/>citation key| B["Find associated<br/>files"]
  B -->|Auto-link if<br/>extension matches| C["Update broken<br/>linked files"]
  C -->|Step 2: Match by<br/>broken file names| D["Find remaining<br/>associated files"]
  D -->|Auto-link if<br/>extension matches| E["Final linked<br/>files list"]
  E -->|Add unlinked<br/>files| F["Complete file<br/>list"]
Loading

File Walkthrough

Relevant files
Dependencies
1 files
module-info.java
Add Apache Commons Lang3 dependency                                           
+1/-0     
Enhancement
4 files
AutoLinkFilesAction.java
Update callback to handle multiple linked files                   
+20/-7   
AutoSetFileLinksUtil.java
Implement two-step auto-linking with file extension validation
+188/-74
BrokenLinkedFileNameBasedFileFinder.java
New file finder based on broken linked file names               
+60/-0   
FileFinders.java
Add factory method for broken linked file finder                 
+7/-0     
Tests
3 files
AutoSetFileLinksUtilTest.java
Add comprehensive unit tests for auto-linking functionality
+746/-6 
issue-9798_2.bib
Add manual test case for multiple entries                               
+13/-0   
issue-9798_3.bib
Add manual test case for broken file name matching             
+8/-0     
Documentation
2 files
CHANGELOG.md
Document auto-link fix for moved files                                     
+1/-0     
JabRef_en.properties
Add localization string for affected entries                         
+1/-0     
Additional files
1 files
issue-9798_1.bib [link]   

@github-actions github-actions Bot added good second issue Issues that involve a tour of two or three interweaved components in JabRef status: changes-required Pull requests that are not yet complete labels Dec 24, 2025
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.LinkedFile;

import org.jetbrains.annotations.NotNull;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use jspecify nonnull

@LangInteger LangInteger changed the title implement auto-relink implement auto-link Dec 24, 2025
// Wait because there are several rounds in one auto-link operation
// The later round depends on the updated bibEntry of the previous round
UiTaskExecutor.runAndWaitInJavaFXThread(() -> entry.setFiles(newLinkedFiles));
};

@LangInteger LangInteger Dec 25, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original onLinkedFile is not reasonable, considering that we not only add new linked files, but also update old ones, and multiple linked files may be updated at the same time. So it is changed to onLinkedFilesUpdated.

Also, onLinkedFile creates the undoable object by assigning the newVal using the only passed-in linked file, which does not make sense.

Comment thread jabgui/src/main/java/org/jabref/gui/externalfiles/AutoLinkFilesAction.java Outdated
Comment thread jabgui/src/main/java/org/jabref/gui/externalfiles/AutoSetFileLinksUtil.java Outdated
@github-actions github-actions Bot removed the status: changes-required Pull requests that are not yet complete label Dec 25, 2025
* one trick: we accumulate Part B after Step 1, so Part A does not affect those broken linked files we use
* to query Part B
*/
private void doLinkAssociateFiles(BibEntry entry, BiConsumer<List<LinkedFile>, BibEntry> onAddLinkedFile, LinkFilesResult result) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function, I have to call onAddLinkedFile.accept multiple times (with UiTaskExecutor.runAndWaitInJavaFXThread) because the updated BibEntry after the first auto-link step is needed for the second auto-link step.

This also affects the Undo Manager behavior that multiple undo actions may be added in the auto-link process for one entry.

Probably a better way is to isolate the BibEntry from the code here with another layer of POJO to do non-UI operations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not take the high-level AI-suggestion to update the code here.

Comment on lines 106 to +227
public Collection<LinkedFile> findAssociatedNotLinkedFiles(BibEntry entry) throws IOException {
List<LinkedFile> associatedNotLinkedFiles = new ArrayList<>();

List<String> extensions = externalApplicationsPreferences.getExternalFileTypes().stream().map(ExternalFileType::getExtension).toList();

LOGGER.debug("Searching for extensions {} in directories {}", extensions, directories);

// Run the search operation
FileFinder fileFinder = FileFinders.constructFromConfiguration(autoLinkPreferences);
List<Path> result = new ArrayList<>(fileFinder.findAssociatedFiles(entry, directories, extensions));
result.addAll(findAssociatedFilesByBrokenLinkedFile(entry));

// Collect the found files that are not yet linked
List<Path> linkedFiles = entry.getFiles().stream()
.map(file -> file.findIn(directories))
.filter(Optional::isPresent)
.map(Optional::get)
.toList();
for (Path foundFile : result) {
boolean fileAlreadyLinked = linkedFiles.stream()
.anyMatch(linked -> {
try {
return Files.isSameFile(linked, foundFile);
} catch (IOException e) {
LOGGER.debug("Unable to check file identity, assuming no identity", e);
return false;
}
});

if (!fileAlreadyLinked) {
Optional<ExternalFileType> type = FileUtil.getFileExtension(foundFile)
.map(extension -> ExternalFileTypes.getExternalFileTypeByExt(extension, externalApplicationsPreferences))
.orElse(Optional.of(new UnknownExternalFileType("")));

String strType = type.map(ExternalFileType::getName).orElse("");
Path relativeFilePath = FileUtil.relativize(foundFile, directories);
LinkedFile linkedFile = new LinkedFile("", relativeFilePath, strType);
associatedNotLinkedFiles.add(linkedFile);
LOGGER.debug("Found file {} for entry {}", linkedFile, entry.getCitationKey());
}
}
// Find the associated files
List<String> extensions = getConfiguredExtensions();
LOGGER.debug("Searching for associated not linked files with extensions {} in directories {}", extensions, directories);
return Stream.concat(
findAssociatedNotLinkedFilesWithFinder(entry, preConfiguredFileFinder, extensions).stream(),
findAssociatedNotLinkedFilesWithFinder(entry, brokenLinkedFileNameBasedFileFinder, extensions).stream())
.distinct()
.toList();
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish it were much clearer by writing code like this to show that we have two sources of associated files.
And the new BrokenLinkedFileNameBasedFileFinder.java also helped this purpose.

Comment on lines +189 to +191
//******************************************************************************************************
//*********************************** Test linkAssociatedFiles *****************************************
//******************************************************************************************************

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following test cases are grouped by commets like this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use nested tests to structure your tests more easily.
https://docs.junit.org/6.0.1/writing-tests/nested-tests.html

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I’ve already applied @Nested; with code folding, the test structure is clean and clear.

Comment on lines +45 to +46
// 1. it has the same name (case-insensitive) as one of the broken linked file names
// 2. its file extension matches one specified in extensions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we do not consider a strict file extension match as:

  • Entry Editor - Automatically search and show unlinked files in the entity editor may want to show those file with same name but different extension as possible candidate
  • Quality - Automatically set file links will make sure only file with the same name and extention is auto linked. The logic is in autoLinkBrokenLinkedFiles

@LangInteger LangInteger marked this pull request as ready for review December 25, 2025 06:58
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Dec 25, 2025

Copy link
Copy Markdown
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Unbounded directory traversal

Description: The code uses Files.walk(directory) without depth limit, which could lead to denial of
service through directory traversal attacks if an attacker can control the directory
structure with deeply nested folders or symbolic links creating cycles.
AutoSetFileLinksUtil.java [243-248]

Referred Code
    return associatedFiles
            .stream()
            .filter(associatedFile -> !isFileAlreadyLinked(associatedFile, linkedFiles))
            .map(this::buildLinkedFileFromPath)
            .toList();
}
Ticket Compliance
🟢
🎫 #9798
🟢 When a user moves a file in the file directory without JabRef, the file should be
automatically relinked instead of being re-imported
The functionality should be in "Automatically set file links" (Quality -> Automatically
set file links)
If a file was moved and it is absolutely sure (i.e., no two files with the same name), the
link should be updated
If a file exists multiple times, no rewrite should be done
The system should recognize that a linked file is missing, find the moved file, and
rewrite the link
High testing effort is required with multiple test cases for different scenarios
🟡
🎫 #13109
🔴 Add Pseudonymization functionality to CLI
Implementation should be similar to the consistency check CLI experience
Reference the class `org.jabref.cli.CheckConsistency` for implementation guidance
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Generic variable names: Variables like files, result, updated lack descriptive context about what they represent
in the auto-linking process

Referred Code
    Map<String, LinkedFile> files;
    try {
        files = findAssociatedNotLinkedFilesWithUniqueName(entry, finder);
    } catch (IOException e) {
        result.addFileException(e);
        LOGGER.error("Problem finding files", e);
        files = Map.of();
    }
    return files;
}

private Map<String, LinkedFile> findAssociatedNotLinkedFilesWithUniqueName(BibEntry entry, FileFinder finder) throws IOException {
    Collection<LinkedFile> files = findAssociatedNotLinkedFilesWithFinder(entry, finder, getConfiguredExtensions());

    Set<String> toBeRemoved = new HashSet<>();
    Map<String, LinkedFile> result = new HashMap<>();

    files.forEach(file -> {
        String fileName = FileUtil.getBaseName(file.getLink());
        if (!result.containsKey(fileName)) {
            result.put(fileName, file);


 ... (clipped 8 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Silent error handling: IOException is caught and logged but processing continues without informing the caller
about partial failures in file association

Referred Code
try {
    files = findAssociatedNotLinkedFilesWithUniqueName(entry, finder);
} catch (IOException e) {
    result.addFileException(e);
    LOGGER.error("Problem finding files", e);
    files = Map.of();
}
return files;

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Path traversal risk: File paths from BibEntry are used in file system operations without apparent validation
against directory traversal attacks

Referred Code
private static @NonNull Set<String> getBrokenLinkedFileNames(@NonNull BibEntry entry, @NonNull List<Path> directories) {
    return entry.getFiles().stream()
                .filter(linkedFile -> linkedFile.findIn(directories).isEmpty())
                .map(linkedFile -> FileUtil.getBaseName(linkedFile.getLink()).toLowerCase())
                .collect(Collectors.toSet());
}

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@subhramit subhramit added the status: awaiting-second-review For non-trivial changes label Dec 28, 2025
@subhramit subhramit requested a review from koppor December 28, 2025 21:24
koppor
koppor previously requested changes Dec 28, 2025

@koppor koppor left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many file movals at test-support/src/manual-tests/ - I would suggest to add a README.md to the directory of how to test.

The "best" software engineering update would be to create a section in docs/requirements/files.md for this automatic re-link - and link the implementation and test case(s)

Comment on lines +90 to +108
/**
* Source of associated not linked files:
* Part A. match file name with CitationKey, (START, EXACT, REGEX) configured by user
* Part B. match file name with broken linked file names, currently silently happen
*
* The auto-link process:
* Prolog: we only consider the file with a unique name
* if a file's name is found multiple times in Part A, we do not consider it in Step 1
* if a file's name is found multiple times in Part B, we do not consider it in Step 2
* Step 1. try to auto link each broken linked file with Part A at first. For unlinked files left in Part A, if no
* linked file with same name exists, add them
* why `add`: Part A are found mainly by CitationKey, which has strong connection to the entry, so we are
* confident that we should add them automatically
* Step 2. try to auto link each broken linked file with Part B.
* how about `unlinked files left in Part B`: there should be no files left as they are found based on broken
* linked files and are used to fix them. `files left` hints a bug
* one trick: we accumulate Part B after Step 1, so Part A does not affect those broken linked files we use
* to query Part B
*/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use markdown JavaDoc to keep formatting

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Used markdown JavaDoc instead

dialogService.notify(String.format("%s %s\n%s",
Localization.lang("Finished automatically setting external links."),
Localization.lang("Changed %0 entries.", result.getChangedEntries().size()),
Localization.lang("Affected entries: %0", result.getChangedEntries().stream()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please recheck our localiaztion howto - use entry(s) because there could be one, two, more, much more.

Also for the string the line above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

*/
private void doLinkAssociatedFiles(BibEntry entry, BiConsumer<List<LinkedFile>, BibEntry> onAddLinkedFile, LinkFilesResult result) {
boolean entryUpdated = false;
// Step 1: try matched files based on CitationKey configured by user

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line before to separate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line added

// It is a bug if there are files left
if (!files.isEmpty()) {
LOGGER.error("Cannot auto-link all the files found based on broken linked file names. Files left: {} ",
files.keySet().stream().map(files::get).map(LinkedFile::getLink).collect(Collectors.joining("||")));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange format, use

["{filename}", "{filename}", "{filename}"]

Easily can be achived with joining.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line has been removed. I confirmed that there are normal cases where some associated files may not be used to do auto-link.

@github-actions github-actions Bot added status: changes-required Pull requests that are not yet complete and removed status: awaiting-second-review For non-trivial changes labels Dec 28, 2025
@LangInteger

LangInteger commented Dec 29, 2025

Copy link
Copy Markdown
Contributor Author

Many file movals at test-support/src/manual-tests/ - I would suggest to add a README.md to the directory of how to test.

The "best" software engineering update would be to create a section in docs/requirements/files.md for this automatic re-link - and link the implementation and test case(s)

@koppor
I have added:

  • part 1: a readme file inside test-support/src/manual-tests/ to explain how to run the manual test cases
  • part 2: a section in docs/requirements/files.md

I have two questions for part 2:

Q1: I saw some requirements that end with the line Needs: impl (even the feature has been implemented). This is not necessary, right?
Q2: I added my req-id to both the code implements it with /// [impl->req~logic.externalfiles.file-transfer.auto-link~1] and the code test it with /// Tests req~logic.externalfiles.file-transfer.auto-link~1. I can see the req-id can be used to find the corresponding code. Is that the only purpose?

I have read through https://devdocs.jabref.org/requirements/

@github-actions github-actions Bot removed the status: changes-required Pull requests that are not yet complete label Dec 29, 2025

@subhramit subhramit left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calixtus calixtus enabled auto-merge December 31, 2025 22:29
@calixtus calixtus added this pull request to the merge queue Dec 31, 2025
Merged via the queue into JabRef:main with commit b2aea81 Dec 31, 2025
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: external-files good second issue Issues that involve a tour of two or three interweaved components in JabRef 📌 Pinned Review effort 4/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

If user moved file, it should simply be relinked - [Find Unlinked Files part]

4 participants