OCR integration#13313
Conversation
Tip for future - always take a fresh pull from upstream/main before beginning to work on a branch (if there has been a decent time gap). |
|
Your code currently does not meet JabRef's code guidelines. We use Checkstyle to identify issues. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Tests / Checkstyle (pull_request)" and click on it. In case of issues with the import order, double check that you activated Auto Import. You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports. Please carefully follow the setup guide for the codestyle. Afterwards, please run checkstyle locally and fix the issues, commit, and push. |
9f91505 to
db1f577
Compare
|
|
||
| this.ocrService = new OcrService(filePreferences); | ||
|
|
||
| // Log if OCR is not available but don't fail |
There was a problem hiding this comment.
The comment simply restates what the code does in the next lines. It doesn't provide additional information about why this approach was chosen or any other valuable context.
| } | ||
| return false; // nothing usable found | ||
| } | ||
|
|
There was a problem hiding this comment.
Catching generic Exception is too broad and masks specific issues. Should catch specific exceptions like IOException or SecurityException for better error handling.
There was a problem hiding this comment.
Yes, trag bot probably selected wrong lines, but overall he's right, the Exception is too generic
| Localization.lang("Failed to initialize AI service. OCR functionality will be unavailable.") + "\n\n" + | ||
| Localization.lang("Error details: ") + e.getMessage()); | ||
| // Set aiService to null - the application can continue without AI features | ||
| JabRefGUI.aiService = null; |
There was a problem hiding this comment.
No, it can't.
Trust me. It must be present and not null.
There was a problem hiding this comment.
I think this is outdated, right?
| dialogService, | ||
| taskExecutor); | ||
| Injector.setModelOrService(AiService.class, aiService); | ||
| } catch (OcrException e) { |
There was a problem hiding this comment.
Can you remove the throw of exception and not use them?
There was a problem hiding this comment.
I think this is outdated, right?
| ); | ||
|
|
||
| // Add OCR menu item for PDF files | ||
| if (linkedFile.getFile().getFileType().equalsIgnoreCase("pdf")) { |
There was a problem hiding this comment.
Try to search a method in class FilesUtil (or FileUtil) like isPDF
| } | ||
|
|
||
| themeWindowManager.setDarkModeForWindowFrame(stage, darkMode); | ||
| // themeWindowManager.setDarkModeForWindowFrame(stage, darkMode); |
There was a problem hiding this comment.
But this doesn't solve the original problem? Shouldn't you made a try/catch, so that other people who use JabRef could still use dark theme, if for some other machines it doesn't work
| NotificationService notificationService, | ||
| TaskExecutor taskExecutor | ||
| ) { | ||
| ) throws OcrException { |
There was a problem hiding this comment.
I think this is outdated, right?
| /** | ||
| * Factory method for creating a success result. | ||
| */ | ||
| static OcrResult success(@NonNull String text) { |
There was a problem hiding this comment.
Remove @NonNull and always assume that the input is not null
| } | ||
| return false; // nothing usable found | ||
| } | ||
|
|
There was a problem hiding this comment.
Yes, trag bot probably selected wrong lines, but overall he's right, the Exception is too generic
| getBoolean(OPEN_FILE_EXPLORER_IN_LAST_USED_DIRECTORY)); | ||
| getBoolean(OPEN_FILE_EXPLORER_IN_LAST_USED_DIRECTORY), | ||
| // TODO: replace placeholder once real source tag / persistence is implemented | ||
| ""); |
There was a problem hiding this comment.
I think you are able to handle this. This is not hard
| // Set the action to execute when clicked | ||
| ocrItem.setOnAction(event -> ocrAction.execute()); | ||
|
|
||
| // Disable if the action is not executable (file doesn't exist) |
There was a problem hiding this comment.
Both comments are redundant and simply describe what the code does without providing additional context or reasoning for the implementation choices.
| /** | ||
| * Factory method for creating a success result with text and output file. | ||
| */ | ||
| static OcrResult success(@NonNull String text, Path outputFile) { |
There was a problem hiding this comment.
The outputFile parameter lacks @nonnull annotation, and the method doesn't validate if the path is null before wrapping it in Optional.of(), which could cause NullPointerException.
| */ | ||
| record Failure(String errorMessage) implements OcrResult { | ||
| public Failure { | ||
| // Provide default message instead of throwing exception |
There was a problem hiding this comment.
The comment merely restates what the code does without providing additional insight or reasoning. Comments should add new information not plainly derived from the code.
| public boolean skipTextPages = true; | ||
| public boolean clean = false; | ||
| public boolean forceOcr = false; | ||
| public String language = "eng"; | ||
| public int optimizeLevel = 1; |
There was a problem hiding this comment.
Fields in OcrMyPdfOptions are declared public, violating encapsulation principles. They should be private with getter/setter methods to maintain proper encapsulation.
| * | ||
| * @return Configuration error message, or empty string if properly configured | ||
| */ | ||
| String getConfigurationError(); |
There was a problem hiding this comment.
Method returns String which could be null. According to modern Java practices and clean architecture, Optional should be used to handle potential empty states.
| * @param method The method to use for creating the searchable PDF | ||
| * @return The result of the operation | ||
| */ | ||
| public OcrResult createSearchablePdf(Path inputPdfPath, Path outputPdfPath, OcrMethod method) { |
There was a problem hiding this comment.
Method parameters lack null validation. All parameters should be validated against null at the start of the method to prevent NullPointerException.
| * Gets the list of default tessdata paths based on the operating system. | ||
| */ | ||
| private List<String> getDefaultTessdataPaths() { | ||
| List<String> paths = new ArrayList<>(); |
There was a problem hiding this comment.
Usage of ArrayList constructor instead of modern Java collections factory methods. Should use List.of() and new methods for building collections.
|
Closing this for housekeeping. Maybe if you or sbdy else is interested in picking this up again, this can be reopened. |
|
hey..I would like to work on this issue . Can I know why it is close without integration ? As far as I could understand the approach had a lot native dependency issues , are you planning on continuing the same path for the feature or can we have a seperate API for OCR integration , like either an integrated CLI or a 3rd party service ? |
|
This is not an issue, but was an unfinished gsoc project. |

Closes #13267.
Mandatory checks
CHANGELOG.mddescribed in a way that is understandable for the average user (if change is visible to the user)