fix(ocr): handle empty language probabilities for numeric input#1004
Merged
fix(ocr): handle empty language probabilities for numeric input#1004
Conversation
When performing OCR on text containing only numbers, the AppleLanguageDetector returns an empty `rawProbabilities` dictionary. This unhandled case caused the AppleOCREngine to fail, resulting in an empty recognition result. This patch addresses the issue by updating the `smartMerging` condition in the OCR engine to be true if `rawProbabilities` is empty. This ensures that purely numeric text is processed correctly. Additionally, a comment has been added to `AppleLanguageDetector` to clarify that `rawProbabilities` can be empty in edge cases like this. Closes: #1001
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR fixes an issue where the AppleOCREngine would fail when processing text containing only numbers due to empty language probabilities returned by AppleLanguageDetector.
- Updated the smartMerging condition to handle empty rawProbabilities
- Added documentation to clarify that rawProbabilities can be empty in edge cases
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| AppleOCREngine.swift | Added condition to handle empty rawProbabilities in smartMerging logic |
| AppleLanguageDetector.swift | Added documentation comments explaining when rawProbabilities can be empty |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Easydict/Swift/Service/Apple/AppleLanguageDetector/AppleLanguageDetector.swift
Outdated
Show resolved
Hide resolved
…geDetector.swift Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Adds two new OCR test cases (`enNumber729`, `enNumberPi`) that use images containing only numbers. These tests verify the fix for handling purely numeric input, ensuring that the OCR engine correctly processes numbers without failing due to empty language probabilities. Additionally, this commit refactors `SystemUtilitiesTests` to use the `SystemUtility.shared` singleton instead of a global function, improving code consistency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When performing OCR on text containing only numbers, the AppleLanguageDetector returns an empty
rawProbabilitiesdictionary. This unhandled case caused the AppleOCREngine to fail, resulting in an empty recognition result.This patch addresses the issue by updating the
smartMergingcondition in the OCR engine to be true ifrawProbabilitiesis empty. This ensures that purely numeric text is processed correctly.Additionally, a comment has been added to
AppleLanguageDetectorto clarify thatrawProbabilitiescan be empty in edge cases like this.Closes: #1001