Skip to content

fix(ocr): handle empty language probabilities for numeric input#1004

Merged
AkaShark merged 3 commits intodevfrom
fix-ocr-numbers
Oct 3, 2025
Merged

fix(ocr): handle empty language probabilities for numeric input#1004
AkaShark merged 3 commits intodevfrom
fix-ocr-numbers

Conversation

@tisfeng
Copy link
Copy Markdown
Owner

@tisfeng tisfeng commented Oct 3, 2025

When performing OCR on text containing only numbers, the AppleLanguageDetector returns an empty rawProbabilities dictionary. This unhandled case caused the AppleOCREngine to fail, resulting in an empty recognition result.

This patch addresses the issue by updating the smartMerging condition in the OCR engine to be true if rawProbabilities is empty. This ensures that purely numeric text is processed correctly.

Additionally, a comment has been added to AppleLanguageDetector to clarify that rawProbabilities can be empty in edge cases like this.

Closes: #1001

When performing OCR on text containing only numbers, the AppleLanguageDetector returns an empty `rawProbabilities` dictionary. This unhandled case caused the AppleOCREngine to fail, resulting in an empty recognition result.

This patch addresses the issue by updating the `smartMerging` condition in the OCR engine to be true if `rawProbabilities` is empty. This ensures that purely numeric text is processed correctly.

Additionally, a comment has been added to `AppleLanguageDetector` to clarify that `rawProbabilities` can be empty in edge cases like this.

Closes: #1001
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where the AppleOCREngine would fail when processing text containing only numbers due to empty language probabilities returned by AppleLanguageDetector.

  • Updated the smartMerging condition to handle empty rawProbabilities
  • Added documentation to clarify that rawProbabilities can be empty in edge cases

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
AppleOCREngine.swift Added condition to handle empty rawProbabilities in smartMerging logic
AppleLanguageDetector.swift Added documentation comments explaining when rawProbabilities can be empty

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

tisfeng and others added 2 commits October 3, 2025 16:16
…geDetector.swift

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Adds two new OCR test cases (`enNumber729`, `enNumberPi`) that use images containing only numbers. These tests verify the fix for handling purely numeric input, ensuring that the OCR engine correctly processes numbers without failing due to empty language probabilities.

Additionally, this commit refactors `SystemUtilitiesTests` to use the `SystemUtility.shared` singleton instead of a global function, improving code consistency.
Copy link
Copy Markdown
Collaborator

@AkaShark AkaShark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AkaShark AkaShark merged commit d98de88 into dev Oct 3, 2025
5 checks passed
@AkaShark AkaShark deleted the fix-ocr-numbers branch October 3, 2025 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐞 反馈问题:ocr 无法识别纯数字

3 participants