Improve visual context OCR quality#482
Merged
Merged
Conversation
SwiftLint (--strict) flagged generateContext at cyclomatic complexity 11 (limit 10). Extract the summarizer-vs-OCR fallback branch into resolvedContextText(...), a behavior-preserving helper that drops the function to ~8 and reads more clearly. XcodeGen drift: the project.pbxproj had hand-written synthetic object IDs (AA1000...) for the three new files instead of being regenerated. Ran 'xcodegen generate' so the committed project matches project.yml (hash-based IDs); diff vs main is now only the three new file references. Sparkle stays pinned exactVersion 2.9.1.
sanitizeOCR's noise filter only treated a token as real when it had an ASCII vowel or matched the English word lists, so CJK, Cyrillic, Greek, Arabic, Hebrew, Thai, and accented-Latin tokens were stripped to nothing. Non-English users would get empty visual context and fully generic suggestions. assessOCRToken now keeps any token carrying a non-ASCII letter as strong signal, after numbers and repeated-glyph junk are rejected so this is not a backdoor for ASCII garbage. The Latin-only tail moved into assessLatinToken to keep cyclomatic complexity under the limit; ASCII behavior is unchanged (a count<=2 token can never be repeated-glyph junk, so the reordering is a no-op for it). Adds regression tests for CJK/Cyrillic/accented-Latin preservation and for ASCII noise still being dropped on a mixed line.
…ality # Conflicts: # Cotabby.xcodeproj/project.pbxproj
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Improve the screenshot/OCR visual-context pipeline so Cotabby sends richer, sanitized prompt context instead of dropping context when summarization fails. The change keeps Screen Recording required, expands OCR capture quality and budgets, adds deterministic OCR-noise filtering, and replaces the generic summary prompt with an autocomplete-focused context extraction prompt.
Validation
plutil -lint Cotabby.xcodeproj/project.pbxprojCotabby.xcodeproj/project.pbxproj: OKxcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination platform=macOS build-for-testing -derivedDataPath build/DerivedData** TEST BUILD SUCCEEDED **xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination platform=macOS test -only-testing:CotabbyTests/PromptContextSanitizerTests -only-testing:CotabbyTests/ScreenshotContextGeneratorTests -only-testing:CotabbyTests/VisualContextSummaryPromptRendererTests -derivedDataPath build/DerivedDataCotabbyTests.xctestbundle because the test bundle and host app have different Team IDs. This matches the known local signing failure for app-hosted tests.rm -rf build/DerivedDataLinked issues
None.
Risk / rollout notes
Greptile Summary
This PR overhauls the visual-context OCR pipeline to deliver richer, sanitized prompt context for autocomplete instead of silently dropping context when summarization fails. Capture budgets are expanded (larger screenshot dimensions, 5 000-char OCR cap, 6 s timeout, Vision accurate mode), and a new deterministic token-scoring pass in
PromptContextSanitizerreplaces the previous heuristic with layered rules that keep prose, technical tokens, emails, file paths, and non-Latin scripts while dropping OCR garbage.ScreenshotContextGenerator.resolvedContextTextnow falls back to bounded, sanitized OCR text whenever summarization fails or produces an empty result, rather than propagating the error and losing all visual context.WindowScreenshotCapturingandScreenTextExtractingprotocol seams letScreenshotContextGeneratorbe exercised with stub implementations, enabling the newScreenshotContextGeneratorTeststo cover every key path without requiring live capture or a GGUF model.VisualContextSummaryPromptRendererextracts the summarization prompt into a dedicated, testable type with explicit priority ordering and prompt-injection rejection instructions.Confidence Score: 5/5
Safe to merge; the pipeline changes are well-contained, all fallback paths return bounded sanitized text rather than empty context, and new protocol seams enable meaningful unit coverage.
The OCR fallback logic and token-scoring rules are the most complex additions, and both are exercised by new unit tests. The summarizer now propagates errors instead of swallowing them, but resolvedContextText catches every error and falls back gracefully, so no regression path was opened. Expanded capture budgets increase latency but are isolated to the visual-context path and explicitly documented.
PromptContextSanitizer.swift — the isLikelyShortMixedCaseNoise heuristic (already flagged in a previous review round) still false-positives on lowercase-initial brand names common on macOS screenshots (iPhone, macOS, iCloud, UIKit). No other files require special attention.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[FocusedInputSnapshot] --> B[captureScreenshot WindowScreenshotCapturing] B --> C{OCR extractText} C -- noRecognizedText --> D{windowTitle available?} D -- no --> ERR1[throw unavailable] D -- yes --> E[normalizeRecognizedText windowTitle] E --> F{hasMeaningfulSignal?} F -- no --> ERR1 F -- yes --> RETURN1[return VisualContextExcerpt source ocr_fallback] C -- success --> G[normalizeRecognizedText sanitizeOCR cap 5000 chars] G --> H{hasMeaningfulSignal normalizedText?} H -- no --> ERR2[throw unavailable] H -- yes --> I[resolvedContextText] I --> J{summarizer configured?} J -- no --> K[ocrFallback boundedSummaryText] J -- yes --> L[summarizer.summarize VisualContextSummaryPromptRenderer 6s timeout 160 tokens] L -- throws --> K L -- empty result --> K L -- success --> M[boundedSummaryText cap 1500 chars] M --> N{hasMeaningfulSignal summary?} N -- no --> K N -- yes --> RETURN2[return VisualContextExcerpt source summary] K --> O{hasMeaningfulSignal fallback?} O -- no --> ERR2 O -- yes --> RETURN3[return VisualContextExcerpt source ocr_fallback]Comments Outside Diff (2)
Cotabby/Support/PromptContextSanitizer.swift, line 614-633 (link)isLikelyShortMixedCaseNoisefalse-positives for lowercase-initial camelCase brand namesThe function is designed to flag OCR garbage like
"gLVWrt"but its final return condition —uppercaseCount >= 2 || !firstCharacterIsUppercase— also catches legitimate tokens where the first letter is lowercase but the token contains any uppercase. Concretely:"iPhone"(letters.first=i, uppercaseCount=1 →!firstCharacterIsUppercase=true→ classified as noise),"macOS"(letters.first=m, uppercaseCount=2 → both clauses true → noise), and"iCloud","iPad","eBay"all follow the same path. None of these appear inknownWordSignalsorpreservedTechnicalTokens, so they reachisLikelyShortMixedCaseNoiseand are silently dropped. On a macOS screenshot tool, references to Apple product names are common OCR output and their removal degrades autocomplete context.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Cotabby/Support/VisualContextSummaryPromptRenderer.swift, line 660-661 (link)sanitizeOCRcall on already-filtered textVisualContextSummaryPromptRenderer.promptis called fromLlamaVisualContextSummarizer.summarize, which receives text that was already processed bynormalizeRecognizedText(i.e.PromptContextSanitizer.sanitizeOCRwith a 5 000-character cap). The secondsanitizeOCRcall here without a character limit runs the full line-scoring logic on text that is already clean, making it a no-op in the production path. The extra pass is harmless and idempotent, but it creates a cost asymmetry: in production the sanitizer runs twice while tests that callpromptdirectly sanitize exactly once. Consider removing the internal call and documenting that callers are expected to pre-sanitize, or keep it but note the redundancy so future callers know the contract.Reviews (2): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile