fix(apr-qa): strengthen Golden Output gate with statistical gibberish detection#1463
Merged
Merged
Conversation
… detection
Root cause (Five Whys):
1. Defective Qwen2-0.5B-Instruct inference (CJK/Polish/diacritic byte fragments
like "udaÅĤo", "ëĸ»", "zwiÄħzku") shipped because `apr qa` PASSED.
2. Why did Golden Output PASS gibberish? `verify_output` garbage-pattern check
was hardcoded to 4 strings: ["\u{FFFD}", "[UNK]", "akunji", "olumbia"].
3. Why hardcoded? Patterns came from specific past incidents (LAYOUT-002 "olumbia"
garbage) and were never generalized.
4. Why never generalized? No statistical sanity signal — non-ASCII ratio,
repeated-fragment detection, U+FFFD density.
5. Why dangerous? Any new gibberish class sails through. The QA gate became a
liar — passing models that produce unintelligible non-ASCII output.
Fix: add `detect_gibberish` with three statistical signals, ANY trip rejects:
- Non-ASCII ratio > 60% over 16+ chars (English/code outputs are ASCII-heavy)
- 4+ byte fragment repeated 3+ times consecutively (BPE/loop pathology)
- U+FFFD density > 1 per 32 chars (UTF-8 decode failures)
Tests added (all passing):
- verify_output_rejects_qwen2_05b_observed_gibberish — captured real string
- verify_output_rejects_repeated_fragment — "udaÅĤo udaÅĤo udaÅĤo udaÅĤo end"
- verify_output_accepts_normal_english — regression guard
Caveat: this PR strengthens the GATE, not the underlying inference defect.
Qwen2-0.5B short-prompt inference still produces gibberish; downstream PR
will bisect that defect now that the gate can no longer hide it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ddb6a1a to
b8f5eb7
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
apr qaGolden Output gate had a hardcoded 4-string garbage list (["\u{FFFD}", "[UNK]", "akunji", "olumbia"]) that missed every new gibberish classdetect_gibberishwith 3 statistical signals (non-ASCII ratio > 60%, 4+ byte fragment repeated 3+ times, U+FFFD density)Five Whys (in commit msg ddb6a1a)
apr qaGolden Output PASSED.verify_outputgarbage check used 4 hardcoded strings.Honest scope
apr qarun because the gate's internal 512-token generation produces different (apparently sufficiently-ASCII) output than the manual 16-token test that captured the gibberishTest plan
🤖 Generated with Claude Code