fix(apr-qa): strengthen Golden Output gate with statistical gibberish detection by noahgift · Pull Request #1463 · paiml/aprender

noahgift · 2026-05-04T09:18:57Z

Summary

apr qa Golden Output gate had a hardcoded 4-string garbage list (["\u{FFFD}", "[UNK]", "akunji", "olumbia"]) that missed every new gibberish class
Added detect_gibberish with 3 statistical signals (non-ASCII ratio > 60%, 4+ byte fragment repeated 3+ times, U+FFFD density)
Captured Qwen2-0.5B observed gibberish ("udaÅĤo", "ëĸ»", "zwiÄħzku") as drift-prevention test cases

Five Whys (in commit msg `ddb6a1a`)

Why did defective Qwen2-0.5B-Instruct inference ship undetected? apr qa Golden Output PASSED.
Why did Golden Output PASS gibberish? verify_output garbage check used 4 hardcoded strings.
Why hardcoded? Patterns came from one-off past incidents (LAYOUT-002 "olumbia") and were never generalized.
Why never generalized? No statistical sanity signal (non-ASCII ratio, repeated-fragment, FFFD density).
Why dangerous? New gibberish classes sail through — the gate became a falsifying liar.

Honest scope

✅ Strengthens the GATE so future regressions of this class FAIL
✅ 14/14 unit tests pass, including 3 new ones using real captured gibberish
⚠️ Does NOT yet flip the existing 0.5B apr qa run because the gate's internal 512-token generation produces different (apparently sufficiently-ASCII) output than the manual 16-token test that captured the gibberish
⚠️ Underlying Qwen2-0.5B short-prompt inference defect still exists; downstream bisection PR follows now that the gate can no longer hide it

Test plan

`cargo test -p apr-cli --lib --features inference verify_output` — 14/14 pass
`cargo build --release -p apr-cli --features cuda --bin apr` — clean
Pre-commit quality gates green (per local hook)
CI `ci / gate` and `workspace-test` (pending)

🤖 Generated with Claude Code

… detection Root cause (Five Whys): 1. Defective Qwen2-0.5B-Instruct inference (CJK/Polish/diacritic byte fragments like "udaÅĤo", "ëĸ»", "zwiÄħzku") shipped because `apr qa` PASSED. 2. Why did Golden Output PASS gibberish? `verify_output` garbage-pattern check was hardcoded to 4 strings: ["\u{FFFD}", "[UNK]", "akunji", "olumbia"]. 3. Why hardcoded? Patterns came from specific past incidents (LAYOUT-002 "olumbia" garbage) and were never generalized. 4. Why never generalized? No statistical sanity signal — non-ASCII ratio, repeated-fragment detection, U+FFFD density. 5. Why dangerous? Any new gibberish class sails through. The QA gate became a liar — passing models that produce unintelligible non-ASCII output. Fix: add `detect_gibberish` with three statistical signals, ANY trip rejects: - Non-ASCII ratio > 60% over 16+ chars (English/code outputs are ASCII-heavy) - 4+ byte fragment repeated 3+ times consecutively (BPE/loop pathology) - U+FFFD density > 1 per 32 chars (UTF-8 decode failures) Tests added (all passing): - verify_output_rejects_qwen2_05b_observed_gibberish — captured real string - verify_output_rejects_repeated_fragment — "udaÅĤo udaÅĤo udaÅĤo udaÅĤo end" - verify_output_accepts_normal_english — regression guard Caveat: this PR strengthens the GATE, not the underlying inference defect. Qwen2-0.5B short-prompt inference still produces gibberish; downstream PR will bisect that defect now that the gate can no longer hide it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 4, 2026 09:19

noahgift force-pushed the fix/qa-golden-output-gibberish-detection branch from ddb6a1a to b8f5eb7 Compare May 4, 2026 09:51

noahgift merged commit 1e2b116 into main May 4, 2026
10 checks passed

noahgift deleted the fix/qa-golden-output-gibberish-detection branch May 4, 2026 10:09

noahgift mentioned this pull request May 4, 2026

spec(ship-two-models): v2.95.0 — §50 LAYOUT-001/002 in safetensors→APR FFN import #1467

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(apr-qa): strengthen Golden Output gate with statistical gibberish detection#1463

fix(apr-qa): strengthen Golden Output gate with statistical gibberish detection#1463
noahgift merged 1 commit into
mainfrom
fix/qa-golden-output-gibberish-detection

noahgift commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 4, 2026

Summary

Five Whys (in commit msg ddb6a1a)

Honest scope

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Five Whys (in commit msg `ddb6a1a`)