Summary
apr validate --quality (and the documented example apr validate model.apr --quality --strict in README.md:114) reports 3/100 Grade F → ValidationFailed exit 5 on a model that apr qa says ✓ ALL GATES PASSED. The root cause is that 22 of 25 quality checks are placeholders marked Pending — Not implemented, and the threshold logic still gates on total_score < 50 (validate.rs:96).
Reproducer
$ apr validate /home/noah/models/qwen2.5-coder-1.5b-instruct-q4k.apr --quality --strict
... (3 checks PASS, 22 marked "Pending — Not implemented")
Warning: --strict is not yet implemented for APR validation summary. Flag ignored.
✓ VALID 3/100 points
╭──────────────────────────────────┬───────┬────────────────────────╮
│ A. Format & Structural Integrity │ 3/25 │ ██░░░░░░░░░░░░░░░░░░ │
│ B. Tensor Physics & Statistics │ 0/25 │ ░░░░░░░░░░░░░░░░░░░░ │
│ C. Tooling & Operations │ 0/25 │ ░░░░░░░░░░░░░░░░░░░░ │
│ D. Conversion & Interoperability │ 0/25 │ ░░░░░░░░░░░░░░░░░░░░ │
╰──────────────────────────────────┴───────┴────────────────────────╯
TOTAL: 3/100 Grade: F
error: Validation failed: Score 3/100 (below 50% threshold)
exit=5
Inconsistency with apr qa
The same model in the same session:
$ apr qa /home/noah/models/qwen2.5-coder-1.5b-instruct-q4k.apr
✓ PASS Capability Match
✓ PASS Tensor Contract (339 tensors)
✓ PASS Metadata Plausibility
✓ PASS Golden Output (2 cases)
✓ PASS Throughput 17.14 tok/s
✓ PASS Perf Regression
✓ ALL GATES PASSED
And apr run on the same model returns "2 + 2 equals 4." via apr serve run + curl /v1/chat/completions. The model is unambiguously good. The validator is broken.
Root causes
- 22 stubbed-out checks (
validate.rs). Categories B (Tensor Physics), C (Tooling), and D (Conversion) score 0/25 each because none of their checks have been implemented; they all return SKIP — Not implemented.
- Threshold gate fires on the stub (
validate.rs:96):
if !skip_contract && report.total_score < 50 {
return Err(CliError::ValidationFailed(format!(
"Score {}/100 (below 50% threshold)", report.total_score
)));
}
No APR model can ever score >= 50 until checks B/C/D are filled in.
--strict flag is documented but ignored (validate.rs:294,454). README.md:114 advertises it; the runtime emits a warning and proceeds.
README impact
README.md:114 (and any user copy-pasting the example) gets a confusing failure on first contact:
# Inspect
apr inspect model.gguf
apr validate model.apr --quality --strict # ← always fails on valid models
apr tensors model.gguf | head -20
Suggested fix
Two options, both ship-acceptable:
Option A (one-line, immediate): gate the 50% threshold on implemented_checks >= N. While any category is fully Pending, treat --quality as informational, not a fail. Document that apr qa is the canonical pass/fail gate (this is already true per CLAUDE.md "Use apr Tools First (MANDATORY) — Step 1: ALWAYS start here (catches 80% of issues): apr qa model.apr").
Option B (proper): implement category B/C/D checks. There's enough infrastructure (apr tensors, apr trace, apr inspect, apr export) to wire them up:
- B (Tensor Physics): mean/std/NaN/Inf/dead-channel checks per tensor → 5 checks
- C (Tooling): does
apr inspect, apr trace, apr tensors, apr explain succeed on the file? → 5 checks
- D (Conversion): does the model round-trip APR → GGUF → APR? → 5 checks (currently blocked by #1865 (apr export panic), so this also pulls in that fix).
Severity
P1 — release blocker for v0.35.0 because the README documents the failing usage. Filed alongside #1864 and #1865 during v0.35.0 release dogfood.
Artifacts
- Host: noah-Lambda-Vector
- Model: /home/noah/models/qwen2.5-coder-1.5b-instruct-q4k.apr (339 tensors, 28 layers, qwen2)
- Build: HEAD = 0d8d52b (release/v0.35.0 worktree)
- Surfaced by: v0.35.0 release dogfood, 2026-05-22
Summary
apr validate --quality(and the documented exampleapr validate model.apr --quality --strictin README.md:114) reports 3/100 Grade F → ValidationFailed exit 5 on a model thatapr qasays ✓ ALL GATES PASSED. The root cause is that 22 of 25 quality checks are placeholders markedPending — Not implemented, and the threshold logic still gates ontotal_score < 50(validate.rs:96).Reproducer
Inconsistency with
apr qaThe same model in the same session:
And
apr runon the same model returns "2 + 2 equals 4." viaapr serve run + curl /v1/chat/completions. The model is unambiguously good. The validator is broken.Root causes
validate.rs). Categories B (Tensor Physics), C (Tooling), and D (Conversion) score 0/25 each because none of their checks have been implemented; they all returnSKIP — Not implemented.validate.rs:96):--strictflag is documented but ignored (validate.rs:294,454). README.md:114 advertises it; the runtime emits a warning and proceeds.README impact
README.md:114(and any user copy-pasting the example) gets a confusing failure on first contact:Suggested fix
Two options, both ship-acceptable:
Option A (one-line, immediate): gate the 50% threshold on
implemented_checks >= N. While any category is fullyPending, treat--qualityas informational, not a fail. Document thatapr qais the canonical pass/fail gate (this is already true per CLAUDE.md "Use apr Tools First (MANDATORY) — Step 1: ALWAYS start here (catches 80% of issues):apr qa model.apr").Option B (proper): implement category B/C/D checks. There's enough infrastructure (
apr tensors,apr trace,apr inspect,apr export) to wire them up:apr inspect,apr trace,apr tensors,apr explainsucceed on the file? → 5 checksSeverity
P1 — release blocker for v0.35.0 because the README documents the failing usage. Filed alongside #1864 and #1865 during v0.35.0 release dogfood.
Artifacts