Skip to content

apr validate --quality: 22/25 checks 'Pending — Not implemented' → working models score 3/100, exit 5 #1866

@noahgift

Description

@noahgift

Summary

apr validate --quality (and the documented example apr validate model.apr --quality --strict in README.md:114) reports 3/100 Grade F → ValidationFailed exit 5 on a model that apr qa says ✓ ALL GATES PASSED. The root cause is that 22 of 25 quality checks are placeholders marked Pending — Not implemented, and the threshold logic still gates on total_score < 50 (validate.rs:96).

Reproducer

$ apr validate /home/noah/models/qwen2.5-coder-1.5b-instruct-q4k.apr --quality --strict
... (3 checks PASS, 22 marked "Pending — Not implemented")
Warning: --strict is not yet implemented for APR validation summary. Flag ignored.

  ✓ VALID 3/100 points

╭──────────────────────────────────┬───────┬────────────────────────╮
│ A. Format & Structural Integrity │  3/25 │ ██░░░░░░░░░░░░░░░░░░ │
│ B. Tensor Physics & Statistics   │  0/25 │ ░░░░░░░░░░░░░░░░░░░░ │
│ C. Tooling & Operations          │  0/25 │ ░░░░░░░░░░░░░░░░░░░░ │
│ D. Conversion & Interoperability │  0/25 │ ░░░░░░░░░░░░░░░░░░░░ │
╰──────────────────────────────────┴───────┴────────────────────────╯
  TOTAL: 3/100  Grade: F
error: Validation failed: Score 3/100 (below 50% threshold)
exit=5

Inconsistency with apr qa

The same model in the same session:

$ apr qa /home/noah/models/qwen2.5-coder-1.5b-instruct-q4k.apr
✓ PASS Capability Match
✓ PASS Tensor Contract  (339 tensors)
✓ PASS Metadata Plausibility
✓ PASS Golden Output (2 cases)
✓ PASS Throughput  17.14 tok/s
✓ PASS Perf Regression
✓ ALL GATES PASSED

And apr run on the same model returns "2 + 2 equals 4." via apr serve run + curl /v1/chat/completions. The model is unambiguously good. The validator is broken.

Root causes

  1. 22 stubbed-out checks (validate.rs). Categories B (Tensor Physics), C (Tooling), and D (Conversion) score 0/25 each because none of their checks have been implemented; they all return SKIP — Not implemented.
  2. Threshold gate fires on the stub (validate.rs:96):
    if !skip_contract && report.total_score < 50 {
        return Err(CliError::ValidationFailed(format!(
            "Score {}/100 (below 50% threshold)", report.total_score
        )));
    }
    No APR model can ever score >= 50 until checks B/C/D are filled in.
  3. --strict flag is documented but ignored (validate.rs:294,454). README.md:114 advertises it; the runtime emits a warning and proceeds.

README impact

README.md:114 (and any user copy-pasting the example) gets a confusing failure on first contact:

# Inspect
apr inspect model.gguf
apr validate model.apr --quality --strict      # ← always fails on valid models
apr tensors model.gguf | head -20

Suggested fix

Two options, both ship-acceptable:

Option A (one-line, immediate): gate the 50% threshold on implemented_checks >= N. While any category is fully Pending, treat --quality as informational, not a fail. Document that apr qa is the canonical pass/fail gate (this is already true per CLAUDE.md "Use apr Tools First (MANDATORY) — Step 1: ALWAYS start here (catches 80% of issues): apr qa model.apr").

Option B (proper): implement category B/C/D checks. There's enough infrastructure (apr tensors, apr trace, apr inspect, apr export) to wire them up:

  • B (Tensor Physics): mean/std/NaN/Inf/dead-channel checks per tensor → 5 checks
  • C (Tooling): does apr inspect, apr trace, apr tensors, apr explain succeed on the file? → 5 checks
  • D (Conversion): does the model round-trip APR → GGUF → APR? → 5 checks (currently blocked by #1865 (apr export panic), so this also pulls in that fix).

Severity

P1 — release blocker for v0.35.0 because the README documents the failing usage. Filed alongside #1864 and #1865 during v0.35.0 release dogfood.

Artifacts

  • Host: noah-Lambda-Vector
  • Model: /home/noah/models/qwen2.5-coder-1.5b-instruct-q4k.apr (339 tensors, 28 layers, qwen2)
  • Build: HEAD = 0d8d52b (release/v0.35.0 worktree)
  • Surfaced by: v0.35.0 release dogfood, 2026-05-22

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions