Skip to content

fix(aprender-train): bind quantize_to_gguf_bytes match result so --features hub builds#1432

Merged
noahgift merged 1 commit into
mainfrom
fix/hf-pipeline-gguf-writer-result-binding
May 3, 2026
Merged

fix(aprender-train): bind quantize_to_gguf_bytes match result so --features hub builds#1432
noahgift merged 1 commit into
mainfrom
fix/hf-pipeline-gguf-writer-result-binding

Conversation

@noahgift

@noahgift noahgift commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • One-character fix: let result = match quant { ... }; so the function body actually binds the tuple it computes instead of discarding it via a trailing ;.
  • Without this, --features hub fails to compile with two error[E0425]: cannot find value 'result' in this scope errors at gguf_writer.rs:42-43.
  • --features hub is not exercised by any workflow in .github/workflows/, which is why main CI is green despite this latent defect.

Why this PR

Discovered while attempting to add FALSIFY-APR-DISTILL-TRAIN-003/004 falsifier coverage to hf_pipeline::distillation::DistillationLoss in the previous /loop iteration. The hub-feature build is the only path that exercises the parallel hf_pipeline distillation implementation, so this defect was blocking falsifier-parity work between the canonical distill::loss and the hf_pipeline copy.

Surfaces 11 pre-existing test failures (jidoka, not regressions)

With the build fixed, cargo test --features hub now reports 145 passed / 11 failed. None of the 11 failures are caused by this fix — they're pre-existing bugs that were silently masked by the build error. Two distinct classes:

  1. test_falsify_quantize_empty_data_* (3 tests)contract_pre_quantize! macro asserts input.len() > 0 while the tests assert empty input → empty output. Contract / test drift; needs the contract precondition relaxed or the tests updated to #[should_panic].

  2. test_falsify_*_roundtrip (8 tests) — F32 byte-encoding bug in the GGUF roundtrip path: expected [5.0, 6.0, 7.0, 8.0] vs got [0.0, 5.93e-39, ...]. Looks like a header-alignment / padding miscalculation. Real bug, not a test issue.

Both classes are explicitly out of scope for this PR (separate root causes get separate fixes per Toyota Way).

Test plan

  • cargo build -p aprender-train --lib --features hub → clean
  • cargo test -p aprender-train --lib --features hub → 145/156 passes (pre-existing 11 failures documented above)
  • cargo fmt --all -- --check → no diff in touched file

Out of scope (follow-ups)

  • Fix the empty-data contract drift (3 tests)
  • Fix the F32 GGUF roundtrip byte-encoding bug (8 tests)

🤖 Generated with Claude Code

…atures hub builds

`quantize_to_gguf_bytes` had a trailing semicolon after its `match` expression
that discarded the computed `(Vec<u8>, GgmlType)` tuple, then referenced an
unbound `result` variable in the postcondition macro and tail expression. The
function compiled to two `error[E0425]: cannot find value 'result' in this
scope` errors, breaking the entire `--features hub` build.

The fix is a single-character change: `let result = match quant { ... };`.

Five Whys:
1. Why was the build broken? `quantize_to_gguf_bytes` references undefined
   `result` because the original `let result = ...` binding was lost during
   a refactor and a stray `;` turned the match into a unit-valued statement.
2. Why didn't main CI catch it? `--features hub` is not exercised by any
   workflow in `.github/workflows/` (verified via grep).
3. Why does it matter then? It's a latent defect that blocks any work
   touching the HuggingFace pipeline path — including the previous /loop
   iteration's attempt to add FALSIFY-APR-DISTILL-TRAIN-003/004 falsifier
   coverage to `hf_pipeline::distillation::DistillationLoss`.
4. Why fix it as a standalone PR? Per `feedback_toyota_way_all_defects`
   defects must be visible. Fixing the syntactic bug surfaces 11 OTHER
   pre-existing test failures in the hf_pipeline path that were silently
   masked by the build error. Two of those classes are visible at a glance:
     - `test_falsify_quantize_empty_data_*` (3 tests) panic because the
       `contract_pre_quantize!` macro asserts `input.len() > 0` while the
       tests assert empty-input → empty-output. Contract / test drift, not
       a regression.
     - `test_falsify_*_roundtrip` (8 tests) appear to expose a real F32
       byte-encoding bug in the GGUF roundtrip path — `expected [5.0, 6.0,
       7.0, 8.0]` vs `got [0.0, 5.93e-39, ...]`. Pre-existing; surfacing for
       follow-up.
   These two classes are documented here for follow-up and explicitly NOT
   in this PR's scope (jidoka — separate root causes get separate fixes).
5. Why now? Per `feedback_main_ci_andon` main CI is green and stays green
   — `--features hub` is a strict superset that wasn't being exercised.
   Landing this fix doesn't risk regression on any required check.

Verified locally:
- `cargo build -p aprender-train --lib --features hub` → clean (was: 2 errors)
- `cargo test -p aprender-train --lib --features hub` → 145 passed / 11 failed
  (the 11 failures are all pre-existing — same files, same panics — they
  just happen to be runnable now)
- `cargo fmt --all -- --check` → no diff in the touched file

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit f01def7 into main May 3, 2026
11 checks passed
@noahgift noahgift deleted the fix/hf-pipeline-gguf-writer-result-binding branch May 3, 2026 17:38
noahgift added a commit that referenced this pull request May 3, 2026
…bytes (#1433)

Closes the empty-data contract drift surfaced by PR #1432: the 3
`test_falsify_quantize_empty_data_*` tests assert that empty input
must produce empty output (dtype preserved), but `contract_pre_quantize!`
asserts `input.len() > 0` and panicked on the empty case.

Resolution: handle empty input in a fast-path BEFORE the precondition
fires. The contract's domain is non-empty inputs (where the precision
bound is meaningful); empty is a documented edge case the test suite
already encodes.

Five Whys:
1. Why was the contract precondition tripping? `contract_pre_quantize!`
   is generated from a YAML contract that declares `input.len() > 0` as
   a domain restriction.
2. Why are tests then asserting empty input → empty output? Because real
   callers pass zero-length slices for skipped tensors / no-op cases;
   panic-on-empty would force every caller to defensively check len().
3. Why not loosen the contract? Domain restrictions are intentional —
   the precision bound (the postcondition) is undefined on empty inputs.
   Loosening the precondition without a thoughtful adjustment to the
   bound's meaning would weaken the math.
4. Why early-return rather than `if data.is_empty() { ... } else { ... }`
   inside the match? Less indentation; the empty path doesn't compute
   anything quantization-mode-specific other than the dtype tag, so a
   tiny early-return is cleaner than a branch around the whole body.
5. Why ship now? PR #1432 surfaced this as one of 11 pre-existing test
   failures with --features hub. Per Toyota Way each defect gets its own
   focused fix; this PR closes the 3 empty-data tests cleanly. The
   remaining 8 (F32 GGUF roundtrip byte-encoding bugs) are tracked for
   follow-up — they're a real serialization defect, separate root cause.

Verified locally:
- `cargo test -p aprender-train --lib --features hub gguf_writer` →
    23/23 pass (was 20/23: 3 empty-data tests now green)
- `cargo test -p aprender-train --lib --features hub` →
    7977 passed / 9 failed (was 7975 / 11). Net 2 fewer failures.
- `cargo fmt --all -- --check` → no diff in touched file

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
…n test helpers (#1434)

* fix(aprender-train): early-return on empty input in quantize_to_gguf_bytes

Closes the empty-data contract drift surfaced by PR #1432: the 3
`test_falsify_quantize_empty_data_*` tests assert that empty input
must produce empty output (dtype preserved), but `contract_pre_quantize!`
asserts `input.len() > 0` and panicked on the empty case.

Resolution: handle empty input in a fast-path BEFORE the precondition
fires. The contract's domain is non-empty inputs (where the precision
bound is meaningful); empty is a documented edge case the test suite
already encodes.

Five Whys:
1. Why was the contract precondition tripping? `contract_pre_quantize!`
   is generated from a YAML contract that declares `input.len() > 0` as
   a domain restriction.
2. Why are tests then asserting empty input → empty output? Because real
   callers pass zero-length slices for skipped tensors / no-op cases;
   panic-on-empty would force every caller to defensively check len().
3. Why not loosen the contract? Domain restrictions are intentional —
   the precision bound (the postcondition) is undefined on empty inputs.
   Loosening the precondition without a thoughtful adjustment to the
   bound's meaning would weaken the math.
4. Why early-return rather than `if data.is_empty() { ... } else { ... }`
   inside the match? Less indentation; the empty path doesn't compute
   anything quantization-mode-specific other than the dtype tag, so a
   tiny early-return is cleaner than a branch around the whole body.
5. Why ship now? PR #1432 surfaced this as one of 11 pre-existing test
   failures with --features hub. Per Toyota Way each defect gets its own
   focused fix; this PR closes the 3 empty-data tests cleanly. The
   remaining 8 (F32 GGUF roundtrip byte-encoding bugs) are tracked for
   follow-up — they're a real serialization defect, separate root cause.

Verified locally:
- `cargo test -p aprender-train --lib --features hub gguf_writer` →
    23/23 pass (was 20/23: 3 empty-data tests now green)
- `cargo test -p aprender-train --lib --features hub` →
    7977 passed / 9 failed (was 7975 / 11). Net 2 fewer failures.
- `cargo fmt --all -- --check` → no diff in touched file

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(aprender-train): account for GGUF tensor-data alignment padding in test helpers

Closes the F32 GGUF roundtrip drift surfaced by PR #1432: 8 tests failing
with `expected [5.0, 6.0, 7.0, 8.0]` vs `got [0.0, 5.93e-39, ...]` plus 1
inline equivalent in `pipeline.rs`.

Root cause: `aprender::format::gguf::export_tensors_to_gguf` (types.rs:445)
writes `padding_for_alignment(header_size, GGUF_DEFAULT_ALIGNMENT)` zero
bytes between the tensor-info section and the tensor data so the data
section starts at a 32-byte-aligned offset. Two test helpers — the shared
`find_data_section_start` in `gguf_verify/tests/mod.rs` and an inline
clone in `pipeline.rs::test_falsify_pipeline_f32_data_integrity_through_pipeline`
— skipped header + metadata + tensor info but NOT the trailing alignment
padding, then read f32 bytes from the wrong offset. The "wrong offset"
landed in the zero-padding region, which produced the characteristic
`[0.0, ~5.93e-39, ~5.95e-39, ...]` pattern (the trailing values are the
first 12 bytes of actual tensor data, just shifted by `padding` bytes
and reinterpreted as f32 from the offset boundary).

Fix: add `pos += padding_for_alignment(pos, GGUF_DEFAULT_ALIGNMENT)` in
both helpers to mirror the writer's behavior.

Five Whys:
1. Why was the data section misaligned? Helpers had a comment claiming
   "NO alignment padding" — incorrect; the writer DOES pad (types.rs:445).
2. Why didn't anyone notice before? `--features hub` was unbuildable on
   main due to PR #1432's syntactic bug; tests couldn't run.
3. Why are there two near-identical helpers (test/mod.rs + pipeline.rs)? A
   refactor extracted `find_data_section_start` for reuse but missed the
   inline copy in `pipeline.rs`. Drift between the two means a fix to one
   is incomplete. This PR fixes both. Follow-up: collapse the inline copy
   to a call into the shared helper.
4. Why use a hardcoded `pos += padding_for_alignment(...)` rather than a
   higher-level "find data section start" abstraction? Bounded scope —
   each fix is a single line + a contextual import. Refactoring the test
   harness is separate work.
5. Why ship now? Per Toyota Way each defect gets a focused fix; this
   single root cause closes all 9 remaining surfaced failures (8 in
   gguf_verify + 1 in pipeline). `--features hub` test pass rate goes
   from 7977/7986 → 7986/7986 (full green).

Verified locally:
- `cargo test -p aprender-train --lib --features hub gguf_verify` → 72/72 pass
- `cargo test -p aprender-train --lib --features hub` → 7986 pass / 0 fail / 16 ignored
- `cargo fmt --all -- --check` → no diff in touched files

Net `--features hub` health across today's session:
  - Pre-#1432: build error (could not run any test)
  - Post-#1432: 7975/7986 pass (build works, 11 pre-existing failures surfaced)
  - Post-#1433: 7977/7986 pass (3 empty-data tests fixed, 9 left)
  - Post-this: 7986/7986 pass — full green ✅

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
…TRAIN-003/004 falsifier-parity coverage (#1436)

Closes the parallel-implementation falsifier-coverage gap for the contract
`apr-cli-distill-train-v1.yaml` between:
  - canonical `crates/aprender-train/src/distill/loss.rs` (task #186, has tests)
  - parallel  `crates/aprender-train/src/hf_pipeline/distillation/loss.rs` (this PR)

Both implement Hinton-style KD loss with the same math invariants per
contract. Without same-coverage gates, one impl could regress without the
other surfacing — exactly the `feedback_coverage_contracts_coevolution`
class.

Adds 4 new tests to `hf_pipeline::distillation::tests`:
1. `falsify_apr_distill_train_003_t_scaling_preserves_argmax` — mirror of
   the canonical TRAIN-003 test (T-scaling preserves softmax argmax)
2. `falsify_apr_distill_train_004_alpha_one_equals_pure_kd` — mirror of
   the canonical TRAIN-004 test (alpha=1 → pure KD; the 1-alpha CE term
   is zeroed)
3. `falsify_apr_distill_train_004_alpha_zero_equals_pure_ce` — symmetric
   bookkeeping (alpha=0 → pure CE; catches off-by-one swap of alpha and
   1-alpha coefficients)
4. `falsify_apr_distill_train_003_log_softmax_consistency` — softmax/
   log_softmax inverse identity within fp32 noise + l2_normalize smoke
   to cover all three currently-imported helpers

Five Whys:
1. Why this gap? `hf_pipeline::distillation::DistillationLoss` is a
   parallel implementation of the canonical `distill::loss::DistillationLoss`,
   added later. The falsifier tests for the canonical impl (#186) weren't
   replicated.
2. Why catch it now? PRs #1432-#1434 fixed the `--features hub` build
   chain (broken syntactic + alignment-padding bugs that prevented the
   tests from being run at all). Now that the test surface is reliable,
   adding the parity coverage is finally executable.
3. Why two impls in the first place? `hf_pipeline` was a Hugging-Face-
   transformers-style scaffolding for distillation export pipelines;
   `distill::loss` is the canonical training-side loss used by `apr
   distill --stage train`. Both should compute the same math.
4. Why a TRAIN-004-dual? Same operational invariant from both directions:
   alpha=1 → soft only, alpha=0 → hard only. If a refactor swaps the
   coefficients, only one of the two tests would catch it; both together
   cover the bookkeeping completely.
5. Why no contract version bump? `apr-cli-distill-train-v1.yaml` v1.0.0
   already enumerates TRAIN-003 and TRAIN-004; this PR closes a coverage
   gap, not a contract change.

Verified locally:
- `cargo test -p aprender-train --lib --features hub falsify_apr_distill`
    → 7 pass / 0 fail (canonical 3 + new hf_pipeline 4)
- `cargo test -p aprender-train --lib --features hub` → 7990/7990 pass,
    16 ignored (was 7986 pre-PR; +4 for the new tests)
- `cargo fmt --all -- --check` → no diff in touched file

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
… drift-prevention tests (#1435)

Closes the contract drift between v1.2.0's prediction and the actual code:
contract `apr-cpu-vs-gpu-output-parity-v1` v1.2.0 (PR #1430) said the wgpu
rejection log should emit `[apr-cpu-vs-gpu-output-parity-v1] wgpu path
rejected, attempting fallback: ...` symmetric to the CUDA tag, but #1430
only made the existing `[GH-559]`/`Backend:` logs unconditional — the
contract-tagged wgpu rejection log itself was missing.

Mirror of the FALSIFY-CPU-GPU-003 chain (#1428 visibility + #1429 drift
test) for FALSIFY-CPU-GPU-005:
- Adds `pub(crate) const WGPU_FALLBACK_LOG_PREFIX = "[apr-cpu-vs-gpu-output-parity-v1] wgpu path rejected"`
- Updates `try_apr_wgpu_inference` to emit it on `GpuDevice::new()` failure
  (alongside the existing `[GH-559]` runbook tag — both, not either)
- 3 new unit tests:
  * `wgpu_fallback_log_prefix_is_contract_tagged` (symmetric to the CUDA test)
  * `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag` (symmetry guard:
    both prefixes must start with the same contract ID and end with
    "path rejected" so grep recipes work uniformly across backends)

Five Whys:
1. Why was this gap in v1.2.0? PR #1430 conflated "make existing wgpu log
   visible" (done) with "add contract-tagged rejection log" (deferred). The
   contract's prediction text wrote the second; the code shipped only the
   first.
2. Why catch it now? `--features hub` build healthy across PRs #1432-#1434
   means the test surface is reliable; this is the natural follow-up.
3. Why the symmetry test? `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag`
   locks in that BOTH backends use the same `[CONTRACT_ID] <backend> path
   rejected` shape. Without it a future PR could drift one but not the
   other and grep recipes would silently skip backends.
4. Why keep `[GH-559]` alongside the new contract tag? Runbook continuity
   — humans tracking that issue tag in logs over time shouldn't lose it.
5. Why no contract version bump? v1.2.0 already specifies this tag in
   FALSIFY-CPU-GPU-005's prediction; this PR closes the implementation
   gap. Bumping again would imply a contract semantic change, which
   isn't happening — only the code catches up to the contract.

Verified locally:
- `cargo test -p aprender-serve --features cuda --lib --release fallback_log_prefix`
    → 3/3 pass (cuda + wgpu + symmetry)
- `cargo fmt --all -- --check` → no diff in touched file

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
…line distill falsifier-parity (#1437)

Canonical record of today's MODEL-2-side hygiene cycle (PRs #1432-#1436).
Pre-cycle: --features hub unbuildable (E0425 in quantize_to_gguf_bytes),
masking 11 pre-existing test failures.

Chain landed:
- #1432: bind match result; build works → 7975/7986 (11 pre-existing surfaced)
- #1433: empty-input early-return → 7977/7986 (3 contract-drift fixed)
- #1434: GGUF alignment-padding skip in 2 test helpers → 7986/7986 ✅
- #1435: WGPU_FALLBACK_LOG_PREFIX + 3 drift-prevention tests (matches v1.2.0)
- #1436: 4 hf_pipeline distill falsifier-parity tests → 7990/7990

§42 documents: what landed (table), net hub health progression, why for
MODEL-2 (parallel-impl drift-protection), Five Whys (build masked
pre-existing failures), coverage update (15+33 unchanged; parallel-impl
uplift), ship % effects (MODEL-1 87→88, MODEL-2 50→54), and
next-session pickup options (CPU-GPU-005 part b OR distill-train
precompute determinism).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant