fix(aprender-train): account for GGUF tensor-data alignment padding in test helpers by noahgift · Pull Request #1434 · paiml/aprender

noahgift · 2026-05-03T18:14:50Z

Summary

Closes the last 9 of 11 pre-existing test failures surfaced by PR fix(aprender-train): bind quantize_to_gguf_bytes match result so --features hub builds #1432: 8 in gguf_verify/tests/falsify.rs, 1 inline in pipeline.rs. Single root cause: test helpers skipped header + metadata + tensor-info but NOT the trailing 32-byte alignment padding the writer emits at aprender::format::gguf::types.rs:445.
After this PR, --features hub is fully green: 7986 pass / 0 fail / 16 ignored.

Five Whys

Why was the data section misaligned? Test helpers had a comment claiming "NO alignment padding" — incorrect; the writer DOES pad.
Why didn't anyone notice before? --features hub was unbuildable on main until PR fix(aprender-train): bind quantize_to_gguf_bytes match result so --features hub builds #1432 fixed quantize_to_gguf_bytes.
Why two near-identical helpers? Refactor extracted find_data_section_start but missed an inline clone in pipeline.rs. This PR fixes both; follow-up: collapse the inline copy to call the shared helper.
Why use a hardcoded pos += padding_for_alignment(pos, ALIGN) instead of a higher-level abstraction? Bounded scope — single-line fix in each location. Refactor is separate work.
Why ship now? One root cause closes all 9 surfaced failures cleanly. --features hub jumps from 7977/7986 → 7986/7986 (full green).

Net `--features hub` health across today's session

Pre-fix(aprender-train): bind quantize_to_gguf_bytes match result so --features hub builds #1432: build error (no tests could run)
Post-fix(aprender-train): bind quantize_to_gguf_bytes match result so --features hub builds #1432: 7975/7986 (11 pre-existing failures surfaced, jidoka)
Post-fix(aprender-train): early-return on empty input in quantize_to_gguf_bytes #1433: 7977/7986 (3 empty-data fixed)
Post-this: 7986/7986 ✅ (alignment-padding fix closes the rest)

Test plan

cargo test -p aprender-train --lib --features hub → 7986 pass / 0 fail
cargo test -p aprender-train --lib --features hub gguf_verify → 72/72 pass
cargo fmt --all -- --check → no diff in touched files

🤖 Generated with Claude Code

…bytes Closes the empty-data contract drift surfaced by PR #1432: the 3 `test_falsify_quantize_empty_data_*` tests assert that empty input must produce empty output (dtype preserved), but `contract_pre_quantize!` asserts `input.len() > 0` and panicked on the empty case. Resolution: handle empty input in a fast-path BEFORE the precondition fires. The contract's domain is non-empty inputs (where the precision bound is meaningful); empty is a documented edge case the test suite already encodes. Five Whys: 1. Why was the contract precondition tripping? `contract_pre_quantize!` is generated from a YAML contract that declares `input.len() > 0` as a domain restriction. 2. Why are tests then asserting empty input → empty output? Because real callers pass zero-length slices for skipped tensors / no-op cases; panic-on-empty would force every caller to defensively check len(). 3. Why not loosen the contract? Domain restrictions are intentional — the precision bound (the postcondition) is undefined on empty inputs. Loosening the precondition without a thoughtful adjustment to the bound's meaning would weaken the math. 4. Why early-return rather than `if data.is_empty() { ... } else { ... }` inside the match? Less indentation; the empty path doesn't compute anything quantization-mode-specific other than the dtype tag, so a tiny early-return is cleaner than a branch around the whole body. 5. Why ship now? PR #1432 surfaced this as one of 11 pre-existing test failures with --features hub. Per Toyota Way each defect gets its own focused fix; this PR closes the 3 empty-data tests cleanly. The remaining 8 (F32 GGUF roundtrip byte-encoding bugs) are tracked for follow-up — they're a real serialization defect, separate root cause. Verified locally: - `cargo test -p aprender-train --lib --features hub gguf_writer` → 23/23 pass (was 20/23: 3 empty-data tests now green) - `cargo test -p aprender-train --lib --features hub` → 7977 passed / 9 failed (was 7975 / 11). Net 2 fewer failures. - `cargo fmt --all -- --check` → no diff in touched file Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…pty-data-early-return

…n test helpers Closes the F32 GGUF roundtrip drift surfaced by PR #1432: 8 tests failing with `expected [5.0, 6.0, 7.0, 8.0]` vs `got [0.0, 5.93e-39, ...]` plus 1 inline equivalent in `pipeline.rs`. Root cause: `aprender::format::gguf::export_tensors_to_gguf` (types.rs:445) writes `padding_for_alignment(header_size, GGUF_DEFAULT_ALIGNMENT)` zero bytes between the tensor-info section and the tensor data so the data section starts at a 32-byte-aligned offset. Two test helpers — the shared `find_data_section_start` in `gguf_verify/tests/mod.rs` and an inline clone in `pipeline.rs::test_falsify_pipeline_f32_data_integrity_through_pipeline` — skipped header + metadata + tensor info but NOT the trailing alignment padding, then read f32 bytes from the wrong offset. The "wrong offset" landed in the zero-padding region, which produced the characteristic `[0.0, ~5.93e-39, ~5.95e-39, ...]` pattern (the trailing values are the first 12 bytes of actual tensor data, just shifted by `padding` bytes and reinterpreted as f32 from the offset boundary). Fix: add `pos += padding_for_alignment(pos, GGUF_DEFAULT_ALIGNMENT)` in both helpers to mirror the writer's behavior. Five Whys: 1. Why was the data section misaligned? Helpers had a comment claiming "NO alignment padding" — incorrect; the writer DOES pad (types.rs:445). 2. Why didn't anyone notice before? `--features hub` was unbuildable on main due to PR #1432's syntactic bug; tests couldn't run. 3. Why are there two near-identical helpers (test/mod.rs + pipeline.rs)? A refactor extracted `find_data_section_start` for reuse but missed the inline copy in `pipeline.rs`. Drift between the two means a fix to one is incomplete. This PR fixes both. Follow-up: collapse the inline copy to a call into the shared helper. 4. Why use a hardcoded `pos += padding_for_alignment(...)` rather than a higher-level "find data section start" abstraction? Bounded scope — each fix is a single line + a contextual import. Refactoring the test harness is separate work. 5. Why ship now? Per Toyota Way each defect gets a focused fix; this single root cause closes all 9 remaining surfaced failures (8 in gguf_verify + 1 in pipeline). `--features hub` test pass rate goes from 7977/7986 → 7986/7986 (full green). Verified locally: - `cargo test -p aprender-train --lib --features hub gguf_verify` → 72/72 pass - `cargo test -p aprender-train --lib --features hub` → 7986 pass / 0 fail / 16 ignored - `cargo fmt --all -- --check` → no diff in touched files Net `--features hub` health across today's session: - Pre-#1432: build error (could not run any test) - Post-#1432: 7975/7986 pass (build works, 11 pre-existing failures surfaced) - Post-#1433: 7977/7986 pass (3 empty-data tests fixed, 9 left) - Post-this: 7986/7986 pass — full green ✅ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…TRAIN-003/004 falsifier-parity coverage (#1436) Closes the parallel-implementation falsifier-coverage gap for the contract `apr-cli-distill-train-v1.yaml` between: - canonical `crates/aprender-train/src/distill/loss.rs` (task #186, has tests) - parallel `crates/aprender-train/src/hf_pipeline/distillation/loss.rs` (this PR) Both implement Hinton-style KD loss with the same math invariants per contract. Without same-coverage gates, one impl could regress without the other surfacing — exactly the `feedback_coverage_contracts_coevolution` class. Adds 4 new tests to `hf_pipeline::distillation::tests`: 1. `falsify_apr_distill_train_003_t_scaling_preserves_argmax` — mirror of the canonical TRAIN-003 test (T-scaling preserves softmax argmax) 2. `falsify_apr_distill_train_004_alpha_one_equals_pure_kd` — mirror of the canonical TRAIN-004 test (alpha=1 → pure KD; the 1-alpha CE term is zeroed) 3. `falsify_apr_distill_train_004_alpha_zero_equals_pure_ce` — symmetric bookkeeping (alpha=0 → pure CE; catches off-by-one swap of alpha and 1-alpha coefficients) 4. `falsify_apr_distill_train_003_log_softmax_consistency` — softmax/ log_softmax inverse identity within fp32 noise + l2_normalize smoke to cover all three currently-imported helpers Five Whys: 1. Why this gap? `hf_pipeline::distillation::DistillationLoss` is a parallel implementation of the canonical `distill::loss::DistillationLoss`, added later. The falsifier tests for the canonical impl (#186) weren't replicated. 2. Why catch it now? PRs #1432-#1434 fixed the `--features hub` build chain (broken syntactic + alignment-padding bugs that prevented the tests from being run at all). Now that the test surface is reliable, adding the parity coverage is finally executable. 3. Why two impls in the first place? `hf_pipeline` was a Hugging-Face- transformers-style scaffolding for distillation export pipelines; `distill::loss` is the canonical training-side loss used by `apr distill --stage train`. Both should compute the same math. 4. Why a TRAIN-004-dual? Same operational invariant from both directions: alpha=1 → soft only, alpha=0 → hard only. If a refactor swaps the coefficients, only one of the two tests would catch it; both together cover the bookkeeping completely. 5. Why no contract version bump? `apr-cli-distill-train-v1.yaml` v1.0.0 already enumerates TRAIN-003 and TRAIN-004; this PR closes a coverage gap, not a contract change. Verified locally: - `cargo test -p aprender-train --lib --features hub falsify_apr_distill` → 7 pass / 0 fail (canonical 3 + new hf_pipeline 4) - `cargo test -p aprender-train --lib --features hub` → 7990/7990 pass, 16 ignored (was 7986 pre-PR; +4 for the new tests) - `cargo fmt --all -- --check` → no diff in touched file Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… drift-prevention tests (#1435) Closes the contract drift between v1.2.0's prediction and the actual code: contract `apr-cpu-vs-gpu-output-parity-v1` v1.2.0 (PR #1430) said the wgpu rejection log should emit `[apr-cpu-vs-gpu-output-parity-v1] wgpu path rejected, attempting fallback: ...` symmetric to the CUDA tag, but #1430 only made the existing `[GH-559]`/`Backend:` logs unconditional — the contract-tagged wgpu rejection log itself was missing. Mirror of the FALSIFY-CPU-GPU-003 chain (#1428 visibility + #1429 drift test) for FALSIFY-CPU-GPU-005: - Adds `pub(crate) const WGPU_FALLBACK_LOG_PREFIX = "[apr-cpu-vs-gpu-output-parity-v1] wgpu path rejected"` - Updates `try_apr_wgpu_inference` to emit it on `GpuDevice::new()` failure (alongside the existing `[GH-559]` runbook tag — both, not either) - 3 new unit tests: * `wgpu_fallback_log_prefix_is_contract_tagged` (symmetric to the CUDA test) * `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag` (symmetry guard: both prefixes must start with the same contract ID and end with "path rejected" so grep recipes work uniformly across backends) Five Whys: 1. Why was this gap in v1.2.0? PR #1430 conflated "make existing wgpu log visible" (done) with "add contract-tagged rejection log" (deferred). The contract's prediction text wrote the second; the code shipped only the first. 2. Why catch it now? `--features hub` build healthy across PRs #1432-#1434 means the test surface is reliable; this is the natural follow-up. 3. Why the symmetry test? `cuda_and_wgpu_fallback_log_prefixes_share_contract_tag` locks in that BOTH backends use the same `[CONTRACT_ID] <backend> path rejected` shape. Without it a future PR could drift one but not the other and grep recipes would silently skip backends. 4. Why keep `[GH-559]` alongside the new contract tag? Runbook continuity — humans tracking that issue tag in logs over time shouldn't lose it. 5. Why no contract version bump? v1.2.0 already specifies this tag in FALSIFY-CPU-GPU-005's prediction; this PR closes the implementation gap. Bumping again would imply a contract semantic change, which isn't happening — only the code catches up to the contract. Verified locally: - `cargo test -p aprender-serve --features cuda --lib --release fallback_log_prefix` → 3/3 pass (cuda + wgpu + symmetry) - `cargo fmt --all -- --check` → no diff in touched file Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…line distill falsifier-parity (#1437) Canonical record of today's MODEL-2-side hygiene cycle (PRs #1432-#1436). Pre-cycle: --features hub unbuildable (E0425 in quantize_to_gguf_bytes), masking 11 pre-existing test failures. Chain landed: - #1432: bind match result; build works → 7975/7986 (11 pre-existing surfaced) - #1433: empty-input early-return → 7977/7986 (3 contract-drift fixed) - #1434: GGUF alignment-padding skip in 2 test helpers → 7986/7986 ✅ - #1435: WGPU_FALLBACK_LOG_PREFIX + 3 drift-prevention tests (matches v1.2.0) - #1436: 4 hf_pipeline distill falsifier-parity tests → 7990/7990 §42 documents: what landed (table), net hub health progression, why for MODEL-2 (parallel-impl drift-protection), Five Whys (build masked pre-existing failures), coverage update (15+33 unchanged; parallel-impl uplift), ship % effects (MODEL-1 87→88, MODEL-2 50→54), and next-session pickup options (CPU-GPU-005 part b OR distill-train precompute determinism). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift and others added 3 commits May 3, 2026 19:41

Merge branch 'main' of github.com:paiml/aprender into fix/quantize-em…

96fce4c

…pty-data-early-return

noahgift enabled auto-merge (squash) May 3, 2026 18:29

noahgift mentioned this pull request May 3, 2026

test(apr-cpu-vs-gpu-output-parity-v1): add WGPU_FALLBACK_LOG_PREFIX + drift-prevention tests #1435

Merged

3 tasks

noahgift merged commit 5b81a33 into main May 3, 2026
11 checks passed

noahgift deleted the investigate/f32-gguf-roundtrip-byte-encoding branch May 3, 2026 18:40

noahgift mentioned this pull request May 3, 2026

test(apr-cli-distill-train-v1): hf_pipeline DistillationLoss FALSIFY-TRAIN-003/004 falsifier-parity #1436

Merged

3 tasks

noahgift mentioned this pull request May 3, 2026

spec(ship-two-models): v2.87.0 — §42 hub build chain repair + hf_pipeline distill falsifier-parity #1437

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(aprender-train): account for GGUF tensor-data alignment padding in test helpers#1434

fix(aprender-train): account for GGUF tensor-data alignment padding in test helpers#1434
noahgift merged 3 commits into
mainfrom
investigate/f32-gguf-roundtrip-byte-encoding

noahgift commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 3, 2026

Summary

Five Whys

Net --features hub health across today's session

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Net `--features hub` health across today's session