test(apr-cli-distill-train-v1): hf_pipeline DistillationLoss FALSIFY-TRAIN-003/004 falsifier-parity#1436
Merged
Conversation
…TRAIN-003/004 falsifier-parity coverage Closes the parallel-implementation falsifier-coverage gap for the contract `apr-cli-distill-train-v1.yaml` between: - canonical `crates/aprender-train/src/distill/loss.rs` (task #186, has tests) - parallel `crates/aprender-train/src/hf_pipeline/distillation/loss.rs` (this PR) Both implement Hinton-style KD loss with the same math invariants per contract. Without same-coverage gates, one impl could regress without the other surfacing — exactly the `feedback_coverage_contracts_coevolution` class. Adds 4 new tests to `hf_pipeline::distillation::tests`: 1. `falsify_apr_distill_train_003_t_scaling_preserves_argmax` — mirror of the canonical TRAIN-003 test (T-scaling preserves softmax argmax) 2. `falsify_apr_distill_train_004_alpha_one_equals_pure_kd` — mirror of the canonical TRAIN-004 test (alpha=1 → pure KD; the 1-alpha CE term is zeroed) 3. `falsify_apr_distill_train_004_alpha_zero_equals_pure_ce` — symmetric bookkeeping (alpha=0 → pure CE; catches off-by-one swap of alpha and 1-alpha coefficients) 4. `falsify_apr_distill_train_003_log_softmax_consistency` — softmax/ log_softmax inverse identity within fp32 noise + l2_normalize smoke to cover all three currently-imported helpers Five Whys: 1. Why this gap? `hf_pipeline::distillation::DistillationLoss` is a parallel implementation of the canonical `distill::loss::DistillationLoss`, added later. The falsifier tests for the canonical impl (#186) weren't replicated. 2. Why catch it now? PRs #1432-#1434 fixed the `--features hub` build chain (broken syntactic + alignment-padding bugs that prevented the tests from being run at all). Now that the test surface is reliable, adding the parity coverage is finally executable. 3. Why two impls in the first place? `hf_pipeline` was a Hugging-Face- transformers-style scaffolding for distillation export pipelines; `distill::loss` is the canonical training-side loss used by `apr distill --stage train`. Both should compute the same math. 4. Why a TRAIN-004-dual? Same operational invariant from both directions: alpha=1 → soft only, alpha=0 → hard only. If a refactor swaps the coefficients, only one of the two tests would catch it; both together cover the bookkeeping completely. 5. Why no contract version bump? `apr-cli-distill-train-v1.yaml` v1.0.0 already enumerates TRAIN-003 and TRAIN-004; this PR closes a coverage gap, not a contract change. Verified locally: - `cargo test -p aprender-train --lib --features hub falsify_apr_distill` → 7 pass / 0 fail (canonical 3 + new hf_pipeline 4) - `cargo test -p aprender-train --lib --features hub` → 7990/7990 pass, 16 ignored (was 7986 pre-PR; +4 for the new tests) - `cargo fmt --all -- --check` → no diff in touched file Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
noahgift
added a commit
that referenced
this pull request
May 3, 2026
…line distill falsifier-parity (#1437) Canonical record of today's MODEL-2-side hygiene cycle (PRs #1432-#1436). Pre-cycle: --features hub unbuildable (E0425 in quantize_to_gguf_bytes), masking 11 pre-existing test failures. Chain landed: - #1432: bind match result; build works → 7975/7986 (11 pre-existing surfaced) - #1433: empty-input early-return → 7977/7986 (3 contract-drift fixed) - #1434: GGUF alignment-padding skip in 2 test helpers → 7986/7986 ✅ - #1435: WGPU_FALLBACK_LOG_PREFIX + 3 drift-prevention tests (matches v1.2.0) - #1436: 4 hf_pipeline distill falsifier-parity tests → 7990/7990 §42 documents: what landed (table), net hub health progression, why for MODEL-2 (parallel-impl drift-protection), Five Whys (build masked pre-existing failures), coverage update (15+33 unchanged; parallel-impl uplift), ship % effects (MODEL-1 87→88, MODEL-2 50→54), and next-session pickup options (CPU-GPU-005 part b OR distill-train precompute determinism). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
apr-cli-distill-train-v1.yamlbetween canonicaldistill::loss(had tests since BUG: NaN values in GGUF→APR→GGUF roundtrip (Jidoka PMAT-187) #186) and parallelhf_pipeline::distillation::loss(this PR).hf_pipeline::distillation::tests:falsify_apr_distill_train_003_t_scaling_preserves_argmaxfalsify_apr_distill_train_004_alpha_one_equals_pure_kdfalsify_apr_distill_train_004_alpha_zero_equals_pure_ce(symmetric bookkeeping)falsify_apr_distill_train_003_log_softmax_consistency(helper inverse identity + l2_normalize smoke)Why this PR
This was the original goal that surfaced PR #1432's syntactic build break. With #1432-#1434 fixing the
--features hubbuild chain, the parity coverage is now executable. Perfeedback_coverage_contracts_coevolution, every parallel implementation that participates in a contract must have the same falsifier coverage — silent drift would let one impl regress without the other surfacing.Five Whys
hf_pipeline::distillationis a parallel implementation added later; canonicaldistill::losshad tests, hf_pipeline did not.hf_pipelineis HF-transformers-style export scaffolding;distill::lossis canonical training. Both compute the same math and need the same gates.apr-cli-distill-train-v1already enumerates TRAIN-003+004; this closes a coverage gap, not a contract change.Test plan
cargo test -p aprender-train --lib --features hub falsify_apr_distill→ 7 pass / 0 fail (canonical 3 + new hf_pipeline 4)cargo test -p aprender-train --lib --features hub→ 7990/7990 pass (was 7986; +4 new)cargo fmt --all -- --check→ no diff in touched file🤖 Generated with Claude Code