Skip to content

test(apr-cli-distill-train-v1): hf_pipeline DistillationLoss FALSIFY-TRAIN-003/004 falsifier-parity#1436

Merged
noahgift merged 1 commit into
mainfrom
feat/hf-pipeline-distill-train-falsifier-parity
May 3, 2026
Merged

test(apr-cli-distill-train-v1): hf_pipeline DistillationLoss FALSIFY-TRAIN-003/004 falsifier-parity#1436
noahgift merged 1 commit into
mainfrom
feat/hf-pipeline-distill-train-falsifier-parity

Conversation

@noahgift

@noahgift noahgift commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Closes the parallel-implementation falsifier-coverage gap for contract apr-cli-distill-train-v1.yaml between canonical distill::loss (had tests since BUG: NaN values in GGUF→APR→GGUF roundtrip (Jidoka PMAT-187) #186) and parallel hf_pipeline::distillation::loss (this PR).
  • Adds 4 unit tests to hf_pipeline::distillation::tests:
    1. falsify_apr_distill_train_003_t_scaling_preserves_argmax
    2. falsify_apr_distill_train_004_alpha_one_equals_pure_kd
    3. falsify_apr_distill_train_004_alpha_zero_equals_pure_ce (symmetric bookkeeping)
    4. falsify_apr_distill_train_003_log_softmax_consistency (helper inverse identity + l2_normalize smoke)

Why this PR

This was the original goal that surfaced PR #1432's syntactic build break. With #1432-#1434 fixing the --features hub build chain, the parity coverage is now executable. Per feedback_coverage_contracts_coevolution, every parallel implementation that participates in a contract must have the same falsifier coverage — silent drift would let one impl regress without the other surfacing.

Five Whys

  1. Why this gap? hf_pipeline::distillation is a parallel implementation added later; canonical distill::loss had tests, hf_pipeline did not.
  2. Why catch it now? PRs fix(aprender-train): bind quantize_to_gguf_bytes match result so --features hub builds #1432-fix(aprender-train): account for GGUF tensor-data alignment padding in test helpers #1434 fixed the build chain that prevented these tests from running.
  3. Why two impls? hf_pipeline is HF-transformers-style export scaffolding; distill::loss is canonical training. Both compute the same math and need the same gates.
  4. Why a TRAIN-004-dual? Catches off-by-one swap of alpha and 1-alpha coefficients — both directions cover the bookkeeping completely.
  5. Why no contract version bump? apr-cli-distill-train-v1 already enumerates TRAIN-003+004; this closes a coverage gap, not a contract change.

Test plan

  • cargo test -p aprender-train --lib --features hub falsify_apr_distill → 7 pass / 0 fail (canonical 3 + new hf_pipeline 4)
  • cargo test -p aprender-train --lib --features hub → 7990/7990 pass (was 7986; +4 new)
  • cargo fmt --all -- --check → no diff in touched file

🤖 Generated with Claude Code

…TRAIN-003/004 falsifier-parity coverage

Closes the parallel-implementation falsifier-coverage gap for the contract
`apr-cli-distill-train-v1.yaml` between:
  - canonical `crates/aprender-train/src/distill/loss.rs` (task #186, has tests)
  - parallel  `crates/aprender-train/src/hf_pipeline/distillation/loss.rs` (this PR)

Both implement Hinton-style KD loss with the same math invariants per
contract. Without same-coverage gates, one impl could regress without the
other surfacing — exactly the `feedback_coverage_contracts_coevolution`
class.

Adds 4 new tests to `hf_pipeline::distillation::tests`:
1. `falsify_apr_distill_train_003_t_scaling_preserves_argmax` — mirror of
   the canonical TRAIN-003 test (T-scaling preserves softmax argmax)
2. `falsify_apr_distill_train_004_alpha_one_equals_pure_kd` — mirror of
   the canonical TRAIN-004 test (alpha=1 → pure KD; the 1-alpha CE term
   is zeroed)
3. `falsify_apr_distill_train_004_alpha_zero_equals_pure_ce` — symmetric
   bookkeeping (alpha=0 → pure CE; catches off-by-one swap of alpha and
   1-alpha coefficients)
4. `falsify_apr_distill_train_003_log_softmax_consistency` — softmax/
   log_softmax inverse identity within fp32 noise + l2_normalize smoke
   to cover all three currently-imported helpers

Five Whys:
1. Why this gap? `hf_pipeline::distillation::DistillationLoss` is a
   parallel implementation of the canonical `distill::loss::DistillationLoss`,
   added later. The falsifier tests for the canonical impl (#186) weren't
   replicated.
2. Why catch it now? PRs #1432-#1434 fixed the `--features hub` build
   chain (broken syntactic + alignment-padding bugs that prevented the
   tests from being run at all). Now that the test surface is reliable,
   adding the parity coverage is finally executable.
3. Why two impls in the first place? `hf_pipeline` was a Hugging-Face-
   transformers-style scaffolding for distillation export pipelines;
   `distill::loss` is the canonical training-side loss used by `apr
   distill --stage train`. Both should compute the same math.
4. Why a TRAIN-004-dual? Same operational invariant from both directions:
   alpha=1 → soft only, alpha=0 → hard only. If a refactor swaps the
   coefficients, only one of the two tests would catch it; both together
   cover the bookkeeping completely.
5. Why no contract version bump? `apr-cli-distill-train-v1.yaml` v1.0.0
   already enumerates TRAIN-003 and TRAIN-004; this PR closes a coverage
   gap, not a contract change.

Verified locally:
- `cargo test -p aprender-train --lib --features hub falsify_apr_distill`
    → 7 pass / 0 fail (canonical 3 + new hf_pipeline 4)
- `cargo test -p aprender-train --lib --features hub` → 7990/7990 pass,
    16 ignored (was 7986 pre-PR; +4 for the new tests)
- `cargo fmt --all -- --check` → no diff in touched file

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 0f7cfd6 into main May 3, 2026
11 checks passed
@noahgift noahgift deleted the feat/hf-pipeline-distill-train-falsifier-parity branch May 3, 2026 19:31
noahgift added a commit that referenced this pull request May 3, 2026
…line distill falsifier-parity (#1437)

Canonical record of today's MODEL-2-side hygiene cycle (PRs #1432-#1436).
Pre-cycle: --features hub unbuildable (E0425 in quantize_to_gguf_bytes),
masking 11 pre-existing test failures.

Chain landed:
- #1432: bind match result; build works → 7975/7986 (11 pre-existing surfaced)
- #1433: empty-input early-return → 7977/7986 (3 contract-drift fixed)
- #1434: GGUF alignment-padding skip in 2 test helpers → 7986/7986 ✅
- #1435: WGPU_FALLBACK_LOG_PREFIX + 3 drift-prevention tests (matches v1.2.0)
- #1436: 4 hf_pipeline distill falsifier-parity tests → 7990/7990

§42 documents: what landed (table), net hub health progression, why for
MODEL-2 (parallel-impl drift-protection), Five Whys (build masked
pre-existing failures), coverage update (15+33 unchanged; parallel-impl
uplift), ship % effects (MODEL-1 87→88, MODEL-2 50→54), and
next-session pickup options (CPU-GPU-005 part b OR distill-train
precompute determinism).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant