contract(tensor-layout-v1): v2.0.0 → v2.1.0 — FALSIFY-013 safetensors FFN round-trip drift gate by noahgift · Pull Request #1468 · paiml/aprender

noahgift · 2026-05-04T12:11:28Z

Summary

Codifies spec §50 finding as a falsifiable contract gate. `apr diff ` MUST report IDENTICAL for FFN tensors (`mlp.{down,gate,up}_proj.weight`), never [TRANSPOSED]. Status: PARTIAL_ALGORITHM_LEVEL — algorithm-bound to live evidence; flips DISCHARGED when the safetensors→APR FFN transpose fix lands.

Five Whys (recap from §50)

Why does 0.5B `apr run` produce gibberish? FFN matmul reads weights in wrong orientation.
Why? APR FFN tensors stored as `[out, in]` (HF SafeTensors convention), not `[in, out]` (kernel expectation).
Why? Safetensors→APR import preserved HF shape labels without transposing.
Why? `needs_transpose` scaffolding at `f16_convert.rs:100-127` is `#[allow(dead_code)]`.
Why undetected? No round-trip falsification gate. This PR adds it.

Validation

```
$ pv validate contracts/tensor-layout-v1.yaml
0 error(s), 0 warning(s)
Contract is valid.
```

Algorithm evidence captured

diagnosis_line: 'Values identical, shapes transposed (format layout diff)'
3 affected FFN tensors (down/gate/up _proj)
6 unaffected tensors as control (q/k/v/o _proj + 2 norms)
7B status: works (GGUF-imported)
0.5B status: gibberish (safetensors-imported)

Discharge criterion

When the fix lands and `apr diff` reports IDENTICAL on round-trip, flip status PARTIAL → DISCHARGED. Coverage tally will increment +1.

Cross-refs

Spec §50 (PR spec(ship-two-models): v2.95.0 — §50 LAYOUT-001/002 in safetensors→APR FFN import #1467)
Evidence (PR docs(evidence): qwen2-0.5b bisection — root cause via apr diff (LAYOUT-001/002 violation) #1466)
CLAUDE.md "## LAYOUT-001/002 Tensor Layout Safety"

Test plan

`pv validate` clean
Pre-commit quality gates pass
CI `ci / gate` and `workspace-test`

🤖 Generated with Claude Code

… FFN round-trip drift gate Codifies the §50 finding (spec v2.95.0) as a falsifiable contract gate: post-import, `apr diff <apr> <gguf>` MUST report IDENTICAL for FFN tensors (`mlp.{down,gate,up}_proj.weight`), never [TRANSPOSED]. Status: PARTIAL_ALGORITHM_LEVEL — algorithm-bound to live evidence in `evidence/qwen2-0.5b-bisection-2026-05-04/findings.md` but not yet DISCHARGED (requires the safetensors→APR FFN transpose fix to land, ~50 LOC bounded scope per spec §50.7). algorithm_evidence block captures: - diagnosis_line: 'Values identical, shapes transposed (format layout diff)' - affected_tensors: 3 FFN proj weights with shape labels swapped - unaffected_tensors: 4 attn projections + 2 norms (IDENTICAL) - seven_b_status: works (GGUF-imported, inherits transposed FFN layout) - half_b_status: gibberish (safetensors-imported, preserves HF [out, in]) Discharge criterion: when the fix lands and `apr diff` reports IDENTICAL on the round-trip, flip status PARTIAL → DISCHARGED. This is the gate that prevents §49 MODEL-2 strategy A from regressing in the future. Validation: `pv validate contracts/tensor-layout-v1.yaml` → 0 errors. Cross-refs: - spec §50 (docs/specifications/aprender-train/ship-two-models-spec.md) - PR #1466 evidence (open auto-merge) - PR #1467 spec amendment v2.95.0 (open auto-merge) - CLAUDE.md "## LAYOUT-001/002 Tensor Layout Safety" (this contract is SOURCE OF TRUTH) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift · 2026-05-04T12:14:09Z

Closing — premise was wrong. Re-reading the existing tensor-layout-v1 contract metadata reveals: 'safetensors: layout=row-major (HuggingFace native format - same layout as APR)'. The [TRANSPOSED] tag from apr diff reports contract-compliant shape-label differences between formats, NOT a defect. The 0.5B APR file's [intermediate, hidden] layout for ffn_gate IS what the contract specifies. Six hypotheses now falsified for the Qwen2-0.5B gibberish; root cause is genuinely elsewhere. Spec §50 in PR #1467 also needs correction. Authoring a fix from this premise would break MODEL-1.

…oupling finding (#1472) Adds §50 documenting the architecture-mismatch finding caught after §49.6 steps 3+4 landed (PR #1470 contract + PR #1471 wire-up). The remaining §49.6 step 5 was scoped at "0 LOC, just run apr pretrain --init" — that assumption is empirically wrong. Empirical finding (§50.1): pretrain_real.rs:38-46 HARDCODES Llama370MConfig::* for every architectural constant. Qwen2.5-Coder-0.5B-Instruct has different shape across the board: Param | Llama370M | Qwen2.5-Coder-0.5B -----------------|-----------|-------------------- hidden_size | 1024 | 896 num_attention_heads | 16 | 14 num_kv_heads | 4 (GQA-4:1) | 2 (GQA-7:1) intermediate_size | 2816 | 4864 vocab_size | 50_257 | 151_936 rope_theta | 10_000 | 1_000_000 Every tensor mismatches. Loading Qwen2.5 weights into a Llama370M- shaped optimizer is a category error. Three options surfaced (§50.3): A: Find/build a Llama-shaped 0.5B pretrained checkpoint (~5K LOC + multi-week training; recreates §24/§25 corpus problem) B: Make trainer architecture-polymorphic (~200-400 LOC; preserves §24/§25 falsification; recommended) C: Replace Llama370MConfig with Qwen2_5_Coder_0_5B_Config outright (~300 LOC; deletes a working falsification path) Recommendation (§50.5): Option B — preserves §24/§25 falsification evidence, exercises TransformerConfig's designed polymorphism, binds each new component (qwen2_0_5b constructor, GQA-7:1 attention, Qwen tokenizer surface) to its own falsifier. Re-scoped roadmap (§50.4) — 8 sub-steps replacing original step 5: 5a. Author apr-pretrain-arch-polymorphic-v1.yaml contract (~80 LOC) 5b. TransformerConfig::qwen2_0_5b() constructor (~40 LOC) 5c. Extract arch from init APR file metadata (~80 LOC) 5d. Qwen tokenizer-vocab compatibility check (~30 LOC) 5e. GQA-7:1 attention forward-pass verification (~50 LOC) 5f. Wire actual weight load (~120 LOC) 5g. LIVE 500-step smoke fine-tune (operator dispatch) 0 LOC 5h. Stamp + publish as MODEL-2 v2 (~10 LOC) Total: ~410 LOC + 1 LIVE training run. Five Whys (§50.6): 1. Why didn't §49 catch this? §49 was authored from strategy/ data-budget reasoning; the 0-LOC step-5 cost implicitly assumed polymorphism. Live source inspection (this section's empirical move) revealed pretrain_real.rs:38-46 predates the assumption. 2. Why catch this NOW and not in step 5 implementation? Per feedback_no_guessing.md: read live source before forming implementation plan. Surfacing the mismatch BEFORE writing 200 LOC of weight-load code that fails at runtime is the cheapest place to pay cost-of-defect. The §50-prior wrong- premise PRs (#1466/#1467/#1468 closed) on the SHIP-007 / 0.5B gibberish track were the same defect class. 3. Why option B over A or C? Preserves §24/§25 falsification evidence (we KEEP knowing from-scratch fails at 9.75; we just don't ship it as MODEL-2). Exercises the polymorphism TransformerConfig was designed for. Each new component becomes its own falsifier rather than a hidden coupling. 4. Why is FALSIFY-005 the right place to fail-fast? PR #1470 already pinned "Architecture mismatch is FAIL-FAST, not silent- truncate". Step 4 (PR #1471) doesn't enforce arch matching yet — returns "not yet wired" before getting there. So FALSIFY-005 is currently UNBOUND but its discharge gate is well-defined: read APR header, compare against pretrain target, error with names of mismatched fields. 5. Why isn't this a "punt"? A punt would say "blocked, await operator". This amendment names three options with LOC estimates, recommends one with reasoning, gives a concrete 8- step roadmap with falsifier discharge mapped to each sub-step. The work IS shippable; it's just bigger than 0 LOC. Plain ship-% update: - MODEL-1: unchanged at 91% (SHIP-007 cascade infrastructure track) - MODEL-2: unchanged at 57% — first ship-% movement gated on §50.4 step 5g (LIVE 500-step fine-tune producing val_loss < 9.38). Sub-steps 5a-5f can each individually move 1% with falsifier discharge (architecture-polymorphic infrastructure shipped == evidence that the §49 path is REACHABLE, not just theoretical). Refs: - §49 — MODEL-2 strategy pivot (PR #1461) - PR #1470 — apr-pretrain-from-init-v1 v1.0.0 PROPOSED contract - PR #1471 — apr pretrain --init clap field + magic-byte validate - feedback_no_guessing.md — read source before forming hypothesis - feedback_fix_root_cause_never_route_around.md Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 4, 2026 12:11

noahgift closed this May 4, 2026

auto-merge was automatically disabled May 4, 2026 12:14
Pull request was closed

noahgift mentioned this pull request May 4, 2026

spec(ship-two-models): v2.94.0 → v2.95.0 — §50 MODEL-2 architecture-coupling finding #1472

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contract(tensor-layout-v1): v2.0.0 → v2.1.0 — FALSIFY-013 safetensors FFN round-trip drift gate#1468

contract(tensor-layout-v1): v2.0.0 → v2.1.0 — FALSIFY-013 safetensors FFN round-trip drift gate#1468
noahgift wants to merge 1 commit into
mainfrom
contract/tensor-layout-v1-ffn-falsifier

noahgift commented May 4, 2026

Uh oh!

noahgift commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 4, 2026

Summary

Five Whys (recap from §50)

Validation

Algorithm evidence captured

Discharge criterion

Cross-refs

Test plan

Uh oh!

noahgift commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant