Skip to content

docs(ship-two-001): §36 — plain-language status of what's left to ship the two models#1098

Merged
noahgift merged 1 commit into
mainfrom
docs/spec-36-plain-status
Apr 28, 2026
Merged

docs(ship-two-001): §36 — plain-language status of what's left to ship the two models#1098
noahgift merged 1 commit into
mainfrom
docs/spec-36-plain-status

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Landmark section in plain prose for readers who don't want to chase the §15→§35 hypothesis chain. Each of the two models is blocked by a single concrete problem.

Plain-language status

MODEL-1 (Qwen2.5-Coder-7B Apache-Q4K, already published): has a numerical bug at layer 3 of FFN — outputs 18× too spread compared to GGUF on the same prompt. Three theories tested and refuted today:

Actual bug: cumulative F32 precision drift through residual connections. Fix path: with PR #1082 (sub-FFN telemetry) merged and PR #1083 (CLI wiring) in flight, run apr trace --payload on canonical 7B teacher in both formats and bisect layer-by-layer.

MODEL-2 (paiml/albor-llama-370m-python-v1, trained today): val_loss=9.38, spec target 3.0. The 370M-from-scratch architecture has converged — 4× more steps yielded same outcome (§34). Capacity is the binding constraint. Path: distillation from shipped MODEL-1 7B teacher. apr distill is currently a stub (§35); contract authored as #1097, impl is multi-day Rust task.

What's left

Model What's blocked What's needed
MODEL-1 Layer-3 FFN bug Bisect with sub-FFN telemetry → fix
MODEL-2 val_loss=9.38 capacity ceiling Implement apr distill --stage train

Both blockers are fixable with code, not training time or compute.

Today's session output

11 PRs landed in 24 hours:

  • 6 spec amendments (§30, §31/§32, §33, §34, §35, §36)
  • 4 contracts (P1.0 dataset, parallel-bpe, distill-train, parity)
  • 1 implementation (P1.1 apr pull dataset)
  • 2 SHIP-007 sub-FFN telemetry PRs (PR B + PR C)

Plus full P1.0 → P2 corpus pipeline (565.6M tokens, val_loss=9.38) executed end-to-end with zero muda.

Test plan

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) April 28, 2026 07:26
… — v2.80 → v2.81

Landmark section in plain prose for readers who don't want to chase the
§15→§35 hypothesis chain. Each model is blocked by a single concrete
problem.

MODEL-1: numerical bug at layer 3 of FFN. 18× std anomaly vs GGUF reference.
Three theories tested+refuted today (matmul kernel via §30, qkv_bias via §32,
layer-3 weight bytes via #1082 byte-compare). Actual bug is cumulative F32
precision drift through residuals. Fix path: with PR #1082 merged + PR #1083
in flight, run apr trace --payload on canonical 7B teacher in both formats
and bisect layer-by-layer.

MODEL-2: trained end-to-end today. val_loss=9.38 (spec target 3.0). 370M
from-scratch has converged — 4x more steps yielded same outcome (§34).
Capacity is the binding, not corpus or compute. Path forward: distillation
from shipped MODEL-1 7B teacher. apr distill is currently a stub (§35);
contract authored as #1097, impl is multi-day Rust task.

Both blockers are fixable with code, not training time:
- MODEL-1: bisect with new sub-FFN telemetry, then fix at root
- MODEL-2: implement apr distill --stage train, then run 2-4h distillation

Today's session: 11 PRs landed (6 spec amendments + 4 contracts + 1 impl
+ 2 SHIP-007 sub-FFN telemetry PRs) plus full P1.0→P2 pipeline executed
end-to-end with zero muda.

Header v2.80.0 → v2.81.0. No coverage flip — landmark only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the docs/spec-36-plain-status branch from b364e07 to ca2f352 Compare April 28, 2026 07:53
@noahgift noahgift merged commit aae9c04 into main Apr 28, 2026
10 checks passed
@noahgift noahgift deleted the docs/spec-36-plain-status branch April 28, 2026 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant