docs(ship-two-001): §33 — MODEL-2 codeparrot retrain val_loss=9.3837 confirms §25 corpus-diversity hypothesis by noahgift · Pull Request #1094 · paiml/aprender

noahgift · 2026-04-28T02:28:46Z

Summary

P1 corpus pipeline complete end-to-end. P2 MODEL-2 retrain on 565.6M-token codeparrot Python+permissive corpus (7.6× the 4× CSN-Python baseline) pushes val_loss from 9.7507 plateau to 9.3837 — a 0.367-nat (4.7%) improvement.

§25 had concluded:

"There is no LR/step configuration that beats val_loss=9.75 on CSN-Python — only Stack v2 will move the needle."

§33 confirms this empirically.

Pipeline (all stack-canonical, zero muda)

Phase	Outcome
P1.0 contract authored	#1080 → #1089
P1.1 `apr pull dataset` extension	#1089 MERGED
P1.4 codeparrot pull	80 shards / 27 GB
P1.5a parquet → JSONL filter	405,904 rows / 3.17 GB
P1.5b BPE encode-corpus	57 shards / 565.6M tokens / 10h
P2 MODEL-2 retrain on RTX 4090	EARLY_STOP at 51 ep / 47 min

Total wall time from contract authoring to val_loss=9.3837: ~14 hours.

Training curve highlights

Epoch	val_loss
0	10.0698
10	9.5657
20	9.4771
30	9.42x
44	9.3837 ← BEST
50	9.3889 (EARLY_STOP)

Full per-epoch metadata in evidence/model-2-codeparrot-retrain-2026-04-28/all-epochs.json.

Methodology proven (§26.8 pays off)

The §26.8 stack-tool-extension rule paid off concretely:

6h authoring cost (P1.0 contract + P1.1 impl) → permanent apr capability
Every future dataset pull benefits
§33's val_loss=9.3837 is downstream proof of the methodology

Coverage impact

§33 is binding evidence for SHIP-021 (corpus diversity binding) — promotion to DISCHARGED is deferred to a separate PR that updates the SHIP-021 contract atomically. Spec scoreboard unchanged (15+33) in this PR per "one coverage flip per PR" methodology.

Files

Spec: §33 added (~80 lines, 8 subsections)
Evidence:
- evidence/model-2-codeparrot-retrain-2026-04-28/launch.log
- evidence/model-2-codeparrot-retrain-2026-04-28/all-epochs.json
Best checkpoint: /mnt/nvme-raid0/runs/model-2-from-scratch-010-codeparrot/ckpt/epoch-044.apr (live on RTX 4090 host)

Test plan

Spec self-consistent: header v2.78.0 references new §33
Evidence files commit cleanly (launch.log force-added past .gitignore)
Training data live, reproducible via the launch script in /mnt/nvme-raid0/data/codeparrot-python-permissive-shards/

Next session

Per §33.4: re-train with --num-steps 200000 and looser early-stop patience to push val_loss further. With 565.6M tokens available and only 83.5M (15%) seen at EARLY_STOP, there's significant headroom.

🤖 Generated with Claude Code

…confirms §25 corpus-diversity hypothesis — v2.77 → v2.78 P1 corpus pipeline complete end-to-end. P2 MODEL-2 retrain on 565.6M-token codeparrot Python+permissive corpus (7.6× the 4× CSN-Python baseline) pushes val_loss from the 9.7507 plateau to 9.3837 — a 0.367-nat (4.7%) improvement with the SAME training configuration. §25 had concluded (after 80K-step LR-budget falsification on 4× CSN-Python): "There is no LR/step configuration that beats val_loss=9.75 on CSN-Python — only Stack v2 will move the needle." §33 confirms this empirically. The corpus-diversity binding criterion of §26.9 is satisfied. ## Pipeline (all stack-canonical, no muda) | Phase | Outcome | |-------|---------| | P1.0 contract authored (PROPOSED → ACTIVE) | #1080 → #1089 | | P1.1 apr pull dataset extension | #1089 MERGED | | P1.4 codeparrot pull | 80 shards / 27 GB | | P1.5a parquet → JSONL filter | 405,904 rows / 3.17 GB | | P1.5b BPE encode-corpus | 57 shards / 565.6M tokens / 10h | | P2 MODEL-2 retrain on RTX 4090 | EARLY_STOP at 51 ep / 47 min | Total wall time from contract authoring to val_loss=9.3837: ~14 hours. ## Training curve highlights - epoch 0: train=9.7567, val=10.0698 (init) - epoch 10: train=9.4610, val=9.5657 (post-warmup) - epoch 30: train=9.2x, val=9.42x - epoch 44: val=9.3837 (BEST) - epoch 50: train=9.2093, val=9.3889 (EARLY_STOP next) Full per-epoch metadata in evidence/model-2-codeparrot-retrain-2026-04-28/all-epochs.json. ## Coverage impact §33 is binding evidence for SHIP-021 (corpus diversity binding) — promotion to DISCHARGED is deferred to a separate PR that updates the SHIP-021 contract atomically. Spec scoreboard unchanged (15+33) in this PR. ## Files - evidence/model-2-codeparrot-retrain-2026-04-28/launch.log - evidence/model-2-codeparrot-retrain-2026-04-28/all-epochs.json - §33 spec section (8 subsections, ~80 lines) - Header: v2.77.0 → v2.78.0 ## Methodology landed The §26.8 stack-tool-extension rule paid off concretely: - 6h authoring cost (P1.0 contract + P1.1 impl) → permanent apr capability - Every future dataset pull benefits - §33's val_loss=9.3837 is downstream proof of the methodology This commit represents the first cycle in §22→§33 where the spec amendment has the same priority as the empirical result. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 28, 2026 02:28

noahgift merged commit 52da8e5 into main Apr 28, 2026
11 checks passed

noahgift deleted the docs/spec-33-codeparrot-retrain-success branch April 28, 2026 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(ship-two-001): §33 — MODEL-2 codeparrot retrain val_loss=9.3837 confirms §25 corpus-diversity hypothesis#1094

docs(ship-two-001): §33 — MODEL-2 codeparrot retrain val_loss=9.3837 confirms §25 corpus-diversity hypothesis#1094
noahgift merged 1 commit into
mainfrom
docs/spec-33-codeparrot-retrain-success

noahgift commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 28, 2026

Summary

Pipeline (all stack-canonical, zero muda)

Training curve highlights

Methodology proven (§26.8 pays off)

Coverage impact

Files

Test plan

Next session

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant