Skip to content

docs(ship-two-001): §25 — §24.8 LR-budget hypothesis FALSIFIED — spec v2.68.0 → v2.69.0#1077

Merged
noahgift merged 1 commit into
mainfrom
feat/spec-25-lr-budget-falsified
Apr 27, 2026
Merged

docs(ship-two-001): §25 — §24.8 LR-budget hypothesis FALSIFIED — spec v2.68.0 → v2.69.0#1077
noahgift merged 1 commit into
mainfrom
feat/spec-25-lr-budget-falsified

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

§24.8 prescribed a falsifiable next step: apr pretrain --num-steps 80000 on the 4× corpus to test whether LR budget or corpus diversity is the binding constraint on val_loss. §25 records the clean falsification.

Result: 80K run early-stopped at epoch 10 / 22K steps with best val_loss=9.7507 at epoch 4 — functionally identical to the 20K run's 9.7513.

§24.8 outcome matrix (now decided)

Outcome Hypothesis Observed
val_loss < 8.911 LR-budget bound
val_loss plateau 9.5–9.7 only Stack v2 helps CONFIRMED 9.7507

Why this is a clean falsification

  1. Best-epoch invariance: both 20K and 80K runs hit best at epoch 4. The 20K cosine LR is 0.94×peak there; the 80K is 0.99×peak. Different LR, identical val_loss.
  2. Train-val gap = -0.010 at epoch 9: healthy generalization, no memorization onset.
  3. Patience consistency: 20K/50K/80K all show same plateau pattern at epoch 4.

Chinchilla scaling alignment

Corpus Tokens % of optimal for 370M val_loss floor
1× CSN 18.1M 0.24% 8.91 (mem-driven, see §24)
4× CSN 74.3M 1.00% 9.75 (true generalization)
Stack v2 Python ~5–10B 70–135% unknown — only this should hit 3.0

The 4× corpus is still 100× under-sized. There is no LR/step configuration that beats 9.75 on CSN-Python.

Method

  • 80K dispatch: PID 2277850, RTX 4090, 6636 MiB GPU memory
  • Early-stop fired at epoch 10 (5 non-improvement epochs from epoch 4)
  • ~1h32min wall (saved 4.5hr that wouldn't have changed conclusion)
  • Lambda-labs lane pre-authorized per feedback_compute_pre_authorized.md
  • Zero eprintln!, zero route-arounds

Closes the LR-budget question

§24.8's explicit falsifier executed and answered. The single remaining lever is corpus diversity → Stack v2 Python (multi-hour data-engineering task, deferred to user authorization).

Test plan

  • CI workspace-test passes
  • CI gate passes
  • Spec banner v2.69.0 reflects new §25
  • Evidence JSON validates (11 epoch metadatas + termination summary)

🤖 Generated with Claude Code

…oss=9.75 floor is corpus-diversity-bound — spec v2.68.0 → v2.69.0

§24.8 prescribed `apr pretrain --num-steps 80000` on the 4× corpus
to falsify whether LR budget or corpus diversity is the binding
constraint on val_loss. §25 records the clean falsification.

80K dispatch (PID 2277850, RTX 4090) early-stopped at epoch 10 /
22,000 steps (~1h32min wall) with best val_loss=9.7507 at epoch 4.
The 20K run's best was 9.7513 — delta = 6×10⁻⁴, within FP noise.

§24.8 specified two outcomes:
- val_loss < 8.911: LR-budget hypothesis confirmed
- val_loss plateau 9.5–9.7: only Stack v2 will help

The data show **plateau at 9.7507 = LR-budget hypothesis FALSIFIED**.
4× more cosine-decay LR budget did not move the needle.

Three independent signals confirm corpus saturation:
1. Best-epoch invariance (both runs hit best at epoch 4)
2. Train-val gap = -0.010 at epoch 9 (healthy generalization)
3. Patience-trigger consistency across 20K/50K/80K runs

Chinchilla scaling math:
| Corpus | Tokens | % of optimal | val_loss floor |
|--------|-------:|-------------:|---------------:|
| 1× CSN | 18.1M | 0.24% | 8.91 (mem-driven) |
| 4× CSN | 74.3M | 1.00% | 9.75 (true) |
| Stack v2 Python | ~5–10B | 70–135% | only this hits 3.0 |

§24.8's explicit falsifier executed and answered. There is no
LR/step configuration that beats 9.75 on CSN-Python; only Stack
v2 Python (multi-billion tokens) is the on-spec corpus path.

Methodology: zero eprintln!, zero route-arounds, early-stop
saved 4.5 hours of compute. Lambda-labs lane pre-authorized.

Spec v2.68.0 → v2.69.0. No coverage tally change.

Evidence: evidence/model-2-corpus-4x-2026-04-27/training-summary-80k.json

Run dir: /mnt/nvme-raid0/runs/model-2-from-scratch-010-4x-80k

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 27, 2026 05:34
@noahgift noahgift merged commit eea2475 into main Apr 27, 2026
11 checks passed
@noahgift noahgift deleted the feat/spec-25-lr-budget-falsified branch April 27, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant