Skip to content

Record: 11L Int6+Zstd MLP3x SmearGate BigramHash OrthoInit MuonWD EMA (mean val_bpb=1.1497)#362

Closed
mkenney2 wants to merge 1 commit intoopenai:mainfrom
mkenney2:main
Closed

Record: 11L Int6+Zstd MLP3x SmearGate BigramHash OrthoInit MuonWD EMA (mean val_bpb=1.1497)#362
mkenney2 wants to merge 1 commit intoopenai:mainfrom
mkenney2:main

Conversation

@mkenney2
Copy link
Copy Markdown

Summary

  • 11L 512d MLP3x with int6+zstd-22, SmearGate, BigramHash, OrthoInit, Muon WD=0.02, EMA(0.997)
  • FP16 tied embeddings, seq2048, sliding window eval stride=256
  • Mean val_bpb: 1.1497 ± 0.0004 (3 seeds: 1337, 42, 7)
  • Artifact: ~14.8MB

Test plan

  • 3-seed validation on 8xH100 SXM (all under 16MB, all under 600s)
  • train.log included (seed 7, val_bpb=1.1495)
  • Extensive ablation documented in README (AttnRes, depth recurrence, seq-len curriculum, TTT)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant