Skip to content

The Stinky Frost Recipe — 1.1725 BPB#190

Closed
newjordan wants to merge 1 commit intoopenai:mainfrom
newjordan:submission/stinky-frost-recipe
Closed

The Stinky Frost Recipe — 1.1725 BPB#190
newjordan wants to merge 1 commit intoopenai:mainfrom
newjordan:submission/stinky-frost-recipe

Conversation

@newjordan
Copy link
Copy Markdown

Summary

  • val_bpb: 1.1725 | 15.58MB artifact | 8xH100 600s
  • Int6 QAT (25%) + FP16 tied embeddings + MLP hidden=1344
  • SmearGate + BigramHash (4096×128) + OrthoInit
  • Muon WD=0.01 + sliding window eval stride=64

Techniques

Technique Impact
FP16 tied embeddings ~0.1 BPB — preserves token distinguishability through int6 quant
MLP hidden=1344 Custom size to fit FP16 embed under 16MB
SmearGate Learned bigram blending on embeddings (512 params)
BigramHash Hash token pairs into learned embeddings (~590K params)
OrthoInit Orthogonal init for all large linear layers
Early QAT 25% ~6000 steps of quantization-aware training
Stride=64 eval Overlapping context windows, ~0.09 BPB improvement

Reproduction

QUANT_BITS=6 QAT_START_FRAC=0.25 EVAL_STRIDE=64 MUON_WD=0.01 FP16_EMBED=1 SMEAR_GATE=1 BIGRAM_HASH=1 ORTHO_INIT=1 MLP_HIDDEN=1344 RUN_ID=stinky_frost NCCL_IB_DISABLE=1 torchrun --standalone --nproc_per_node=8 train_gpt.py

Test plan

  • Verify reproduction on 8xH100 within 600s wallclock
  • Confirm artifact size < 16,000,000 bytes
  • Validate BPB on FineWeb validation set

Int6 QAT + FP16 embed + SmearGate + BigramHash + OrthoInit + MuonWD
+ sliding window stride=64. MLP hidden=1344, 9L/512d.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan referenced this pull request in newjordan/parameter-golf Mar 20, 2026
Runs exact PR #190 config (1.1725 BPB) across 3 seeds (42, 137, 2026)
for self-validation. Also includes edge-finder script for future testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@newjordan newjordan closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant