Skip to content

submission: Int6 MLP3x + QAT + SlidingWindow (val_bpb: 1.1702)#117

Open
trovatochris wants to merge 2 commits intoopenai:mainfrom
trovatochris:main
Open

submission: Int6 MLP3x + QAT + SlidingWindow (val_bpb: 1.1702)#117
trovatochris wants to merge 2 commits intoopenai:mainfrom
trovatochris:main

Conversation

@trovatochris
Copy link
Copy Markdown

Summary

Stacked int6 per-row quantization + zstd22 compression, 3x MLP expansion, QAT weight-snapping at 70% training, tuned Muon optimizer (momentum=0.99 with warmup), extended warmdown (3000 iters), and stride-64 sliding window evaluation.

Key Metrics

  • seed1337
  • final_sliding_window_eval_exact val_loss:1.97577705 val_bpb:1.17016796
  • Total submission size int6+zstd22: 15,306,777 bytes
  • Under 16MB: YES (693,223 bytes headroom)
  • Training steps: 9,540/20,000 (600s wallclock cap)
  • Average step time: 62.86ms
  • Model params: 21,778,504

Configuration

MLP_MULT=3 VAL_LOSS_EVERY=0 MATRIX_LR=0.02 SCALAR_LR=0.02 TIED_EMBED_LR=0.03 MUON_MOMENTUM=0.99 MUON_MOMENTUM_WARMUP_STEPS=1500 MUON_MOMENTUM_WARMUP_START=0.92 WARMDOWN_ITERS=3000 ENABLE_QAT=1 QAT_START_FRAC=0.7 USE_SLIDING_EVAL=1 EVAL_STRIDE=64

Command

torchrun --standalone --nproc_per_node=8 train_gpt.py

Requires pip install zstandard for zstd compression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant