Skip to content

Record: Batch-Optimized 524K + Warmdown 4000 (val_bpb 1.1497)#364

Open
shikhar1729 wants to merge 2 commits intoopenai:mainfrom
shikhar1729:submission/batch-opt-524k-wd4000
Open

Record: Batch-Optimized 524K + Warmdown 4000 (val_bpb 1.1497)#364
shikhar1729 wants to merge 2 commits intoopenai:mainfrom
shikhar1729:submission/batch-opt-524k-wd4000

Conversation

@shikhar1729
Copy link
Copy Markdown

Summary

  • Non-record submission building on Update README.md little things #1 entry (thwu1)
  • Two hyperparameter changes, no code changes: TRAIN_BATCH_TOKENS=524288 and WARMDOWN_ITERS=4000
  • Smaller batch yields more optimizer steps per wall-clock minute; longer warmdown retuned to match

Results

Seed val_bpb Steps Artifact
1337 1.14971 7,361 15.93MB
42 1.14924 7,248 15.77MB
7 1.15016 7,269 15.79MB
Mean 1.14970

Test plan

  • 3 seeds with p < 0.01
  • All artifacts under 16MB
  • Runs in under 10 min on 8xH100 SXM
  • train_gpt.py compiles and runs from records folder

Non-record submission building on openai#1 entry (thwu1). Two hyperparameter
changes: TRAIN_BATCH_TOKENS=524288 and WARMDOWN_ITERS=4000.

Mean val_bpb: 1.1497 (3 seeds: 1.1497, 1.1492, 1.1502, std=0.0005)
All artifacts under 16MB. 8xH100 SXM, PyTorch 2.9.1.
Full-weight SGD test-time training on validation data (15 epochs,
lr=0.005) + batch=524K + warmdown=4000. Mean val_bpb 1.1433 across
3 seeds (42: 1.1428, 7: 1.1444, 2024: 1.1427). Ties current SOTA.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant