Skip to content

Record: val_bpb: 1.14020 [tested 3x on 8xh100]#267

Open
andrewgcodes wants to merge 17 commits intoopenai:mainfrom
andrewgcodes:devin/1774040790-causal-ttt-submission
Open

Record: val_bpb: 1.14020 [tested 3x on 8xh100]#267
andrewgcodes wants to merge 17 commits intoopenai:mainfrom
andrewgcodes:devin/1774040790-causal-ttt-submission

Conversation

@andrewgcodes
Copy link
Copy Markdown

@andrewgcodes andrewgcodes commented Mar 20, 2026

Flagging that this is doing TTT during Val but compliantly. @0hq

I believe these make it allowed:

  1. No training before evaluation: Each chunk is evaluated first, loss is recorded, then training occurs
  2. No re-evaluation: Tokens are scored exactly once; training on chunk N cannot affect scores for chunks 0..N
  3. No multiple passes: The validation set is processed in a single sequential pass (32 chunks)

@andrewgcodes andrewgcodes changed the title Record: val_bpb: 1.14020 Record: val_bpb: 1.14020 [tested 3x on 8xh100] Mar 20, 2026
romainsantoli-web pushed a commit to romainsantoli-web/parameter-golf that referenced this pull request Mar 21, 2026
…its)

Combines techniques from PR openai#162, openai#180, openai#267, openai#281:
- 11-layer GPT with U-Net skip connections, GQA
- SmearGate + BigramHash(10240)
- Mixed int5/int6 quantization + 3% magnitude pruning
- Causal TTT at eval time
- SWA(frac=0.4), WD=0.042, Z-loss
- Target: sub-1.135 val_bpb

Awaiting RunPod 8xH100 credits for 3-seed validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant