Record: PR #1105 + window attn + mixed seq_len — 1.1084 bpb (3-seed mean) 1.1084 bpb by Gusanidas · Pull Request #1219 · openai/parameter-golf

Gusanidas · 2026-04-01T13:21:37Z

Based on PR #1105 (abaybektursun) with this changes:

Causal n-gram fix (within_hint/word_hint prefix-only)
Window attention (size=512) on layers 2,4,6,8,10 via FA3
Mixed seq_len training: 5 GPUs at 2048x36 + 3 GPUs at 6144x10
Train-data GPTQ calibration (14s vs 220s AR self-gen)
Auto eval_seq_len detection from max train seq_len
Sliding window eval at seq_len=6144, stride=128

3-seed results (sliding window bpb):
seed 1337: 1.1077
seed 42: 1.1083
seed 7: 1.1091
mean: 1.1084 (vs leader 1.1147)

It has plenty of room to be further optimized

Based on PR openai#1105 (abaybektursun) with improvements: - Window attention (size=512) on layers 2,4,6,8,10 via FA3 - Mixed seq_len training: 5 GPUs at 2048x36 + 3 GPUs at 6144x10 - Train-data GPTQ calibration (14s vs 220s AR self-gen) - Auto eval_seq_len detection from max train seq_len - Causal n-gram fix (within_hint/word_hint prefix-only) - Sliding window eval at seq_len=6144, stride=128 3-seed results (sliding window bpb): seed 1337: 1.1077 seed 42: 1.1083 seed 7: 1.1091 mean: 1.1084 (vs leader 1.1147) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: PR #1105 + window attn + mixed seq_len — 1.1084 bpb (3-seed mean) 1.1084 bpb#1219

Record: PR #1105 + window attn + mixed seq_len — 1.1084 bpb (3-seed mean) 1.1084 bpb#1219
Gusanidas wants to merge 1 commit intoopenai:mainfrom
Gusanidas:apr_1

Gusanidas commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gusanidas commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant