Skip to content

XSA-11 + GPTQ b64/pd002 — 3-seed mean val_bpb 1.1208#587

Open
newjordan wants to merge 1 commit intoopenai:mainfrom
newjordan:submission/xsa11-clean
Open

XSA-11 + GPTQ b64/pd002 — 3-seed mean val_bpb 1.1208#587
newjordan wants to merge 1 commit intoopenai:mainfrom
newjordan:submission/xsa11-clean

Conversation

@newjordan
Copy link
Copy Markdown

Results

Seed Pre-TTT BPB TTT BPB Artifact
1337 1.1203 1.1204 15.56 MB
42 1.1211 1.1213 15.56 MB
7 1.1205 1.1206 15.64 MB
Mean 1.1206 1.1208

Architecture

11L/512d/8H/4KV, relu², XSA on all 11 layers, BigramHash 2048, VE128, EMA 0.997, tied embeddings.

Key changes vs prior submissions

  • XSA on all 11 layers (was 4): -0.0006 BPB from expanded cross-self attention
  • GPTQ block_size=64, percdamp=0.002: better compression from smaller quantization blocks and less Hessian damping, freeing space for XSA-11

Quantization

Int6 GPTQ with block_size=64, percdamp=0.002, 256-sample Hessian calibration, zstd-22 compression.

Reproduction

pip install sentencepiece numpy zstandard
SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 26, 2026
New best: leaky_relu_sq + XSA last 4 + TTT freeze_blocks=0
Beats PR openai#587 (1.1204) by 0.0009 BPB on seed 1337.
2/3 seeds confirmed, seed 7 pending.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 26, 2026
3-seed sweep complete:
  1337: 1.1195  |  42: 1.1200  |  2045: 1.1190
  Mean: 1.1195 (beats PR openai#587 mean of 1.1215 by 0.002)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant