Record: 11L Adaptive Markov + Int6 Mixed Quant (1.2174 bpb) by Jayteare · Pull Request #1046 · openai/parameter-golf

Jayteare · 2026-03-29T00:21:39Z

Summary

Score: 1.2174 val_bpb (int6+zstd roundtrip)
Pre-quant: 1.2078 val_bpb
Artifact size: 15,107,918 bytes (under 16MB)
Hardware: 8xH100, 600s wallclock, 7427 steps

Approach

11-layer GPT with adaptive Markov mixing: a unigram transition table (1024x1024) is combined with transformer logits through a learned per-position gate with confidence-based thresholding. The gate uses the top-2 Markov logit gap to suppress the Markov contribution when the transformer is confident.

Mixed int6/int8 quantization with zstd-22 compression:

MLP and attention weights: int6 per-row ([-32, 31] stored as int8)
Embeddings and Markov table: int8 per-row
Control tensors: fp16 passthrough

786K token batch for higher data throughput per step (5.84B tokens total).

Key Details

11 layers, dim=512, 8 heads, 4 KV heads (GQA), tied embeddings, relu² MLP
MARKOV_LR=0.008, MIX_INIT=0.05, GATE_THRESHOLD=0.20, GATE_TEMP=0.03
No QAT, no EMA (both found harmful at this step count)
zstandard package required for int6+zstd export

Files

README.md — detailed writeup
submission.json — metadata
train.log — full 8xH100 training log
train_gpt.py — training script

11-layer GPT with adaptive Markov mixing and mixed int6/int8 quantization with zstd-22 compression. 786K token batch on 8xH100. 15.1MB artifact, 7427 steps in 600s.

Jayteare force-pushed the pr/11L-adaptive-markov branch 3 times, most recently from d561d88 to b2265d5 Compare March 29, 2026 01:23

Record: 11L Adaptive Markov + Int6 Mixed Quant (1.2174 bpb)

0d5fa51

11-layer GPT with adaptive Markov mixing and mixed int6/int8 quantization with zstd-22 compression. 786K token batch on 8xH100. 15.1MB artifact, 7427 steps in 600s.

Jayteare force-pushed the pr/11L-adaptive-markov branch from b2265d5 to 0d5fa51 Compare March 29, 2026 01:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L Adaptive Markov + Int6 Mixed Quant (1.2174 bpb)#1046

Record: 11L Adaptive Markov + Int6 Mixed Quant (1.2174 bpb)#1046
Jayteare wants to merge 1 commit intoopenai:mainfrom
Jayteare:pr/11L-adaptive-markov

Jayteare commented Mar 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jayteare commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach

Key Details

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jayteare commented Mar 29, 2026 •

edited

Loading