Skip to content

Record: 11L Adaptive Markov + Int6 Mixed Quant (1.2174 bpb)#1046

Open
Jayteare wants to merge 1 commit intoopenai:mainfrom
Jayteare:pr/11L-adaptive-markov
Open

Record: 11L Adaptive Markov + Int6 Mixed Quant (1.2174 bpb)#1046
Jayteare wants to merge 1 commit intoopenai:mainfrom
Jayteare:pr/11L-adaptive-markov

Conversation

@Jayteare
Copy link
Copy Markdown

@Jayteare Jayteare commented Mar 29, 2026

Summary

  • Score: 1.2174 val_bpb (int6+zstd roundtrip)
  • Pre-quant: 1.2078 val_bpb
  • Artifact size: 15,107,918 bytes (under 16MB)
  • Hardware: 8xH100, 600s wallclock, 7427 steps

Approach

11-layer GPT with adaptive Markov mixing: a unigram transition table (1024x1024) is combined with transformer logits through a learned per-position gate with confidence-based thresholding. The gate uses the top-2 Markov logit gap to suppress the Markov contribution when the transformer is confident.

Mixed int6/int8 quantization with zstd-22 compression:

  • MLP and attention weights: int6 per-row ([-32, 31] stored as int8)
  • Embeddings and Markov table: int8 per-row
  • Control tensors: fp16 passthrough

786K token batch for higher data throughput per step (5.84B tokens total).

Key Details

  • 11 layers, dim=512, 8 heads, 4 KV heads (GQA), tied embeddings, relu² MLP
  • MARKOV_LR=0.008, MIX_INIT=0.05, GATE_THRESHOLD=0.20, GATE_TEMP=0.03
  • No QAT, no EMA (both found harmful at this step count)
  • zstandard package required for int6+zstd export

Files

  • README.md — detailed writeup
  • submission.json — metadata
  • train.log — full 8xH100 training log
  • train_gpt.py — training script

@Jayteare Jayteare force-pushed the pr/11L-adaptive-markov branch 3 times, most recently from d561d88 to b2265d5 Compare March 29, 2026 01:23
11-layer GPT with adaptive Markov mixing and mixed int6/int8
quantization with zstd-22 compression. 786K token batch on 8xH100.
15.1MB artifact, 7427 steps in 600s.
@Jayteare Jayteare force-pushed the pr/11L-adaptive-markov branch from b2265d5 to 0d5fa51 Compare March 29, 2026 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant