Skip to content

11L 512d Int8+Zlib Baseline (val_bpb 1.2135, 3-seed)#858

Open
nickferrantelive wants to merge 99 commits intoopenai:mainfrom
nickferrantelive:submission/2026-03-26_20M_Int8Zlib_Baseline
Open

11L 512d Int8+Zlib Baseline (val_bpb 1.2135, 3-seed)#858
nickferrantelive wants to merge 99 commits intoopenai:mainfrom
nickferrantelive:submission/2026-03-26_20M_Int8Zlib_Baseline

Conversation

@nickferrantelive
Copy link
Copy Markdown

Record: 11L 512d Int8+Zlib Baseline

val_bpb: 1.2135 (3-seed mean) | 15.54 MB (mean) | 8xH100 SXM, 599s

Summary

Baseline train_gpt.py with NUM_LAYERS=11 (up from the default 9). All other hyperparameters are stock defaults. This submission demonstrates the baseline architecture properly scaled with additional depth on 8xH100 SXM hardware.

Changes from Naive Baseline

Change Baseline This Impact
Layers 9 11 +2 layers (20.7M vs ~17M params)
Everything else Default Default No other changes

Results (3 seeds, 8xH100 SXM)

Seed Steps val_loss val_bpb Artifact
1337 11,181 2.0484 1.2132 15.54 MB
42 11,185 2.0490 1.2135 15.54 MB
2025 11,182 2.0493 1.2137 15.54 MB

Mean: 1.2135 | Std: 0.0003

Architecture

  • 11 transformer layers, 512-dim, 8 heads (4 KV heads, GQA)
  • 2x MLP expansion (1024 hidden)
  • U-Net skip connections (5 encoder, 6 decoder)
  • Tied embeddings, logit softcap=30.0
  • Vocab size 1024 (SentencePiece BPE)
  • Muon optimizer, int8+zlib quantization
  • Total artifact: 15,541,950 bytes (well under 16MB cap)

Run Command

NUM_LAYERS=11 SEED=1337 \
DATA_PATH=./data/datasets/fineweb10B_sp1024/ \
TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model \
VOCAB_SIZE=1024 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

Notes

This is a non-SOTA submission demonstrating baseline scaling. We have several novel techniques in development (TTT, GPTQ-lite, SmearGate, BigramHash, PolarQuant, hybrid RNN-attention architectures) that we plan to submit as improved records.

Checklist

  • README.md with detailed explanation
  • submission.json with metadata
  • Training logs for 3 seeds
  • train_gpt.py (stock baseline, compiles and runs)
  • All seeds under 16MB and under 10 minutes on 8xH100 SXM

Nick Ferrante added 30 commits March 24, 2026 22:58
…1; add step12 GPTQ-lite mixed int6/int8 + zstd quantization
Nick Ferrante added 29 commits March 26, 2026 00:36
…ted to 8xH100. M3 projects best (1.24), M1 Codec is natural A+B hybrid
…s): full component inventory, compatibility matrix, budget analysis, and 7 strategic questions for multi-LLM analysis
- 11 transformer layers (up from baseline 9), 512d, 8 heads, 4 KV heads
- U-Net skip connections, Muon optimizer, tied embeddings
- Int8 per-row quantization + zlib compression
- 3-seed verification: 1.2132, 1.2135, 1.2137 (std=0.0003)
- All seeds under 16MB (15.54MB), under 10min (599s) on 8xH100 SXM
nickferrantelive pushed a commit to nickferrantelive/parameter-golf that referenced this pull request Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant