11L 512d Int8+Zlib Baseline (val_bpb 1.2135, 3-seed)#858
Open
nickferrantelive wants to merge 99 commits intoopenai:mainfrom
Open
11L 512d Int8+Zlib Baseline (val_bpb 1.2135, 3-seed)#858nickferrantelive wants to merge 99 commits intoopenai:mainfrom
nickferrantelive wants to merge 99 commits intoopenai:mainfrom
Conversation
added 30 commits
March 24, 2026 22:58
…(1,rows/cols)^0.5)
…1; add step12 GPTQ-lite mixed int6/int8 + zstd quantization
added 29 commits
March 26, 2026 00:36
…removed (7x too slow). All 8 models done.
…g test also worse. Step 5 stays best.
…ted to 8xH100. M3 projects best (1.24), M1 Codec is natural A+B hybrid
…s): full component inventory, compatibility matrix, budget analysis, and 7 strategic questions for multi-LLM analysis
…(3 hybrid) = 9 models
…ompetition leaders)
…al expand on others) + ready for 768d
…ob features → NGramContextProj)
… enable_gqa try/except
…, entropy-adaptive alpha)
…n 3500 + grad clip 0.3
- 11 transformer layers (up from baseline 9), 512d, 8 heads, 4 KV heads - U-Net skip connections, Muon optimizer, tied embeddings - Int8 per-row quantization + zlib compression - 3-seed verification: 1.2132, 1.2135, 1.2137 (std=0.0003) - All seeds under 16MB (15.54MB), under 10min (599s) on 8xH100 SXM
nickferrantelive
pushed a commit
to nickferrantelive/parameter-golf
that referenced
this pull request
Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record: 11L 512d Int8+Zlib Baseline
val_bpb: 1.2135 (3-seed mean) | 15.54 MB (mean) | 8xH100 SXM, 599s
Summary
Baseline
train_gpt.pywithNUM_LAYERS=11(up from the default 9). All other hyperparameters are stock defaults. This submission demonstrates the baseline architecture properly scaled with additional depth on 8xH100 SXM hardware.Changes from Naive Baseline
Results (3 seeds, 8xH100 SXM)
Mean: 1.2135 | Std: 0.0003
Architecture
Run Command
Notes
This is a non-SOTA submission demonstrating baseline scaling. We have several novel techniques in development (TTT, GPTQ-lite, SmearGate, BigramHash, PolarQuant, hybrid RNN-attention architectures) that we plan to submit as improved records.
Checklist