Skip to content

Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179)#545

Closed
EthanYangTW wants to merge 4 commits intoopenai:mainfrom
EthanYangTW:submission/int5-gptq-33m
Closed

Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179)#545
EthanYangTW wants to merge 4 commits intoopenai:mainfrom
EthanYangTW:submission/int5-gptq-33m

Conversation

@EthanYangTW
Copy link
Copy Markdown

Summary

33.6M parameter model quantized to int5 with GPTQ error compensation, fitting under 16MB. First submission to achieve int5 quantization on a 33.6M model within the artifact size limit.

Architecture: 11L, 512d, MHA 8/8, MLP 3.5x (1792), BigramHash 8192, XSA all layers
Quantization: int5 per-row GPTQ (clip_range=15) + Early QAT (threshold 0.5) + EMA 0.997
TTT: Legal score-first AdamW, chunk=131072, last 2 blocks unfrozen

Results

Seed Sliding BPB TTT BPB Artifact
1337 1.1244 1.1170 15.53 MB
42 1.1249 1.1182 15.36 MB
7 1.1250 1.1184 15.28 MB
Mean 1.1248 1.1179

Logs

See log files in logs/ directory:

  • Seed 1337: de843ef6-d0df-4872-bc96-cd4600614348.txt
  • Seed 42: b6560b60-85f0-4623-be20-ec366dd9e6fb.txt
  • Seed 7: c1c18644-a5a1-4db2-8641-31900ed8057f.txt

Reproduction

pip install --break-system-packages zstandard
pip install --break-system-packages flash-attn --no-build-isolation
python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 80

SEED=1337 PRUNE_PCT=0.02 TTT_EPOCHS=3 TTT_LR=0.0001 \
TTT_OPTIMIZER=adamw TTT_FREEZE_BLOCKS=2 TTT_CHUNK_TOKENS=131072 \
EVAL_STRIDE=32 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

33.6M params (MHA 8/8, BigramHash 8192, MLP 3.5x) quantized to int5
with GPTQ error compensation. Artifact fits under 16MB (15.3-15.5MB).

Seeds: 1337 (1.1170), 42 (1.1182), 7 (1.1184)
Seed 1337: de843ef6 (TTT 1.1170)
Seed 42: b6560b60 (TTT 1.1182)
Seed 7: c1c18644 (TTT 1.1184)
Copilot AI review requested due to automatic review settings March 23, 2026 15:58
seed1337.log - TTT 1.1170
seed42.log - TTT 1.1182
seed7.log - TTT 1.1184
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@cocohearts
Copy link
Copy Markdown
Collaborator

cocohearts commented Mar 23, 2026

dont open a pr until ur submission is ready pls; mark as draft until ready for submission
train logs are great but dont change train_gpt.py pls put everything in a self contained folder following other submissions
will merge once this is done

Add proper /records submission with submission.json, README, train_gpt.py, and 3-seed logs.
@EthanYangTW
Copy link
Copy Markdown
Author

Closing in favor of properly formatted /records submission.

@EthanYangTW
Copy link
Copy Markdown
Author

@cocohearts

dont open a pr until ur submission is ready pls; mark as draft until ready for submission train logs are great but dont change train_gpt.py pls put everything in a self contained folder following other submissions will merge once this is done

sorry for the inconvience,i have updated the new one and closed this

RoyiRa added a commit to RoyiRa/parameter-golf that referenced this pull request Mar 25, 2026
- Replace relu().square() with leaky_relu(0.5).square() in MLP
  Expected: -0.0015 BPB (5 independent teams confirm)
- Switch TTT optimizer from SGD to AdamW(lr=0.0005, wd=0, betas=0.9/0.95)
  Expected: stronger TTT adaptation per openai#442/openai#503/openai#545
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants