Add depth recurrence + SwiGLU submission (Apple M3 8GB) by iranzithierry · Pull Request #8 · openai/parameter-golf

iranzithierry · 2026-03-18T19:11:39Z

Summary

Non-record submission exploring depth recurrence (weight sharing) and SwiGLU MLPs
4 unique transformer blocks reused 3× = 12 effective layers at 640 dim
SwiGLU MLP replaces relu² for better parameter efficiency
Per-recurrence learnable gate scalars for specialization
Trained on Apple M3 with 8GB RAM (hardware-limited, results are directional)

Changes

records/track_non_record_16mb/2026-03-18_M3_DepthRecurrence_SwiGLU/train_gpt_mlx.py
records/track_non_record_16mb/2026-03-18_M3_DepthRecurrence_SwiGLU/submission.json
records/track_non_record_16mb/2026-03-18_M3_DepthRecurrence_SwiGLU/README.md

Limitations

Trained on consumer hardware (M3/8GB) — score reflects hardware constraints, not the approach's ceiling. The same script can be run on 8×H100 for competitive results.

Add a non-record leaderboard submission exploring depth recurrence and SwiGLU MLPs trained on an Apple M3 (8GB). Includes README with architecture/hyperparameter notes, submission.json metadata, and a full training script (train_gpt_mlx.py) implementing: 4 unique transformer blocks reused 3× (12 effective layers), SwiGLU MLP, wider 640-dim model, per-recurrence gates, U-Net skips, gradient clipping, split optimizers (Muon + Adam), token streaming, and int8+zlib quantization/roundtrip. Notes limitations from hardware and guidance to run on larger hardware for competitive results.

0hq · 2026-03-18T19:36:31Z

You need a train.log and a val_bpb for a non-record submission!

PR openai#672 maxes TTT at 30 epochs (590s/600s eval budget), so all future improvements must be orthogonal to TTT. This update: - Sets 1.0781 BPB (PR openai#672) as the new target to beat - Reorders Top 8 directions: XSA-all confirmed at #1, Full GPTQ #2, SwiGLU #3, Muon-VS #4, aggressive quant #5, MASA openai#6, depth recurrence openai#7 with int6 risk warning, AdEMAMix openai#8 - Deprioritizes TTT-related directions already exploited by PR openai#672 - Collapses ~1000 lines of stale Round 0-3.9 session logs into a concise historical summary - Removes resolved blockers (flash_attn, SSH hangs, local runtime) - Adds fresh Round 1 section with 5 submitted experiments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

iranzithierry added 2 commits March 18, 2026 21:03

Merge branch 'main' of https://github.com/iranzithierry/parameter-golf

0484ec9

0hq added the not ready for review label Mar 19, 2026

0hq closed this Mar 19, 2026

gb250e referenced this pull request in gb250e/parameter-golf Mar 21, 2026

docs: add PR #8 update summary for TPI-009

d794614

Christopher-Lee-McClendon mentioned this pull request Mar 23, 2026

Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) #461

Open

AnirudhaRamesh mentioned this pull request Mar 24, 2026

Autoresearch over a Collective + Distributed Candidate Pool — M4 Air, val_bpb=1.9263 (non-record) #597

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add depth recurrence + SwiGLU submission (Apple M3 8GB)#8

Add depth recurrence + SwiGLU submission (Apple M3 8GB)#8
iranzithierry wants to merge 2 commits intoopenai:mainfrom
iranzithierry:main

iranzithierry commented Mar 18, 2026

Uh oh!

0hq commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iranzithierry commented Mar 18, 2026

Summary

Changes

Limitations

Uh oh!

0hq commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants