The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB, 15.19MB) by newjordan · Pull Request #498 · openai/parameter-golf

newjordan · 2026-03-23T04:11:34Z

Summary

Non-record submission exploring recursive weight sharing — a novel approach where 6 unique transformer blocks are looped 2x each, providing 12 effective layers of depth with only 6 blocks of stored parameters. The freed parameter budget enables MLP 4x expansion, which is the primary quality driver.

val_bpb: 1.1478 (sliding window stride=64) | 15.19 MB | 8xH100 SXM, 600s
28.2M params, 4,396 steps at 136.5ms/step
Full pipeline: Muon + SWA + Late QAT + Training Replay + Self-Distillation + EMA

Key Insight

MLP 4x gives ~2% relative BPB improvement over MLP 3x, but doesn't fit in 16MB with 12 unique layers. Recursive weight sharing (6 unique x 2 loops) fits it in 15.19 MB. The weight sharing is the compression technique; the MLP 4x is the quality lever.

Architecture

6 unique blocks x 2 loops = 12 effective depth
dim=640, 10 heads, 5 KV (GQA), head_dim=64
MLP 4x (hidden=2560), relu-squared
Orthogonal loop positions, U-Net skips, SmearGate, BigramHash, VE128, XSA last 2

Results

Metric	Value
Sliding window val_bpb (stride=64)	1.1478
Pre-quant (post-EMA)	1.1572
Post-quant roundtrip	1.1716
Artifact	15,192,793 bytes (15.19 MB)
Steps	4,396 in 600s
Params	28,224,320

No TTT on Validation Data

All training uses training data only. Late replay buffers training batches. Self-distillation uses EMA teacher on training data.

Test plan

…AttnRes

… gravity needs more steps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

newjordan · 2026-03-23T04:17:08Z

Closing to clean up — resubmitting with only submission files.

Octavian and others added 3 commits March 18, 2026 18:06

docs: fractal transformer research plan — weight sharing + gravity + …

6e503d9

…AttnRes

results: first local ladder — fractal 3x3 beats baseline by 7.1% BPB,…

73271f3

… gravity needs more steps

The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB)

497e08b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

newjordan closed this Mar 23, 2026

newjordan deleted the submission/frugendorff branch March 23, 2026 04:17

This was referenced Mar 25, 2026

Podracing: 1.0461 BPB (3-seed mean) #674

Closed

Podracing: 1.0461 BPB (3-seed mean) — 5-gram eval + LeakyReLU² #706

Open

Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964) #753

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB, 15.19MB)#498

The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB, 15.19MB)#498
newjordan wants to merge 3 commits intoopenai:mainfrom
newjordan:submission/frugendorff

newjordan commented Mar 23, 2026 •

edited

Loading

Uh oh!

newjordan commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

newjordan commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Insight

Architecture

Results

No TTT on Validation Data

Test plan

Uh oh!

newjordan commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

newjordan commented Mar 23, 2026 •

edited

Loading