Skip to content

The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB, 15.19MB)#498

Closed
newjordan wants to merge 3 commits intoopenai:mainfrom
newjordan:submission/frugendorff
Closed

The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB, 15.19MB)#498
newjordan wants to merge 3 commits intoopenai:mainfrom
newjordan:submission/frugendorff

Conversation

@newjordan
Copy link
Copy Markdown

@newjordan newjordan commented Mar 23, 2026

frugendorff

Summary

Non-record submission exploring recursive weight sharing — a novel approach where 6 unique transformer blocks are looped 2x each, providing 12 effective layers of depth with only 6 blocks of stored parameters. The freed parameter budget enables MLP 4x expansion, which is the primary quality driver.

  • val_bpb: 1.1478 (sliding window stride=64) | 15.19 MB | 8xH100 SXM, 600s
  • 28.2M params, 4,396 steps at 136.5ms/step
  • Full pipeline: Muon + SWA + Late QAT + Training Replay + Self-Distillation + EMA

Key Insight

MLP 4x gives ~2% relative BPB improvement over MLP 3x, but doesn't fit in 16MB with 12 unique layers. Recursive weight sharing (6 unique x 2 loops) fits it in 15.19 MB. The weight sharing is the compression technique; the MLP 4x is the quality lever.

Architecture

  • 6 unique blocks x 2 loops = 12 effective depth
  • dim=640, 10 heads, 5 KV (GQA), head_dim=64
  • MLP 4x (hidden=2560), relu-squared
  • Orthogonal loop positions, U-Net skips, SmearGate, BigramHash, VE128, XSA last 2

Results

Metric Value
Sliding window val_bpb (stride=64) 1.1478
Pre-quant (post-EMA) 1.1572
Post-quant roundtrip 1.1716
Artifact 15,192,793 bytes (15.19 MB)
Steps 4,396 in 600s
Params 28,224,320

No TTT on Validation Data

All training uses training data only. Late replay buffers training batches. Self-distillation uses EMA teacher on training data.

Test plan

  • 8xH100 SXM, 600s
  • Artifact under 16MB (15.19 MB)
  • No TTT on validation data
  • Post-quant int6 roundtrip verified
  • Sliding window eval (stride=64)

@newjordan
Copy link
Copy Markdown
Author

Closing to clean up — resubmitting with only submission files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant