Skip to content

feat: recursive weight sharing for 16MB limit#15

Closed
ArthurKaroyan wants to merge 1 commit intoopenai:mainfrom
ArthurKaroyan:feat/recursive-transformer
Closed

feat: recursive weight sharing for 16MB limit#15
ArthurKaroyan wants to merge 1 commit intoopenai:mainfrom
ArthurKaroyan:feat/recursive-transformer

Conversation

@ArthurKaroyan
Copy link
Copy Markdown

No description provided.

Ueaj-Kerman added a commit to Ueaj-Kerman/parameter-golf that referenced this pull request Mar 19, 2026
Add entries openai#15-18 to experiment log covering three worktree experiments:
- GatedCausalConv (ssl): conv replacing first transformer block, best 1.2247 bpb
- NorMuon (normuon): per-row second moment normalization in Muon (code-only)
- SPlus (svdopt): SVD eigenbasis optimizer replacing Muon (code-only)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@0hq
Copy link
Copy Markdown
Collaborator

0hq commented Mar 19, 2026

Not a valid submission, resubmit with training log to prove efficacy.

@0hq 0hq closed this Mar 19, 2026
mrdavtan added a commit to mrdavtan/parameter-golf that referenced this pull request Mar 21, 2026
- Restored from qat-sliding-window branch (was never merged forward)
- Updated SWA: v2 result was +0.0004 (no effect), now superseded by EMA
- Updated Moonshot: added v2 flat-loops result (5.58), scale argument
- Added Finding openai#15: Int5 catastrophic (gap 15x worse than int6)
- Added Finding openai#16: optimizer bug (SmearGate + BigramHash frozen in all prior runs)
- Added Finding openai#17: 11L step-count trap (83ms/step = 40% fewer steps)
- Added Finding openai#18: FA2 positive for step time, no quality effect
- Added Findings openai#19-22: XSA, EMA, TTT, NTK-RoPE (implemented, results pending)
- Updated 'tested by others' section with our implementation status
- Added meta-lessons: optimizer coverage, layer cost, merge window strategy
gb250e referenced this pull request in gb250e/parameter-golf Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants