[WIP] Optimized Muon/Architecture research by @NOPIMPOSSSIBLEWHY by NOPIMPOSSSIBLEWHY · Pull Request #4 · openai/parameter-golf

NOPIMPOSSSIBLEWHY · 2026-03-18T18:07:54Z

Research starting on local MLX (Mac M3). Benchmarking architectures for the 16MB limit using Muon and muP.

Novel techniques from the top 2 leaderboard entries: 1. BigramHash (BIGRAM_BUCKETS=4096, BIGRAM_DIM=128): - Hash consecutive token pairs → embedding lookup → project to model_dim - XOR with coprime multipliers for hash function - Captures local bigram context (~524K params for 4096 buckets) - Used by openai#1 (thwu1, 1.1428 BPB) and openai#2 (Raahil Shah, 1.1458 BPB) 2. SmearGate (SMEAR_GATE=1): - Learned per-dim gate blending current token with previous token - Applied after embedding normalization - Only ~512 params - Used by openai#2 and openai#4 Both are env-var controlled (0=disabled by default). run_v7_full.sh enables everything for the full stack. Also fixed: BigramHash/SmearGate params added to optimizer groups. 1438 lines (62 under 1500 limit). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR openai#672 maxes TTT at 30 epochs (590s/600s eval budget), so all future improvements must be orthogonal to TTT. This update: - Sets 1.0781 BPB (PR openai#672) as the new target to beat - Reorders Top 8 directions: XSA-all confirmed at #1, Full GPTQ #2, SwiGLU #3, Muon-VS #4, aggressive quant openai#5, MASA openai#6, depth recurrence openai#7 with int6 risk warning, AdEMAMix openai#8 - Deprioritizes TTT-related directions already exploited by PR openai#672 - Collapses ~1000 lines of stale Round 0-3.9 session logs into a concise historical summary - Removes resolved blockers (flash_attn, SSH hangs, local runtime) - Adds fresh Round 1 section with 5 submitted experiments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update README.md

70c126b

0hq added the not ready for review label Mar 19, 2026

0hq marked this pull request as draft March 19, 2026 16:57

0hq closed this Mar 19, 2026

dexhunter mentioned this pull request Mar 20, 2026

Community Tool: Parameter Golf Leaderboard Monitor (CLI + Claude Code Skill) #158

Closed

notapplica mentioned this pull request Mar 21, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

gb250e referenced this pull request in gb250e/parameter-golf Mar 21, 2026

docs: add PR #4 summary placeholder

b2f8261

FlashyFlash3011 mentioned this pull request Mar 25, 2026

Record: GatedAttn + ValueResidual + Full QAT + lzma-9 + BigramHash(2048) #347

Open

MVPandey mentioned this pull request Mar 26, 2026

[Research Non-Record] Pure raw-byte JEPA negative result #906

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Optimized Muon/Architecture research by @NOPIMPOSSSIBLEWHY#4

[WIP] Optimized Muon/Architecture research by @NOPIMPOSSSIBLEWHY#4
NOPIMPOSSSIBLEWHY wants to merge 1 commit intoopenai:mainfrom
NOPIMPOSSSIBLEWHY:main

NOPIMPOSSSIBLEWHY commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NOPIMPOSSSIBLEWHY commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants