Skip to content

SOTA attempt (val_bpb=1.2064)#49

Merged
0hq merged 2 commits intoopenai:mainfrom
spokane-way:main
Mar 19, 2026
Merged

SOTA attempt (val_bpb=1.2064)#49
0hq merged 2 commits intoopenai:mainfrom
spokane-way:main

Conversation

@spokane-way
Copy link
Copy Markdown
Contributor

@spokane-way spokane-way commented Mar 19, 2026

  • SEED=1337: 1.20576485
  • SEED=1338: 1.2061746
  • SEED=1339: 1.20715923
  • Sample mean across the three runs: 1.20636623

@spokane-way spokane-way changed the title SOTA attempt (val_bpb=1.2166) SOTA attempt (val_bpb=1.20637) Mar 19, 2026
@spokane-way spokane-way changed the title SOTA attempt (val_bpb=1.20637) SOTA attempt (val_bpb=1.2064) Mar 19, 2026
@0hq 0hq closed this Mar 19, 2026
@0hq 0hq reopened this Mar 19, 2026
@0hq
Copy link
Copy Markdown
Contributor

0hq commented Mar 19, 2026

Great, thanks!

@0hq 0hq merged commit e89fcf8 into openai:main Mar 19, 2026
maxivione pushed a commit to maxivione/parameter-golf that referenced this pull request Mar 20, 2026
* SOTA attempt

* Improve score on SXM

---------

Co-authored-by: spokane-way <spokane@way>
scottspace pushed a commit to scottspace/parameter-golf that referenced this pull request Mar 21, 2026
* SOTA attempt

* Improve score on SXM

---------

Co-authored-by: spokane-way <spokane@way>
nedcut pushed a commit to nedcut/parameter-golf that referenced this pull request Mar 26, 2026
* SOTA attempt

* Improve score on SXM

---------

Co-authored-by: spokane-way <spokane@way>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…ity plateau confirmed

Patches 15/16/21 still uncontested in 150+ open + 10 closed PRs (6 consecutive
audits). PR openai#1430 stable OPEN, 0 comments, no comp owner activity for 16h.

After 13 research fires and 6 audits, the picture is clear: training-time
tweaks are exhausted at our 22M/1500-step scale. All 4 post-fire-9 ports
(Mousse/MuonEq-R/Depth Recurrence/QK_GAIN=5.0) are neutral within the
champion noise band. The "neutrality plateau" at 3.27-3.30 is the empirical
ceiling for training-time changes at our compute budget.

Best remaining moves (in expected value order):
1. H100 escalation of CHAMP_L4_seed42+EL stack with EMA+Tilt+INT6 GPTQ bundle
2. Coprime stride implementation (task openai#58) — only data-side direction
3. BPE-8192 ngram tables build (task openai#49) — enables tokenizer A/B

Spend ~$3.55/$36 (10% utilization). Pod healthy at 7h uptime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…ker identified

First tokenizer-side fire (0/24 patches in this category). Subagent found 3
candidates (BPE-Dropout, Complementary Weighting, Three-Tier Classification)
but ALL are blocked by our pre-tokenized .bin file pipeline.

BPE-Dropout requires live re-tokenization at training time → infeasible.
Complementary Weighting subagent incorrectly cited our MLX prototype, not
the H100 train_gpt.py. Three-Tier is PR openai#1402 pending validation.

Architectural insight: SP1024 may actually be optimal for our 22M architecture
(smaller embedding = more params for model body). Top PRs use SP8192 because
their depth-recurrence stack benefits from finer tokens. We may not need
BPE-8192. Task openai#49 deferred indefinitely.

Cross-domain coverage update (16 fires):
  training: 5, optimizer: 2, eval: 3, compression: 1, data: 2, tokenizer: 1,
  hardware: 0. Hardware still uncovered.

Per user instruction: queued, not shipped. No code patches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants