PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record) by MatoTeziTanka · Pull Request #95 · openai/parameter-golf

MatoTeziTanka · 2026-03-19T15:46:22Z

Summary

Baseline architecture + EMA weight averaging (26 lines added, zero architectural changes)
EMA smooths weight distributions, reducing INT8 quantization loss from 0.0072 to 0.0048 BPB
val_bpb: 1.1836 (3-seed mean, std 0.0005) on 8×H100 SXM
Total artifact: 15.88 MB (99.3% of 16MB budget)

3-Seed Results

Seed	val_bpb	Steps	Artifact
42	1.1836	11,876	15.88 MB
1337	1.1841	11,871	15.87 MB
2024	1.1831	11,875	15.88 MB
Mean	1.1836	—	std: 0.0005

Notable Non-Record — Documented Negative Results

This submission does not beat SOTA. Its value is the documented negative results:

INT4 post-training quantization fails catastrophically — roundtrip BPB goes from 1.44 to 3.73. Per-row, per-group (gs=64), and QAT with STE all fail. Root cause: quantization error compounds through layers (cosine similarity drops to 0.90 at 18 layers).
Shared-weight depth recurrence (LoopFormer) loses to more tokens at this training budget — 1-pass (9 effective layers, 6.5B tokens) beats 2-pass (18 effective layers, 3.6B tokens) by 0.019 BPB.
EMA reduces quantization gap from 0.0072 to 0.0048 BPB by smoothing weight distributions, but training loss improvement is marginal.

Details

Metric	Baseline	PROTEUS EMA
val_bpb (roundtrip)	1.2244	1.1836
INT8 quant loss	0.0072	0.0048

Full logs, submission.json, and train_gpt.py in /records/track_10min_16mb/2026-03-25_PROTEUS_EMA_Notable/

Built with PROTEUS by LightSpeedUp

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ade96bec4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-19T15:50:35Z

records/track_10min_16mb/2026-03-19_PROTEUS_EMA/train_gpt.py

+            opt.step()
+        zero_grad_all()
+
+        if args.ema_enabled and ema_flat is not None and step > 0 and step % args.ema_every == 0:


Prevent exporting an EMA buffer that was never updated

The EMA update guard in main() only runs on step > 0 and step % ema_every == 0, but step is incremented after that block; with runs where training ends before the first eligible step (for example ITERATIONS=10 with EMA_EVERY=10, or an early wall-clock stop), ema_flat remains the initial weights and is still applied during export (Applying EMA weights for export block), which silently overwrites trained parameters with near-initial ones and corrupts the final artifact.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-19T15:50:35Z

records/track_10min_16mb/2026-03-19_PROTEUS_EMA/submission.json

+  "date": "2026-03-19T15:30:00Z",
+  "val_loss": 2.06894551,
+  "val_bpb": 1.22534607,
+  "bytes_total": 15893533,


Fix submission size metadata to match recorded artifact bytes

submission.json reports bytes_total as 15893533, but the bundled run log records final_model.int8.ptz at 15813783 bytes and code size at 49825, implying a total of 15863608; this 29,925-byte overstatement can mis-rank the run in size-constrained leaderboard tooling that trusts submission.json.

Useful? React with 👍 / 👎.

Baseline + EMA weight averaging (26 lines added). EMA smooths weight distributions for reduced INT8 quantization loss. Built with PROTEUS by LightSpeedUp — lightspeedup.com Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

P1: Guard EMA export with _ema_updated flag — prevents overwriting trained weights with initial weights if training ends before first EMA update step. P2: Fix bytes_total in submission.json to match actual artifact (15813783 model + 49825 code = 15863608). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Stack of four published techniques: EMA + seq2048 + FP16 embedding passthrough + sliding window eval (stride=64). Beats current leader (1.1925) by 0.0036 BPB. Built with PROTEUS by LightSpeedUp — lightspeedup.com Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Post-run audit caught artifact at 16,150,005 bytes (over 16MB cap by 150KB). FP16 embedding passthrough pushed us over. Fix: shrink MLP_HIDDEN=992 to make room. Updated score will follow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previous run exceeded 16MB cap (FP16 embedding + full MLP = 16.15MB). Fixed by shrinking MLP hidden from 1024 to 992. Artifact now 15,878,735 bytes (99.2% of cap). Score: 1.18956858 BPB — still #1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Document INT4 failure (cosine sim drops to 0.90 at 18 layers), LoopFormer depth recurrence loss, and EMA overhead tradeoff. Reference Issue openai#140 for next techniques to implement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatoTeziTanka · 2026-03-21T18:50:40Z

Acknowledged — this doesn't beat SOTA. Reframing as a non-record submission.

The value here is the documented negative results:

INT4 post-training quantization fails catastrophically — roundtrip BPB goes from 1.44 to 3.73. Per-row, per-group (gs=64), and QAT with STE all fail. Root cause: quantization error compounds through layers (cosine similarity drops to 0.90 at 18 layers).
Shared-weight depth recurrence (LoopFormer) loses to more tokens at this training budget — 1-pass beats 2-pass by 0.0185 BPB on 8×H100.
EMA reduces INT8 quantization loss from 0.0072 to 0.0048 BPB, but the per-step overhead costs ~600 training steps, partially offsetting the gain.

We have a v4 in progress that addresses the size/score issues. Will submit separately when ready.

Disclosure: I use Claude Code CLI, Codex CLI, and Gemini Pro as tools in my workflow. Human first, AI-assisted.

Mean val_bpb: 1.1836 (std 0.0005) on 8×H100 SXM Seeds: 42 (1.1836), 1337 (1.1841), 2024 (1.1831) Includes documented negative results: INT4 failure, depth recurrence boundary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatoTeziTanka · 2026-03-25T20:48:49Z

3-Seed Results Now Available

Updated with proper /records folder and 3-seed validation on 8×H100 SXM:

Seed	val_bpb	Steps	Artifact
42	1.1836	11,876	15.88 MB
1337	1.1841	11,871	15.87 MB
2024	1.1831	11,875	15.88 MB
Mean	1.1836	—	std: 0.0005

This is a notable non-record submission. The primary value is the documented negative results:

INT4 quantization fails catastrophically — BPB goes from 1.44 to 3.73 due to compounding layer error
Depth recurrence loses to more tokens at this budget — more data beats more compute depth
EMA reduces quantization gap (0.0072 → 0.0048) but doesn't move the needle on training loss

Records folder with full logs, submission.json, and train_gpt.py added in latest commit.

Disclosure: I use Claude Code CLI, Codex CLI, and Gemini Pro as tools in my workflow. Human first, AI-assisted.

chatgpt-codex-connector bot reviewed Mar 19, 2026

View reviewed changes

0hq added the record submission ready for review label Mar 19, 2026

notapplica mentioned this pull request Mar 19, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Mato and others added 7 commits March 20, 2026 09:33

Clean up submission — remove internal notes

41ef34d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatoTeziTanka force-pushed the proteus-ema-submission branch from 80ea2bd to ef1863d Compare March 20, 2026 13:34

cocohearts added does not beat SOTA and removed record submission ready for review labels Mar 21, 2026

This was referenced Mar 21, 2026

PROTEUS v4 — non-record submission (val_bpb: 1.2037) #368

Open

Record: PROTEUS v8 — 11L INT6 + LoRA TTT 5ep cosine (mean val_bpb=0.7853, 4 seeds) #568

Closed

MatoTeziTanka changed the title ~~PROTEUS EMA — val_bpb: 1.2253~~ PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record) Mar 25, 2026

MatoTeziTanka mentioned this pull request Mar 25, 2026

PROTEUS+STYX — val_bpb 0.8495 (3-seed mean) — LeakyReLU(0.9)² + 5-gram Eval Cache #769

Closed

10 tasks

Gusanidas mentioned this pull request Apr 1, 2026

Record: Window Attention + Mixed Seq_Len Training, bpb 1.1108, eval at 6144 (5-seed mean) #1212

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record)#95

PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record)#95
MatoTeziTanka wants to merge 8 commits intoopenai:mainfrom
MatoTeziTanka:proteus-ema-submission

MatoTeziTanka commented Mar 19, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Uh oh!

MatoTeziTanka commented Mar 21, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MatoTeziTanka commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

3-Seed Results

Notable Non-Record — Documented Negative Results

Details

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

MatoTeziTanka commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatoTeziTanka commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

3-Seed Results Now Available

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MatoTeziTanka commented Mar 19, 2026 •

edited

Loading

MatoTeziTanka commented Mar 21, 2026 •

edited

Loading

MatoTeziTanka commented Mar 25, 2026 •

edited

Loading