Skip to content

PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record)#95

Open
MatoTeziTanka wants to merge 8 commits intoopenai:mainfrom
MatoTeziTanka:proteus-ema-submission
Open

PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record)#95
MatoTeziTanka wants to merge 8 commits intoopenai:mainfrom
MatoTeziTanka:proteus-ema-submission

Conversation

@MatoTeziTanka
Copy link
Copy Markdown

@MatoTeziTanka MatoTeziTanka commented Mar 19, 2026

Summary

  • Baseline architecture + EMA weight averaging (26 lines added, zero architectural changes)
  • EMA smooths weight distributions, reducing INT8 quantization loss from 0.0072 to 0.0048 BPB
  • val_bpb: 1.1836 (3-seed mean, std 0.0005) on 8×H100 SXM
  • Total artifact: 15.88 MB (99.3% of 16MB budget)

3-Seed Results

Seed val_bpb Steps Artifact
42 1.1836 11,876 15.88 MB
1337 1.1841 11,871 15.87 MB
2024 1.1831 11,875 15.88 MB
Mean 1.1836 std: 0.0005

Notable Non-Record — Documented Negative Results

This submission does not beat SOTA. Its value is the documented negative results:

  1. INT4 post-training quantization fails catastrophically — roundtrip BPB goes from 1.44 to 3.73. Per-row, per-group (gs=64), and QAT with STE all fail. Root cause: quantization error compounds through layers (cosine similarity drops to 0.90 at 18 layers).

  2. Shared-weight depth recurrence (LoopFormer) loses to more tokens at this training budget — 1-pass (9 effective layers, 6.5B tokens) beats 2-pass (18 effective layers, 3.6B tokens) by 0.019 BPB.

  3. EMA reduces quantization gap from 0.0072 to 0.0048 BPB by smoothing weight distributions, but training loss improvement is marginal.

Details

Metric Baseline PROTEUS EMA
val_bpb (roundtrip) 1.2244 1.1836
INT8 quant loss 0.0072 0.0048

Full logs, submission.json, and train_gpt.py in /records/track_10min_16mb/2026-03-25_PROTEUS_EMA_Notable/

Built with PROTEUS by LightSpeedUp

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ade96bec4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

opt.step()
zero_grad_all()

if args.ema_enabled and ema_flat is not None and step > 0 and step % args.ema_every == 0:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent exporting an EMA buffer that was never updated

The EMA update guard in main() only runs on step > 0 and step % ema_every == 0, but step is incremented after that block; with runs where training ends before the first eligible step (for example ITERATIONS=10 with EMA_EVERY=10, or an early wall-clock stop), ema_flat remains the initial weights and is still applied during export (Applying EMA weights for export block), which silently overwrites trained parameters with near-initial ones and corrupts the final artifact.

Useful? React with 👍 / 👎.

"date": "2026-03-19T15:30:00Z",
"val_loss": 2.06894551,
"val_bpb": 1.22534607,
"bytes_total": 15893533,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix submission size metadata to match recorded artifact bytes

submission.json reports bytes_total as 15893533, but the bundled run log records final_model.int8.ptz at 15813783 bytes and code size at 49825, implying a total of 15863608; this 29,925-byte overstatement can mis-rank the run in size-constrained leaderboard tooling that trusts submission.json.

Useful? React with 👍 / 👎.

Mato and others added 7 commits March 20, 2026 09:33
Baseline + EMA weight averaging (26 lines added). EMA smooths weight
distributions for reduced INT8 quantization loss.

Built with PROTEUS by LightSpeedUp — lightspeedup.com

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P1: Guard EMA export with _ema_updated flag — prevents overwriting
trained weights with initial weights if training ends before first
EMA update step.

P2: Fix bytes_total in submission.json to match actual artifact
(15813783 model + 49825 code = 15863608).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stack of four published techniques: EMA + seq2048 + FP16 embedding
passthrough + sliding window eval (stride=64). Beats current leader
(1.1925) by 0.0036 BPB.

Built with PROTEUS by LightSpeedUp — lightspeedup.com

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Post-run audit caught artifact at 16,150,005 bytes (over 16MB cap by 150KB).
FP16 embedding passthrough pushed us over. Fix: shrink MLP_HIDDEN=992 to
make room. Updated score will follow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous run exceeded 16MB cap (FP16 embedding + full MLP = 16.15MB).
Fixed by shrinking MLP hidden from 1024 to 992. Artifact now 15,878,735
bytes (99.2% of cap). Score: 1.18956858 BPB — still #1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document INT4 failure (cosine sim drops to 0.90 at 18 layers),
LoopFormer depth recurrence loss, and EMA overhead tradeoff.
Reference Issue openai#140 for next techniques to implement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatoTeziTanka
Copy link
Copy Markdown
Author

MatoTeziTanka commented Mar 21, 2026

Acknowledged — this doesn't beat SOTA. Reframing as a non-record submission.

The value here is the documented negative results:

  • INT4 post-training quantization fails catastrophically — roundtrip BPB goes from 1.44 to 3.73. Per-row, per-group (gs=64), and QAT with STE all fail. Root cause: quantization error compounds through layers (cosine similarity drops to 0.90 at 18 layers).
  • Shared-weight depth recurrence (LoopFormer) loses to more tokens at this training budget — 1-pass beats 2-pass by 0.0185 BPB on 8×H100.
  • EMA reduces INT8 quantization loss from 0.0072 to 0.0048 BPB, but the per-step overhead costs ~600 training steps, partially offsetting the gain.

We have a v4 in progress that addresses the size/score issues. Will submit separately when ready.


Disclosure: I use Claude Code CLI, Codex CLI, and Gemini Pro as tools in my workflow. Human first, AI-assisted.

Mean val_bpb: 1.1836 (std 0.0005) on 8×H100 SXM
Seeds: 42 (1.1836), 1337 (1.1841), 2024 (1.1831)
Includes documented negative results: INT4 failure, depth recurrence boundary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatoTeziTanka
Copy link
Copy Markdown
Author

MatoTeziTanka commented Mar 25, 2026

3-Seed Results Now Available

Updated with proper /records folder and 3-seed validation on 8×H100 SXM:

Seed val_bpb Steps Artifact
42 1.1836 11,876 15.88 MB
1337 1.1841 11,871 15.87 MB
2024 1.1831 11,875 15.88 MB
Mean 1.1836 std: 0.0005

This is a notable non-record submission. The primary value is the documented negative results:

  1. INT4 quantization fails catastrophically — BPB goes from 1.44 to 3.73 due to compounding layer error
  2. Depth recurrence loses to more tokens at this budget — more data beats more compute depth
  3. EMA reduces quantization gap (0.0072 → 0.0048) but doesn't move the needle on training loss

Records folder with full logs, submission.json, and train_gpt.py added in latest commit.


Disclosure: I use Claude Code CLI, Codex CLI, and Gemini Pro as tools in my workflow. Human first, AI-assisted.

@MatoTeziTanka MatoTeziTanka changed the title PROTEUS EMA — val_bpb: 1.2253 PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record) Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants