PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record)#95
PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record)#95MatoTeziTanka wants to merge 8 commits intoopenai:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1ade96bec4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| opt.step() | ||
| zero_grad_all() | ||
|
|
||
| if args.ema_enabled and ema_flat is not None and step > 0 and step % args.ema_every == 0: |
There was a problem hiding this comment.
Prevent exporting an EMA buffer that was never updated
The EMA update guard in main() only runs on step > 0 and step % ema_every == 0, but step is incremented after that block; with runs where training ends before the first eligible step (for example ITERATIONS=10 with EMA_EVERY=10, or an early wall-clock stop), ema_flat remains the initial weights and is still applied during export (Applying EMA weights for export block), which silently overwrites trained parameters with near-initial ones and corrupts the final artifact.
Useful? React with 👍 / 👎.
| "date": "2026-03-19T15:30:00Z", | ||
| "val_loss": 2.06894551, | ||
| "val_bpb": 1.22534607, | ||
| "bytes_total": 15893533, |
There was a problem hiding this comment.
Fix submission size metadata to match recorded artifact bytes
submission.json reports bytes_total as 15893533, but the bundled run log records final_model.int8.ptz at 15813783 bytes and code size at 49825, implying a total of 15863608; this 29,925-byte overstatement can mis-rank the run in size-constrained leaderboard tooling that trusts submission.json.
Useful? React with 👍 / 👎.
Baseline + EMA weight averaging (26 lines added). EMA smooths weight distributions for reduced INT8 quantization loss. Built with PROTEUS by LightSpeedUp — lightspeedup.com Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P1: Guard EMA export with _ema_updated flag — prevents overwriting trained weights with initial weights if training ends before first EMA update step. P2: Fix bytes_total in submission.json to match actual artifact (15813783 model + 49825 code = 15863608). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stack of four published techniques: EMA + seq2048 + FP16 embedding passthrough + sliding window eval (stride=64). Beats current leader (1.1925) by 0.0036 BPB. Built with PROTEUS by LightSpeedUp — lightspeedup.com Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Post-run audit caught artifact at 16,150,005 bytes (over 16MB cap by 150KB). FP16 embedding passthrough pushed us over. Fix: shrink MLP_HIDDEN=992 to make room. Updated score will follow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous run exceeded 16MB cap (FP16 embedding + full MLP = 16.15MB). Fixed by shrinking MLP hidden from 1024 to 992. Artifact now 15,878,735 bytes (99.2% of cap). Score: 1.18956858 BPB — still #1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document INT4 failure (cosine sim drops to 0.90 at 18 layers), LoopFormer depth recurrence loss, and EMA overhead tradeoff. Reference Issue openai#140 for next techniques to implement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
80ea2bd to
ef1863d
Compare
|
Acknowledged — this doesn't beat SOTA. Reframing as a non-record submission. The value here is the documented negative results:
We have a v4 in progress that addresses the size/score issues. Will submit separately when ready. Disclosure: I use Claude Code CLI, Codex CLI, and Gemini Pro as tools in my workflow. Human first, AI-assisted. |
Mean val_bpb: 1.1836 (std 0.0005) on 8×H100 SXM Seeds: 42 (1.1836), 1337 (1.1841), 2024 (1.1831) Includes documented negative results: INT4 failure, depth recurrence boundary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3-Seed Results Now AvailableUpdated with proper
This is a notable non-record submission. The primary value is the documented negative results:
Records folder with full logs, submission.json, and train_gpt.py added in latest commit. Disclosure: I use Claude Code CLI, Codex CLI, and Gemini Pro as tools in my workflow. Human first, AI-assisted. |
Summary
3-Seed Results
Notable Non-Record — Documented Negative Results
This submission does not beat SOTA. Its value is the documented negative results:
INT4 post-training quantization fails catastrophically — roundtrip BPB goes from 1.44 to 3.73. Per-row, per-group (gs=64), and QAT with STE all fail. Root cause: quantization error compounds through layers (cosine similarity drops to 0.90 at 18 layers).
Shared-weight depth recurrence (LoopFormer) loses to more tokens at this training budget — 1-pass (9 effective layers, 6.5B tokens) beats 2-pass (18 effective layers, 3.6B tokens) by 0.019 BPB.
EMA reduces quantization gap from 0.0072 to 0.0048 BPB by smoothing weight distributions, but training loss improvement is marginal.
Details
Full logs, submission.json, and train_gpt.py in
/records/track_10min_16mb/2026-03-25_PROTEUS_EMA_Notable/Built with PROTEUS by LightSpeedUp