Skip to content

Non-record: Turbo-Muon + EngramLite(10240) + VE(8,9,10) — val_bpb 1.1431#1205

Open
SergheiBrinza wants to merge 2 commits intoopenai:mainfrom
SergheiBrinza:submission/2026-04-01_TurboMuon_EngramLite_Improved
Open

Non-record: Turbo-Muon + EngramLite(10240) + VE(8,9,10) — val_bpb 1.1431#1205
SergheiBrinza wants to merge 2 commits intoopenai:mainfrom
SergheiBrinza:submission/2026-04-01_TurboMuon_EngramLite_Improved

Conversation

@SergheiBrinza
Copy link
Copy Markdown

@SergheiBrinza SergheiBrinza commented Apr 1, 2026

Summary

Non-record submission based on the PR #1089 Turbo-Muon + EngramLite stack with hyperparameter tuning.

val_bpb: 1.1431 (3-seed mean, std 0.0007)

Seed val_bpb (sliding)
1337 1.1425
42 1.1438
2024 1.1431

Changes from PR #1089

  • Higher LR (0.030 vs 0.025) for faster convergence
  • Wider EngramLite (10240x48 vs 8192x32) for more n-gram coverage
  • VE on layers 8,9,10 (vs 9,10) for additional token identity injection
  • Warmdown 4500 (vs 3500) for smoother weight averaging
  • Muon momentum warmup 1000 steps (vs 1500)

Key Finding

The increased model size (~31.6M vs 30.7M params) pushed the artifact to 16.36MB pre-compression, forcing all 66 weight groups into int5 with 0 promotions to int6/int7 and 20.5% selective pruning. This aggressive quantization likely offset the architectural gains. The 16MB budget is extremely tight — even small parameter increases can cascade into significant quality loss through the quantization pipeline.

Hardware

8xH100 80GB SXM, 600s training, ~5550 steps at 106ms/step.

… 1.1431

Based on PR openai#1089 stack with hyperparameter tuning:
- Higher LR (0.030 vs 0.025) for faster convergence
- Wider EngramLite (10240x48 vs 8192x32)
- VE on layers 8,9,10 (vs 9,10)
- Warmdown 4500 (vs 3500)
- Muon momentum warmup 1000 steps (vs 1500)

3-seed mean: 1.1431 (std 0.0007)
Seeds: 1337=1.1425, 42=1.1438, 2024=1.1431
@SergheiBrinza SergheiBrinza force-pushed the submission/2026-04-01_TurboMuon_EngramLite_Improved branch from 2d2f0d7 to 974948e Compare April 1, 2026 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant