Skip to content

Record: SP4096 + Int6 QAT + NorMuon (val_bpb=1.2012)#200

Open
khasinski wants to merge 1 commit intoopenai:mainfrom
khasinski:sp4096-record
Open

Record: SP4096 + Int6 QAT + NorMuon (val_bpb=1.2012)#200
khasinski wants to merge 1 commit intoopenai:mainfrom
khasinski:sp4096-record

Conversation

@khasinski
Copy link
Copy Markdown

Summary

  • SP4096 tokenizer (26% better text compression than sp1024)
  • Int6 STE QAT + zstd-22 compression (14.3MB artifact)
  • NorMuon optimizer + tuned learning rates

Result

val_bpb = 1.2012 on 8xH100 SXM, 600s training, 14.3MB artifact.

Beats baseline (1.2244) by 0.023 BPB.

Key Metrics

Metric Value
val_bpb (post-quant) 1.2012
Artifact size 14,342,773 bytes
Steps 11,497
Hardware 8xH100 SXM 80GB

Test plan

  • Train sp4096 tokenizer and verify compression ratio
  • Local MLX smoke test
  • RTX 5090 10-min comparison (16 configs tested)
  • 8xH100 RunPod validation (600s)
  • Artifact under 16,000,000 bytes
  • train.log attached

SP4096 tokenizer (26% better compression) + int6 STE QAT + zstd-22 +
NorMuon + tuned LRs. 14.3MB artifact, 11,497 steps in 600s on 8xH100.
Beats baseline (1.2244) by 0.023 BPB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant