Skip to content

Record: Scylla + Full GPTQ + XSA-all + FA3 — val_bpb 0.9485 (3-seed mean)#1184

Open
icryo wants to merge 1 commit intoopenai:mainfrom
icryo:submission/scylla-0.9485
Open

Record: Scylla + Full GPTQ + XSA-all + FA3 — val_bpb 0.9485 (3-seed mean)#1184
icryo wants to merge 1 commit intoopenai:mainfrom
icryo:submission/scylla-0.9485

Conversation

@icryo
Copy link
Copy Markdown

@icryo icryo commented Mar 31, 2026

Summary

3-Seed Results

Seed Sliding BPB (s64)
1337 0.9491
42 0.9476
2025 0.9489
Mean ± Std 0.9485 ± 0.0008

Key Innovation

Scylla tokenizer (998 tokens, @simon-marcus PR #1143) + modern training stack:

  • Full Hessian GPTQ (Cholesky error compensation)
  • XSA on all 11 layers
  • Coprime-stride multi-shard loader (194 shards)
  • FlashAttention 3 (Hopper native)
  • No TTT needed (neutral on this stack)

PR #1143 used the old SOTA base. This submission applies the modern stack to the same tokenizer, yielding 12% better BPB.

Test plan

  • 3-seed verification on 8×H100 SXM (mean 0.9485, std 0.0008)
  • All artifacts under 16,000,000 bytes
  • All training under 600s
  • No TTT
  • Tokenizer byte accounting via validated metadata (candidate.meta.npz)

Credits

sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Mar 31, 2026
- logs/daily_research.md: append 2026-03-31 research section
  - PR openai#771 CLOSED (score-first TTT rule violation)
  - PR openai#727 CLOSED (n-gram illegal — no renormalization)
  - Merged SOTA: 1.1147 (PR openai#1019, 2026-03-25)
  - New PRs: openai#1184 (0.9485 Scylla tokenizer), openai#1185 (0.9641)
  - SLOT eval technique, Full GPTQ, QK-Gain 4.0 documented
- CLAUDE.md: update Competition Strategy + lessons 21-24
  - Merged SOTA updated to 1.1147
  - Current Best Path rewritten for 2026-03-31
  - Lessons openai#21-24: TTT fix, n-gram risk, Scylla, SLOT
  - TTT constraint clarified to score-first protocol
  - Version bumped to v9.0

https://claude.ai/code/session_015z6QKyKzDSYzTniW1GPhAe
@icryo
Copy link
Copy Markdown
Author

icryo commented Apr 1, 2026

Byte accounting uses the identical candidate.meta.npz from PR #1143.
No eval-time adaptation. Standard F.cross_entropy + sliding window.
The only change is training the PR #1060 stack on Scylla-tokenized data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant