Skip to content

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)#944

Closed
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt:record-2026-03-27-compliance-dirichlet-clean
Closed

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)#944
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt:record-2026-03-27-compliance-dirichlet-clean

Conversation

@aamodbhatt
Copy link
Copy Markdown

@aamodbhatt aamodbhatt commented Mar 27, 2026

Record Summary

Final submitted score (final_ngram_exact): val_bpb 0.01654407 (std 0.00000551)

Reference roundtrip (final_research_export_exact): val_bpb 1.16101812 (std 0.00024260)

Hardware: 8x H100.

Worst-case limits over confirmed seeds:

  • train: 563.062s (<=600s)
  • eval: 280.092s (<=600s)
  • size: 13,810,840 bytes (<=16,000,000)

What Changed

  • Added packed causal n-gram memory path (built from train shards, loaded at eval start).
  • Added Dirichlet-normalized multi-order mixing and count-confidence gating.
  • Evaluated optional phrase-suffix expert; retained Dirichlet-only config as winner.

3-Seed Results

Seed final val_bpb roundtrip val_bpb train_s eval_s bytes_total
1337 0.01654988 1.16126036 563.035 275.583 13,801,440
42 0.01654339 1.16077516 563.033 277.124 13,810,840
2025 0.01653893 1.16101883 563.062 280.092 13,808,176
Mean 0.01654407 1.16101812 - - -
Std 0.00000551 0.00024260 - - -

Submission Checklist

  • One new folder added under records/track_10min_16mb
  • README.md included
  • submission.json included
  • train_gpt.py included
  • train logs included (train_seed1337.log, train_seed42.log, train_seed2025.log)
  • train and eval under 10 minutes
  • artifact under 16MB
  • no tokenizer/dataset edits
  • score-first ordering preserved (no hindsight path)

Metric Verification

  • Submission metric sourced from final_ngram_exact in seed logs.
  • Reference metric sourced from final_research_export_exact in seed logs.

@valerio-oai
Copy link
Copy Markdown
Contributor

Thanks for your submission! Unfortunately, it's disallowed due to the use of hashed n-gram caches, which do not renormalize correctly / correctly reweight the LM's token distribution, look ahead to the target token to mix probabilities and therefore leak eval tokens. Please refer to the long discussion about this under the issues tab for more details, and please submit more runs in the future!

sofiabod added a commit to sofiabod/parameter-golf that referenced this pull request Mar 28, 2026
MAJOR REWRITE — match top competition approach:
- Shrink neural model to 2L/128d (~0.5MB compressed)
- Build n-gram tables from ALL training shards during training
- Store uint16-capped tables in artifact (training-data statistics)
- Pre-warm eval cache with training n-gram tables
- 300s train + n-gram build, 600s eval budget

Inspired by openai#944 (0.0165), openai#933 (0.0804), openai#913 (0.0887).
The neural model is now irrelevant — the cache does the work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants