Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean) by aamodbhatt · Pull Request #944 · openai/parameter-golf

aamodbhatt · 2026-03-27T09:03:29Z

Record Summary

Final submitted score (final_ngram_exact): val_bpb 0.01654407 (std 0.00000551)

Reference roundtrip (final_research_export_exact): val_bpb 1.16101812 (std 0.00024260)

Hardware: 8x H100.

Worst-case limits over confirmed seeds:

train: 563.062s (<=600s)
eval: 280.092s (<=600s)
size: 13,810,840 bytes (<=16,000,000)

What Changed

Added packed causal n-gram memory path (built from train shards, loaded at eval start).
Added Dirichlet-normalized multi-order mixing and count-confidence gating.
Evaluated optional phrase-suffix expert; retained Dirichlet-only config as winner.

3-Seed Results

Seed	final val_bpb	roundtrip val_bpb	train_s	eval_s	bytes_total
1337	0.01654988	1.16126036	563.035	275.583	13,801,440
42	0.01654339	1.16077516	563.033	277.124	13,810,840
2025	0.01653893	1.16101883	563.062	280.092	13,808,176
Mean	0.01654407	1.16101812	-	-	-
Std	0.00000551	0.00024260	-	-	-

Submission Checklist

One new folder added under records/track_10min_16mb
README.md included
submission.json included
train_gpt.py included
train logs included (train_seed1337.log, train_seed42.log, train_seed2025.log)
train and eval under 10 minutes
artifact under 16MB
no tokenizer/dataset edits
score-first ordering preserved (no hindsight path)

Metric Verification

Submission metric sourced from final_ngram_exact in seed logs.
Reference metric sourced from final_research_export_exact in seed logs.

valerio-oai · 2026-03-27T23:01:43Z

Thanks for your submission! Unfortunately, it's disallowed due to the use of hashed n-gram caches, which do not renormalize correctly / correctly reweight the LM's token distribution, look ahead to the target token to mix probabilities and therefore leak eval tokens. Please refer to the long discussion about this under the issues tab for more details, and please submit more runs in the future!

MAJOR REWRITE — match top competition approach: - Shrink neural model to 2L/128d (~0.5MB compressed) - Build n-gram tables from ALL training shards during training - Store uint16-capped tables in artifact (training-data statistics) - Pre-warm eval cache with training n-gram tables - 300s train + n-gram build, 600s eval budget Inspired by openai#944 (0.0165), openai#933 (0.0804), openai#913 (0.0887). The neural model is now irrelevant — the cache does the work.

Add 3-seed compliance-first Dirichlet packed-memory record attempt

2c6b808

aamodbhatt mentioned this pull request Mar 27, 2026

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean) #943

Closed

9 tasks

notapplica mentioned this pull request Mar 27, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Idan3011 mentioned this pull request Mar 27, 2026

[Closed] Phrase Cache + N-gram Backoff + EMA-GPU (val_bpb=0.2722) #972

Closed

AnirudhRahul mentioned this pull request Mar 27, 2026

Illegal submissions megathread #677

Open

haikosys mentioned this pull request Mar 27, 2026

Record: Fort Knox — Legal Packed Training Cache, Zero Val Adaptation (val_bpb 0.0638, 3-seed) #982

Closed

valerio-oai closed this Mar 27, 2026

This was referenced Mar 29, 2026

Record: Packed Causal N-gram + Dirichlet Backoff — val_bpb 0.0180 (3-seed mean) #1056

Open

Record: Packed Causal N-gram + Dirichlet Backoff — val_bpb 0.0109 (3-seed mean, NEW SOTA) #1076

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)#944

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)#944
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt:record-2026-03-27-compliance-dirichlet-clean

aamodbhatt commented Mar 27, 2026 •

edited

Loading

Uh oh!

valerio-oai commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aamodbhatt commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Record Summary

What Changed

3-Seed Results

Submission Checklist

Metric Verification

Uh oh!

valerio-oai commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aamodbhatt commented Mar 27, 2026 •

edited

Loading