Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean) by aamodbhatt · Pull Request #943 · openai/parameter-golf

aamodbhatt · 2026-03-27T09:02:05Z

PR Title

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)

PR Body

Record Summary

Final submitted score (final_ngram_exact): val_bpb 0.01654407 (std 0.00000551)

Reference roundtrip (final_research_export_exact): val_bpb 1.16101812 (std 0.00024260)

Hardware: 8x H100.

Worst-case limits over confirmed seeds:

train: 563.062s (<=600s)
eval: 280.092s (<=600s)
size: 13,810,840 bytes (<=16,000,000)

What Changed

Added packed causal n-gram memory path (built from train shards, loaded at eval start).
Added Dirichlet-normalized multi-order mixing and count-confidence gating.
Evaluated optional phrase-suffix expert; retained Dirichlet-only config as winner.

3-Seed Results

Seed	final val_bpb	roundtrip val_bpb	train_s	eval_s	bytes_total
1337	0.01654988	1.16126036	563.035	275.583	13,801,440
42	0.01654339	1.16077516	563.033	277.124	13,810,840
2025	0.01653893	1.16101883	563.062	280.092	13,808,176
Mean	0.01654407	1.16101812	-	-	-
Std	0.00000551	0.00024260	-	-	-

Submission Checklist

One new folder added under records/track_10min_16mb
README.md included
submission.json included
train_gpt.py included
train logs included (train_seed1337.log, train_seed42.log, train_seed2025.log)
train and eval under 10 minutes
artifact under 16MB
no tokenizer/dataset edits
score-first ordering preserved (no hindsight path)

Metric Verification

Submission metric sourced from final_ngram_exact in seed logs.
Reference metric sourced from final_research_export_exact in seed logs.

aamodbhatt · 2026-03-27T09:03:43Z

Superseded by #944 (clean branch from upstream/main with one-folder submission diff).

…rder-13 Key fixes: - Scale counts to preserve full/ctx RATIOS (not just cap at 65535) - Hierarchical CTW mixing: each order's posterior → next order's prior - c=5.0 (matching PR openai#943) - 256K buckets, order-13, 80 shards Previous uint8 capping destroyed ratios (both capped to 255 → ratio=1.0 everywhere). New scaling preserves the actual probability ratios.

32K buckets with full int32 counts = 3.1MB for order-13. openai#943 uses 32K buckets and gets 0.0165. The extreme collisions may actually HELP Dirichlet mixing — more observations per bucket = tighter posteriors. Full-precision counts preserve exact ratios.

Enable two-pass eval (PR openai#943's key technique): - Pass 1: score all tokens with sliding window, build cache - Pass 2: rescore ALL positions using complete cache + hierarchical CTW - Pre-warm cache from training artifact before both passes - Eliminates cold-start problem — early tokens benefit from full cache

aamodbhatt added 2 commits March 27, 2026 00:40

Add 8xH100 fast full-rescore n-gram record attempt (0.0942, 3-seed)

f3d4a98

Add 3-seed compliance-first Dirichlet packed-memory record attempt

58c04cf

aamodbhatt closed this Mar 27, 2026

sofiabod mentioned this pull request Mar 27, 2026

Record: Packed N-gram + Two-Pass Dirichlet CTW — val_bpb 0.0830 (3-seed mean) #986

Open

9 tasks

minh-stakc mentioned this pull request Mar 30, 2026

Record: Packed N-gram + Dirichlet CTW — val_bpb 0.0235 (1xB200) #1114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)#943

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)#943
aamodbhatt wants to merge 2 commits intoopenai:mainfrom
aamodbhatt:record-2026-03-27-compliance-dirichlet

aamodbhatt commented Mar 27, 2026

Uh oh!

aamodbhatt commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aamodbhatt commented Mar 27, 2026

PR Title

PR Body

Record Summary

What Changed

3-Seed Results

Submission Checklist

Metric Verification

Uh oh!

aamodbhatt commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant