Skip to content

SOTA Attempt: Paid prefix (val_bpb=1.0238)#168

Open
spokane-way wants to merge 4 commits intoopenai:mainfrom
spokane-way:paid-prefix
Open

SOTA Attempt: Paid prefix (val_bpb=1.0238)#168
spokane-way wants to merge 4 commits intoopenai:mainfrom
spokane-way:paid-prefix

Conversation

@spokane-way
Copy link
Copy Markdown
Contributor

8.75MB paid-prefix blob

Seed Steps val_bpb (int8+zlib)
1337 16,493 1.02174288
1338 16,426 1.02468190
1339 16,353 1.02508439

@spokane-way spokane-way changed the title SOTA Attempt: Paid prefix (val_brb=1.0238) SOTA Attempt: Paid prefix (val_bpb=1.0238) Mar 20, 2026
@cocohearts cocohearts closed this Mar 20, 2026
@0hq 0hq reopened this Mar 20, 2026
@0hq
Copy link
Copy Markdown
Collaborator

0hq commented Mar 20, 2026

After some more discussion, mind moving this to non-record submissions? Would accept there as an example.

Some context: This isn't really a valid submission. In a text-compression benchmark, this would make some sense, given your model always should be a superior compressor than just storing the text itself/compressing the text with a standard compression algorithm. But given the benchmark isn't compressing the training data, but evaluating the performance of your compressor on an unseed IID validation dataset, this submission is equivalent to 'training on the validation set' directly with your prefix.

@ibarrajo
Copy link
Copy Markdown

After some more discussion, mind moving this to non-record submissions? Would be happy to accept there.

is that applicable to my PR as well? #275

@0hq

@0hq
Copy link
Copy Markdown
Collaborator

0hq commented Mar 21, 2026

@ibarrajo No, I'd only accept this PR, which implemented this idea first, as an example to explain why this type of approach isn't valid.

@ibarrajo
Copy link
Copy Markdown

I see, thanks for the clarification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants