Skip to content

Draft: score harness and pilot hooks#24

Closed
elopez3 wants to merge 1 commit intoopenai:mainfrom
elopez3:main
Closed

Draft: score harness and pilot hooks#24
elopez3 wants to merge 1 commit intoopenai:mainfrom
elopez3:main

Conversation

@elopez3
Copy link
Copy Markdown

@elopez3 elopez3 commented Mar 18, 2026

This is a draft work-in-progress PR for Parameter Golf. It is not a leaderboard submission yet.

What is in this PR:

  • score logging for pre-compression and post-compression evaluation
  • an eval profiler for sequence-length and batch-size sweeps
  • export candidate selection based on final scored output
  • pilot hooks for late-stage quantization-aware fine-tuning
  • pilot hooks for one-family control-tensor test-time adaptation
  • a shared-depth recurrence pilot

What I have already done:

  • set up the local workflow
  • downloaded the tokenizer and a smoke-test dataset shard
  • ran a 200-step MLX smoke run on Apple Silicon
  • verified the updated scripts compile locally

What this PR does not claim:

  • no remote CUDA score yet
  • no claim that the approach beats baseline
  • no claim that this is ready for leaderboard submission

Next step: run short remote experiments on the scored CUDA path once compute support is available.

Add score logging, export selection, and evaluation tooling so
short experiments can be compared on the scored path.

Wire in pilot hooks for late QAT, control-tensor adaptation,
and shared-depth recurrence to support draft challenge work.
@0hq 0hq closed this Mar 19, 2026
mrdavtan added a commit to mrdavtan/parameter-golf that referenced this pull request Mar 23, 2026
Local ablation showed h1792 gives -0.056 BPB over h1536 at similar step cost.
Fixed unset blocks that were killing MLP_HIDDEN, BIGRAM_HASH_BUCKETS, and
TRAIN_BATCH_TOKENS immediately after setting them (same bug class as Finding openai#24).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants