Add experiment workflow and sweep helpers by GLDRoger · Pull Request #13 · openai/parameter-golf

GLDRoger · 2026-03-18T20:00:44Z

Summary

add a reproducible local-to-CUDA experiment workflow runbook
add a wave-1 sweep launcher that isolates each run into its own working directory
add a log parser that extracts stable metrics into a JSONL-friendly ledger format
document the trainer log source-dump quirk in AGENTS.md so future tooling does not mis-parse template strings

Why

This PR is meant to support a disciplined experiment pipeline for new participants who want to go from local Apple Silicon smoke tests to leaderboard-oriented CUDA sweeps without changing the baseline trainer.

What changed

docs/experiment_workflow.md
- local Apple Silicon setup notes
- verified MLX smoke result
- wave-1 CUDA sweep plan
- run isolation and ledger fields
scripts/run_wave1_screen.sh
- generates per-run directories under runs/
- records command.sh and env.txt
- supports --dry-run for sweep inspection
scripts/extract_run_metrics.py
- parses stable metric anchors from trainer logs
- emits one JSON object per run
- ignores the trainer's source-dump prefix by anchoring on runtime metric lines
README.md
- links the new experiment workflow from the Apple Silicon getting-started section
.gitignore
- ignores runs/
AGENTS.md
- records the log parsing surprise for future agents

Verification

bash -n scripts/run_wave1_screen.sh
python3 -m py_compile scripts/extract_run_metrics.py
bash scripts/run_wave1_screen.sh --dry-run
python3 scripts/extract_run_metrics.py logs/mlx_smoke.txt
python3 scripts/extract_run_metrics.py records/track_10min_16mb/2026-03-17_NaiveBaseline/train.log

Notes

This PR intentionally does not change train_gpt.py or train_gpt_mlx.py.
The sweep defaults target 1xH100 breadth screening first; 8xH100 confirmation is still the next step for real leaderboard attempts.

GLDRoger · 2026-03-20T05:41:14Z

A fresh records-only submission PR is being opened for the latest valid result, because the repo rules require submission PRs to add only a new records/... folder. Latest validated run: final_sliding_window_exact val_bpb: 1.17334285, final_quant_zlib_roundtrip_exact val_bpb: 1.20752367, Total submission size quant+zlib: 15859700.

GLDRoger added 2 commits March 19, 2026 01:30

docs: add experiment workflow and sweep helpers

a68d6b9

fix: harden sweep run isolation and ledger parsing

a070987

0hq added the not ready for review label Mar 19, 2026

0hq closed this Mar 19, 2026

mrdavtan mentioned this pull request Mar 22, 2026

Non-record: Negative findings on codebook quantization, magnitude pruning, multi-token prediction, embedding factorization #212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experiment workflow and sweep helpers#13

Add experiment workflow and sweep helpers#13
GLDRoger wants to merge 2 commits intoopenai:mainfrom
GLDRoger:docs/experiment-workflow-and-sweep-tools

GLDRoger commented Mar 18, 2026

Uh oh!

GLDRoger commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GLDRoger commented Mar 18, 2026

Summary

Why

What changed

Verification

Notes

Uh oh!

GLDRoger commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants