Skip to content

Add experiment workflow and sweep helpers#13

Closed
GLDRoger wants to merge 2 commits intoopenai:mainfrom
GLDRoger:docs/experiment-workflow-and-sweep-tools
Closed

Add experiment workflow and sweep helpers#13
GLDRoger wants to merge 2 commits intoopenai:mainfrom
GLDRoger:docs/experiment-workflow-and-sweep-tools

Conversation

@GLDRoger
Copy link
Copy Markdown

Summary

  • add a reproducible local-to-CUDA experiment workflow runbook
  • add a wave-1 sweep launcher that isolates each run into its own working directory
  • add a log parser that extracts stable metrics into a JSONL-friendly ledger format
  • document the trainer log source-dump quirk in AGENTS.md so future tooling does not mis-parse template strings

Why

This PR is meant to support a disciplined experiment pipeline for new participants who want to go from local Apple Silicon smoke tests to leaderboard-oriented CUDA sweeps without changing the baseline trainer.

What changed

  • docs/experiment_workflow.md
    • local Apple Silicon setup notes
    • verified MLX smoke result
    • wave-1 CUDA sweep plan
    • run isolation and ledger fields
  • scripts/run_wave1_screen.sh
    • generates per-run directories under runs/
    • records command.sh and env.txt
    • supports --dry-run for sweep inspection
  • scripts/extract_run_metrics.py
    • parses stable metric anchors from trainer logs
    • emits one JSON object per run
    • ignores the trainer's source-dump prefix by anchoring on runtime metric lines
  • README.md
    • links the new experiment workflow from the Apple Silicon getting-started section
  • .gitignore
    • ignores runs/
  • AGENTS.md
    • records the log parsing surprise for future agents

Verification

bash -n scripts/run_wave1_screen.sh
python3 -m py_compile scripts/extract_run_metrics.py
bash scripts/run_wave1_screen.sh --dry-run
python3 scripts/extract_run_metrics.py logs/mlx_smoke.txt
python3 scripts/extract_run_metrics.py records/track_10min_16mb/2026-03-17_NaiveBaseline/train.log

Notes

  • This PR intentionally does not change train_gpt.py or train_gpt_mlx.py.
  • The sweep defaults target 1xH100 breadth screening first; 8xH100 confirmation is still the next step for real leaderboard attempts.

Copy link
Copy Markdown
Author

A fresh records-only submission PR is being opened for the latest valid result, because the repo rules require submission PRs to add only a new records/... folder. Latest validated run: final_sliding_window_exact val_bpb: 1.17334285, final_quant_zlib_roundtrip_exact val_bpb: 1.20752367, Total submission size quant+zlib: 15859700.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants