Skip to content

docs: increase MLX smoke validation batch size#36

Closed
brendanboyle87 wants to merge 1 commit intoopenai:mainfrom
brendanboyle87:mlx-val-batch-size
Closed

docs: increase MLX smoke validation batch size#36
brendanboyle87 wants to merge 1 commit intoopenai:mainfrom
brendanboyle87:mlx-val-batch-size

Conversation

@brendanboyle87
Copy link
Copy Markdown

Summary

  • update the README MLX smoke command to use VAL_BATCH_SIZE=524288
  • keep the rest of the local trial-run example unchanged

Why

The default validation batch size setting in the README trial run takes a very long time on a local Mac run for an M4 Max Mac Studio with 128GB, so this raises the documented MLX smoke-test value to a more practical local setting.

South-33 added a commit to South-33/parameter-golf that referenced this pull request Mar 19, 2026
- add a PR-audit research log entry covering the clean takeaways from pull requests openai#36 through openai#70
- promote long-context training plus matching long-context eval as a first-class clean branch based on PR openai#61 and PR openai#63
- refine mixed-precision export notes to emphasize using int6/int8 byte savings to fund wider MLP capacity, based on PR openai#65
- update the current snapshot and research thesis so future agents do not over-focus on exporter-only ideas after the broader PR sweep
@0hq 0hq added the enhancement New feature or request label Mar 19, 2026
@cocohearts
Copy link
Copy Markdown
Collaborator

??? this is increasing val batch size??

@cocohearts cocohearts closed this Mar 20, 2026
@brendanboyle87 brendanboyle87 deleted the mlx-val-batch-size branch March 20, 2026 19:09
@brendanboyle87
Copy link
Copy Markdown
Author

brendanboyle87 commented Mar 20, 2026

??? this is increasing val batch size??

Sorry if I was off base here

This was based on the fact that this script is for local mlx dev. there was no intermediate output so I was trying to figure out how long validation would take. Codex gave an estimate in hours vs minutes

“On this machine, a full validation with the old VAL_BATCH_SIZE=8192 is roughly a 5 to 6+ hour job. With VAL_BATCH_SIZE=524288, it is about 5 minutes.

The reason is in train_gpt_mlx.py:766: validation uses VAL_BATCH_SIZE // GRAD_ACCUM_STEPS. With GRAD_ACCUM_STEPS=8 and TRAIN_SEQ_LEN=1024, 8192 means only 1024 eval tokens per batch, which is exactly 1 sequence. 524288 means 65536 eval tokens, or 64 sequences per
batch. On the local validation split here, that works out to 60,568 eval batches vs 947 eval batches.”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants