Skip to content

[non-submission] CPU fallback for the poors (non-CUDA non-MLX)#14

Closed
jordankzf wants to merge 1 commit intoopenai:mainfrom
jordankzf:jordankzf/cpu-fallback
Closed

[non-submission] CPU fallback for the poors (non-CUDA non-MLX)#14
jordankzf wants to merge 1 commit intoopenai:mainfrom
jordankzf:jordankzf/cpu-fallback

Conversation

@jordankzf
Copy link
Copy Markdown

Changes:

train_gpt.py

  • Device auto-detection: CPU fallback instead of raise RuntimeError("CUDA is required")
  • Guards on torch.compile, fused Adam, flash SDP, DDP, torch.cuda.synchronize(), nvidia-smi
  • autocast(device_type=device.type, ...) instead of hardcoded "cuda"
  • (new) MAX_VAL_TOKENS env var to truncate the val set for faster local iteration

@jordankzf
Copy link
Copy Markdown
Author

Training run on my AMD 7840U. Results from the smoke test:

  • 5 training steps: 27 seconds (~5.4s/step)
  • Validation (1M token subset): ~2.5 minutes
  • Final int8 quantized model: 4.96MB
  • Loss 6.94 → 6.89 (I know, I know)

Mainly a POC, go ahead and ask OpenAI for that $25 Runpod credits.

Ueaj-Kerman added a commit to Ueaj-Kerman/parameter-golf that referenced this pull request Mar 19, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@0hq 0hq closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants