Skip to content

Record: Sliding Window Eval (stride=64), val_bpb=1.1925#50

Merged
0hq merged 1 commit intoopenai:mainfrom
mattqlf:submission/sliding-window-eval
Mar 19, 2026
Merged

Record: Sliding Window Eval (stride=64), val_bpb=1.1925#50
0hq merged 1 commit intoopenai:mainfrom
mattqlf:submission/sliding-window-eval

Conversation

@mattqlf
Copy link
Copy Markdown
Contributor

@mattqlf mattqlf commented Mar 19, 2026

Summary

  • Sliding window evaluation with stride=64 on the baseline 9x512 SP-1024 architecture
  • val_bpb: 1.1925 (post-quant int8+zlib), improving on the Naive Baseline's 1.2244 by 0.032
  • Training is identical to the baseline; the improvement comes entirely from the evaluation strategy
  • Each token is scored with 960+ tokens of context instead of 0-1023
  • Eval takes 70s on 8xH100 (well within the 10-minute eval budget)
  • Total artifact size: 15,874,829 bytes (under 16MB cap)

Test plan

  • Trained and evaluated on 8xH100 SXM (RunPod)
  • final_int8_zlib_roundtrip_exact val_bpb:1.19250007
  • Artifact size verified under 16,000,000 bytes
  • train_gpt.py compiles and runs within the records folder

@mattqlf mattqlf force-pushed the submission/sliding-window-eval branch 2 times, most recently from 928069f to de24caf Compare March 19, 2026 05:39
Copy link
Copy Markdown
Collaborator

@0hq 0hq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@0hq 0hq merged commit d84a3e8 into openai:main Mar 19, 2026
@0hq
Copy link
Copy Markdown
Collaborator

0hq commented Mar 19, 2026

One nit: Can you double check that the tail can't be silently dropped in the eval? window_starts only keeps windows where ... >= stride, so the last partial segment of validation data is not scored if it is shorter than stride. This can bias the metric downward or upward depending on what was skipped.

@mattqlf
Copy link
Copy Markdown
Contributor Author

mattqlf commented Mar 19, 2026

Good catch — the final partial window was indeed silently dropped. Fixed in #124: changed the filter to >= 1 and clamped wlen - stride to avoid negative indexing on short windows.

maxivione pushed a commit to maxivione/parameter-golf that referenced this pull request Mar 20, 2026
leonardcser added a commit to leonardcser/parameter-golf that referenced this pull request Mar 21, 2026
Sliding window eval (credit @mattqlf PR openai#50) with configurable stride.
Muon weight decay 0.04 (credit @notapplica PR openai#60).
Orthogonal init with muP scaling (credit @raahilshah PR openai#162).
Gradient clipping at 0.3.

int8 roundtrip val_bpb: 1.2653. Sliding window would add ~0.03 on 8xH100.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
scottspace pushed a commit to scottspace/parameter-golf that referenced this pull request Mar 21, 2026
leonardcser added a commit to leonardcser/parameter-golf that referenced this pull request Mar 21, 2026
Sliding window eval (credit @mattqlf PR openai#50) with configurable stride.
Muon weight decay 0.04 (credit @notapplica PR openai#60).
Orthogonal init with muP scaling (credit @raahilshah PR openai#162).
Gradient clipping at 0.3.

int8 roundtrip val_bpb: 1.2653. Sliding window would add ~0.03 on 8xH100.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
leonardcser added a commit to leonardcser/parameter-golf that referenced this pull request Mar 22, 2026
Sliding window eval (credit @mattqlf PR openai#50) with configurable stride.
Muon weight decay 0.04 (credit @notapplica PR openai#60).
Orthogonal init with muP scaling (credit @raahilshah PR openai#162).
Gradient clipping at 0.3.

int8 roundtrip val_bpb: 1.2653. Sliding window would add ~0.03 on 8xH100.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants