[Non Record] Online Curriculum Learning #737
Open
SPThole wants to merge 7 commits intoopenai:mainfrom
Open
Conversation
abaybektursun
added a commit
to abaybektursun/parameter-golf
that referenced
this pull request
Mar 25, 2026
Two-stage investigation into training data selection for Parameter Golf: Stage 1 (shard-level): 8 scoring methods, validated M5 (val-CE) as most reliable (rho=0.984). But all 80 shards have nearly identical bigram statistics (CE spread: 0.018 bits). Shard reordering: -0.001 BPB (noise). Stage 2 (chunk-level): Scored 244K chunks at 32K granularity. Within-shard variance is 535x larger than between-shard. Selected top 12% by bigram CE and by 17M-param neural proxy. Both made val_bpb worse (+0.007, +0.006). Curriculum learning (8xH100, 3 seeds): Hardest-first ordering by model perplexity. Mean delta: -0.0006, one seed regressed. 95% CI spans zero. Conclusion: On FineWeb (already filtered), hard data selection trades diversity for match quality, and diversity wins. Corroborated by PRs openai#737, openai#623, openai#333 and Sachdeva et al. (ICLR 2025). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements online sequence-level curriculum learning that scores and filters sequences within each batch by unigram entropy, following a V-shaped difficulty schedule aligned with LR warmdown and SWA phases. Zero extra parameters. Built upon PR #623.
Motivation
Standard training feeds random batches regardless of training phase. In a 600-second window (~1100 steps), the model benefits from different data at different stages:
Method
Per-sequence difficulty score — unigram entropy:
H(s) = -Σ p_s(t) · log₂(p_s(t)) for each sequence s of length 2048
V-shaped target — maps training progress to difficulty percentile d ∈ [0,1]:
d(step) = step / (0.45 · T) if step ≤ 0.45·T
d(step) = 1 - (step/T - 0.45) / (1 - 0.45) otherwise
Selection: Load 2× sequences per batch, sort by entropy, select the half centered around percentile d(step). The V-shape completes within each batch — no dependence on shard ordering.
Results
Observation
Worse than baseline (1.3345). The 2× oversampling adds ~50ms/step overhead (588ms vs 540ms), costing ~80 training steps. The curriculum signal doesn't compensate for lost steps. Implication: curriculum at this scale must be zero-overhead (precomputed ordering, not runtime filtering).