Skip to content

[Non-record] Eval-time Adaptation: Stride-OGD + Two-Pass + NTK-RoPE#241

Open
kellyvv wants to merge 1 commit intoopenai:mainfrom
kellyvv:submission/eval-time-adaptation
Open

[Non-record] Eval-time Adaptation: Stride-OGD + Two-Pass + NTK-RoPE#241
kellyvv wants to merge 1 commit intoopenai:mainfrom
kellyvv:submission/eval-time-adaptation

Conversation

@kellyvv
Copy link
Copy Markdown

@kellyvv kellyvv commented Mar 20, 2026

Three eval-time adaptation techniques

Stride-OGD: Online gradient descent on 1024-dim vocab bias, updated every stride (64 tokens). Exact gradient (no backprop), zero artifact cost, 16× faster feedback than TTT LoRA.

Two-Pass Eval: Pass 1 collects per-token gradients → Pass 2 re-scores with accumulated bias correction. Fits in 600s eval budget.

NTK-RoPE 4096: Eval at 4× context length without retraining via RoPE base rescaling.

All techniques include working code + synthetic demo. Run python eval_stride_ogd.py to verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant