Non-record: Fused Triton Megakernels — RMSNorm + LeakyReLU² (val_bpb 1.3560) by dentity007 · Pull Request #1192 · openai/parameter-golf

dentity007 · 2026-03-31T20:40:00Z

Summary

Custom Triton kernels for RMSNorm and LeakyReLU(0.75)² — beats baseline by 0.0017 BPB via eval speedup.

val_bpb: 1.3560 | 1×RTX 5090, 600s

🤖 Generated with Claude Code

…er optimization, and SSM exploration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dentity007 and others added 3 commits March 30, 2026 19:12

Add approach notes for parameter golf challenge

ad23b7f

Update approach with depth recurrence, factorized embeddings, tokeniz…

300eb5c

…er optimization, and SSM exploration

Non-record: Fused Triton Megakernels (val_bpb 1.3560)

c22ffe9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dentity007 closed this Apr 1, 2026

dentity007 reopened this Apr 1, 2026