Happy to release Accelerated Scan, a kernel library for first order parallel associative scans in vanilla @PyTorch, Triton 2.2.0 and CUDA C++.
pip install accelerated-scan🧵
super excited to have contributed to gpt-oss. We have put a lot of love into both training the model and making the developer examples, check them out:
Backpropagation as dynamic programming over ΣΠ is generalized to compute other statistics:
- max,* computes top gradient path
- expectation semiring gives entropy (entropy is high when all paths are "active")
gpt-oss is our new open-weight model family!
the bigger one runs on a single GPU, you can run the small one on your laptop. Go install it right now, seriously! Telling your laptop to do something and watching it happen made me feel the AGI like nothing since ChatGPT.
I am incredibly proud to be able to put this paper out finally! This paper shows that hybrid linear RNNs (Griffin) combined with local attention (or sliding window attention) can be incredibly efficient at language modeling.
Happy to be going to @NeurIPSConf for the first time. Looking forward to meeting you there!
I will be presenting a poster on Tuesday at 10:45 for our ReScience C paper [Re] VAE Approximation Error: ELBO and Exponential Families