We just open-sourced differentiable SDE solvers in PyTorch:
github.com/google-researc…
Now you can put stochastic differential equations in your deep learning models, and neural nets in your SDEs! Credit to @lxuechen.
LLMs have complex joint beliefs about all sorts of quantities. And my postdoc @jamesrequeima visualized them! In this thread we show LLM predictive distributions conditioned on data and free-form text.
LLMs pick up on all kinds of subtle and unusual structure: 🧵
Classifiers are secretly energy-based models! Every softmax giving p(c|x) has an unused degree of freedom, which we use to compute the input density p(x). This makes classifiers into generative models without changing the architecture.
arxiv.org/abs/1912.03263
We just open-sourced a suite of ODE solvers in PyTorch:
github.com/rtqichen/torch…
Everything happens on the GPU and is differentiable. Now you can use ODEs in your deep learning models! Credit to @rtqichen.
Gradient descent in differentiable games rotates around solutions instead of converging. For instance, in GANs. We solve this with a simple trick: complex momentum damps the oscillations.
arxiv.org/abs/2102.08431
With @jonLorraine9@davidjesusacu@PaulVicol
I propose we rename "epistemic uncertainty" to "model uncertainty", and "aleatoric uncertainty" to "per-measurement uncertainty". More generally, one can refer to uncertainty at any level, such as per-pixel, per-image, or per-patient uncertainty. No need for obscure jargon.
I should have announced this before, but a year ago I switched my research focus to AI existential risk reduction and governance. I think the risk of bad outcomes for humanity due to AGI is substantial, and that coordinating a slowdown in AGI development is probably a good idea.
Training Neural SDEs: We worked out how to do scalable reverse-mode autodiff for stochastic differential equations. This lets us fit SDEs defined by neural nets with black-box adaptive higher-order solvers.
arxiv.org/pdf/2001.01328…
With @lxuechen, @rtqichen and @wongtkleonard.
Neural ODEs: Instead of updating hiddens layers by layer, we specify their derivative wrt depth with a neural network. An ODE solver adaptively computes the output.
By amazing students @rtqichen @YuliaRubanova@jessebett.
arxiv.org/abs/1806.07366
I heard you like graphs, so we put a graph neural net in your graph generative model, so you can be invariant to order while you add edges to your graph. Scales to 5000 nodes.
Paper: arxiv.org/abs/1910.00760
Code: github.com/lrjconan/GRAN
What if your favorite classifier was also a generative model? We show that ResNets can be made invertible, giving a scalable density model with unrestricted layer architectures. With @JensBehrmann@wgrathwohl @rtqichen @jh_jacobsenarxiv.org/abs/1811.00995