Mark van der Wilk (@markvanderwilk) / X

Mark van der Wilk

397 posts

Mark van der Wilk

@markvanderwilk

Associate Professor in Machine Learning at the University of Oxford. Interested in automatic inductive bias selection using Bayesian tools.

Oxford, UK

Joined November 2014

Mark van der Wilk
@markvanderwilk
Jan 7, 2020
I'm excited to share that I have joined Imperial College London as a lecturer (asst prof)! I'm convinced it will be a great environment to continue working on GPs, Bayesian Deep Learning, and model-based RL. Do get in touch if you're interested joining to do a PhD!
Mark van der Wilk
@markvanderwilk
Sep 14, 2023
I'm excited to share that I have joined the Dept of CS at the University of Oxford as an Associate Professor, to continue research in Machine Learning! I'm looking forward to new collaborations in Oxford, as well as continuing great existing ones with Imperial colleagues.👇🧵
54K
Mark van der Wilk
@markvanderwilk
Dec 9, 2021
I am still welcoming PhD applicants for 2022 at Imperial College London. We are a growing research group, with clear goals on what new abilities we want to develop in ML and neural networks. Topics: Invariances, neural arch search, (Bayesian) model selection, Gaussian processes.
Mark van der Wilk
@markvanderwilk
May 30, 2020
Next Friday (5 June) I will speak about our paper "Learning Invariances using the Marginal Likelihood" in the "Deep Learning: Classics and Trends" reading group. I will discuss Bayesian model selection, and how we can use it to learn inductive biases through backprop.
Mark van der Wilk
@markvanderwilk
Dec 9, 2020
Tomorrow 10 Dec at 11am GMT I will speak at the Bayesian Deep Learning Meetup about **Bayesian Model Selection** and how it can help architecture search. In a short 20 minutes we will discuss why we (Bayesians ∪ Deep Learners) should care, and approaches from now and the past.
Mark van der Wilk
@markvanderwilk
Oct 18, 2021
Why does Bayesian model selection find good inductive biases, instead of giving overfitting behaviour? I made some visualisations to illustrate this for my talk at genu.ai. Code is available. Slides: mvdw.uk/talk/input-den… Plots: github.com/markvdw/infere… 1/👇
00:00
Mark van der Wilk
@markvanderwilk
Jan 12, 2021
When you take the wide limit of a CNN, the spatial correlations from weight sharing disappear. This is annoying since these make CNNs work! Is this a property that is simply destroyed by taking the infinite limit, like feature learning? We discuss this in our short AABI paper 👇
Mark van der Wilk
@markvanderwilk
Jul 27, 2020
JMLR paper on inducing point selection is out! w/ @BurtDavidR & Carl Rasmussen New results include a recommendation for initialising them. Long story short, alternately - greedily select inducing points with the largest variance of p(f|u, θ), - maximise ELBO wrt hyperparams θ.
Mark van der Wilk
@markvanderwilk
Jun 11, 2019
Thanks to @icmlconf for the best paper award for our work on "Rates of Convergence for Sparse Variational Gaussian Process Regression"! First author @davidrburt will be giving the invited talk on Thursday 3PM Hall A. We'll be at poster #237 that evening!
Cambridge MLG
@CambridgeMLG
Jun 11, 2019
Congratulations to our student @BurtDavidR for receiving an ICML 2019 Best Paper Award for “Rates of Convergence for Sparse Variational Gaussian Process Regression”, jointly with Carl E. Rasmussen and @markvanderwilk! (arxiv.org/pdf/1903.03571…) @icmlconf #ICML2019 #bestpaper
arxiv.org
Rates of Convergence for Sparse Variational Gaussian Process Regression
Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational...
Mark van der Wilk
@markvanderwilk
Aug 17, 2018
Find our paper "Learning invariances using the Marginal Likelihood" at arxiv.org/abs/1808.05563! By describing invariances using samples, we embed them in GPs and learn the invariance distribution. We only need unbiased kernel estimates, and the marginal likelihood is crucial:
Mark van der Wilk
@markvanderwilk
Jul 15, 2020
I really enjoyed working on formulating capsnets as probabilistic inference. I find the ideas behind capsules incredibly intriguing. While we ran into issues (see blog), I hope that our view will be useful. I certainly have hope that more research can make these ideas practical.
Yarin
@yaringal
Jul 13, 2020
@OATML_Oxford student Lewis Smith wrote a really interesting blogpost exploring his experience working with capsule networks -- explaining how to formulate a generative version of the model and how this revealed conceptual issues with capsules as a whole oatml.cs.ox.ac.uk/blog/2020/07/1…
Mark van der Wilk
@markvanderwilk
May 11, 2021
I'm very excited about this work. Deep GPs have many properties that DNNs could benefit from. Our work connects the two to such a strong degree, that a common DNN (e.g. with ReLUs) can be used as a point estimate for a DGP. Nonparametric uncertainty in DNNs, here we come!
Vincent Dutordoir
@vdutor
May 11, 2021
Replying to @vdutor
In this paper, we use this RKHS to construct an interdomain inducing variable that leads to GP basis functions c(.) that are approximately identical to the activation functions in neural nets. Left: standard RBF basis functions Right: our ReLU basis functions
Mark van der Wilk
@markvanderwilk
Nov 25, 2018
The final version of my thesis is online. Chapter 2 proposes desiderata for approximating GP models, and discusses what non-parametric properties are maintained. Chapter 5 extends the Convolutional GP paper with additional connections to infinite CNNs. markvdw.github.io/vanderwilk-the…
Mark van der Wilk
@markvanderwilk
Oct 14, 2022
Imo, inductive bias selection is the most interesting property of Bayes. My past work used this in GPs, but now we show the principle in NNs! Using Laplace approx we can adjust invariances on the training set, using gradients. A step towards self-adjusting NN architectures!
Tycho van der Ouderaa
@tychovdo
Oct 14, 2022
Deep neural nets with the right symmetries baked-in (e.g. translational equivariance in CNNs) perform better. But can we learn them from data? We can with differentiable Laplace approximations! 🌟New method: LILA🌟 with @a1mmer @vincefort, @gxr, @markvanderwilk. A thread👇 🧵1/10
GIF