user avatar
Mark van der Wilk
@markvanderwilk
Associate Professor in Machine Learning at the University of Oxford. Interested in automatic inductive bias selection using Bayesian tools.
Oxford, UK
Joined November 2014
Posts
  • user avatar
    I'm excited to share that I have joined Imperial College London as a lecturer (asst prof)! I'm convinced it will be a great environment to continue working on GPs, Bayesian Deep Learning, and model-based RL. Do get in touch if you're interested joining to do a PhD!
  • user avatar
    I'm excited to share that I have joined the Dept of CS at the University of Oxford as an Associate Professor, to continue research in Machine Learning! I'm looking forward to new collaborations in Oxford, as well as continuing great existing ones with Imperial colleagues.👇🧵
  • user avatar
    I am still welcoming PhD applicants for 2022 at Imperial College London. We are a growing research group, with clear goals on what new abilities we want to develop in ML and neural networks. Topics: Invariances, neural arch search, (Bayesian) model selection, Gaussian processes.
  • user avatar
    Next Friday (5 June) I will speak about our paper "Learning Invariances using the Marginal Likelihood" in the "Deep Learning: Classics and Trends" reading group. I will discuss Bayesian model selection, and how we can use it to learn inductive biases through backprop.
  • user avatar
    Tomorrow 10 Dec at 11am GMT I will speak at the Bayesian Deep Learning Meetup about **Bayesian Model Selection** and how it can help architecture search. In a short 20 minutes we will discuss why we (Bayesians ∪ Deep Learners) should care, and approaches from now and the past.
  • user avatar
    Why does Bayesian model selection find good inductive biases, instead of giving overfitting behaviour? I made some visualisations to illustrate this for my talk at genu.ai. Code is available. Slides: mvdw.uk/talk/input-den… Plots: github.com/markvdw/infere… 1/👇
    00:00
  • user avatar
    When you take the wide limit of a CNN, the spatial correlations from weight sharing disappear. This is annoying since these make CNNs work! Is this a property that is simply destroyed by taking the infinite limit, like feature learning? We discuss this in our short AABI paper 👇
  • user avatar
    JMLR paper on inducing point selection is out! w/ @BurtDavidR & Carl Rasmussen New results include a recommendation for initialising them. Long story short, alternately - greedily select inducing points with the largest variance of p(f|u, θ), - maximise ELBO wrt hyperparams θ.
    Figure showing the selection of the next inducing point in a sequence at the point that maximises the variance of the prior conditioned on currently selected inducing points.
  • user avatar
    Thanks to @icmlconf for the best paper award for our work on "Rates of Convergence for Sparse Variational Gaussian Process Regression"! First author @davidrburt will be giving the invited talk on Thursday 3PM Hall A. We'll be at poster #237 that evening!
  • user avatar
    Find our paper "Learning invariances using the Marginal Likelihood" at arxiv.org/abs/1808.05563! By describing invariances using samples, we embed them in GPs and learn the invariance distribution. We only need unbiased kernel estimates, and the marginal likelihood is crucial:
  • user avatar
    I really enjoyed working on formulating capsnets as probabilistic inference. I find the ideas behind capsules incredibly intriguing. While we ran into issues (see blog), I hope that our view will be useful. I certainly have hope that more research can make these ideas practical.
    @OATML_Oxford student Lewis Smith wrote a really interesting blogpost exploring his experience working with capsule networks -- explaining how to formulate a generative version of the model and how this revealed conceptual issues with capsules as a whole oatml.cs.ox.ac.uk/blog/2020/07/1…
  • user avatar
    I'm very excited about this work. Deep GPs have many properties that DNNs could benefit from. Our work connects the two to such a strong degree, that a common DNN (e.g. with ReLUs) can be used as a point estimate for a DGP. Nonparametric uncertainty in DNNs, here we come!
    Replying to @vdutor
    In this paper, we use this RKHS to construct an interdomain inducing variable that leads to GP basis functions c(.) that are approximately identical to the activation functions in neural nets. Left: standard RBF basis functions Right: our ReLU basis functions
  • user avatar
    The final version of my thesis is online. Chapter 2 proposes desiderata for approximating GP models, and discusses what non-parametric properties are maintained. Chapter 5 extends the Convolutional GP paper with additional connections to infinite CNNs. markvdw.github.io/vanderwilk-the…
  • user avatar
    Imo, inductive bias selection is the most interesting property of Bayes. My past work used this in GPs, but now we show the principle in NNs! Using Laplace approx we can adjust invariances on the training set, using gradients. A step towards self-adjusting NN architectures!
    Deep neural nets with the right symmetries baked-in (e.g. translational equivariance in CNNs) perform better. But can we learn them from data? We can with differentiable Laplace approximations! 🌟New method: LILA🌟 with @a1mmer @vincefort, @gxr, @markvanderwilk. A thread👇 🧵1/10
    GIF