Introduction to Bayesian data analysis (home page)
Instructors
Bruno Nicenboim
Shravan Vasishth
Dates and location
March 2020, taught online.
Overview
In recent years, Bayesian methods have come to be widely adopted in all areas of science. This is in large part due to the development of sophisticated software for probabilisic programming; a recent example is the astonishing computing capability afforded by the language Stan (mc-stan.org). However, the underlying theory needed to use this software sensibly is often inaccessible because end-users don't necessarily have the statistical and mathematical background to read the primary textbooks (such as Gelman et al's classic Bayesian data analysis, 3rd edition). In this course, we seek to cover this gap, by providing a relatively accessible and technically non-demanding introduction to the basic workflow for fitting different kinds of linear models using Stan. To illustrate the capability of Bayesian modeling, we will use the R package RStan and a powerful front-end R package for Stan called brms.
Prerequisites
We assume familiarity with R. Participants will benefit most if they have previously fit linear models and linear mixed models (using lme4) in R, in any scientific domain within linguistics and psychology. No knowledge of calculus or linear algebra is assumed (but will be helpful to know), but basic school level mathematics knowledge is assumed (this will be quickly revisited in class).
Please install the following software before coming to the course
We will be using the software
R,
and
RStudio,
so make sure you install these on your computer.
You should also install the R package
rstan; the R package
brms.
Outcomes
After completing this course, the participant will have become familiar with the foundations of Bayesian inference using Stan (RStan and brms), and will be able to fit a range of multiple regression models and hierarchical models, for normally distributed data, and for lognormal and Binomially distributed data. They will know how to calibrate their models using prior and posterior predictive checks; they will be able to establish true and false discovery rates to validate discovery claims. If there is time, we will discuss how to carry out model comparison using Bayes factors and k-fold cross validation.
Online interaction
We will use google groups and zoom. A link to the private group will be sent to participants.
Course materials
Click here to download everything. If you use github, you can clone this repository:
https://github.com/vasishth/IntroductionBDA
Textbook (in progress):
See
here. PDF version available on request.
Part 1 (Monday-Tuesday): Shravan Vasishth
The lectures correspond roughly to chapters 1 and 2 of our
textbook in preparation
- Monday:
-
Introductory video
-
PDF: 00 Frequentist Foundations (review of some basic ideas)
Exercises: 00 Frequentist Foundations Exercises
-
PDF: 01 Foundations
Exercises Part 1: 01 Foundations Exercises Part 1
Exercises Part 2: 01 Foundations Exercises Part 2
- Tuesday:
-
PDF: 02 Introduction to Bayesian methods
Exercises: 02 Introduction to Bayesian methods Exercises
-
PDF: 02 Sampling
02 Sampling, Additional Notes
Part 2 (Wednesday-Friday): Bruno Nicenboim
For this part of the workshop besides rstan and brms, be sure to have the following packages installed (and loaded in your session):
MASS, dplyr, tidyr, purrr, readr, extraDistr,
ggplot2, brms, bayesplot, tictoc, gridExtra
The lectures correspond roughly to chapters 3, 4 and 5 of our
textbook in preparation
- Wednesday - 03 Computational Bayesian data analysis
Slides and exercises
Stan slides
Part 1
Part 2
Part 3
A brief intro to Stan
- Thursday - 04 - Bayesian regression models
Slides and exercises
More exercises
Part 1 (Linear model)
Part 2 (Log-normal regression)
Part 3 (Logistic regression)
- Friday
05 - Bayesian hierarchical models
Slides
Exercises
06 - Model comparison with Bayes factor
Slides
Exercises
Case studies:
Three case studies (zip archive): meta-analysis, measurement error models, and an example of pre-registration.
Tentative schedule
Depending on the class, we may go faster or slower, so I may not adhere to this exact schedule.
Additional readings
R programming
- Getting started with R
- R for data science
- Efficient R programming.
Books
-
A Student's Guide to Bayesian Statistics, by Ben Lambert: A good, non-technical introduction to Stan and Bayesian modeling.
- Statistical Rethinking, by Richard McElreath: A classic introduction.
- Doing Bayesian Data Analysis, Second Edition:
A Tutorial with R, JAGS, and Stan, By John Kruschke: A good introduction specifically for psychologists.
Tutorial articles and materials
-
brms tutorial by the author of the package, Paul Buerkner.
- Ordinal regression models in psychological research: A tutorial, by Buerkner and Vuorre.
-
Contrast coding tutorial, by Schad, Hohenstein, Vasishth, Kliegl.
-
Bayesian workflow tutorial, by Schad, Betancourt, Vasishth.
-
Linear mixed models tutorial, Sorensen, Hohenstein, Vasishth.
-
brms tutorial for phonetics/phonology, Vasishth, Nicenboim, Beckman, Li, Kong.
- Reproducible workflows tutorial
- Michael Betancourt's resources: These are a must if you want to get deeper into Stan and Bayesian modeling.
- MCMC animations/visualizations,McElreath's blog post on MCMC
Some example articles from our lab and other groups that use Bayesian methods
-
Example random-effects meta-analysis (phonetics data on neutralization).
- A second example of a large-scale study and a random-effects meta-analysis (EEG data)
- A third example of a large-scale study and a random-effects meta-analysis (reading data)
-
Example of a hierarchical finite mixture model using Stan.
- Replication attempt of a published study.
- Another (large-sample) replication attempt of a published study.
- Bayesian analysis of relatively large-sample psycholinguistic experiment.
- Examples of regression analyses by Vehtari and colleagues