The goal of this course is to introduce students to classical approaches of statistical learning.
The information age has resulted in masses of multivariate data in many fields: finance, marketing, economy, biology, environmental sciences, and the knowledge to handle them in a rigorous and self-critical manner is of great importance in research and industry.
We will give equal importance to theoretical and practical aspects of statistical learning, showing several applications in class and proposing practical sessions in which the student has to perform actual data analysis using the R software.
This course is intended for ENSIMAG students from IF, ISI, and MMIS, as well as those from the M1AM master at UGA.
We will be covering the following topics:
- Review of multivariate statistics
- Simple and multivariate linear regression
- Cross-validation, model selection, and bias-variance tradeoff
- Principal component analysis
- Linear classification: discriminative and generative approaches
- Decision trees
- Ensemble methods: bagging and (gradient) boosting
- Performance metrics and overfitting
- Introduction to network analysis and community detection in graphs
The main reference for our course is the book by James et al. "Introduction to Statistical Learning" freely available here.