This repository contains two machine learning tutorials. We will learn how to apply several machine learning algorithms, ranging from polynomial regression and random forests to deep convolutional neural networks, in order to answer a science qeustion (see e.g. Wu 2020). These notebooks will also help you gain familiarity with the scikit-learn and pytorch/fastai Python packages for machine learning.
Presented by John F. Wu (@jwuphysics).
Open the Colab notebooks for the introductory machine learning (part 1) and the deep learning (part 2) sessions.
If you want to run these notebooks locally, then you should clone the repository and set up a conda environment with the necessary packages (numpy, scipy, matplotlib, pandas, scikit-learn, pytorch, fastai). The installation process might depend on (a) whether you have an NVIDIA graphics card, and (b) what version of CUDA your system is running. To avoid these complications, just use the Colab notebook!
- Before getting started
- Can we predict a galaxy's neutral hydrogen (HI) content?
- Examine data with pandas
- A very simplified glossary for xGASS
- Examine and clean features
- Visualize correlations
- Polynomial regression
- Multivariate linear regression
- Train-test split
- Cross-validation
- Quadratic and higher-order polynomial models
- Overfitting
- Decision trees
- How scikit-learn does it
- Random forests
- Optimize hyperparameters
- Feature importances
- Introducing the science problem (again)
- Solving the task with a CNN
- Understanding convolutions
- Other neural network ingredients
- A simple CNN model in action (forward)
- Optimization
- A simple CNN model in action (forward + backward)
- Hyperparameters