If you use this code in your research, please cite the following paper:
@article{mcdermott2024closer,
title={A closer look at auroc and auprc under class imbalance},
author={McDermott, Matthew and Zhang, Haoran and Hansen, Lasse and Angelotti, Giovanni and Gallifant, Jack},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={44102--44163},
year={2024}
}
Run the following commands to clone this repo and create the Conda environment:
git clone git@github.com:hzhang0/auc_bias.git
cd auc_bias
conda env create -f environment.yml
conda activate auc_bias
To reproduce the experiments on synthetic data (Section 3.1 of the paper), run the notebooks/synthetic_exps.ipynb notebook top to bottom.
To train a single model, call train.py with the appropriate arguments, for example:
python -m auc_biases.train \
--output_dir /output/dir \
--dataset adult \
--algorithm xgb \
--balance_groups \
--attribute 0 \
--higher_prev_group_weight 3
To obtain the mimic dataset, see instructions here. The other three datasets are included and/or downloaded automatically.
To reproduce the experiments in the paper which involve training a grid of models using different hyperparameters, use sweep.py as follows:
python sweep.py launch \
--experiment {experiment_name} \
--output_dir {output_root} \
--command_launcher {launcher}
where:
experiment_namecorresponds to experiments defined as classes inexperiments.pyoutput_rootis a directory where experimental results will be stored.launcheris a string corresponding to a launcher defined inlaunchers.py(i.e.slurmorlocal).
The experiment vary_group_weight_with_seeds corresponds to Figure 3. We have also uploaded the results of this experiment here. You can download this pickle file and place it in the notebooks folder before continuing to the next step.
After an experiment has finished running, to create Figures 3, 7, and 8, run notebooks/agg_results.ipynb