If you use this code in your research, please cite our NeurIPS 2025 paper:
@inproceedings{zhang2025group,
title={On Group Sufficiency Under Label Bias},
author={Zhang, Haoran and Salaudeen, Olawale Elijah and Ghassemi, Marzyeh},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025}
}
Run the following commands to create the Conda environment:
cd sufficiency_label_bias
conda env create -f environment.yml
conda activate label_bias
To run a single evaluation, call train.py with the appropriate arguments, for example:
python -m sufficiency_label_bias.train \
--output_dir /output/dir \
--dataset adult \
--noise_type attr_label \
--noise_mean 0.75 \
--algorithm MI_Forward \
--base_loss GCE \
--balance_groups \
--seed 0
Where for --algorithm, MI_Forward is CMI-Reg from the paper, and MinSuf is Suf-Reg, and for the synthetic noise --noise_type, attr_label is "group asymmetric" from the paper, and attr_uniform is "group uniform".
To reproduce the experiments in the paper which involve training a grid of models using different hyperparameters, use sweep.py as follows:
python sweep.py launch \
--experiment {experiment_name} \
--output_dir {output_root} \
--command_launcher {launcher}
where:
experiment_namecorresponds to experiments defined as classes inexperiments.pyoutput_rootis a directory where experimental results will be stored.launcheris a string corresponding to a launcher defined inlaunchers.py(i.e.slurmorlocal).
If running experiments on Clothing1M or CivilComments, update add_data_path in experiments.py with the paths to the downloaded datasets.
After an experiment has finished running, to create tables and figures, run notebooks/agg_results.ipynb. After ERM models have finished training, the estimated transition matrices can be computed using notebooks/compute_transition_matrix.ipynb. This will then enable running train.py with --transition_mat "estimated".