Skip to content

MLforHealth/sufficiency_label_bias

Repository files navigation

On Group Sufficiency Under Label Bias

Paper

If you use this code in your research, please cite our NeurIPS 2025 paper:

@inproceedings{zhang2025group,
  title={On Group Sufficiency Under Label Bias},
  author={Zhang, Haoran and Salaudeen, Olawale Elijah and Ghassemi, Marzyeh},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025}
}

Setting Up

Run the following commands to create the Conda environment:

cd sufficiency_label_bias
conda env create -f environment.yml
conda activate label_bias

Running Experiments

To run a single evaluation, call train.py with the appropriate arguments, for example:

python -m sufficiency_label_bias.train \ 
    --output_dir /output/dir \
    --dataset adult \
    --noise_type attr_label \
    --noise_mean 0.75 \
    --algorithm MI_Forward \
    --base_loss GCE \
    --balance_groups \
    --seed 0

Where for --algorithm, MI_Forward is CMI-Reg from the paper, and MinSuf is Suf-Reg, and for the synthetic noise --noise_type, attr_label is "group asymmetric" from the paper, and attr_uniform is "group uniform".

To reproduce the experiments in the paper which involve training a grid of models using different hyperparameters, use sweep.py as follows:

python sweep.py launch \
    --experiment {experiment_name} \
    --output_dir {output_root} \
    --command_launcher {launcher} 

where:

  • experiment_name corresponds to experiments defined as classes in experiments.py
  • output_root is a directory where experimental results will be stored.
  • launcher is a string corresponding to a launcher defined in launchers.py (i.e. slurm or local).

If running experiments on Clothing1M or CivilComments, update add_data_path in experiments.py with the paths to the downloaded datasets.

Aggregating Results

After an experiment has finished running, to create tables and figures, run notebooks/agg_results.ipynb. After ERM models have finished training, the estimated transition matrices can be computed using notebooks/compute_transition_matrix.ipynb. This will then enable running train.py with --transition_mat "estimated".

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors