Reproduction Variation Fair Clustering

This is the codebase for the Machine Learning Reproducibility Challenge (MLRC) of the paper Variational Fair Clustering.

Requirements

Install Anaconda: https://www.anaconda.com/distribution/
The code is tested on Python 3.6 in . Refer to the Getting started section for more detail.
Download the datasets Bank, Adult, Census II, Student, and Drugnet, put the files in a directory with the name of the dataset with a capital in the data/[dataset] directory.

Getting started

Clone the repository

Create the environment necessary for running the experiments. Choose the command according to your operating system:

Linux and MacOS

conda env create -f linux_macOS_fact_env.yaml

Windows

conda env create -f windowsOS_fact_env.yaml

Usage of environemnt

To activate the environment, use:

conda activate fact_vfc

To deactivate the environment, use:

conda deactivate

Running the experiments

Our results can be displayed in main.ipynb by running the entire notebook without changing anything. To reproduce our results you can simply change the name of the "outputs" directory. This way a new directory by the name "outputs" will be created and filled with our results by running the entire notebook.

New experiments can also be conducted using the test_fair_clustering.py file. The usage of the file is specified as follows:

test_fair_clustering.py [--seed SEED] [-d DATASET]
                        [--cluster_option CLUSTER_OPTION]
                        [--kernel_type KERNEL_TYPE]
                        [--kernel_args KERNEL_ARGS]
                        [--lmbda LMBDA] [--lmbda-tune LMBDA-TUNE]
                        [--L L] [--data_dir DATA_DIR]
                        [--output_path OUTPUT_PATH]
                        [--plot_option_clusters_vs_lambda PLOT_OPTION_CLUSTERS_VS_LAMBDA]
                        [--plot_option_fairness_vs_clusterE PLOT_OPTION_FAIRNESS_VS_CLUSTERE]
                        [--plot_option_balance_vs_clusterE PLOT_OPTION_BALANCE_VS_CLUSTERE]
                        [--plot_option_convergence PLOT_OPTION_CONVERGENCE]
                        [--plot_bound_update PLOT_BOUND_UPDATE]
                        [--bera BERA]

optional arguments:
  --seed SEED       Fixed seed to initialise clusters
  -d DATASET        Name of the dataset to be used: Synthetic, Synthetic-unequal, Adult, Bank, CensusII
  --cluster_option CLUSTER_OPTION
                    Name of the cluster algorithm to be used: kmedian, kmean, ncut, kernel
  --kernel_type KERNEL_TYPE
                    Name of the kernel type to be used: poly, rad, tanh
  --kernel_args KERNEL_ARGS
                    Arguments to be used within the kernel function: x_y (where x and y are floats)
  --lmbda LMBDA     Initial lambda value
  --lmbda-tune LMBDA-TUNE
                    Whether lambda is tuned during clustering
  --L L             Lipschitz constant in the bound update
  --data_dir DATA_DIR
                    Datadirectory to retrieve datasets
  --output_path OUTPUT_PATH
                    Path where output values need to be stored
  --plot_option_clusters_vs_lambda PLOT_OPTION_CLUSTERS_VS_LAMBDA
                    Plot clusters in 2D w.r.t. lambda
  --plot_option_fairness_vs_clusterE PLOT_OPTION_FAIRNESS_VS_CLUSTERE
                    Plot clustering original energyt w.r.t. fairness
  --plot_option_balance_vs_clusterE PLOT_OPTION_BALANCE_VS_CLUSTERE
                    Plot clustering original energy w.r.t. balance
  --plot_option_convergence PLOT_OPTION_CONVERGENCE
                    Plot convergence of the fair clustering energy
  --plot_bound_update PLOT_BOUND_UPDATE
                    Plot (only one) boundy update
  --bera BERA       Whether Bera et al. results needed to be loaded and converted to metrics of Ziko et al.

To view the notebook with our experimental results, run:

jupyter notebook main.ipynb

Example run

$ python test_fair_clustering.py -d Synthetic --cluster_option kmedian --lmbda 10 --lmbda-tune False

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
bera		bera
bera_res		bera_res
data		data
notebook_utils		notebook_utils
outputs		outputs
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
alter_data.py		alter_data.py
data_visualization.py		data_visualization.py
kernel.py		kernel.py
linux_macOS_fact_env.yaml		linux_macOS_fact_env.yaml
main.ipynb		main.ipynb
reproduction commands.txt		reproduction commands.txt
test_fair_clustering.py		test_fair_clustering.py
tex_table.ipynb		tex_table.ipynb
windowsOS_fact_env.yaml		windowsOS_fact_env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reproduction Variation Fair Clustering

Requirements

Getting started

Linux and MacOS

Windows

Usage of environemnt

Running the experiments

Example run

About

Uh oh!

Releases 1

Packages

Contributors 4

Uh oh!

Languages

MarkiemarkF/FACT

Folders and files

Latest commit

History

Repository files navigation

Reproduction Variation Fair Clustering

Requirements

Getting started

Linux and MacOS

Windows

Usage of environemnt

Running the experiments

Example run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Uh oh!

Languages

Packages