- The required packages are installed as a new conda environment including both R and Python dependencies with the following command:
conda create --name envbase -f requirements_conda.yml
- The
missing R packagescan be found in the "requirements_r.rda" file and can be downloaded using the following commands:
load("requirements_r.rda")
for (count in 1:length(installed_packages)) {
install.packages(installed_packages[count])
}
⚠️ Forreticulate, if asked for default python virtual environment, the answer should benoto take the default conda environment into consideration
- The
permimppackage can be downloaded with the following commands:
install.packages('permimp', repos=NULL, type='source')
- The
sandboxpackage can be downloaded going inside "code_dcrt" with:
* python setup.py build_ext --inplace
* pip install -e
- For the 3 first experiments,
compute_simulationsis used along withplot_simulations_all:-
Set
N_SIMULATIONSto 1:100 to perform the 100 runs. -
Set
N_CPUaccording to the reserved resources (parallel) or 1 (serial). -
For the first experiment:
- Set
DEBUGto FALSE. - Uncomment both
permfitandcpi. n_samplesis set to 300 andn_featuesis set to 100- Uncomment all the
rhovalues. - Set
prob_sim_datatoregression_perm. - In
stat_knockoff, uncomment (lasso_cv). - The output csv file
simulation_results_blocks_100_Mi_dnn_dnn_py_300:100is found inresults/results_csv.
- Set
-
For the second experiment:
- Set
DEBUGto FALSE. - Uncomment both
permfitandcpi. - Set
n_sampleston_samples = `if`(!DEBUG, seq(100, 1000, by = 100), 10L)(comment line 84 and uncomment line 85). - Set
n_featuresto 50 - In
prob_sim_data, commentregression_permand uncomment all the rest. - In
stat_knockoff, uncomment (lasso_cv). - The output csv file
simulation_results_blocks_100_dnn_dnn_py_perm_100--1000is found inresults/results_csv.
- Set
-
For the third experiment:
- Uncomment all methods.
- Set
n_samplesto 1000 andn_featuresto 50. - In
prob_sim_data, commentregression_permand uncomment all the rest. - In
stat_knockoff, uncomment (lasso_cv,bartanddeep). - The output csv file
simulation_results_blocks_100_allMethods_pred_finalis found inresults/results_csv.
-
For the forth experiment, we move to the
ukbbfolder:- The data are the public data from UK Biobank that needs to sign an agreement before using it (Any personal data are already removed).
- In the
processscripts, change method topermfit_dnnorcpi_dnnto process the data and explore the importance of the variables using one of the methods. - The corresponding results per method are found in
Results_variablesfolder.
-
-
For the section D:
- Set
DEBUGto FALSE. - Uncomment both
cpiandloco_dnn(The last item uncommitted shouldn't be followed by a comma). - Set
n_samplesto 1000,n_featuresto 50 andrhoto 0.8. - In
prob_sim_data, uncommentregression.
- Set
-
The output csv file
simulation_results_blocks_100_CPI_LOCO_DNNis found inresults/results_csv. -
For the section M:
-
We use
compute_simulations_py. -
Large scale simulation:
- The script can be launched with the following command:
python -u compute_simulations_py.py --n 10000 --p 50 --nsig 20 --nblocks 10 --intra 0.8 --conditional 1 --f 1 --s 100 --njobs 1--nstands for the number of samples--pstands for the number of variables--nsigstands for the number of significant variables randomly chosen--nblocksstands for the number of blocks/groups in the data structure--intrastands for the intra correlation inside the groups--conditionalstands for the use of CPI (1) or PI (0)--fstands for the first point of the range (Default1)--sstands for the step-size i.e. range size (Default100)--njobsstands for the serial/parallel implementation underJoblib(Default1)- The csv output file
simulation_results_blocks_100_n_10000_p_50_cpi_permfitis found inresults/results_csv.
- The script can be launched with the following command:
-
UK Biobank semi-simulation:
- The
filenameshould be changed to the corresponding UKBB data (not publicly available). - The script can be launched with the following command:
python -u compute_simulations_py.py --nsig 115 --conditional 1 --f 1 --s 100 --njobs 1 python -u compute_simulations_py.py --nsig 115 --conditional 0 --f 1 --s 100 --njobs 1--nsigstands for the number of significant variables randomly chosen--conditionalstands for the use of CPI (1) or PI (0)--fstands for the first point of the range (Default1)--sstands for the step-size i.e. range size (Default100)--njobsstands for the serial/parallel implementation underJoblib(Default1)- The csv output file
simulation_results_blocks_100_UKBB_singleis found inresults/results_csv.
- The
-
-
For the section N:
- The Cam-CAN data is not publicly available, thus we provide the script process_age_prediction_CamCAN in order to compute the degree of significance for each frequency band.
- The output csv file
Result_single_FREQ_all_imp_outer_10_inneris found incamcan.
- We move to the
plot_simulations_all:
-
For the first experiment with simulation_results_blocks_100_Mi_dnn_dnn_py_300:100 as input:
- Change
source(at line 2) tosource("utils/plot_methods_all_Mi.R"). - Set
nb_relevantto 20 andN_CPUto the number of dedicated resources. - Set
run_plot_auc,run_plot_type1error,run_plot_powerandrun_timeone by one to TRUE. - Set
run_plot_combineandrun_all_methodsto FALSE. - Uncomment (
Permfit-DNNandCPI-DNN). - The output csv files
AUC_blocks_100_Mi_dnn_dnn_py_300:100,power_blocks_100_Mi_dnn_dnn_py_300:100,type1error_blocks_100_Mi_dnn_dnn_py_300:100andtime_bars_blocks_100_Mi_dnn_dnn_py_300:100are found inresults/results_csv.
- Change
-
For the second experiment with simulation_results_blocks_100_dnn_dnn_py_perm_100--1000 as input:
- Change
source(at line 2) tosource("utils/plot_methods_all_increasing_combine.R"). - Set
nb_relevantto 20 andN_CPUto the number of dedicated resources. - Set
run_plot_combineto TRUE. - Set
run_all_methodsto FALSE. - Set
run_plot_auc,run_plot_type1error,run_plot_powerandrun_timeone by one to FALSE. - Uncomment (
Permfit-DNNandCPI-DNN). - The output csv files
AUC_blocks_100_dnn_dnn_py_perm_100--1000andtype1error_blocks_100_dnn_dnn_py_perm_100--1000are found inresults/results_csv.
- Change
-
For the third experiment with simulation_results_blocks_100_allMethods_pred_final as input:
- Change
source(at line 2) tosource("utils/plot_methods_all.R"). - Set
nb_relevantto 20 andN_CPUto the number of dedicated resources. - Set
run_plot_auc,run_plot_type1error,run_plot_powerone by one to TRUE. - Set
run_plot_combineandrun_timeto FALSE. - Set
run_all_methodsandwith_pvalto TRUE. - Uncomment (
Marg,d0CRT,Permfit-DNN,CPI-DNN,CPI-RF,lazyvi,cpi_knockoff,locoandStrobl). - The output csv files
AUC_blocks_100_allMethods_pred_imp_final_withPval,power_blocks_100_allMethods_pred_imp_final`` andtype1error_blocks_100_allMethods_pred_imp_finalare found inresults/results_csv```.
- Change
- For the supplementary experiments:
-
For the section D with simulation_results_blocks_100_CPI_LOCO_DNN as input:
- Change
source(at line 2) tosource("utils/plot_methods_all.R"). - Set
nb_relevantto 20 andN_CPUto the number of dedicated resources. - Set
run_plot_auc,run_plot_type1error,run_plot_powerandrun_timeone by one to TRUE. - Set
run_plot_combineandrun_all_methodsto FALSE. - Uncomment (
LOCO-DNNandCPI-DNN). - The output csv files
AUC_blocks_100_CPI_LOCO_DNN,power_blocks_100_CPI_LOCO_DNN,type1error_blocks_100_CPI_LOCO_DNNandtime_bars_blocks_100_CPI_LOCO_DNNare found inresults/results_csv.
- Change
-
For the section I with simulation_results_blocks_100_allMethods_pred_final as input:
- Change
source(at line 2) tosource("utils/plot_methods_all.R"). - Set
nb_relevantto 20 andN_CPUto the number of dedicated resources. - Set
run_plot_aucto TRUE. - Set
run_plot_type1error,run_plot_power,run_time``````run_plot_combineandwith_pvalto FALSE. - Set
run_all_methodsto TRUE. - Uncomment (
Knockoff_bart,Knockoff_lasso,Shap,SAGE,MDI,BART,Knockoff_deep,Knockoff_pathandKnockoff_lasso). - The output csv file
AUC_blocks_100_allMethods_pred_imp_final_withoutPvalis found inresults/results_csv.
- Change
-
For the section K with simulation_results_blocks_100_allMethods_pred_final as input:
- Change
source(at line 2) tosource("utils/plot_methods_all.R"). - Set
run_timeto TRUE and the rest to FALSE. - Uncomment all the methods.
- The output csv file
time_bars_blocks_100_allMethods_pred_imp_finalis found inresults/results_csv.
- Change
-
For the section M:
- Change
source(at line 2) tosource("utils/plot_methods_all.R"). - Set
run_plot_auc,run_plot_type1error,run_plot_powerandrun_timeone by one to TRUE. - Set
run_plot_combine,run_all_methodsandwith_pvalto FALSE. - Uncomment (
Permfit-DNN,CPI-DNN). - Large scale simulation with simulation_results_blocks_100_n_10000_p_50_cpi_permfit as input:
- Set
nb_relevantto 20 andN_CPUto the number of dedicated resources. - The output csv files are found in
results/results_csvunder[AUC-type1error-power-time_bars]_blocks_100_groups_CPI_n_10000_p_50_cpi_permfit.
- Set
- UK Biobank semi simulation:
- Set
nb_relevantto 115 andN_CPUto the number of dedicated resources. - The output csv files are found in
results/results_csvunder[AUC-type1error-power-time_bars]_blocks_100_UKBB_single.
- Set
- Change
-
- We move to the
visualizationwith 4 notebooksplot_figure_simulations,plot_figure_simulations_2,plot_figure_simulations_3plot_ukbb_resultsandplot_freqRes:plot_figure_simulationsfor the plots in the main text.plot_figure_simulations_2andplot_figure_simulations_3for the plots in the supplement.plot_ukbb_resultsfor the plot of the forth experiment.plot_freqResfor the Cam-CAN corresponding plot.