A CNN multi-label classification model to predict labels of each sample.
Multi-Label Stratification was used to split the dataset into a training set and a validation set. The performance of the model was estimated using focal loss over the validation set. A Densenet121 acts as the backbone of the model. The GlobalMaxPool and GlobalAvgPool layers of the final CNN feature map were concatenated before being fed to two fully connected layers to calculate the probability of each class.
Image augmentation was performed by flipping the image, rotating it 90 degrees, and randomly cropping out 1024 ⨉ 1024 pixel (px) patches. To improve the model’s predictive power, multiple random crops were taken, and the maximum probability among them was calculated when predicting on the test set.
A combined loss function of focal loss, Lovasz loss, and hard example log loss was used for training the model. It was optimized using an Adam optimizer with a step learning rate of [30, 15, 7.5, 3, 1] ⨉ 1e-5 for [25, 5, 5, 5, 5] epochs respectively. The output was thresholded using the ratio of labels in the training set.
Using this method, the CNN reached 0.565 Macro F1 on the test set when averaging predictions from 5 folds.
The original model source can be found here.
The trained model files can be found here.
-
The basic Runtime Environment is python3.6, pytorch0.4.1, you can refer to requriements.txt to set up your environment.
-
Data process
-
Go to subdirectory
cd src/data_process -
Download v18 external data
python s1_download_hpa_v18.py
-
Resize tif image to 768 and 1536
python s2_resize_tif_image.py --dataset train --size 768 python s2_resize_tif_image.py --dataset test --size 768 python s2_resize_tif_image.py --dataset train --size 1536 python s2_resize_tif_image.py --dataset test --size 1536
-
Resize v18 external image to 512, 768 and 1536
python s3_resize_external_image.py --size 512 python s3_resize_external_image.py --size 768 python s3_resize_external_image.py --size 1536
-
Generate meta data
python s4_generate_meta.py
-
Search matching samples from training set and v18 external data
python s5_train_match_external.py
-
Search matching samples from test set
python s6_test_match_test.py
-
Split training set and validation set
python s7_generate_split.py
-
Calculate mean and std of images
python s8_generate_images_mean_std.py
-
Split training set and validation set for arcface models
python s9_generate_antibody_split.py
-
Modify wrong targets base on xml file from kaggle forum
python s10_generate_correct_external_meta.py
-
Generate leak test set
python s11_generate_test_leak_meta.py
-
-
Training
-
Go to subdirectory
cd src/run -
classification model
python train.py \ --out_dir external_crop512_focal_slov_hardlog_class_densenet121_dropout_i768_aug2_5folds \ --gpu_id 0,1,2,3 --arch class_densenet121_dropout --scheduler Adam55 --epochs 55 \ --img_size 768 --crop_size 512 --batch_size 48 --split_name random_ext_folds5 --fold 0python train.py \ --out_dir external_crop1024_focal_slov_hardlog_clean_class_densenet121_large_dropout_i1536_aug2_5folds \ --gpu_id 0,1,2,3 --arch class_densenet121_large_dropout --scheduler adam45 --epochs 45 \ --img_size 1536 --crop_size 1024 --batch_size 36 --split_name random_ext_noleak_clean_folds5 --fold 0python train.py \ --out_dir external_crop512_focal_slov_hardlog_class_inceptionv3_dropout_i768_aug2_5folds \ --gpu_id 0,1,2,3 --arch class_inceptionv3_dropout --scheduler adam45 --epochs 45 \ --img_size 768 --crop_size 512 --batch_size 64 --split_name random_ext_noleak_clean_folds5 --fold 0python train.py \ --out_dir external_crop1024_focal_slov_hardlog_clean_class_resnet34_dropout_i1536_aug2_5folds \ --gpu_id 0,1,2,3 --arch class_resnet34_dropout --scheduler adam45 --epochs 45 \ --img_size 1536 --crop_size 1024 --batch_size 48 --split_name random_ext_noleak_clean_folds5 --fold 0 -
metric learning model
python train_ml.py \ --out_dir face_all_class_resnet50_dropout_i768_aug2_5folds \ --gpu_id 0,1,2,3 --arch class_resnet50_dropout --scheduler FaceAdam --epochs 50 \ --img_size 768 --batch_size 32
-
-
Predicting
-
Go to subdirectory
cd src/run -
classification model
python test.py \ --out_dir external_crop512_focal_slov_hardlog_class_densenet121_dropout_i768_aug2_5folds \ --gpu_id 0 --arch class_densenet121_dropout \ --img_size 768 --crop_size 512 --seeds 0,1,2,3 --batch_size 12 --fold 0 \ --augment default,flipud,fliplr,transpose,flipud_lr,flipud_transpose,fliplr_transpose,flipud_lr_transposepython test.py \ --out_dir external_crop1024_focal_slov_hardlog_clean_class_densenet121_large_dropout_i1536_aug2_5folds \ --gpu_id 0 --arch class_densenet121_large_dropout \ --img_size 1536 --crop_size 1024 --seeds 0,1,2,3 --batch_size 8 --fold 0 \ --augment default,flipud,fliplr,transpose,flipud_lr,flipud_transpose,fliplr_transpose,flipud_lr_transposepython test.py \ --out_dir external_crop512_focal_slov_hardlog_class_inceptionv3_dropout_i768_aug2_5folds \ --gpu_id 0 --arch class_inceptionv3_dropout \ --img_size 768 --crop_size 512 --seeds 0,1,2,3 --batch_size 24 --fold 0 \ --augment default,flipud,fliplr,transpose,flipud_lr,flipud_transpose,fliplr_transpose,flipud_lr_transposepython test.py \ --out_dir external_crop1024_focal_slov_hardlog_clean_class_resnet34_dropout_i1536_aug2_5folds \ --gpu_id 0 --arch class_resnet34_dropout \ --img_size 1536 --crop_size 1024 --seeds 0,1,2,3 --batch_size 12 --fold 0 \ --augment default,flipud,fliplr,transpose,flipud_lr,flipud_transpose,fliplr_transpose,flipud_lr_transpose -
metric learning model
python test_ml.py \ --out_dir face_all_class_resnet50_dropout_i768_aug2_5folds \ --gpu_id 0,1,2,3 --arch class_resnet50_dropout \ --img_size 768 --batch_size 32 --dataset test --predict_epoch 45
-
-
Ensemble
-
Go to subdirectory
cd src/ensemble -
Make ensemble
python ensemble_augment.py \ --fold 0 --epoch_name final \ --model_name external_crop512_focal_slov_hardlog_class_densenet121_dropout_i768_aug2_5folds \ --augments default,flipud,fliplr,transpose,flipud_lr,flipud_transpose,fliplr_transpose,flipud_lr_transpose \ --do_valid 0 --do_test 1 --update 1 --seeds 0,1,2,3 --ensemble_type maximumpython ensemble_folds.py \ --en_cfgs external_crop512_focal_slov_hardlog_class_densenet121_dropout_i768_aug2_5folds \ --do_valid 1 --do_test 1 --update 1
-
-
Post processing
-
Go to subdirectory
cd src/post_processing -
Search the most similar samples by metric learning model
python s1_calculate_distance.py \ --model_name face_all_class_resnet50_dropout_i768_aug2_5folds --epoch_name 045 \ --do_valid 0 --do_test 1 -
Modify submissions
python s2_modify_result.py \ --model_name external_crop1024_focal_slov_hardlog_clean_class_densenet121_large_dropout_i1536_aug2_5folds \ --face_model_name face_all_class_resnet50_dropout_i768_aug2_5folds \ --out_name d121_i1536_aug2_maximum_5folds_f012_max_test_ratio2_face_r50_i768 --threshold 0.65
-