Clone this repo.
git clone https://github.com/jose13579/variable-hyperparameter-image-impainting.git
cd variable-hyperparameter-image-impainting/
We build our project based on Pytorch and Python libraries. To train and test our project, we suggest create a Conda environment from the provided YAML, e.g.
conda env create -f environment.yml
conda activate vhii
Or use the Dockerfile to install conda and prerequisites, e.g.
docker build -t vhii-mage .
If you are having issues to install the environment, we suggest you follow the next steps:
- Remove the cupy from the environment.yml
- Create the conda env
- Manually install the cupy with the correct version:
conda install -c conda-force cupy==7.7.0
We prepare the real and masks datasets.
Preparing Places365 Dataset. The dataset can be downloaded from here. The training set has approximately 1.8 million images from 365 scene categories, where there are at most 5,000 images per category. We adopt the places365-Standard datasets (small images 256 * 256 version) to train and validate our proposed method. The dataset should be arranged in the same directory structure as
datasets
|- places365
|- data_256
|- a
|- airport_terminal
|- <image_id>.jpg
|- <image_id>.jpg
|- airplane_cabin
|- <image_id>.jpg
|- <image_id>.jpg
|- ...
|- ...
|- val_256
|- <image_id>.jpg
|- <image_id>.jpg
Preparing CelebA Dataset. The dataset can be downloaded from here. This dataset contains more than 200,000 large-scale facial celebrity images. We adopt 162,770 for training and 19,961 for test. The dataset should be arranged in the same directory structure as
datasets
|- celeba_dataset
|- train
|- <image_id>.jpg
|- <image_id>.jpg
|- val
|- <image_id>.jpg
|- <image_id>.jpg
|- test
|- <image_id>.jpg
|- <image_id>.jpg
Preparing Paris Street View (PSV) Dataset. The dataset can be downloaded from here. The training and test set includes 14,900 and 100 images, respectively. This dataset was collected from the street views of Paris, taking a large number of buildings, and structure information, such as windows and doors. The dataset should be arranged in the same directory structure as
datasets
|- psv_dataset
|- train
|- <image_id>.jpg
|- <image_id>.jpg
|- test
|- <image_id>.jpg
|- <image_id>.jpg
Preparing Mask Dataset. The dataset can be downloaded from here. This mask dataset contains 12,000 irregular masks grouped into six intervals according to the mask area on the total image size, where each interval has 2,000 masks. We employed three intervals (20-30%, 30-40%, and 40-50%) for test. The dataset should be arranged in the same directory structure as
datasets
|- test_mask
|- mask
|- testing_mask_dataset
|- 10-20
|- <mask_id>.png
|- <mask_id>.png
|- 20-30
|- <mask_id>.png
|- <mask_id>.png
|- ...
Or you can use the dataset grouped into directories per interval here.
The trained model can be download: places
| Model | Seed | Mask set | FID ↓ | LPIPS ↓ | PSNR ↑ | SSIM ↑ | Model Size (Mb) | FLOPS (G) | # Params (M) | Config |
|---|---|---|---|---|---|---|---|---|---|---|
| VHII efficient | 0 | 20-30 | 1.1783 | 0.0649 | 26.4769 | 0.8922 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 0 | 30-40 | 2.3969 | 0.0995 | 24.1554 | 0.8368 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 0 | 40-50 | 4.6187 | 0.1404 | 22.3163 | 0.7751 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 42 | 20-30 | 1.1806 | 0.0650 | 26.4727 | 0.8922 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 42 | 30-40 | 2.4146 | 0.0996 | 24.1472 | 0.8366 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 42 | 40-50 | 4.6141 | 0.1405 | 22.3215 | 0.7750 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 123 | 20-30 | 1.1767 | 0.0650 | 26.4536 | 0.8920 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 123 | 30-40 | 2.4125 | 0.0998 | 24.1337 | 0.8366 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 123 | 40-50 | 4.6877 | 0.1407 | 22.3123 | 0.7749 | 71 | 150.272 | 17.552 | config |
The trained model can be download: celeba.
| Model | Seed | Mask set | FID ↓ | LPIPS ↓ | PSNR ↑ | SSIM ↑ | Model Size (Mb) | FLOPS (G) | # Params (M) | Config |
|---|---|---|---|---|---|---|---|---|---|---|
| VHII efficient | 0 | 20-30 | 0.7854 | 0.0330 | 31.3488 | 0.9415 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 0 | 30-40 | 1.3521 | 0.0490 | 28.7055 | 0.9096 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 0 | 40-50 | 2.2800 | 0.0686 | 26.5571 | 0.8727 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 42 | 20-30 | 0.7714 | 0.0329 | 31.3497 | 0.9416 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 42 | 30-40 | 1.3552 | 0.0491 | 28.6867 | 0.9096 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 42 | 40-50 | 2.2400 | 0.0684 | 26.5921 | 0.8729 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 123 | 20-30 | 0.7822 | 0.0330 | 31.3313 | 0.9415 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 123 | 30-40 | 1.3489 | 0.0491 | 28.7198 | 0.9097 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 123 | 40-50 | 2.2413 | 0.0685 | 26.5672 | 0.8728 | 71 | 150.272 | 17.552 | config |
The trained model can be download: psv.
| Model | Seed | Mask set | FID ↓ | LPIPS ↓ | PSNR ↑ | SSIM ↑ | Model Size (Mb) | FLOPS (G) | # Params (M) | Config |
|---|---|---|---|---|---|---|---|---|---|---|
| VHII efficient | 0 | 20-30 | 24.9343 | 0.0535 | 29.9719 | 0.9146 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 0 | 30-40 | 35.9012 | 0.0787 | 27.6982 | 0.8719 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 0 | 40-50 | 46.5952 | 0.1118 | 25.7796 | 0.8209 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 42 | 20-30 | 26.6362 | 0.0542 | 29.8541 | 0.9137 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 42 | 30-40 | 35.4199 | 0.0802 | 27.6240 | 0.8699 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 42 | 40-50 | 47.6322 | 0.1138 | 25.7706 | 0.8187 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 123 | 20-30 | 26.0129 | 0.0568 | 29.4361 | 0.9110 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 123 | 30-40 | 35.1132 | 0.0794 | 27.7138 | 0.8741 | 71 | 150.272 | 17.552 | config |
| VHII efficient | 123 | 40-50 | 47.5585 | 0.1111 | 25.9231 | 0.8231 | 71 | 150.272 | 17.552 | config |
The trained models can be download:
| Model | Seed | Mask set | FID ↓ | LPIPS ↓ | PSNR ↑ | SSIM ↑ | Model Size (Mb) | FLOPS (G) | # Params (M) | Config |
|---|---|---|---|---|---|---|---|---|---|---|
| VHII efficient 256-128-64-32 | 0 | 20-30 | 0.9393 | 0.0359 | 31.0091 | 0.9391 | 69 | 145.54 | 16.975 | config |
| VHII efficient 256-128-64-32 | 0 | 30-40 | 1.6504 | 0.0532 | 28.4032 | 0.9066 | 69 | 145.54 | 16.975 | config |
| VHII efficient 256-128-64-32 | 0 | 40-50 | 2.7938 | 0.0742 | 26.2939 | 0.8694 | 69 | 145.54 | 16.975 | config |
| VHII efficient 128-64-32-16 | 0 | 20-30 | 1.3365 | 0.0418 | 30.2689 | 0.9323 | 21 | 46.412 | 4.868 | config |
| VHII efficient 128-64-32-16 | 0 | 30-40 | 2.0768 | 0.0620 | 27.7036 | 0.8971 | 21 | 46.412 | 4.868 | config |
| VHII efficient 128-64-32-16 | 0 | 40-50 | 4.1076 | 0.0861 | 25.6279 | 0.8572 | 21 | 46.412 | 4.868 | config |
| VHII efficient 64-32-16-8 | 0 | 20-30 | 2.0768 | 0.0509 | 29.4559 | 0.9208 | 7.4 | 20.12 | 1.655 | config |
| VHII efficient 64-32-16-8 | 0 | 30-40 | 3.837 | 0.0751 | 26.9152 | 0.8860 | 7.4 | 20.12 | 1.655 | config |
| VHII efficient 64-32-16-8 | 0 | 40-50 | 6.7535 | 0.1033 | 24.8644 | 0.8432 | 7.4 | 20.12 | 1.655 | config |
Once the dataset is prepared, new models can be trained with the following commands:
bash run_train.sh --train_config_file
For example:
bash run_train.sh configs/psv_proposal_efficient_128_64_32_16_channels.json
To test the models
-
Download the trained models, and save they in
trained_models/. -
Run the test bash file to evaluate/test the trained model.
bash run_test_dataset.sh --model_name --model_path --seed --gt_dataset_path --mask_dataset_path --output_dataset_path
For example:
bash run_test_dataset.sh "VHII_efficient" "trained_models/celeba/celeba_VHII_efficient/gen_00050.pth" 0 "/data/celeba/celeba_dataset/test/" "/data/pconv/test_mask/20-30/" "test_output_datasets/trained_celeba_VHII_efficient_seed_0/output_images"
The outputs inpainted images are saved at test_output_datasets/.
To measure the quantitative results:
cd metrics
bash run_metrics.sh --gt_dataset_path --output_dataset_path
For example:
bash run_metrics.sh "/data/celeba/celeba_dataset/test/" "/config/variable-hyperparameter-image-impainting/test_output_datasets/trained_celeba_VHII_efficient_seed_0"
To inference a single image like this:
bash run_test_image.sh --model_name --model_path --input_path --mask_path --output_path --output_name
For example:
bash run_test_image.sh "VHII_efficient" trained_models/celeba/celeba_VHII_efficient/gen_00050.pth "examples/img/100_000100_gt.png" "examples/mask/100_000100_mask.png" "examples/output" "100_000100_output"
If you are having problems to test our model (e.g celeba), I suggest you follow the next steps:
- Use the images from examples/ directory, there you can find the examples that you can use to test the proposed model
- Download the celeba model here and put it inside the directory "trained_models/celeba/celeba_VHII_efficient"
- Create the docker image
docker build -t vhii-mage .
- Create a repository
nvidia-docker run --userns=host -it --rm --name vhii-repository -v /work/data/:/data -v /work/code/:/code vhii-mage bash
- Inside the docker repository, you can run the test command
bash run_test_image.sh "VHII_efficient" trained_models/celeba/celeba_VHII_efficient/best_model_celeba.pth "examples/img/100_000100_gt.png" "examples/mask/100_000100_mask.png" "examples/output" "100_000100_output"
@article{Campana2023_Inpainting,
author=(J.L.F. Campana and L.G.L. Decker and M.R. Souza and H.A. Maia and H. Pedrini}
title={Variable-Hyperparameter Visual Transformer for Efficient Image Inpainting},
journal={Computers \& Graphics},
year={2023}
}If you have any questions or suggestions about this paper, feel free to contact me (j209820@dac.unicamp.br).




