Jiwon Kang1
Β·
Yeji Choi1
Β·
JoungBin Lee1
Β·
Wooseok Jang1
Β·
Jinhyeok Choi1
Taekeun Kang2
Β·
Yongjae Park2
Β·
Myungin Kim2
Β·
Seungryong Kim1
1KAIST AI Β 2SAMSUNG
Face swapping aims to transfer the identity of a source face onto a target face while preserving target-specific attributes such as pose, expression, lighting, skin tone, and makeup. However, since real ground truth for face swapping is unavailable, achieving both accurate identity transfer and high-quality attribute preservation remains challenging. Recent diffusion-based approaches attempt to improve visual fidelity through conditional inpainting on masked target images, but the masked condition removes crucial appearance cues, resulting in plausible yet misaligned attributes due to the lack of explicit supervision. To address these limitations, we propose APPLE (Attribute-Preserving Pseudo-Labeling), a diffusion-based teacherβstudent framework that enhances attribute fidelity through attribute-aware pseudo-label supervision. We reformulate face swapping as a conditional deblurring task to more faithfully preserve target-specific attributes such as lighting, skin tone, and makeup. In addition, we introduce an attribute-aware inversion scheme to further improve detailed attribute preservation. Through an elaborate attribute-preserving design for teacher learning, APPLE produces high-quality pseudo triplets that explicitly provide the student with direct face-swapping supervision. Overall, APPLE achieves state-of-the-art performance in terms of attribute preservation and identity transfer, producing more photorealistic and target-faithful results.
For instructions in Korean, please refer to README_kor.md.
- 1. Project Overview
- 2. Installation
- 3. Training (Teacher Model)
- 4. Inference (Teacher Model)
- 5. Inference (Student Model)
This document aims to explain the training and inference process of the Diffusion Model (Teacher Model).
- NVIDIA GPU
- Anaconda (Conda)
git clone https://github.com/your-repo/fluxswap.git
cd fluxswapNote:
<PROJECT_ROOT>refers to the absolute path of thisfluxswapdirectory.
This project uses three Conda environments: 3DDFA_env, mediapipe, and faceswap_omini.
1. 3DDFA_env
- Original Github
- Used for 3DMM Landmark extraction.
- Please follow the instructions on the original Github to install the 3DDFA checkpoints and environment.
2. mediapipe
- Original Github
- Used for Gaze Landmark extraction.
conda env create --file preprocess/mediapipe.yaml3. faceswap_omini
- Used for final condition image generation, model training, and inference.
conda env create --file preprocess/faceswap_omini.yaml
conda activate faceswap_omini
# Install mmcv and mmsegmentation
pip install -e preprocess/mmcv
pip install -e preprocess/mmsegmentation- VGGFace2-HQ: The main dataset used for training.
- This document assumes the dataset is stored at a specific path (e.g.,
<VGGFACE2_HQ_PATH>).
- This document assumes the dataset is stored at a specific path (e.g.,
- FFHQ: Used for evaluation.
- The FFHQ dataset consists of
srcandtrgfolders, each having a preprocessing structure similar to VGGFace2-HQ. (See 4.1. FFHQ Dataset Inference for detailed structure).
- The FFHQ dataset consists of
- Use the dataset uploaded to HuggingFace.
- Decompress the
originalfolder and use the data.
The VGGFace2-HQ dataset undergoes a total of 3 preprocessing steps.
- Conda Environment:
3DDFA provided Conda - File to Modify:
<PROJECT_ROOT>/preprocess/3DDFA-V3/demo_from_folder_jiwon_vgg.pyline 24: Modify to the VGGFace2-HQ dataset path (<VGGFACE2_HQ_PATH>).
- Execution:
- Single GPU:
cd <PROJECT_ROOT>/preprocess/3DDFA-V3/ ./run_vgg.sh
- Multi-GPU:
cd <PROJECT_ROOT>/preprocess/3DDFA-V3/ ./run_vgg_multigpu.sh
- Single GPU:
- Result: Saved in
<VGGFACE2_HQ_PATH>/3dmm/folder.
- Conda Environment:
mediapipe - File to Modify:
<PROJECT_ROOT>/preprocess/MediaPipe_Iris/inference.pyline 34,dataset_path: Modify to the VGGFace2-HQ dataset path (<VGGFACE2_HQ_PATH>).
- Execution:
- Single GPU:
cd <PROJECT_ROOT>/preprocess/MediaPipe_Iris/ ./inference.sh
- Multi-GPU:
cd <PROJECT_ROOT>/preprocess/MediaPipe_Iris/ ./inference_torchrun.sh
- Single GPU:
- Result: Saved in
<VGGFACE2_HQ_PATH>/iris/folder.
- Conda Environment:
faceswap_omini - File to Modify:
<PROJECT_ROOT>/preprocess/vgg_preprocess_seg_mask_gaze_multigpu_samsung.pyline 73,image_folder_path: Modify to the VGGFace2-HQ dataset path (<VGGFACE2_HQ_PATH>).
- Execution:
# Activate faceswap_omini environment conda activate faceswap_omini # Run script python <PROJECT_ROOT>/preprocess/vgg_preprocess_seg_mask_gaze_multigpu_samsung.py
- Result: Saved in
<VGGFACE2_HQ_PATH>/condition_blended_image_blurdownsample8_segGlass_landmark_irisfolder.
- Calculate scores using LAION Aesthetics for VGGFace2-HQ images in advance and use them for data filtering.
- You can generate the
score.jsonfile with<PROJECT_ROOT>/preprocess/vgg_preprocess_score_multigpu.py. - An example file used is
<PROJECT_ROOT>/preprocess/score.json.
- Conda Environment:
faceswap_omini - Config File:
<PROJECT_ROOT>/train/config/baseline_vgg_0.35.yamlnetarc_path: Modify to the Arc2Face model path to be used.dataset_path: Modify to the VGGFace2-HQ dataset path (<VGGFACE2_HQ_PATH>).
- Execution:
cd <PROJECT_ROOT>/train/script ./baseline_vgg.sh
- Conda Environment:
faceswap_omini - Checkpoint Used (Example):
<PROJECT_ROOT>/checkpoints/teacher
Example of inference on the FFHQ evaluation dataset.
base_path: Project root path (<PROJECT_ROOT>)ffhq_base_path: Preprocessed FFHQ dataset path. Assumes the following structure:<FFHQ_BASE_PATH>/ βββ src β βββ 3dmm β βββ condition_... β βββ ... β βββ 000000.jpg βββ trg βββ 3dmm βββ condition_... β ... βββ 000000.jpgid_guidance_scale: Higher settings increase ID identity reflection but may decrease attribute preservation. (Minimum value: 1.0)
Without Inversion
CUDA_VISIBLE_DEVICES=0,1,2 torchrun --standalone --nproc_per_node=3 pulid_omini_inference_ffhq_args_multigpu.py \
--base_path <PROJECT_ROOT> \
--ffhq_base_path <FFHQ_BASE_PATH> \
--checkpoint_path <PROJECT_ROOT>/checkpoints/teacher \
--guidance_scale 1.0 \
--image_guidance_scale 1.0 \
--id_guidance_scale 1.0 \
--condition_type 'blur_landmark_iris'With Inversion
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --standalone --nproc_per_node=4 pulid_omini_inference_ffhq_inversion_args_multigpu.py \
--base_path <PROJECT_ROOT> \
--ffhq_base_path <FFHQ_BASE_PATH> \
--checkpoint_path <PROJECT_ROOT>/checkpoints/teacher \
--guidance_scale 1.0 \
--image_guidance_scale 1.0 \
--id_guidance_scale 1.0 \
--condition_type 'blur_landmark_iris'Generate a pseudo dataset based on VGGFace2-HQ. The VGGFace2-HQ dataset must be preprocessed.
- Execution:
- Run the
<PROJECT_ROOT>/pulid_omini_dataset_gen_fluxpseudovgg_multigpu.shshell script. line 34,lora_file_path: You can set the checkpoint path to be used within the script.
- Run the
- Conda Environment:
faceswap_omini - Checkpoint Used (Example):
<PROJECT_ROOT>/checkpoints/student
CUDA_VISIBLE_DEVICES=0,1,2 torchrun --standalone --nproc_per_node=3 pulid_omini_inference_ffhq_args_multigpu.py \
--base_path <PROJECT_ROOT> \
--ffhq_base_path <FFHQ_BASE_PATH> \
--checkpoint_path <PROJECT_ROOT>/checkpoints/student \
--guidance_scale 1.0 \
--image_guidance_scale 1.0 \
--id_guidance_scale 1.0 \