High-fidelity and High-efficiency Talking Portrait Synthesis with Detail-aware Neural Radiance Fields

This repository provides a PyTorch implementation for the paper: High-fidelity and High-efficiency Talking Portrait Synthesis with Detail-aware Neural Radiance Fields. (on TVCG2024, Manuscript DOI: 10.1109/TVCG.2024.3488960)

A self-driven generated video of our method: here

A cross-driven generated video of our method: here

Installation

Tested on Ubuntu 22.04, Pytorch 2.0.1 and CUDA 11.6.

git clone https://github.com/muyuWang/HHNeRF.git
cd HHNeRF

Install dependency

pip install -r requirements.txt

Data pre-processing

Our data preprocessing method follows previous work AD-NeRF, SSP-NeRF and RAD-NeRF. We provide some HR videos in 900 * 900 resolution. In data preprocessing, please downsample them to 450 * 450. Then use the downsampled frames to perform data preprocessing in RAD-NeRF (extract images, detect lands, face parsing, extract background, estimate head pose ...). With the extracted landmarks, extract patches from the eye region and then utilize a ResNet model to extract their features.

Finally, file structure after finishing all steps:

./data/<ID>
├──<ID>.mp4 # original video
├──ori_imgs # original images from video
│  ├──0.png
│  ├──0.lms # 2D landmarks
│  ├──...
├──hr_imgs # HR ground truth frames (static background)
│  ├──0.jpg
│  ├──...
├──eye_features # eye patches and features
│  ├──0_l.png # left eye
│  ├──0_r.png # right eye
│  ├──0.pt # eye feature
│  ├──...
├──gt_imgs # ground truth images (static background)
│  ├──0.jpg
│  ├──...
├──parsing # semantic segmentation
│  ├──0.png
│  ├──...
├──torso_imgs # inpainted torso images
│  ├──0.png
│  ├──...
├──aud.wav # original audio 
├──aud.npy # audio features (deepspeech)
├──bc.jpg # default background
├──track_params.pt # raw head tracking results
├──transforms_train.json # head poses (train split)
├──transforms_val.json # head poses (test split)

Some HR talking videos and processed data can be downloaded at baidudisk.

Usage

The training script is in train.sh.. Here is an example.

Training DaNeRF module:

python main.py data/Sunak/ --workspace trial/Sunak/ -O --iters 70000 --data_range 0 -1 --dim_eye 6 --lr 0.005 --lr_net 0.0005 --num_rays 65536 --patch_size 32

Training DaNeRF and ECSR jointly:

python main_sr.py data/Sunak/ --workspace trial/Sunak/ -O --iters 150000 --data_range 0 -1 --dim_eye 6 --patch_size 32 --srtask --num_rays 16384 --lr 0.005 --lr_net 0.0005 --weight_pcp 0.05 --weight_style 0.01 --weight_gan 0.01 --test_tile 450

with ckpt use --ftsr_path 'trial/Sunak/modelsr_ckpt/sresrnet_17.pth'.

Acknowledgement

This project is developed based on RAD-NeRF of Tang et al and 4K-NeRF of Wang et al. Thanks for these great works.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
freqencoder		freqencoder
gridencoder		gridencoder
nerf		nerf
raymarching		raymarching
results		results
scripts		scripts
shencoder		shencoder
README.md		README.md
activation.py		activation.py
encoding.py		encoding.py
main.py		main.py
main_sr.py		main_sr.py
requirements.txt		requirements.txt
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

High-fidelity and High-efficiency Talking Portrait Synthesis with Detail-aware Neural Radiance Fields

Installation

Install dependency

Data pre-processing

Usage

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

muyuWang/HHNeRF

Folders and files

Latest commit

History

Repository files navigation

High-fidelity and High-efficiency Talking Portrait Synthesis with Detail-aware Neural Radiance Fields

Installation

Install dependency

Data pre-processing

Usage

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages