This paper proposes a novel talking head generation method that combines layered viewpoint simulation (LVS) and continuous lighting simulation (CLS). LVS simulates multiple viewpoints through the multi-scale features of the video frame to construct the global depth representation, which can improve the accuracy of volume density estimation and enhance detail description. CLS simulates multiple lighting through brightness changes of continuous video frames to construct the global brightness representation, thereby alleviating color accumulation errors and eliminating blur. Extensive experiments demonstrate that our method significantly improves the detail quality compared to the state-of-the-art methods.
demo.mp4
Our datasets come from public interviews or TV shows.
- We introduce layered viewpoint simulation (LVS) to construct the global depth representation, which enhances the detail expression.
- We propose continuous lighting simulation (CLS) to construct the global brightness representation, which eliminates blurring artifacts.
- We train and test based on Python 3.8
- ffmpeg:
sudo apt-get install ffmpeg - To install the dependencies run:
pip install -r TalkingHead-Simulation.txt
If you find this repository helpful, please consider citing our paper:
@inproceedings{dong2025talking,
title = {Talking Head Generation via Viewpoint and Lighting Simulation Based on Global Representation},
author = {Biao Dong and Lei Zhang},
booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia (ACM MM)},
year = {2025}
}
Our code is for research purposes only. More details will be released shortly. If you have any questions, please contact us: dongbiao@bit.edu.cn