{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T12:27:56Z","timestamp":1753878476581,"version":"3.41.2"},"reference-count":54,"publisher":"Wiley","issue":"7","license":[{"start":{"date-parts":[[2023,10,30]],"date-time":"2023-10-30T00:00:00Z","timestamp":1698624000000},"content-version":"vor","delay-in-days":29,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computer Graphics Forum"],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this work, we introduce a new approach for face stylization. Despite existing methods achieving impressive results in this task, there is still room for improvement in generating high\u2010quality artistic faces with diverse styles and accurate facial reconstruction. Our proposed framework, MMFS, supports multi\u2010modal face stylization by leveraging the strengths of StyleGAN and integrates it into an encoder\u2010decoder architecture. Specifically, we use the mid\u2010resolution and high\u2010resolution layers of StyleGAN as the decoder to generate high\u2010quality faces, while aligning its low\u2010resolution layer with the encoder to extract and preserve input facial details. We also introduce a two\u2010stage training strategy, where we train the encoder in the first stage to align the feature maps with StyleGAN and enable a faithful reconstruction of input faces. In the second stage, the entire network is fine\u2010tuned with artistic data for stylized face generation. To enable the fine\u2010tuned model to be applied in zero\u2010shot and one\u2010shot stylization tasks, we train an additional mapping network from the large\u2010scale Contrastive\u2010Language\u2010Image\u2010Pre\u2010training (CLIP) space to a latent w+ space of fine\u2010tuned StyleGAN. Qualitative and quantitative experiments show that our framework achieves superior performance in both one\u2010shot and zero\u2010shot face stylization tasks, outperforming state\u2010of\u2010the\u2010art methods by a large margin.<\/jats:p>","DOI":"10.1111\/cgf.14952","type":"journal-article","created":{"date-parts":[[2023,10,31]],"date-time":"2023-10-31T06:15:30Z","timestamp":1698732930000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Multi\u2010Modal Face Stylization with a Generative Prior"],"prefix":"10.1111","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6724-6177","authenticated-orcid":false,"given":"Mengtian","family":"Li","sequence":"first","affiliation":[{"name":"Kuaishou Technology  China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-8880-0606","authenticated-orcid":false,"given":"Yi","family":"Dong","sequence":"additional","affiliation":[{"name":"Tsinghua University  China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5130-5754","authenticated-orcid":false,"given":"Minxuan","family":"Lin","sequence":"additional","affiliation":[{"name":"Kuaishou Technology  China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7787-6428","authenticated-orcid":false,"given":"Haibin","family":"Huang","sequence":"additional","affiliation":[{"name":"Kuaishou Technology  China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7225-565X","authenticated-orcid":false,"given":"Pengfei","family":"Wan","sequence":"additional","affiliation":[{"name":"Kuaishou Technology  China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8243-9513","authenticated-orcid":false,"given":"Chongyang","family":"Ma","sequence":"additional","affiliation":[{"name":"Kuaishou Technology  China"}]}],"member":"311","published-online":{"date-parts":[[2023,10,30]]},"reference":[{"key":"e_1_2_7_2_2","doi-asserted-by":"crossref","unstructured":"AbdalR. LeeH.-Y. ZhuP. ChaiM. SiarohinA. WonkaP. TulyakovS.: 3DAvatarGAN: Bridging Domains for Personalized Editable Avatars. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2023) pp.4552\u20134562. 2","DOI":"10.1109\/CVPR52729.2023.00442"},{"key":"e_1_2_7_3_2","doi-asserted-by":"crossref","unstructured":"AlalufY. PatashnikO. Cohen-OrD.: ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2021) pp.6711\u20136720. 2","DOI":"10.1109\/ICCV48922.2021.00664"},{"key":"e_1_2_7_4_2","doi-asserted-by":"crossref","unstructured":"AbdalR. QinY. WonkaP.: Image2stylegan: How to embed images into the stylegan latent space? InProceedings of the IEEE\/CVF International Conference on Computer Vision(2019) pp.4432\u20134441. 2","DOI":"10.1109\/ICCV.2019.00453"},{"key":"e_1_2_7_5_2","doi-asserted-by":"crossref","unstructured":"AbdalR. QinY. WonkaP.: Image2stylegan++: How to edit the embedded images? InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2020) pp.8296\u20138305. 2","DOI":"10.1109\/CVPR42600.2020.00832"},{"key":"e_1_2_7_6_2","doi-asserted-by":"crossref","unstructured":"CheferH. BenaimS. PaissR. WolfL.: Image-based clip-guided essence transfer. InEuropean Conference on Computer Vision(2022) pp.695\u2013711. 3","DOI":"10.1007\/978-3-031-19778-9_40"},{"key":"e_1_2_7_7_2","doi-asserted-by":"crossref","unstructured":"ChongM. J. ForsythD.: Jojogan: One shot face stylization. InEuropean Conference on Computer Vision(2022) pp.128\u2013152. 3 4 5","DOI":"10.1007\/978-3-031-19787-1_8"},{"key":"e_1_2_7_8_2","doi-asserted-by":"crossref","unstructured":"CaronM. TouvronH. MisraI. J\u00e9gouH. MairalJ. BojanowskiP. JoulinA.: Emerging properties in self-supervised vision transformers. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2021) pp.9650\u20139660. 4","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"e_1_2_7_9_2","doi-asserted-by":"crossref","unstructured":"ChoiY. UhY. YooJ. HaJ.-W.: Stargan v2: Diverse image synthesis for multiple domains. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2020) pp.8188\u20138197. 2 3","DOI":"10.1109\/CVPR42600.2020.00821"},{"key":"e_1_2_7_10_2","doi-asserted-by":"crossref","unstructured":"DengJ. GuoJ. XueN. ZafeiriouS.: Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2019) pp.4690\u20134699. 6","DOI":"10.1109\/CVPR.2019.00482"},{"key":"e_1_2_7_11_2","unstructured":"DhariwalP. NicholA.: Diffusion models beat GANs on image synthesis. InAdvances in Neural Information Processing Systems(2021) pp.8780\u20138794. 7"},{"key":"e_1_2_7_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530164"},{"key":"e_1_2_7_13_2","doi-asserted-by":"crossref","unstructured":"HuangX. BelongieS.: Arbitrary style transfer in real-time with adaptive instance normalization. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2017) pp.1501\u20131510. 3","DOI":"10.1109\/ICCV.2017.167"},{"key":"e_1_2_7_14_2","unstructured":"HuangX. LiuM.-Y. BelongieS. KautzJ.: Multimodal unsupervised image-to-image translation. InEuropean Conference on Computer Vision(2018) pp.172\u2013189. 3 5 7"},{"key":"e_1_2_7_15_2","unstructured":"HeuselM. RamsauerH. UnterthinerT. NesslerB. HochreiterS.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. InAdvances in Neural Information Processing Systems(2017). 5"},{"key":"e_1_2_7_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2021.3114308"},{"key":"e_1_2_7_17_2","unstructured":"JamriskaO.:Ebsynth: Fast example-based image synthesis and style transfer.https:\/\/github.com\/jamriska\/ebsynth 2018. 2"},{"key":"e_1_2_7_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459860"},{"key":"e_1_2_7_19_2","doi-asserted-by":"crossref","unstructured":"KangK. KimS. H. ChoS.: Gan inversion for out-of-range images with geometric transformations. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2021) pp.13921\u201313929. 4","DOI":"10.1109\/ICCV48922.2021.01368"},{"key":"e_1_2_7_20_2","doi-asserted-by":"crossref","unstructured":"KarrasT. LaineS. AilaT.: A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2019) pp.4401\u20134410. 2 4 5","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_2_7_21_2","doi-asserted-by":"crossref","unstructured":"KarrasT. LaineS. AittalaM. HellstenJ. LehtinenJ. AilaT.: Analyzing and improving the image quality of stylegan. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2020) pp.8110\u20138119. 2 4","DOI":"10.1109\/CVPR42600.2020.00813"},{"key":"e_1_2_7_22_2","doi-asserted-by":"crossref","unstructured":"KwonG. YeJ. C.: Clipstyler: Image style transfer with a single text condition. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2022) pp.18062\u201318071. 2 5","DOI":"10.1109\/CVPR52688.2022.01753"},{"key":"e_1_2_7_23_2","doi-asserted-by":"crossref","unstructured":"KwonG. YeJ. C.: One-shot adaptation of gan in just one clip.IEEE Transactions on Pattern Analysis and Machine Intelligence(2023). 3","DOI":"10.1109\/TPAMI.2023.3283551"},{"key":"e_1_2_7_24_2","unstructured":"LiuM. LiQ. QinZ. ZhangG. WanP. ZhengW.: Blendgan: implicitly gan blending for arbitrary stylized face generation. InAdvances in Neural Information Processing Systems(2021) pp.29710\u201329722. 2 3 5 7"},{"key":"e_1_2_7_25_2","unstructured":"LiuA. H. LiuY.-C. YehY.-Y. WangY.-C. F.: A unified feature disentangler for multi-domain image translation and manipulation. InAdvances in Neural Information Processing Systems(2018). 2"},{"key":"e_1_2_7_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01284-z"},{"key":"e_1_2_7_27_2","unstructured":"LiY. ZhangR. LuJ. ShechtmanE.: Few-shot image generation with elastic weight consolidation.arXiv preprint arXiv:2012.02780(2020). 3"},{"key":"e_1_2_7_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3113786"},{"key":"e_1_2_7_29_2","unstructured":"MoS. ChoM. ShinJ.: Freeze the discriminator: a simple baseline for fine-tuning gans.arXiv preprint arXiv:2002.10964(2020). 3"},{"key":"e_1_2_7_30_2","unstructured":"NicholA. Q. DhariwalP.: Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning (ICML)(2021) pp.8162\u20138171. 7"},{"key":"e_1_2_7_31_2","doi-asserted-by":"crossref","unstructured":"OjhaU. LiY. LuJ. EfrosA. A. LeeY. J. ShechtmanE. ZhangR.: Few-shot image generation via cross-domain correspondence. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2021) pp.10743\u201310752. 3","DOI":"10.1109\/CVPR46437.2021.01060"},{"key":"e_1_2_7_32_2","unstructured":"PinkneyJ. N. AdlerD.: Resolution dependent gan interpolation for controllable image synthesis between domains.arXiv preprint arXiv:2010.05334(2020). 2"},{"key":"e_1_2_7_33_2","doi-asserted-by":"crossref","unstructured":"PatashnikO. WuZ. ShechtmanE. Cohen-OrD. LischinskiD.: Styleclip: Text-driven manipulation of stylegan imagery. InProceedings of the IEEE\/CVF International Conference on Computer Vision(2021) pp.2085\u20132094. 3 5","DOI":"10.1109\/ICCV48922.2021.00209"},{"key":"e_1_2_7_34_2","doi-asserted-by":"crossref","unstructured":"RichardsonE. AlalufY. PatashnikO. NitzanY. AzarY. ShapiroS. Cohen-OrD.: Encoding in style: a stylegan encoder for image-to-image translation. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2021) pp.2287\u20132296. 2 4","DOI":"10.1109\/CVPR46437.2021.00232"},{"key":"e_1_2_7_35_2","doi-asserted-by":"crossref","unstructured":"RombachR. BlattmannA. LorenzD. EsserP. OmmerB.: High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2022) pp.10684\u201310695. 2","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_7_36_2","unstructured":"RobbE. ChuW.-S. KumarA. HuangJ.-B.: Few-shot adaptation of generative adversarial networks.arXiv preprint arXiv:2010.11943(2020). 3"},{"key":"e_1_2_7_37_2","doi-asserted-by":"crossref","unstructured":"RutaD. GilbertA. MotiianS. FaietaB. LinZ. CollomosseJ.: HyperNST: Hyper-Networks for Neural Style Transfer. InEuropean Conference on Computer Vision(2023) pp.201\u2013217. 3","DOI":"10.1007\/978-3-031-25056-9_14"},{"key":"e_1_2_7_38_2","first-page":"8748","volume-title":"International Conference on Machine Learning (ICML)","author":"Radford A.","year":"2021"},{"key":"e_1_2_7_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459771"},{"key":"e_1_2_7_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459838"},{"key":"e_1_2_7_41_2","doi-asserted-by":"crossref","unstructured":"TumanyanN. Bar-TalO. BagonS. DekelT.: Splicing vit features for semantic appearance transfer. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2022) pp.10748\u201310757. 4","DOI":"10.1109\/CVPR52688.2022.01048"},{"key":"e_1_2_7_42_2","doi-asserted-by":"crossref","unstructured":"WangY. Gonzalez-GarciaA. BergaD. HerranzL. KhanF. S. WeijerJ. v. d.: Minegan: effective knowledge transfer from gans to target domains with few images. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2020) pp.9332\u20139341. 3","DOI":"10.1109\/CVPR42600.2020.00935"},{"key":"e_1_2_7_43_2","unstructured":"WangY. WuC. HerranzL. van deWeijerJ. Gonzalez-GarciaA. RaducanuB.: Transferring gans: generating images from limited data. InEuropean Conference on Computer Vision(2018) pp.218\u2013234. 3"},{"key":"e_1_2_7_44_2","unstructured":"WangY. YiR. TaiY. WangC. MaL.: CtlGAN: Few-shot Artistic Portraits Generation with Contrastive Transfer Learning.arXiv preprint arXiv:2203.08612(2022). 3"},{"key":"e_1_2_7_45_2","doi-asserted-by":"crossref","unstructured":"YangS. JiangL. LiuZ. LoyC. C.: Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2022) pp.7693\u20137702. 5 7","DOI":"10.1109\/CVPR52688.2022.00754"},{"key":"e_1_2_7_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555437"},{"key":"e_1_2_7_47_2","doi-asserted-by":"crossref","unstructured":"YiR. LiuY.-J. LaiY.-K. RosinP. L.: Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2019) pp.10743\u201310752. 1","DOI":"10.1109\/CVPR.2019.01100"},{"key":"e_1_2_7_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322984"},{"key":"e_1_2_7_49_2","unstructured":"YangC. ShenY. ZhangZ. XuY. ZhuJ. WuZ. ZhouB.: One-shot generative domain adaptation.arXiv preprint arXiv:2111.09876(2021). 3"},{"key":"e_1_2_7_50_2","unstructured":"ZhuP. AbdalR. FemianiJ. WonkaP.: Mind the gap: Domain gap control for single shot domain adaptation for generative adversarial networks. InInternational Conference on Learning Representations(2022). 3 5"},{"key":"e_1_2_7_51_2","unstructured":"ZhuP. AbdalR. QinY. FemianiJ. WonkaP.: Improved stylegan embedding: Where are the good latents?arXiv preprint arXiv:2012.09036(2020). 2 4"},{"key":"e_1_2_7_52_2","doi-asserted-by":"crossref","unstructured":"ZhangR. IsolaP. EfrosA. A. ShechtmanE. WangO.: The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(2018). 6","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_2_7_53_2","unstructured":"ZhengW. LiQ. GuoX. WanP. WangZ.: Bridging clip and stylegan through latent alignment for image editing.arXiv preprint arXiv:2210.04506(2022). 3"},{"key":"e_1_2_7_54_2","unstructured":"ZhangZ. LiuY. HanC. GuoT. YaoT. MeiT.: Generalized one-shot domain adaptation of generative adversarial networks. InAdvances in Neural Information Processing Systems(2022) pp.13718\u201313730. 3"},{"key":"e_1_2_7_55_2","unstructured":"ZhangY. WeiY. JiZ. BaiJ. ZuoW. et al.: Towards diverse and faithful one-shot adaption of generative adversarial networks. InAdvances in Neural Information Processing Systems(2022) pp.37297\u201337308. 3 5"}],"container-title":["Computer Graphics Forum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1111\/cgf.14952","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,13]],"date-time":"2024-01-13T08:23:59Z","timestamp":1705134239000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1111\/cgf.14952"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10]]},"references-count":54,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["10.1111\/cgf.14952"],"URL":"https:\/\/doi.org\/10.1111\/cgf.14952","archive":["Portico"],"relation":{},"ISSN":["0167-7055","1467-8659"],"issn-type":[{"type":"print","value":"0167-7055"},{"type":"electronic","value":"1467-8659"}],"subject":[],"published":{"date-parts":[[2023,10]]},"assertion":[{"value":"2023-10-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e14952"}}