Ashish Patel 🇮🇳’s Post

𝗗𝗮𝘆-𝟰𝟴𝟮 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP by ByteDance Inc. Follow me for a similar post: Ashish Patel ------------------------------------------------------------------- 𝗜𝗻𝘁𝗲𝗿𝗲𝘀𝘁𝗶𝗻𝗴 𝗙𝗮𝗰𝘁𝘀 : 🔸 This paper is published #CVPR2022. ------------------------------------------------------------------- 𝗜𝗠𝗣𝗢𝗥𝗧𝗔𝗡𝗖𝗘 👉 Training a text-to-image generator in the general domain (e.g., Dall.e, CogView) requires huge amounts of paired text-image data, which is too expensive to collect. 👉 In this paper, we propose a self-supervised scheme named as CLIP-GEN for general text-to-image generation with the language-image priors extracted with a pre-trained CLIP model. 👉 In our approach, we only require a set of unlabeled images in the general domain to train a text-to-image generator. 👉 Specifically, given an image without text labels, we first extract the embedding of the image in the united language-vision embedding space with the image encoder of CLIP. 👉 Next, we convert the image into a sequence of discrete tokens in the VQGAN codebook space (the VQGAN model can be trained with the unlabeled image dataset in hand). 👉 Finally, we train an autoregressive transformer that maps the image tokens from its unified language-vision representation. 👉 Once trained, the transformer can generate coherent image tokens based on the text embedding extracted from the text encoder of CLIP upon an input text. 👉 Such a strategy enables us to train a strong and general text-to-image generator with large text-free image dataset such as ImageNet. 👉 Qualitative and quantitative evaluations verify that our method significantly outperforms optimization-based text-to-image methods in terms of image quality while not compromising the text-image matching. 👉 Our method can even achieve comparable performance as flagship supervised models like CogView. #computervision #artificialintelligence #deeplearning #datascience #machinelearning #technology

  • graphical user interface, website
Pooja Jain

Wavicle Data Solutions193K followers

3y

Insightful share👍💯

Ashish Singh

Novartis52K followers

3y

Insightful share Ashish Patel

Like
Reply
Thom Ives, Ph.D.

Unify Consulting65K followers

3y

THIS is creative and significant Ashish. Very nice write up too as usual. Thanks!

See more comments

To view or add a comment, sign in

Explore content categories