Ashish Patel 🇮🇳’s Post

𝗗𝗮𝘆-𝟮𝟭𝟴 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗞𝗮𝗹𝗲𝗶𝗱𝗼-𝗕𝗘𝗥𝗧 : Vision-Language Pre-training on Fashion Domain by Alibaba Group Follow me for a similar post:  🇮🇳 Ashish Patel Interesting Facts : 🔸 This is a paper in arXiv with over 2 citations. ------------------------------------------------------------------- 𝗔𝗺𝗮𝘇𝗶𝗻𝗴 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 : https://lnkd.in/eXD7-cYp code : https://lnkd.in/eps-fszD ------------------------------------------------------------------- 𝗜𝗠𝗣𝗢𝗥𝗧𝗔𝗡𝗖𝗘 🔸 We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which introduces a novel kaleido strategy for fashion cross-modality representations from transformers. 🔸In contrast to random masking strategy of recent VL models, we design alignment guided masking to jointly focus more on image-text semantic relations. 🔸To this end, we carry out five novel tasks, i.e., rotation, jigsaw, camouflage, grey-to-color, and blank-to-color for self-supervised VL pre-training at patches of different scale. 🔸Kaleido-BERT is conceptually simple and easy to extend to the existing BERT framework, it attains new state-of-the-art results by large margins on four downstream tasks, including text retrieval (R@1: 4.03% absolute improvement), image retrieval (R@1: 7.13% abs imv.), category recognition (ACC: 3.28% abs imv.), and fashion captioning (Bleu4: 1.2 abs imv.). 🔸We validate the efficiency of Kaleido-BERT on a wide range of e-commerical websites, demonstrating its broader potential in real-world applications. #computervision #artificialintelligence #deeplearning

  • graphical user interface, text, application

To view or add a comment, sign in

Explore content categories