Ashish Patel 🇮🇳’s Post

𝗗𝗮𝘆-𝟮𝟴𝟭 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗩𝗶𝗱𝗲𝗼𝗖𝗟𝗜𝗣: Contrastive Pre-training for Zero-shot Video-Text Understanding by Facebook AI Follow me for a similar post: 🇮🇳 Ashish Patel Interesting Facts : 🔸 This paper is published EMNLP 2021 with 2 citations. ------------------------------------------------------------------- 𝗔𝗺𝗮𝘇𝗶𝗻𝗴 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 : https://lnkd.in/e93wiKXe Code: https://lnkd.in/eu-Wmr7u ------------------------------------------------------------------- 𝗜𝗠𝗣𝗢𝗥𝗧𝗔𝗡𝗖𝗘 🔸 We present video clip, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks.  🔸 VideoCLIP trains a transformer for video and text by contrasting temporally overlapping positive video-text pairs with hard negatives from nearest neighbor retrieval.  🔸 Our experiments on a diverse series of downstream tasks, including sequence-level text-video retrieval, VideoQA, token-level action localization, and action segmentation reveal state-of-the-art performance, surpassing prior work, and in some cases even outperforming supervised approaches.  #computervision #artificialintelligence #innovation

  • diagram, text

To view or add a comment, sign in

Explore content categories