𝗗𝗮𝘆-𝟯𝟱𝟮 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗡𝗩𝗜𝗗𝗜𝗔’𝘀 𝗔𝗱𝗮𝗩𝗶𝗧 𝗛𝗮𝗹𝘁𝘀 𝗧𝗼𝗸𝗲𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻 𝘁𝗼 𝗔𝗱𝗮𝗽𝘁𝗶𝘃𝗲𝗹𝘆 𝗔𝗱𝗷𝘂𝘀𝘁 𝗩𝗶𝗧 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗖𝗼𝘀𝘁 𝗼𝗻 𝗜𝗺𝗮𝗴𝗲𝘀 𝗼𝗳 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆 Follow me for a similar post: @🇮🇳 Ashish Patel ------------------------------------------------------------------- 𝗜𝗻𝘁𝗲𝗿𝗲𝘀𝘁𝗶𝗻𝗴 𝗙𝗮𝗰𝘁𝘀 : 🔸 Paper: 𝗔𝗱𝗮𝗩𝗶𝗧: 𝗔𝗱𝗮𝗽𝘁𝗶𝘃𝗲 𝗧𝗼𝗸𝗲𝗻𝘀 𝗳𝗼𝗿 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗩𝗶𝘀𝗶𝗼𝗻 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿 🔸 This paper is published arxiv2021. 🔸 Nvidia researchers propose AdaViT, an input-dependent mechanism that adaptively adjusts vision transformers’ inference cost by halting the compute of different tokens at different depths to reserve compute for discriminative tokens. ------------------------------------------------------------------- 𝗜𝗠𝗣𝗢𝗥𝗧𝗔𝗡𝗖𝗘 🔸 Author introduce a method for input-dependent inference in vision transformers that allows us to halt the computation for different tokens at different depths. 🔸 It base learning of adaptive token halting on the existent embedding dimensions in the original architecture and do not require extra parameters or compute for halting. 🔸 It introduce distributional prior regularization to guide halting towards a specific distribution and average token depth that stabilizes ACT training. 🔸 Analyze the depth of varying tokens across different images and provide insights into the attention mechanism of vision transformers. 🔸 Empirically show that the proposed method improves throughput by up to 62 percent on hardware with a minor drop in accuracy. 🔸 Compared to baseline models, AdaViT reduced FLOPs by 39 percent without extra parameters and with only a minor loss in accuracy. Moreover, AdaViT directly improved the throughputs of DeiT small and tiny variants by 38 percent and 62 percent without any hardware modification and with only a 0.3 percent accuracy hit. ------------------------------------------------------------------- #computervision #artificialintelligence #innovation -------------------------------------------------------------------
Solutic Group•7K followers
4yI will get there! Manny Ko