#computervision #artificialintelligence #technology

𝗗𝗮𝘆-𝟯𝟵𝟰 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗩𝗥𝗧: 𝗔 𝗩𝗶𝗱𝗲𝗼 𝗥𝗲𝘀𝘁𝗼𝗿𝗮𝘁𝗶𝗼𝗻 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿 𝗯𝘆 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗟𝗮𝗯, 𝗘𝗧𝗛 𝗭𝘂𝗿𝗶𝗰𝗵, 𝗦𝘄𝗶𝘁𝘇𝗲𝗿𝗹𝗮𝗻𝗱 𝗮𝗻𝗱 𝗠𝗲𝘁𝗮 Follow me for a similar post: Ashish Patel ------------------------------------------------------------------- 𝗜𝗻𝘁𝗲𝗿𝗲𝘀𝘁𝗶𝗻𝗴 𝗙𝗮𝗰𝘁𝘀 : 🔸 Paper: 𝗩𝗥𝗧: 𝗔 𝗩𝗶𝗱𝗲𝗼 𝗥𝗲𝘀𝘁𝗼𝗿𝗮𝘁𝗶𝗼𝗻 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿 🔸 This paper is published arxiv2022. 🔸 Proposed the Video Restoration Transformer (VRT) for video restoration. Based on a multi-scale framework, it jointly extracts, aligns, and fuses information from different frames at multiple resolutions by two kinds of modules: multiple temporal mutual self attention (TMSA) and parallel warping. ------------------------------------------------------------------- 𝗜𝗠𝗣𝗢𝗥𝗧𝗔𝗡𝗖𝗘 🔸 Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. 🔸 Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which either is restricted by frame-by-frame restoration or lacks long-range modelling ability. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities. 🔸 More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal mutual self attention (TMSA) and parallel warping. 🔸 TMSA divides the video into small clips, on which mutual attention is applied for joint motion estimation, feature alignment and feature fusion, while self attention is used for feature extraction. To enable cross-clip interactions, the video sequence is shifted for every other layer. 🔸 Besides, parallel warping is used to further fuse information from neighboring frames by parallel feature warping. Experimental results on three tasks, including video super-resolution, video deblurring and video denoising, demonstrate that VRT outperforms the state-of-the-art methods by large margins (up to 2.16dB) on nine benchmark datasets. #computervision #artificialintelligence #technology

2 Comments

Ashish Patel 🇮🇳 4y

https://github.com/JingyunLiang/VRT https://arxiv.org/abs/2201.12288

2 Reactions

To view or add a comment, sign in

LinkedIn respects your privacy

Ashish Patel 🇮🇳’s Post

More from this author

How I Read This Book on DeepSeek — And Where Each Chapter Actually Helped Me in the Real World

From Concept to Scalable LLM: Exploring the Power of Model Context Protocol

90% of Top Companies Are Implementing AI Agents—Don’t Get Left Behind

Explore content categories