Ashish Patel 🇮🇳’s Post

𝗗𝗮𝘆-𝟯𝟵𝟭 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗥𝗲𝗹𝗧𝗥: 𝗥𝗲𝗹𝗮𝘁𝗶𝗼𝗻 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿 𝗳𝗼𝗿 𝗦𝗰𝗲𝗻𝗲 𝗚𝗿𝗮𝗽𝗵 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗯𝘆 𝗟𝗲𝗶𝗯𝗻𝗶𝘇 𝗨𝗻𝗶𝘃𝗲𝗿𝘀𝗶𝘁𝘆 𝗛𝗮𝗻𝗻𝗼𝘃𝗲𝗿, 𝗚𝗲𝗿𝗺𝗮𝗻𝘆 Follow me for a similar post: Ashish Patel  ------------------------------------------------------------------- 𝗜𝗻𝘁𝗲𝗿𝗲𝘀𝘁𝗶𝗻𝗴 𝗙𝗮𝗰𝘁𝘀 : 🔸 Paper: RelTR: Relation Transformer for Scene Graph Generation 🔸 This paper is published arxiv2022. 🔸 Transformer’s encoder-decoder architecture, we propose a novel one-stage end-to-end framework for scene graph generation, RelTR. Given a fixed number of coupled subject and object queries, a fixed-size set of relationships is predicted using different attention mechanisms in the triplet decoder of RelTR. ------------------------------------------------------------------- 𝗜𝗠𝗣𝗢𝗥𝗧𝗔𝗡𝗖𝗘 🔸 Different objects in the same scene are more or less related to each other, but only a limited number of these relationships are noteworthy.  🔸 Inspired by DETR, which excels in object detection, we view scene graph generation as a set prediction problem and propose an end-to-end scene graph generation model RelTR which has an encoder-decoder architecture.  🔸 The encoder reasons about the visual feature context while the decoder infers a fixed-size set of triplets subject-predicate-object using different types of attention mechanisms with coupled subject and object queries.  🔸 We design a set prediction loss performing the matching between the ground truth and predicted triplets for the end-to-end training.  🔸 In contrast to most existing scene graph generation methods, RelTR is a one-stage method that predicts a set of relationships directly only using visual appearance without combining entities and labeling all possible predicates.  🔸 Extensive experiments on the Visual Genome and Open Images V6 datasets demonstrate the superior performance and fast inference of our mod #computervision #artificialintelligence #innovation

  • diagram

To view or add a comment, sign in

Explore content categories