Skip to content

rkzheng99/SyncVIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SyncVIS: Synchronized Video Instance Segmentation

🔥Our SyncVIS is accepted by NeurIPS 2024 (poster)! (2024.10)

☀️ Overview

SyncVIS explicitly introduces video-level query embeddings and designs two key modules to synchronize video-level query with frame-level query embeddings: a synchronized video-frame modeling paradigm and a synchronized embedding optimization strategy. The former attempts to promote the mutual learning of frame- and video-level embeddings with each other and the latter divides large video sequences into small clips for easier optimization. On this page, we provide further experiments of our approaches and additional visualizations including both specific scenarios and failure cases as well as their analysis.

image

✏️ Further Experiments

We list the results of building our method upon other popular VIS methods apart from IDOL and VITA. Worth mentioning, TMT-VIS is mainly designed for training on multiple datasets, and in our experiments, we mainly test the effectiveness of our model when training on a single YTVIS-19 dataset.

Table 1 Experiments on aggregating our design to current VIS methods (ResNet-50)

Method AP Method AP
Mask2Former 45.1 VITA 49.5
+ Synchronized Video-Frame Modeling 50.3 + Synchronized Video-Frame Modeling 53.0
+ Synchronized Embedding Optimization 46.7 + Synchronized Embedding Optimization 51.2
+ Both (SyncVIS) 51.5 + Both (SyncVIS) 54.2
TMT-VIS 47.3 DVIS 52.6
+ Synchronized Video-Frame Modeling 51.1 + Synchronized Video-Frame Modeling 54.9
+ Synchronized Embedding Optimization 48.7 + Synchronized Embedding Optimization 54.0
+ Both (SyncVIS) 51.9 + Both (SyncVIS) 55.8
GenVIS 51.3 IDOL 49.5
+ Synchronized Video-Frame Modeling 54.4 + Synchronized Video-Frame Modeling 55.1
+ Synchronized Embedding Optimization 52.7 + Synchronized Embedding Optimization 51.3
+ Both (SyncVIS) 55.4 + Both (SyncVIS) 56.5

✨ Visualization

Fast-Moving Instances

In this part, we present you several cases showing that our model is capable of tracking and segmenting instances with greater velocity. These results demonstrate that with our video-frame synchronization, SyncVIS is able to depict the trajectories and appearances of these fast-moving objects.

Racing Car

We demonstrate that our SyncVIS has the ability of segmenting and tracking fast-moving racing cars with precision and consistency.

Skateboarding

We demonstrate that our SyncVIS has the ability of segmenting and tracking fast-moving man skating on his skateboard, segmenting the man's pose and movement with precision and consistency.

Failure Cases

image

As for limitations, our model has problem in segmenting very crowded or heavily occluded scenarios. As shown in the above frames, our model has a problem segmenting the person behind the horseman in the front (but can segment most of the horseman), showing that heavy occlusion remains a vital challenge. However, our model still shows better performance in segmenting complex scenes with multiple instances and occlusions than previous approaches.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages