SyncVIS: Synchronized Video Instance Segmentation

🔥Our SyncVIS is accepted by NeurIPS 2024 (poster)! (2024.10)

☀️ Overview

SyncVIS explicitly introduces video-level query embeddings and designs two key modules to synchronize video-level query with frame-level query embeddings: a synchronized video-frame modeling paradigm and a synchronized embedding optimization strategy. The former attempts to promote the mutual learning of frame- and video-level embeddings with each other and the latter divides large video sequences into small clips for easier optimization. On this page, we provide further experiments of our approaches and additional visualizations including both specific scenarios and failure cases as well as their analysis.

✏️ Further Experiments

We list the results of building our method upon other popular VIS methods apart from IDOL and VITA. Worth mentioning, TMT-VIS is mainly designed for training on multiple datasets, and in our experiments, we mainly test the effectiveness of our model when training on a single YTVIS-19 dataset.

Table 1 Experiments on aggregating our design to current VIS methods (ResNet-50)

Method	AP	Method	AP
Mask2Former	45.1	VITA	49.5
+ Synchronized Video-Frame Modeling	50.3	+ Synchronized Video-Frame Modeling	53.0
+ Synchronized Embedding Optimization	46.7	+ Synchronized Embedding Optimization	51.2
+ Both (SyncVIS)	51.5	+ Both (SyncVIS)	54.2
TMT-VIS	47.3	DVIS	52.6
+ Synchronized Video-Frame Modeling	51.1	+ Synchronized Video-Frame Modeling	54.9
+ Synchronized Embedding Optimization	48.7	+ Synchronized Embedding Optimization	54.0
+ Both (SyncVIS)	51.9	+ Both (SyncVIS)	55.8
GenVIS	51.3	IDOL	49.5
+ Synchronized Video-Frame Modeling	54.4	+ Synchronized Video-Frame Modeling	55.1
+ Synchronized Embedding Optimization	52.7	+ Synchronized Embedding Optimization	51.3
+ Both (SyncVIS)	55.4	+ Both (SyncVIS)	56.5

✨ Visualization

Fast-Moving Instances

In this part, we present you several cases showing that our model is capable of tracking and segmenting instances with greater velocity. These results demonstrate that with our video-frame synchronization, SyncVIS is able to depict the trajectories and appearances of these fast-moving objects.

Racing Car

We demonstrate that our SyncVIS has the ability of segmenting and tracking fast-moving racing cars with precision and consistency.

Skateboarding

We demonstrate that our SyncVIS has the ability of segmenting and tracking fast-moving man skating on his skateboard, segmenting the man's pose and movement with precision and consistency.

Failure Cases

As for limitations, our model has problem in segmenting very crowded or heavily occluded scenarios. As shown in the above frames, our model has a problem segmenting the person behind the horseman in the front (but can segment most of the horseman), showing that heavy occlusion remains a vital challenge. However, our model still shows better performance in segmenting complex scenes with multiple instances and occlusions than previous approaches.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
SyncVIS		SyncVIS
pics		pics
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SyncVIS: Synchronized Video Instance Segmentation

☀️ Overview

✏️ Further Experiments

✨ Visualization

Fast-Moving Instances

Racing Car

Skateboarding

Failure Cases

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SyncVIS: Synchronized Video Instance Segmentation

☀️ Overview

✏️ Further Experiments

✨ Visualization

Fast-Moving Instances

Racing Car

Skateboarding

Failure Cases

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages