Skip to content

jingjing0419/Efficient-SAM2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval

Authors: Jing Zhang, Zhikai Li✉, Xuewen Liu, Qingyi Gu✉

(✉ denotes corresponding author.)

Intruduction

This repository contains the official implementation for the ICLR 2026 paper "Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval".

Overview

Motivation

SAM2's perception pattern exhibite computational redundancy. i) The focused attention in mask decoder vs. broad attention span in image encoder shows unnecessary background computation. ii) In memory bank, only a small subset of tokens contribute significantly to memory attention, and the salient regions exhibit temporal consistency. motivation

Method

For image encoder, we introduce object-aware Sparse Window Routing (SWR), which assigns object-irrelevant background windows to a lightweight shortcut branch based on spatial-temporal consistency and perceptual saliency of the object, thus reducing encoding redundancy. For memory attention, we propose object-aware Sparse Memory Retrieval (SMR), which builds a FIFO mask queue to retrieval most salient memory tokens, in which the saliency patterns are reused from their first recollection, thereby reducing the computational cost. Method

Performance

Efficient-SAM2 wins a well-balanced accuracy–speed trade-off. Method


Create Environment

Prerequisites

The code requires python>=3.10, as well as torch>=2.5.1 and torchvision>=0.20.1. Please follow the instructions here to install both PyTorch and TorchVision dependencies. You can install SAM 2 on a GPU machine using:

git clone https://github.com/jingjing0419/Efficient-SAM2.git
cd sam3
pip install -e .

To use the SAM 2 predictor and run the example notebooks, jupyter and matplotlib are required and can be installed by:

pip install -e ".[notebooks]"

Prepare Models

All the model checkpoints can be downloaded by running:

cd checkpoints && \
./download_ckpts.sh && \
cd ..

or individually from:

Usage

Train Bypass

python tools/train_bypass_all.py \
    --apply_bypass \
    --apply_WB \
    --use_wandb \
    --train_epoch=5 \
    --train_step=32 \
    --lr=1e-4 \
    --base_video_dir=<PATH-TO-TRAINING-IMAGES> \   
    --input_mask_dir=<PATH-TO-TRAINING-ANNOTATION> \
    --video_list_file=./train_sel_v1.txt \
    --output_mask_dir=./outputs/SAV_train/sav_train_pred_pngs \
    --dataset='sav_train' \
    --sam2_model='base+' \
    --bypass_type='bottleneck'

Inference

The vos_inference_main.py script can be used to generate predictions for semi-supervised video object segmentation (VOS) evaluation on datasets such as DAVIS, MOSE or the SA-V dataset.

After installing SAM 2 and its dependencies, it can be used as follows (DAVIS 2017 dataset as an example). This script saves the prediction PNG files to the --output_mask_dir.

Run Efficient-SAM2 inference:

python tools/vos_inference_main.py \
--sam2_model='base+' --Mem_stride=1 --dataset='SAV_test' \
--apply_bypass --apply_WB --dilate_mask --WB_theta=0.7 \
--bypass_ckpt_base='./bypass/ckpt/bypass_bottleneck_base.pth' \
--prune_memory --topk_mask --set_drop_ratio=0.95 \
--output_mask_dir='./outputs2/'

Evaluation

Run SA-V evaluation:

python sav_evaluator.py \
--gt_root <PATH-TO-SAV-TEST/VAL-DATASET-GROUNDTRUTH> \
--pred_root <PATH-TO-MODEL-OUTPUT>

Star this repository if you find it helpful!

About

[ICLR 2026] The official implementation of "Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval"

Resources

License

Apache-2.0, BSD-3-Clause licenses found

Licenses found

Apache-2.0
LICENSE
BSD-3-Clause
LICENSE_cctorch

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors