Skip to content

CongpeiQiu/CLIPRefiner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Refining CLIP's Spatial Awareness: A Visual-centric Perspective

TL;DR

We propose a refiner module that extracts dense, spatially-aware features directly from CLIP, enhancing region-language alignment with a visual-centric focus.

Key Features 🔍

  • Refiner Architecture: Refines CLIP's dense features through SSL pipeline for enhanced spatial sensitivity
  • SCD-Guidance: Maintains region-language matching capabilities while adding spatial awareness
  • Model-Agnostic Design: Verified effective on multiple CLIP variants

Introduction

Official implementation of the paper Refining CLIP's Spatial Awareness: A Visual-centric Perspective (ICLR 2025).

Refining CLIP's Spatial Awareness: A Visual-centric Perspective
Congpei Qiu, Yanhao Wu, Wei Ke, Xiuxiu Bai, Tong Zhang
[Project Page][arXiv Paper]

TODO

  • Release trained Refiner models and codes
  • Release fine-tuned VLMs with Refiner integration
  • Add SigLIP v2 support

Training Refiner on Frozen VLMs

Refiner_Dynamics.mp4
Training dynamics visualization of Refiner on EVA-CLIP

The code will be released soon, stay tuned!

Fine-tuning VLMs with Refiner

The code will be released soon, stay tuned!

License

Released under MIT License.

Citation

@article{qiu2025refining,
  title={Refining CLIP's Spatial Awareness: A Visual-Centric Perspective},
  author={Qiu, Congpei and Wu, Yanhao and Ke, Wei and Bai, Xiuxiu and Zhang, Tong},
  journal={arXiv preprint arXiv:2504.02328},
  year={2025}
}

Acknowledgement

Our code is based on CLIPSelf and closely related to OpenCLIP, EVA-CLIP and MMDetection. We sincerely thank them for their high-quality open source code!

About

[ICLR2025] Code Release of Refining CLlP's Spatial Awareness: A Visual-centric Perspective

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors