Skip to content

This is an official code of the paper "DALIP: Distribution Alignment-based Language-Image Pre-Training for Domain-Specific Data"

Notifications You must be signed in to change notification settings

XavierHeart/DALIP

Repository files navigation

DALIP: Distribution Alignment-based Language-Image Pre-Training for Domain-Specific Data

Framework

a) Overview of our Distribution Alignment-based Language-Image Pre-Training (DALIP) method for biological data. Specifically, DALIP optimizes CLIP models by matching the similarity between feature distribution of image-text pairs, which are efficiently approximated by first- and second-order statistics of token features. Particularly, (b) a Multi-head Brownian Distance Covariance (MBDC) module is presented to efficiently acquire second-order statistics of token features.

Abstract

Recently, Contrastive Language-Image Pre-training (CLIP) has shown promising performance in domain-specific data (e.g., biology), and has attracted increasing research attention. Existing works generally focus on collecting extensive domain-specific data and directly tuning the original CLIP models. Intuitively, such a paradigm takes no full consideration of the characteristics lying in domain-specific data (e.g., fine-grained nature of biological data) and so limits model capability, while mostly losing the original ability of CLIP in the general domain. In this paper, we propose a Distribution Alignment-based Language-Image Pre-Training (DALIP) method for biological data. Specifically, DALIP optimizes CLIP models by matching the similarity between feature distribution of image-text pairs instead of the original [cls] token, which can capture rich yet effective information inherent in image-text pairs as powerful representations, and so better cope with fine-grained nature of biological data. Particularly, our DALIP efficiently approximates feature distribution via its first- and second-order statistics, while presenting a Multi-head Brownian Distance Covariance (MBDC) module to acquire second-order statistics of token features efficiently. Furthermore, we collect a new dataset for plant domain (e.g., specific data in biological domain) comprising 10M plant data with 3M general-domain data (namely PlantMix-13M) according to data mixing laws. Extensive experiments show that DALIP clearly outperforms existing CLIP counterparts in biological domain, while well generalizing to remote sensing and medical imaging domains. Besides, our PlantMix-13M dataset further boosts performance of DALIP in plant domain, while preserving model ability in general domain.

Installation

  • clone
git clone https://github.com/XavierHeart/DALIP
cd DALIP-main/
  • install dependencies
pip install -r requirments.txt

Dataset

The PlantMix-13M dataset will be publicly available soon. This dataset contains:

  • 10M plant domain images and text pairs
  • 3M general domain images and text pairs
  • Carefully curated according to data mixing laws

Model

Model Arch. Dataset ImageNet-1K Cifar-100 Cars Pets Sun397 General Mean PlantNet Fungi PlantVillage Med. Leaf PlantDoc Plant Mean Mean
OpenCLIP ViT-B/16 PlantMix-13M 46.8 61.3 50.2 66.2 50.1 54.9 89.9 47.0 32.3 48.9 33.0 50.2 52.6
DALIP ViT-B/16 PlantMix-13M 49.2 69.2 58.9 75.2 55.6 61.6 91.0 52.8 34.5 43.7 34.3 51.3 56.4

Acknowledgments

This project is built upon the OpenCLIP codebase. We sincerely thank them for their outstanding contribution to the open-source community.

About

This is an official code of the paper "DALIP: Distribution Alignment-based Language-Image Pre-Training for Domain-Specific Data"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 55

Languages