Android apps which demonstrate the use of OpenAI's CLIP model for zero-shot image classification and text-image retrieval using clip.cpp
- Fully on-device inference of the CLIP model (generate text and image embeddings)
- Uses JNI bindings over clip.cpp which itself is based on ggml for efficient inference
The project consists of two Gradle modules, app-zero-shot-classify and app-text-image-search
which contain the sources files for the 'Zero Shot Image Classification' and `Text-Image-Search'
apps respectively.
- Clone the project and open the resulting directory in Android Studio. An automatic Gradle build
should start, if not click on the
Buildmenu and selectMake Project.
git clone https://github.com/shubham0204/CLIP-Android --depth=1
-
Connect the test-device to the computer and make sure that the device is recognized by the computer.
-
Download one of the GGUF models from the HuggingFace repository. For instance, if we download the
CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-f16.ggufmodel, we need to push it to the test-device's file-system usingadb push,
adb push CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-f16.gguf /data/local/tmp/clip_model_fp16.gguf
[!INFO] It is not required to place the GGUF model in the
/data/local/tmpdirectory. A path to the app's internal storage or any other directory which the app can accessible is also allowed.
- For both modules, in
MainActivityViewModel.kt, ensure that theMODEL_PATHvariable points to the correct model path on the test-device. For instance, if the model is pushed to/data/local/tmp/clip_model_fp16.gguf, then theMODEL_PATHvariable should be set to/data/local/tmp/clip_model_fp16.gguf. Moreover, you can configureNUM_THREADSandVERBOSITYvariables as well.
private val MODEL_PATH = "/data/local/tmp/clip_model_fp16.gguf"
private val NUM_THREADS = 4
private val VERBOSITY = 1- Select one of the module in the
Run / Debug Configurationdropdown in the top menu bar, and run the app on the test-device by clicking on theRunbutton (Shift + F10) in Android Studio.
-
Navigate to this fork of clip.cpp and clone the branch (
add-android-sample) and open the resulting directory in Android Studio. -
The project contains two modules,
appandclip. The AAR of theclipmodule can be found in theapp-text-image-search/libsandapp-zero-shot-classify/libsdirectories of this project. Runninggradlew clip:assembleshould build the debug and release versions ofclipas an AAR. -
The AAR can be added to the
libsdirectory of your project and added as a dependency in thebuild.gradlefile of the app module.
dependencies {
// ...
implementation(files("libs/clip.aar"))
// ...
}- CLIP: Connecting Text and Images
- Learning Transferable Visual Models From Natural Language Supervision
- clip.cpp
- ggml
- shubham0204's PR that adds JNI bindings to clip.cpp
@article{DBLP:journals/corr/abs-2103-00020,
author = {Alec Radford and
Jong Wook Kim and
Chris Hallacy and
Aditya Ramesh and
Gabriel Goh and
Sandhini Agarwal and
Girish Sastry and
Amanda Askell and
Pamela Mishkin and
Jack Clark and
Gretchen Krueger and
Ilya Sutskever},
title = {Learning Transferable Visual Models From Natural Language Supervision},
journal = {CoRR},
volume = {abs/2103.00020},
year = {2021}
}

