Project in Deep Learning course - 046211, Technion, 2022-2023
Used repository in this project OpenAI-CLIP
GitHub
Helping repositories along the way: Classifier-GitHub,
food-101-GitHub
Our goal is to recognize from an image of a dish, the ingredients that it consists of, or at least the significant ingredients that can be inferred from the image. The ability to accurately recognize ingredients in food images has the potential to revolutionize the food industry, from recipe suggestion to dietary management. With the increasing popularity of food-related social media platforms and the growing number of people with dietary restrictions, there is a clear need for a tool that can quickly and easily identify ingredients in food images. This project aims to develop a prototype for a model for food ingredient recognition from a given image, with the goal of providing a valuable resource for individuals and companies in the food industry.
FOOD 101
commonly used for research in food recognition and classification.
- 101 Food Classes
- 101,000 images
- 800-1300 images in each class
Our model is based on ResNet18 CNN and CLIP Image encoder.
The CLIP Image features pass through FC layer, Batch Normalization layer, and are summed to the skip connections of the ResNet activations.
| Library | Version |
|---|---|
Python |
3.9 |
cuda (for GPU usage) |
11.3 |
python3 -m venv venvsource venv/bin/activateInstall requirements and GUI for display result on sample.
pip install -r requirements.txt
sudo apt-get install python3-tkThe used dataset is FOOD101.
Run the Data/utils.py file as main for downloading the dataset.
- You can clean the data annotations by yourself with Data/annotation_extractor.py.
- Note that at the first use you should train for creating checkpoint of model weights.
Modify hyperparameters in the relevant block at train_model.py.
The output path should exist.
Run the train_model script for training model.
You can add to the model CLIP features of image.
If you have dish name info it can be also injected to the model with text encoder of CLIP.
Modify the parameters at test_model.py as model path and set the configuration
for testing the model on test set of FOOD101.
The scores are printed to the output channel (default terminal).
Modify the parameters section in sample_test.py.
The parameters relate to the loaded model and the image path.
The result will be displayed on separate window with the ingredients as the title.
| File name | Purpsoe |
|---|---|
train_model.py |
train Resnet18 model with configuration from CLIP. |
test_model.py |
load trained model and test on dataset described with json annotation file (default test set of FOOD101). |
sample_test.py |
load trained model and test on single image and display with ingredients. |
Data/IngredientsLoader.py |
modified data loader for parsing the annotation file and the relevant images. |
Data/utils.py |
utility functions. |
Data/annotation_extractor.py |
scripts for extracting ingredients and dictionary from annotations ingredients101. |
Data/Ingredients_json/... |
extracted json files that contain image with it ingredients. |
model/BasicNodule.py |
basic script for Resnet implementation from Classifier. |
model/Resnet.py |
modified Resnet with our additions (image and text extracturs of CLIP). source code from Classifier. |
model/Resnet_w_concat_connection.py |
modified Resnet with different connection in skip connections.source code from Classifier. |
model/utils.py |
utility functions |

