Image Captioning With Encoder-Decoder Architecture

Project for the course Deep Learning 046211 (Technion) Winter 2022-2023.

Video:

YouTube - https://youtu.be/HsJHZepSWHU (in hebrew).

Encoder Decoder Image Captioning

Background

Image captioning is the task of generating short sentences that describe the content of an image. The goal of this project is to implement an encoder-decoder network for image captioning. The encoder is a pre-trained CNN, and for the decoder we used both LSTM and Transformer networks. The network is trained on the Flickr8k dataset.

Prerequisites

Full lists of requirements are in the requirements.txt file. Require python version is 3.10.9. To install the requirements run: pip install -r requirements.txt

Files in the repository

File name	Purpsoe
'data.py'	Data loader and additional scripts for the flickr8k datasets.
'models.py'	All the models used in the project (Transformer, LSTM, resnet50k).
'train.py'	Training script.
Example_Images	Folder with example images for the README.md file.
'LSTM_optuna.py'	Optuna hyperparameter tuning script for the LSTM model.
'Transformer_optuna.py'	Optuna hyperparameter tuning script for the Transformer model.
'Transformer_full.csv'	Results for the Transformer model during final training.

Results

The full results for the Transformer model training are in the 'Transformer_full.csv' file. In order to replicate the results run train.py without changing the hyperparameters, seed or model class.

Training

In order to train the model one should clone the repository, select the model class (Transformer / LSTM) and select the required hyperparameters in the script (the optimal hyperparameters we used are already in the script).

Examples

References:

https://www.kaggle.com/code/itaishufaro/flickr-30k-data-loader-preparation-pytorch/edit (Based data loader on this script).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning With Encoder-Decoder Architecture

Project for the course Deep Learning 046211 (Technion) Winter 2022-2023.

Background

Prerequisites

Files in the repository

Results

Training

Examples

References:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Example_Images		Example_Images
.gitignore		.gitignore
LSTM_optuna.py		LSTM_optuna.py
README.md		README.md
Transformer_Optuna.py		Transformer_Optuna.py
Transformer_full.csv		Transformer_full.csv
data.py		data.py
models.py		models.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Image Captioning With Encoder-Decoder Architecture

Project for the course Deep Learning 046211 (Technion) Winter 2022-2023.

Background

Prerequisites

Files in the repository

Results

Training

Examples

References:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages