Brain-Cap

Generating Captions for Visual Stimuli Out of fMRI Scans

By Yoav Tsoran and Roey Shafran

This repository is part of a final project in the Technion's course 046211 - Deep Learning

Overview

This project presents a proposed method for creating a descriptive text of the visual stimuli presented to a subject during an fMRI scan. The method is based on the combination of two previous works, MinD-Vis and ClipCap. Mind-Vis is used to generate meaningful embeddings for fMRI scans, while ClipCap is used to create an image embedding that is used as a prefix to GPT2 pre-trained language model. Our work builds upon these previous methods by using the ClipCap method in an fMRI-to-caption setup and training a simple mapping network between the MinD-Vis fMRI encoder embedding space and the GPT2 embedding space. This approach can help improve our understanding of the brain's visual system and explore potential technological applications.

Outlines

Environment setup
Dataset and folder structure
Training
Results
Credits

Environment setup

After cloning into the repository, please run:

conda env create -f environment.yml
conda activate brain-cap

As an alternative, using pip you can run:
```
pip install -r requirements.yml
```

Dataset and folder structure

Due to size limits, the data and pretrains folders aren't included in this repository ad need to be downloaded seperately. The data folder include both the fMRI-image datasets used in the MinD-Vis work, the captions for the included images and the checkpoints file for our model. The data folder structure is as follows:

data
├── BOLD5000
│   ├── BOLD5000_GLMsingle_ROI_betas
│   ├── BOLD5000_Stimuli
│   ├── COCO-captions
│   │   └── annotations
│   ├── CSI1_no_duplicates.pth
│   └── ImageNet-captions
│       └── imagenet_captions.json
└── Checkpoints

The MinD-Vis repository provides download links for the fMRI-image datasets. The data.zip file needs to be extracted to this repository data folder as stated above. The COCO dataset captions can be downloaded from the COCO dataset official website. The ImageNet dataset captions can be downloaded from the mlfoundations/imagenet-captions GitHub reposiroty. We also provide a download for our Checkpoints, and the MinD-Vis pretrained encoder. The fMRI_encoder_pretrain_metafile.pth should be copied to the pretrains folder.

Creating the Dataset

To speed up the training we use only one caption for each training sample and save the preprocessed dataset for faster loading.
A script for creating the dataset file is provided.
For example, to create the dataset file only for the first subject (CSI1), as used for our training, please run the following line from the code folder:
```
python create_dataset_no_dup.py --path ../data/BOLD5000 --save-path ../data/BOLD5000/CSI1_no_duplicates.pth --subjects CSI1 --batch_size 8
```
If you saved the BOLD5000 folder at a different location, want to train the model on more subjects or use larger batch size for the preprocess (might help with the script running speed) run the script with the --help flag.

Training:

We trained a MLP architecture between the latent spaces while keeping the fMRI encoder and GPT2 decoder freezed.

To train the model, use the train.ipynb notebook. This notebook allows training our model on the CSI1_no_duplicates dataset from scratch orkeep training from our last Checkpoints.
At the beginning of the notebook, you can change the locations of the different folders if you have downloaded the datasets or checkpoints to a different directory from the repo.
Example for the training process:

Results

References:

Chen, Z., Qing, J., Xiang, T., Yue, W., & Zhou, J. (2022). Seeing Beyond the Brain: Masked Modeling Conditioned Diffusion Model for Human Vision Decoding. In arXiv.
Mokady, R., Hertz, A., & Bermano, A. (2021). Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brain-Cap

Overview

Environment setup

Dataset and folder structure

Creating the Dataset

Training:

Results

References:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
code		code
data		data
pretrains		pretrains
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.yml		requirements.yml

Folders and files

Latest commit

History

Repository files navigation

Brain-Cap

Overview

Environment setup

Dataset and folder structure

Creating the Dataset

Training:

Results

References:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages