We provide the CoCoCON evaluation dataset consisting of 1500 samples at ./data/cococon.json. Each sample contains 1-5 contrast sets. See paper for details and a few examples below.

We evaluate the pretrained checkpoints provided here on CoCoCON.
- Migrate to the directory
unified-io, follow instructions in the original repository to create JAX environment.
cd unified-io - Download pretrained Unified-IO checkpoints and save in the directory
./checkpoints/. - To run likelihood-based evaluation of cross-task consistency using CoCoCON, execute the following command. Sizes can be chosen from
small,base,largeandxl. Output files are saved at./results/by default. The path to validation split (val2014) of MS-COCO images is needed as additional input.
bash evaluate_cococon.sh <size> <path-to-image-directory> - To generate predictions for the samples in CoCoCON, execute the following command:
bash evaluate_tasks.sh <size> <path-to-image-directory> - Follow instructions here for evaluation of task-specific accuracies using output from Step 3.
We first finetune pretrained checkpoints of OFA models on the four tasks in CoCoCON and then evaluate them on CoCoCON. Instructions for training OFA models coming soon!
Migrate to the evaluators directory i.e. cd evaluators/.
- Install packages required for COCO Caption Evaluation.
pip install -r requirements.txt - Run the following command using output files from Unified-IO or OFA.
python coco_eval.py <path-to-output-file> ../data/cococon.json
We thank the researchers behind Unified-IO and OFA for making their models available for training and inference.