This repository provides the evaluation code and COCO-DIMCIM benchmark dataset described in our paper "DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models"
The Does-it/Can-it framework, DIMCIM, is a reference-free measurement of default-mode diversity (“Does" the model generate images with expected attributes?) and generalization capacity (“Can" the model generate diverse attributes for a particular concept?) of text-to-image generative models.
COCO-DIMCIM dataset is a benchmarking dataset of concepts, attributes, coarse prompts and dense prompts derived from COCO dataset, which we release as a part of this work. It can be used to evaulate attribute wise default mode diversity and generalization capacity of text-to-image generative models. COCO-DIMCIM consists of 30 concepts, 494 attributes, 930 coarse prompts, and 14,641 dense prompts.
The dataset files are located in COCO-DIMCIM directory
COCO-DIMCIM/seed_captions_attributes: json files containing concept attributes and seed captions from COCO datset used to generate coarse and dense prompts for image generation.COCO-DIMCIM/dense_prompts: json files containing coarse and dense image generation prompts derived from COCO seed captions.COCO-DIMCIM/simple_attribute_prompts: json files containing simple concept-attribute prompts used to compute attribute VQAScores as described in the paper.
The dataset is licensed CC-by-NC (found in the LICENSE file) and is intended for use as a benchmark. This data is an output from Llama 3.1, and also subject to the Llama 3.1 terms (link). Use of the data to train, fine tune, or otherwise improve an AI model, which is distributed or made available, shall also include "Llama" at the beginning of any such AI model name.
First install VQAScore from VQAScore repo
And other dependencies
$ pip install matplotlib numpy tqdm
2. Generate images using text-to-image model using coarse prompts and dense prompts in COCO-DIMCIM/dense_prompts.
- For coarse prompts generated images, save image paths in the format given in the example
DIMCIM-score-calculation/example_image_gen_jsons/table_coarse_prompt_generated_images_paths.json. Replace the list with generated images paths. - For dense prompts generated images, save image paths like in the format given in the example
DIMCIM-score-calculation/example_image_gen_jsons/table_dense_prompt_generated_images_paths.json. Replace the list of image paths of each prompt, with generated images paths.
For DIM score calculation
$ cd DIMCIM-score-calculation
$ python calculate_DIM_scores.py --coarse_prompts_generated_images_path <path to coarse prompts generated image paths json> --simple_attribute_prompts_path <path to simple attribute prompts json for the corresponding concept>
DIM score calculation example:
$ python calculate_DIM_scores.py --coarse_prompts_generated_images_path "./example_image_gen_jsons/table_coarse_prompt_generated_images_paths.json" --simple_attribute_prompts_path ../COCO-DIMCIM/simple_attribute_prompts/table_simple_attribute_prompts.json
For CIM score calculation
$ cd DIMCIM-score-calculation
$ python calculate_CIM_scores.py --dense_prompts_generated_images_path <path to dense prompts generated image paths json> --simple_attribute_prompts_path <path to simple attribute prompts json for the corresponding concept>
CIM score calculation example:
$ python calculate_CIM_scores.py --dense_prompts_generated_images_path "./example_image_gen_jsons/table_dense_prompt_generated_images_paths.json" --simple_attribute_prompts_path ../COCO-DIMCIM/simple_attribute_prompts/table_simple_attribute_prompts.json
After running the above scripts to calculate DIM and CIM scores, attribute wise DIM scores (dim_scores.json) and CIM scores (cim_scores.json) are generated.
Use the notebook DIMCIM-score-calculation/plot_DIMCIM_scores.ipynb to plot DIM/CIM scores and analyze attribute-wise default mode diversity and generalization capacity of text-to-image models.
@misc{teotia2025dimcimquantitativeevaluationframework,
title={DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models},
author={Revant Teotia and Candace Ross and Karen Ullrich and Sumit Chopra and Adriana Romero-Soriano and Melissa Hall and Matthew J. Muckley},
year={2025},
eprint={2506.05108},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.05108},
}DIMCIM is CC-BY-NC 4.0 licensed, as found in the LICENSE file.
