This is the official implementation of the paper CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation by Arnav Yayavaram, Siddharth Yayavaram, Simran Khanuja, Michael Saxon and Graham Neubig.
As text-to-image models become increasingly prevalent, ensuring their equitable performance across diverse cultural contexts is critical. Efforts to mitigate cross-cultural biases have been hampered by trade-offs, including a loss in performance, factual inaccuracies, or offensive outputs. Despite widespread recognition of these challenges, an inability to reliably measure these biases has stalled progress. To address this gap, we introduce CAIRE, a novel evaluation metric that assesses the degree of cultural relevance of an image, given a user-defined set of labels. Our framework grounds entities and concepts in the image to a knowledge-base and uses factual information to give independent graded judgments for each culture label. On a manually curated dataset of culturally salient but rare items built using language models, CAIRE surpasses all baselines by 28% F1 points. Additionally, we construct two datasets for culturally universal concepts, one comprising of T2I generated outputs and another retrieved from naturally-occurring data. CAIRE achieves Pearson’s correlations of 0.56 and 0.66 with human ratings on these sets, based on a 5-point Likert scale of cultural relevance. This demonstrates its strong alignment with human judgment across diverse image sources.
Prerequisites:
- Python version 3.9 or later.
git clone https://github.com/siddharthyayavaram/CAIRE.git
cd CAIREThe setup process performs the following tasks:
- Create a virtual environment:
python -m venv caire source caire/bin/activate - Install dependencies:
pip install -e .
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124For further details regarding the installation of PyTorch, refer to the official PyTorch guide.
Note:
If you are using an Ampere GPU, ensure that your CUDA version is 11.7 or higher, then install FlashAttention with the following command:
pip install flash-attn --no-build-isolation-
Creates necessary directories: Ensures the existence of required folders (
data/, andsrc/outputs/). -
Downloads dataset files (~31GB): Fetches preprocessed datasets and lookup files, storing them in
data/. -
Downloads predefined target culture lists into
data/:country_list.pkl: A list of 177 countriestop10_countries.pkl: 10 countries selected based on annotator availability (population) and diversity
["Brazil", "China", "Egypt", "Germany", "India", "Indonesia", "Mexico", "Nigeria", "Russia", "United States of America"]
indian_states.pkl: 28 Indian states, excluding Union TerritoriesUSA_states.pkl: U.S. statescommon_religions.pkl: Religions with the highest global population representation
["Christianity", "Islam", "Hinduism", "Buddhism", "Sikhism", "Judaism", "Atheism", "Agnosticism"]
python setup.py download_assetsDEFAULT_DATASET: Fallback image folder (src/examples/).DATA_PATH,OUTPUT_PATH: Root folders for data files (.pkl, indices) and outputs.PREDEFINED_TARGET_LISTS: Paths to predefined target lists stored underdata/.INDEX_INFOS,FAISS_INDICES,LEMMA_EMBEDS,BABELNET_WIKI: Retrieval/Index metadata.RETRIEVAL_BATCH_SIZE,NUMBER_RETRIEVED_IMAGES,MAX_WIKI_DOCS: Retrieval Parameters.PROMPT_TEMPLATE: Prompt for culture scoring.
python -m src.main --target_list <TARGET_LIST> --image_paths <IMAGE_PATHS>-
--target_list-
Default:
top10_countries.pkl(located underdata/). -
If you input a
*.pklfile that exists indata/, CAIRE treats it as a predefined list. e.gindian_states.pkl -
To use custom labels, wrap a comma-separated string in quotes
--target_list "CultureA,CultureB,CultureC"
You can set your own
.pklfiles, but to use them as predefined target lists, you must also add their paths toPREDEFINED_TARGET_LISTSinconfig.py. -
-
--image_paths-
Must be either:
- A single folder path (e.g.,
data/image_folder) - A space-separated list of image file paths (e.g.,
img1.jpg img2.jpg img3.jpg)
- A single folder path (e.g.,
-
If a folder is passed, CAIRE processes all images inside.
-
If omitted, CAIRE defaults to the example folder in
config.DEFAULT_DATASET(src/examples/).
-
-
args.timestamp- Automatically set to the current timestamp (format:
YYYYMMDD_HHMMSS, e.g.,20250531_143210). - Used to create a unique subfolder for storing all intermediate and final output files.
- Automatically set to the current timestamp (format:
-
log_run_metadata(args)-
Appends a row to
src/outputs/run_log.csvfor every run, containing:timestamp(e.g.,20250531_143210)image_input_type(folderorlist)num_imagesimage_paths(folder name or space-separated file paths)targets(predefined target_list filename or comma-separated custom labels)
All outputs are written under
src/outputs/, with subfolders identified by timestamps. -
src/outputs/<TIMESTAMP> contains intermediate and final output files:
bids_match.pkl: Entity matching results (BabelNet ID matching).lemma_match.pkl: Lemma-based disambiguation.WIKI.pkl: Retrieved Wikipedia Pages.image_embeddings.pkl: mSigLIP image embeddings.1-5_scores_VLM_qwen.pkl: Final 1–5 scoring results (UsingQwen2.5-VL-7B-Instruct).combined_outputs.csv: Final CSV containingimage_path, Matched Entity, corresponding Wikipedia link, and 1-5 Scores.
For every run, check run_log.csv (in src/outputs) to match timestamp & input parameters.
The file will have the following structure:
| timestamp | image_input_type | num_images | image_paths | targets |
|-----------------|------------------|------------|--------------------|-----------------------|
| 20250531_143210 | folder | 125 | examples | top10_countries.pkl |
| 20250531_150005 | list | 20 | /path/img1.jpg … | "CultureA,CultureB" |
-
Default image folder and target list (uses
data/top10_countries.pklandconfig.DEFAULT_DATASET):python -m src.main
-
Specify a predefined target list (
indian_states.pkl) and a folder of imagespython -m src.main --target_list indian_states.pkl --image_paths image_folder
-
Pass individual image files manually and a custom target list
python -m src.main --target_list "CultureA,CultureB,CultureC" \ --image_paths image_folder/img1.jpg image_folder/img2.jpg -
Using a custom
.pkltarget list you added underdata/Suppose you createddata/custom_targets.pkl.python -m src.main --target_list custom_targets.pkl --image_paths image_folder
Try these examples directly:
python -m src.main --target_list "Canada, Brazil, United States, Mexico, Argentina, United Kingdom, France, Germany, Italy, Egypt, South Africa, Nigeria, India, China, Japan, South Korea, Australia, New Zealand, Saudi Arabia, Indonesia" \
--image_paths examples/t2i/wedding/nanobanana/1.png examples/t2i/wedding/nanobanana/2.png examples/t2i/wedding/nanobanana/3.png examples/t2i/wedding/nanobanana/4.png examples/t2i/wedding/nanobanana/5.png python -m src.main --target_list "West Africa, Caribbean, East Asia" --image_paths examples/djembe.pngeval/visualization.ipynb shows the 1-5 scores and matched Wikipedia pages for the example images with default CAIRE arguments (data/top10_countries.pkl and config.DEFAULT_DATASET).
For web integration, we provide a FastAPI-based REST API.
cd CAIRE
conda activate caire
pip install -r api/requirements_api.txt./api/start_server.shOr manually:
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
uvicorn api.server:app --host 0.0.0.0 --port 8000 --reloadServer: http://localhost:8001 | Docs: http://localhost:8001/docs
- GET
/api/health- Health check - GET
/api/predefined-lists- Get available culture lists - POST
/api/analyze- Analyze image with custom cultures - POST
/api/analyze-with-predefined- Analyze with predefined culture list
curl -X POST "http://localhost:8000/api/analyze" \
-F "image=@image.jpg" \
-F "cultures=India,China,USA"For complete documentation, see api/README.md or api/QUICKSTART.md
Important
Ensure you have sufficient disk space before proceeding:
- data/ requires ~31GB
If you find this work useful in your research, please cite:
@misc{yayavaram2025caireculturalattributionimages,
title={CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation},
author={Arnav Yayavaram and Siddharth Yayavaram and Simran Khanuja and Michael Saxon and Graham Neubig},
year={2025},
eprint={2506.09109},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.09109},
}