Codebase of InfSplign

Official Implementation for "InfSplign: Inference-Time Spatial Alignment of Text-to-Image Diffusion Models"

(Sarah Rastegar, Violeta Chatalbasheva, Sieger Falkena, Anuj Singh, Yanbo Wang, Tejas Gokhale, Hamid Palangi, Hadi Jamali-Rad)

Abstract

Text-to-image (T2I) diffusion models generate high-quality images but often fail to capture the spatial relations specified in text prompts. This limitation can be traced to two factors: lack of fine-grained spatial supervision in training data and inability of text embeddings to encode spatial semantics. We introduce InfSplign, a training-free inference-time method that improves spatial alignment by adjusting the noise through a compound loss in every denoising step. Proposed loss leverages different levels of cross-attention maps extracted from the U-Net decoder to enforce accurate object placement and a balanced object presence during sampling. The method is lightweight, plug-and-play, and compatible with any diffusion backbone. Our comprehensive evaluations on VISOR and T2I-CompBench show that InfSplign establishes a new state of the art (to the best of our knowledge), achieving substantial performance gains over the strongest existing inference-time baselines and even outperforming fine-tuning-based methods.

Environment

Create Virtual Environment (venv)

conda env create -f environment.yml
conda activate infsplign
python -m spacy download en

or

conda create -n infsplign python=3.11
conda activate infsplign
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.4 -c pytorch -c nvidia
conda install --yes --file requirements.txt
python -m spacy download en

T2I-CompBench

Follow the steps for downloading the detection model weights from the original T2I-CompBench repo in section UniDet for 2D/3D-Spatial Relationships and Numeracy evaluation. Then run the following steps:

cd UniDet_eval
pip install --user --no-cache-dir git+https://github.com/facebookresearch/detectron2.git@5aeb252b194b93dc2879b4ac34bc51a31b5aee13

# Fix bug in detectron2:
# In file "\lib\site-packages\detectron2\data\transforms\transform.py"
# Change LINEAR to BILINEAR in this line:
def __init__(self, src_rect, output_size, interp=Image.BILINEAR, fill=0):

Usage

To generate the images, run the following command:

python pipeline_batch.py --model sd2.1 --benchmark <benchmark_name> --json_filename <prompt_filename> --batch_size 10 --loss_type gelu --strategy diff --energy_loss var

After image generation completes, compute the evaluation scores:

python run_evaluation.py --model sdxl sd2.1 --benchmark <benchmark_name> --json_filename <prompt_filename>

Evaluation benchmarks are:

visor: VISOR
t2i: T2I-CompBench
geneval: GenEval

The corresponding data files are:

VISOR: visor_prompts
T2I-CompBench: t2i_prompts
GenEval: geneval_objects.

We provide multiple VISOR subsets in json_files.

Contact

Corresponding author: Violeta Chatalbasheva (violetachatalbasheva@gmail.com)

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
UniDet_eval		UniDet_eval
imgs		imgs
job_scripts		job_scripts
json_files		json_files
notebooks		notebooks
utils		utils
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
attn_processor_batch.py		attn_processor_batch.py
environment.yml		environment.yml
models.py		models.py
pipeline_batch.py		pipeline_batch.py
requirements.txt		requirements.txt
run_evaluation.py		run_evaluation.py
self_guide_batch.py		self_guide_batch.py
sweep.sh		sweep.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codebase of InfSplign

Official Implementation for "InfSplign: Inference-Time Spatial Alignment of Text-to-Image Diffusion Models"

(Sarah Rastegar, Violeta Chatalbasheva, Sieger Falkena, Anuj Singh, Yanbo Wang, Tejas Gokhale, Hamid Palangi, Hadi Jamali-Rad)

Abstract

Environment

Create Virtual Environment (venv)

T2I-CompBench

Usage

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Codebase of InfSplign

Official Implementation for "InfSplign: Inference-Time Spatial Alignment of Text-to-Image Diffusion Models"

(Sarah Rastegar, Violeta Chatalbasheva, Sieger Falkena, Anuj Singh, Yanbo Wang, Tejas Gokhale, Hamid Palangi, Hadi Jamali-Rad)

Abstract

Environment

Create Virtual Environment (venv)

T2I-CompBench

Usage

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages