Evaluating graphical perception capabilities of Vision Transformers
Vision Transformers (ViTs) have emerged as a powerful alternative to convolutional
neural networks (CNNs) in a variety of image-based tasks. While CNNs have pre-
viously been evaluated for their ability to perform graphical perception tasks, which
are essential for interpreting visualizations, the perceptual capabilities of ViTs remain
largely unexplored. In this work, we investigate the performance of ViTs in elementary
visual judgment tasks inspired by Cleveland and McGill’s foundational studies, which
quantified the accuracy of human perception across different visual encodings. Inspired
by their study, we benchmark ViTs against CNNs and human participants in a series of
controlled graphical perception tasks. Our results reveal that, although ViTs demonstrate
strong performance in general vision tasks, their alignment with human-like graphical
perception in the visualization domain is limited. This study highlights key perceptual
gaps and points to important considerations for the application of ViTs in visualization
systems and graphical perceptual modeling.
The src directory contains the code necessary to produce the data, train the models for all results reported in the paper.
- Python >= 3.10
- Cuda >= 12.4
- Pytorch: 2.6.0
If you want to avoid option 1 manual installation, perform the following:
git clone https://github.com/poonam2308/ViTsGraphicalPerception.git
cd ViTsGraphicalPerception
bash install_replicate.shsudo apt update
sudo apt install git Installation of all the requirements can be either done manually or using conda option.For instructions on installing Conda, see prerequisites.md.
Before proceeding ensure that Conda is installed on your system, see prerequisites.md.
#git clone git@github.com:poonam2308/ViTsGraphicalPerception.git
# to avoid SSH key issues perform git clone via https instead of SSH
git clone https://github.com/poonam2308/ViTsGraphicalPerception.git
cd ViTsGraphicalPerception
bash setup_conda.sh
#*Note* If you did not accept the license terms during the first Miniconda installation attempt, the installer will prompt you again to review and accept the terms.
conda activate vitsgpon a fresh machine, you can install the basic python tools with (do not forget to replace X with the python version in your machine):
sudo apt python3 python3-venv python3-pip
# if python version specific, replace X in the below comment and run the below command
#sudo apt install python3.X python3.X-venv python3-pip#git clone git@github.com:poonam2308/ViTsGraphicalPerception.git
# to avoid SSH key issues perform git clone via https instead of SSH
git clone https://github.com/poonam2308/ViTsGraphicalPerception.git
cd ViTsGraphicalPerception
bash setup_venv.sh
source venv/bin/activate- Stimulus Generation (Data): src/ClevelandMcGill modules to build task specific images
- Network: src/Models modules to define the three types (CvT, Swin, vViT) network architecture used in the paper
- Training: src/Experiments modules to perform training on CvT, Swin and vViT on generated data. Please note data is generated during the training process and it is not saved in the disk. It can be easily produced with the stimuli generation step.
- Evaluation: src/TestEvaluation modules to evaluate the trained checkpoints (weights) on the test dataset.
- Analysis: src/Analysis modules to compare CvT, Swin, and vViT to human performance on the same synthetic stimuli.
The trained checkpoints for the experiments are available on GoogleDrive Access it here
- Create required folders
mkdir -p src/Experiments/chkpt
mkdir -p src/Experiments/trainingplots- Train a model (per task/script)
python src/Experiments/<name_of_file>.py- Customize data sizes, batch size, epochs All experiment scripts accept the following flags:
--train_target number of training samples
--val_target number of validation samples
--test_target number of test samples
--batch_size batch size
--epochs number of training epochs
python src/Experiments/cvt_bfr.py --train_target 100 --val_target 20 --test_target 20 --batch_size 32 --epochs 100Reproduce the following via single scripts
Analysis figures Run the analysis scripts directly
- Navigate to the directory src/Analysis
- Open and run:
jupyter notebook Analysis.ipynbBaseline evaluation (all models via one single script notebook)
- Navigate to the directory src/TestEvaluation Download the pretrained checkpoints and place them in:
TestEvaluation/chkpt/- Open and run:
jupyter notebook Main_Evaluation.ipynbAll evaluation results are saved as CSV files under the results/ subfolders.
Ablation study (all models via one single script notebook)
- Navigate to the directory src/TestEvaluation Download the pretrained checkpoints and place them in:
TestEvaluation/chkpt/- Open and run:
jupyter notebook Ablation_Evaluation.ipynb- OS tested Linux (Pop!_OS, Arch)
- Python / CUDA: [Python 3.10], [CUDA 12.4], [PyTorch 2.3]
- GPU used: [NVIDIA RTX 3090], [NVIDIA RTX 6000 Ada], [NVIDIA A100]
- VRAM required (training): [≥ 16 GB VRAM recommended]
- VRAM required (evaluation only): [≥ 12 GB VRAM]
- Typical training one model & task: [~4.0 to 6 hours] for
[epochs=100, batch_size=32]on the GPU above.
Note: Data are generated on the fly during training as described in the repo. If you have less VRAM(smaller GPUs), reduce batch_size. Evaluation typically fits in ~12 GB VRAM. Random seeds are set in the experiment scripts; small numeric differences may occur across hardware/driver versions. - Produce evaluation results with
Main_Evaluation.ipynband analysis figures withAnalysis.ipynb. - Or run both in one go with the convenience script; this script here is reproducing the results and figures from the paper using the provided pretrained checkpoints, not re-training models from scratch:
replicate.sh
