A comprehensive toolkit for evaluating and analyzing vision models, with a focus on Vision Transformers (ViT). Vitookit provides flexible evaluation protocols, distributed training support, and seamless integration with HPC clusters and Weights & Biases.
- 🎯 Multiple Evaluation Protocols: Full finetuning, linear probing, k-NN, and more
- ⚡ Fast Data Loading: Support for both PyTorch DataLoader and FFCV for high-performance I/O
- 🔧 Flexible Configuration: Gin-config based configuration system
- 🚀 Distributed Training: Multi-GPU and multi-node training with PyTorch DDP
- 🏭 HPC Integration: Slurm and Condor cluster support with automatic job management
- 📊 WandB Integration: Seamless experiment tracking and model versioning
- 🗂️ Various Datasets: ImageNet variants, CIFAR, transfer learning datasets, and more
pip install git+https://github.com/erow/vitookit.gitvitrun is a wrapper around torchrun that automatically locates evaluation scripts in vitookit/evaluation.
Single GPU:
vitrun eval_cls.py \
--data_location=/data/IMNET \
--model vit_tiny_patch16_224 \
--gin build_model.global_pool='"avg"'Multi-GPU (distributed):
vitrun --nproc_per_node=8 eval_cls.py \
--data_location=/data/IMNET \
--model vit_base_patch16_224 \
--batch_size=128Submit a job to a Slurm cluster:
sbatch hpc/svitrun.sh eval_cls.py \
--data_location=/data/IMNET \
--model vit_tiny_patch16_224 \
--gin build_model.global_pool='"avg"'Submitit provides automatic checkpointing, job requeuing, and better cluster integration:
submitit \
--module vitookit.evaluation.eval_cls \
--ngpus=8 \
--nodes=1 \
--partition=gpu \
--data_location=/data/IMNET \
--model vit_base_patch16_224Available Options:
--module: Python module to run (e.g.,vitookit.evaluation.eval_cls)--ngpus: Number of GPUs per node--nodes: Number of nodes to request-t, --timeout: Job duration in minutes--mem: Memory to request (GB)--partition: Cluster partition name--job_dir: Directory for job outputs--fast_dir: Fast disk directory for dataset caching
FFCV Fast Loading Example:
submitit \
--module vitookit.evaluation.eval_cls_ffcv \
--ngpus=8 \
--train_path ~/data/ffcv/IN1K_train_500_95.ffcv \
--val_path ~/data/ffcv/IN1K_val_500_95.ffcv \
--fast_dir /scratch/local/ \
-w wandb:dlib/EfficientSSL/lsx2qmyscondor_submit condor/eval.submitVitookit provides multiple evaluation protocols for comprehensive model assessment. Use vitrun to launch any evaluation script.
Train all model parameters end-to-end:
vitrun eval_cls.py \
--data_location=$DATA_PATH \
-w <weights.pth> \
--gin build_model.drop_path_rate=0.1See doc/finetune.md for detailed training recipes and hyperparameter settings.
Freeze the backbone and train only the classification head:
vitrun eval_linear.py \
--data_location=$DATA_PATH \
-w <weights.pth> \
--blr=0.1Parameter-free evaluation using feature similarity:
vitrun eval_knn.py \
--data_location=$DATA_PATH \
-w <weights.pth>Automatically determine optimal learning rates:
vitrun lr_finder.py \
--data_location=$DATA_PATH \
--model vit_base_patch16_224Vitookit uses gin-config for flexible model and dataset configuration.
vitrun eval_cls.py --cfgs config.gin another_config.ginvitrun eval_cls.py \
--gin build_model.drop_path_rate=0.1 \
--gin build_model.global_pool='"avg"'Note: String values in gin require nested quotes: '"value"'
build_model()- Model architecture and parametersbuild_dataset()- Dataset selection and preprocessingbuild_transform()- Data augmentation strategies
Vitookit supports multiple sources for loading pretrained weights via -w or --pretrained_weights:
| Source Type | Format | Example |
|---|---|---|
| Local file | File path | -w /path/to/checkpoint.pth |
| HTTPS URL | URL | -w https://example.com/model.pth |
| WandB Run | wandb:<entity>/<project>/<run_id> |
-w wandb:dlib/EfficientSSL/lsx2qmys |
| WandB Artifact | artifact:<entity>/<project>/<name> |
-w artifact:dlib/models/vit-base |
Extract weights from nested checkpoint structures:
vitrun eval_cls.py \
-w checkpoint.pth \
--checkpoint_key model \
--prefix "^module\.(.*)"Options:
--checkpoint_key: Extract state dict from a specific key (e.g.,model,teacher,student)--prefix: Regex pattern to remove from state dict keys (e.g.,module.from DDP models)
Vitookit supports 20+ datasets out of the box:
ImageNet Variants:
IN1K- ImageNet-1K (ILSVRC2012)IN100- ImageNet-100 subsetIN1Kv2- ImageNet-V2
Transfer Learning Datasets:
CIFAR10,CIFAR100- CIFAR datasetsFlowers- Oxford 102 FlowersCars- Stanford Cars (196 classes)Aircraft- FGVC AircraftPets- Oxford-IIIT Pets (37 classes)CUB200- Caltech-UCSD BirdsFood- Food-101DTD- Describable Textures DatasetSUN397- Scene UnderstandingSTL- STL-10INAT- iNaturalist
Custom:
Folder- Generic ImageFolder structure
--blr: Base learning rate (scales with batch size:lr = blr * batch_size / 256)--lr: Absolute learning rate (overrides automatic scaling)--layer_decay: Layer-wise LR decay (0.5-0.75 recommended for ViT finetuning)
--ThreeAugment: Use 3-Augment instead of RandAugment--mixup: Mixup alpha (default: 0.8)--cutmix: Cutmix alpha (default: 1.0)--ra N: Repeated augmentation (N > 1 for batch augmentation)
ViT from scratch:
--opt adamw --blr 5e-4 --weight_decay 5e-2 --epochs 300ViT finetuning:
--opt adamw --blr 5e-4 --layer_decay 0.65 --epochs 100ResNet50:
--opt adamw --blr 5e-4 --weight_decay 2e-5See doc/finetune.md for more detailed recipes.
By default, outputs are saved to --output_dir (defaults to outputs/<protocol>-<dataset>):
checkpoint.pth- Latest checkpoint (saved every--ckpt_freqepochs)checkpoint_best.pth- Best validation accuracy checkpointconfig.gin- Gin configuration snapshotconfig.yml- Arguments configurationlog.txt- JSON-formatted training logs
WandB is automatically initialized when --output_dir is set:
- Logs metrics, hyperparameters, and system information
- Supports automatic resuming from checkpoints
- Set custom run name:
WANDB_NAME=my_experiment vitrun ...
- doc/finetune.md - Detailed training recipes for different architectures
- doc/mvit.md - Multiscale Vision Transformer specifics
- doc/simlap.md - SimLap: Simple Representation Learning with Arbitrary Pairs
- doc/margin_classification.md - Margin-based classification methods
We welcome contributions! Please feel free to submit issues and pull requests.
If you use Vitookit in your research, please cite:
@software{vitookit,
author = {Gent},
title = {Vitookit: Vision Model Evaluation Toolkit},
year = {2024},
url = {https://github.com/erow/vitookit}
}This project is licensed under the MIT License.