This is the official repository for the paper "Test-Time Canonicalization by Foundation Models for Robust Perception" (ICML 2025).
- Each task is a python script of its own (in
experiments/folder) - Each python file imports from
focal/utils, which contain most of the dependencies and library code. - In particular,
focal/utils/energy.pycontains the energy function used for optimization. - You can run the experiments using the following command:
python -m experiments.rotation_2D # Replace `rotation_2D` with the name of the task you want to run.The recommended way to install the code is using uv package manager, however, we also provide a requirements.txt file for pip users.
We have tested the code with Python 3.9 and CUDA 12.8.
Recommended installation (uv):
pip install uv
uv sync
uv add . --dev
uv run -m experiments.rotation_2D # Run the 2D rotation experiment
pip installation:
pip install -r requirements.txt
pip install -e .
python -m experiments.rotation_2D
Note: You might need additional dependencies for specific tasks such as active vision (gaussian-splatting) and 3D viewpoints. See below for details.
In order to run the active vision experiments, you need to download the Gaussian Splatting libraries and datasets. You can do this by running the following script:
# Install gaussian splatting dependencies
apt-get install python3-opencv # Install OpenCV if not already installed
source .venv/bin/activate # Activate the virtual environment (necessary for uv users)
bash scripts/download_gaussian_splatting_libraries.sh
# Download pretrained models and datasets
cd gaussian_splatting/
wget https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/datasets/pretrained/models.zip
unzip models.zip -d pretrained
wget http://storage.googleapis.com/gresearch/refraw360/360_v2.zip
unzip 360_v2.zip -d data
To run 3D viewpoint experiments, you need to install detectron2 and TRELLIS. You also need to download and install patches for Objaverse (Apache 2.0), TRELLIS (MIT), and OVSeg (CC By-NC). Please refer to Licenses and Third Parties for more information.
conda create -n focal_3d python=3.10
conda activate focal_3d
pip install -r requirements.txt
./scripts/download_3D_libraries.sh
### Apply OVSeg patch manually.
cd third_party_modified;
git clone https://github.com/facebookresearch/ov-seg.git
cd ov-seg;
git checkout 36f49d496714998058d115ffb6172d9d84c59065
git apply ../../patches/cc_by_nc/ovseg_patch.patch
cd ..;
mv ov-seg ovseg;
cd ..;
pip install -Ue third_party_modified/ovseg/third_party/CLIP/.
The OVSeg checkpoint also needs to be downloaded from their Google Drive: https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view. Place it at: ./third_party/ovseg/ovseg_swinbase_vitL14_ft_mpt.pth.
To run experiments on Objaverse, the dataset needs to be downloaded and filtered. To do so, install Blender and start the xserver (if on a headless server) as described in https://github.com/allenai/objaverse-xl/tree/main/scripts/rendering:
# Blender / xserver install
wget https://download.blender.org/release/Blender3.2/blender-3.2.2-linux-x64.tar.xz && \
tar -xf blender-3.2.2-linux-x64.tar.xz && \
rm blender-3.2.2-linux-x64.tar.xz
sudo apt-get install xserver-xorg -y && \
sudo python3 start_x_server.py start
# Filter
python3 -m scripts.viewpoint_3D_process_objaverse
To run experiments on CO3D, the dataset needs to be downloaded and filtered. To download the data, download it from https://ai.meta.com/datasets/co3d-downloads/. Structure should be: datasets/co3d/orig/<class>/<uid> when done. To filter, export an OpenAI key and run the following:
export OPENAI_API_KEY="<your key>"
python3 -m scripts.viewpoint_3D_process_co3d
python3 -m experiments.viewpoint_3D --mode rank --canon_2d_pattern 0 --dataset objaverse # Fig. 5
python3 -m experiments.viewpoint_3D --mode gt_prob --canon_2d_pattern 5 --dataset objaverse # Fig. 12
python3 -m experiments.viewpoint_3D --mode gt_prob --canon_2d_pattern 5 --dataset co3d # Fig. 6
If you find this code useful in your research, please cite this paper:
@inproceedings{
singhal2025testtime,
title={Test-Time Canonicalization by Foundation Models for Robust Perception},
author={Utkarsh Singhal and Ryan Feng and Stella X. Yu and Atul Prakash},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=JMZ7mr19AK}
}This repo has been created and is maintained by
All original code in confs/, experiments/, focal/, scripts/ is MIT-licensed. You may use, modify, and distribute it commercially.
Some functionality requires patches for third-party libraries:
- TRELLIS (MIT) – can be used commercially.
- Objaverse (Apache 2.0) (dataset) – can be used commercially with notices.
- OVSeg (CC BY-NC 4.0) (classifier for 3D viewpoint experiments) – non-commercial only.
Copies of their licenses / notices are included in patches/.
The CC BY-NC patch is located at
patches/cc_by_nc/ovseg_patch.patch.
Do not ship or include this patch in any commercial product.
You may apply it manually for non-commercial purposes only.
