Inspire

Official implementation of the paper "InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning".

Note: We are doing our best to improve this work. If you have any questions or suggestions, please feel free to create an issue in this repo or contact us at shihan.wu.koorye@outlook.com.

[Project] [ArXiv] [PDF] [Inspire-FAST]

News

🔥Sep 29, 2025: CALVIN evaluation experiment results are now available.
🔥May 23, 2025: Our paper has been updated for better clarity and readability. The optimized version is now available on arXiv.
🔥May 21, 2025: The code is released and the paper is now available on arXiv.

Introduction

Abstract Leveraging pretrained Vision-Language Models (VLMs) to map language instruction and visual observations to raw low-level actions, Vision-Language-Action models (VLAs) hold great promise for achieving general-purpose robotic systems. Despite their advancements, existing VLAs tend to spuriously correlate task-irrelevant visual features with actions, limiting their generalization capacity beyond the training data. To address this challenge, we propose Intrinsic Spatial Reasoning (InSpire), which mitigates the adverse effects of spurious correlations by boosting the spatial reasoning ability of VLAs. Specifically, InSpire redirects the model's attention to task-relevant visual clues by simply appending the question “In which direction is the [object] relative to the robot” before the language instruction and aligning the VLA's answer “right / left / up / down / front / back / grasp” and predicted actions with the ground-truth. Notably, InSpire can be employed as a plugin to enhance existing autoregressive VLAs, requiring no extra data or interaction with other large models. Extensive experimental results in both simulation and real-world environments demonstrate the effectiveness and flexibility of our approach.

Experiments

Overall Performance

Real-world Environments

LIBERO Simulated Environments

CALVIN Simulated Environments

Attention Maps

Videos

Real-world Environments

Note: The real-world experiment was conducted on the $\pi_0$-FAST model, and the relevant code is available at Inspire-FAST.

Seen Tasks

$\pi_0$-FAST: Cookies Towel	$\pi_0$-FAST: Left Bowl on Middle Bowl	$\pi_0$-FAST: Blue Cup Plate	$\pi_0$-FAST: Pull Bottom Plate

InSpire: Cookies Towel	InSpire: Left Bowl on Middle Bowl	InSpire: Blue Cup Plate	InSpire: Pull Bottom Plate

Unseen Tasks

$\pi_0$-FAST: Pick Orange	$\pi_0$-FAST: Banana Towel	$\pi_0$-FAST: Ball Book	$\pi_0$-FAST: Orange Cup Plate

InSpire: Pick Orange	InSpire: Banana Towel	InSpire: Ball Book	InSpire: Orange Cup Plate

Simulated Environments

Seen Tasks

miniVLA: Butter Drawer	miniVLA: Moka Stove	miniVLA: Sauce Tray	miniVLA: Book Caddy

InSpire: Butter Drawer	InSpire: Moka Stove	InSpire: Sauce Tray	InSpire: Book Caddy

Unseen Tasks

miniVLA: Bowl Plate	miniVLA: Cheese Basket	miniVLA: Bowl Plate	miniVLA: Book Caddy

InSpire: Bowl Plate	InSpire: Cheese Basket	InSpire: Bowl Plate	InSpire: Book Caddy

CALVIN Simulated Environments

ABC -> D

miniVLA	InSpire

Models Checkpoints

Model	Dataset	Checkpoint
MiniVLA	Libero90	Download
InspireVLA	Libero90	Download
InspireVLA	Libero10+Goal+Object+Spatial	Download

Installation

Clone the repository.

git clone https://github.com/Koorye/Inspire.git

Install the dependencies.

conda create -n inspire python=3.10
conda activate inspire

cd LIBERO
pip install -r requirements.txt
pip install -e .
cd ..

cd vq_bet_official
pip install -r requirements.txt
pip install -e .
cd ..

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements-min.txt

# (Optional) for Flash Attention
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Evaluation with Pretrained Checkpoints

Create .hf_token file in the root directory and add your Hugging Face token.

echo "your_huggingface_token" > .hf_token

Download pretrained checkpoints.

bash scripts/download_pretrained_weights.sh

Run the evaluation script.

bash vla_scripts/eval/eval_baseline_libero90.sh
bash vla_scripts/eval/eval_inspire_libero90.sh

Training Your Own Checkpoints

Prepare the dataset.

See Dataset Preparation.

Run the training script.

bash vla_scripts/train/train_baseline_libero90.sh
bash vla_scripts/train/train_inspire_libero90.sh

Run the evaluation script.

bash vla_scripts/eval/eval_baseline_libero90.sh
bash vla_scripts/eval/eval_inspire_libero90.sh

Acknowledgements

Our work is built upon the following open-source projects: CALVIN, LIBERO, miniVLA, Pi-0. We thank the authors for releasing their code. If you use our model and code, please consider citing these works as well.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
LIBERO		LIBERO
examples		examples
experiments/robot		experiments/robot
prismatic		prismatic
rlds_dataset_builder		rlds_dataset_builder
scripts		scripts
utils		utils
vla_scripts		vla_scripts
vq_bet_official		vq_bet_official
.gitignore		.gitignore
DATASET.md		DATASET.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements-min.txt		requirements-min.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inspire

News

Introduction

Experiments

Overall Performance

Attention Maps

Videos

Real-world Environments

Simulated Environments

CALVIN Simulated Environments

Models Checkpoints

Installation

Evaluation with Pretrained Checkpoints

Training Your Own Checkpoints

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Koorye/Inspire

Folders and files

Latest commit

History

Repository files navigation

Inspire

News

Introduction

Experiments

Overall Performance

Attention Maps

Videos

Real-world Environments

Simulated Environments

CALVIN Simulated Environments

Models Checkpoints

Installation

Evaluation with Pretrained Checkpoints

Training Your Own Checkpoints

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages