V-HOP: Visuo-Haptic 6D Object Pose Tracking

This is the offical implementation of our paper: V-HOP: Visuo-Haptic 6D Object Pose Tracking, accepted by Robotics: Science and Systems (RSS) 2025.

Contributors

Hongyu Li, Mingxi Jia, Tuluhan Akbulut, Yu Xiang, George Konidaris, and Srinath Sridhar.

Overview

Humans naturally integrate vision and haptics for robust object perception during manipulation. The loss of either modality significantly degrades performance. Inspired by this multisensory integration, prior object pose estimation research has attempted to combine visual and haptic/tactile feedback. Although these works demonstrate improvements in controlled environments or synthetic datasets, they often underperform vision-only approaches in real-world settings due to poor generalization across diverse grippers, sensor layouts, or sim-to-real environments. Furthermore, they typically estimate the object pose for each frame independently, resulting in less coherent tracking over sequences in real-world deployments. To address these limitations, we introduce a novel unified haptic representation that effectively handles multiple gripper embodiments. Building on this representation, we introduce a new visuo-haptic transformer-based object pose tracker that seamlessly integrates visual and haptic input. We validate our framework in our dataset and the Feelsight dataset, demonstrating significant performance improvement on challenging sequences. Notably, our method achieves superior generalization and robustness across novel embodiments, objects, and sensor types (both taxel-based and vision-based tactile sensors). In real-world experiments, we demonstrate that our approach outperforms state-of-the-art visual trackers by a large margin. We further show that we can achieve precise manipulation tasks by incorporating our real-time object tracking result into motion plans, underscoring the advantages of visuo-haptic perception.

Installation

Our code is wrapped in a docker container, therefore you don't need to install any dependencies manually.

Prerequisites

Setup

Clone the repository:

# Clone the repository
git clone https://github.com/brown-ivl/v-hop.git
cd v-hop

You can pull the docker image:

docker pull lhy0807/v-hop:latest

Or build the docker image:

sh docker/build.sh

Set the dataset directory path DATA_DIR in docker/run_container.sh to the path of your dataset. Run the docker container:

sh docker/run.sh

Dataset Preparation

We store our dataset in SquashFS format. Therefore, you need to mount the dataset to your local machine. We mount it using Singularity. An alternative way is to use library like PySquashfsImage to read the dataset. However, it is not officially supported by the authors.

Download Dataset (WIP)

We are still working on the dataset preparation. For the full dataset, ID 0-41 are for training, and ID 42-48 are for validation. We have prepared a small subset of the dataset for you to test the code. You can download the dataset from here.

Preprocessing

To preprocess the dataset, run:

sh preprocess.sh

Usage

Training

python train.py

Inference

Download the checkpoint from here.

python test_subset.py

Project Structure

v-hop/
├── docker/         # Docker configuration
├── config/         # Configuration files
├── dataset/        # Dataset related files
├── networks/       # Model files
├── FoundationPose/ # FoundationPose integration

License

This project is licensed under the Attribution-NonCommercial 4.0 International license.

Citation

If you find this work useful, please consider citing:

@inproceedings{li2025vhop,
    title={V-HOP: Visuo-Haptic 6D Object Pose Tracking}, 
    author={Li, Hongyu and Jia, Mingxi and Akbulut, Tuluhan and Xiang, Yu and Konidaris, George and Sridhar, Srinath},
    booktitle={Proceedings of Robotics: Science and Systems},
    year={2025}
}

Contact

Please contact Hongyu Li (hongyu@brown.edu) for any questions.

Acknowledgments

This work is supported by the National Science Foundation (NSF) under CAREER grant #2143576, grant #2346528, and the Office of Naval Research (ONR) grant #N00014-22-1-259. We thank Ying Wang, Tao Lu, Zekun Li, and Xiaoyan Cong for their valuable discussions. We thank the area chair and the reviewers for providing constructive feedback on improving the quality and clarity of our paper. This research was conducted using computational resources and services at the Center for Computation and Visualization, Brown University.

Our codebase is built on top of the following projects:

FoundationPose: We adopt their network and pretrained model.
dex-urdf: We adopt their collection of URDF models for generating synthetic data.

We thank the authors for their great work.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
FoundationPose @ 3519bfa		FoundationPose @ 3519bfa
assets		assets
config		config
dataset		dataset
docker		docker
networks		networks
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
preprocess.sh		preprocess.sh
preprocess_dataset.py		preprocess_dataset.py
test_subset.py		test_subset.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

V-HOP: Visuo-Haptic 6D Object Pose Tracking

Contributors

Overview

Installation

Prerequisites

Setup

Dataset Preparation

Download Dataset (WIP)

Preprocessing

Usage

Training

Inference

Project Structure

License

Citation

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

V-HOP: Visuo-Haptic 6D Object Pose Tracking

Contributors

Overview

Installation

Prerequisites

Setup

Dataset Preparation

Download Dataset (WIP)

Preprocessing

Usage

Training

Inference

Project Structure

License

Citation

Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages