Skip to content

brown-ivl/v-hop

Repository files navigation

V-HOP: Visuo-Haptic 6D Object Pose Tracking

teaser figure

[Paper] [Website] [arXiv]

This is the offical implementation of our paper: V-HOP: Visuo-Haptic 6D Object Pose Tracking, accepted by Robotics: Science and Systems (RSS) 2025.

Contributors

Hongyu Li, Mingxi Jia, Tuluhan Akbulut, Yu Xiang, George Konidaris, and Srinath Sridhar.

Overview

Humans naturally integrate vision and haptics for robust object perception during manipulation. The loss of either modality significantly degrades performance. Inspired by this multisensory integration, prior object pose estimation research has attempted to combine visual and haptic/tactile feedback. Although these works demonstrate improvements in controlled environments or synthetic datasets, they often underperform vision-only approaches in real-world settings due to poor generalization across diverse grippers, sensor layouts, or sim-to-real environments. Furthermore, they typically estimate the object pose for each frame independently, resulting in less coherent tracking over sequences in real-world deployments. To address these limitations, we introduce a novel unified haptic representation that effectively handles multiple gripper embodiments. Building on this representation, we introduce a new visuo-haptic transformer-based object pose tracker that seamlessly integrates visual and haptic input. We validate our framework in our dataset and the Feelsight dataset, demonstrating significant performance improvement on challenging sequences. Notably, our method achieves superior generalization and robustness across novel embodiments, objects, and sensor types (both taxel-based and vision-based tactile sensors). In real-world experiments, we demonstrate that our approach outperforms state-of-the-art visual trackers by a large margin. We further show that we can achieve precise manipulation tasks by incorporating our real-time object tracking result into motion plans, underscoring the advantages of visuo-haptic perception.

Installation

Our code is wrapped in a docker container, therefore you don't need to install any dependencies manually.

Prerequisites

Setup

Clone the repository:

# Clone the repository
git clone https://github.com/brown-ivl/v-hop.git
cd v-hop

You can pull the docker image:

docker pull lhy0807/v-hop:latest

Or build the docker image:

sh docker/build.sh

Set the dataset directory path DATA_DIR in docker/run_container.sh to the path of your dataset. Run the docker container:

sh docker/run.sh

Dataset Preparation

We store our dataset in SquashFS format. Therefore, you need to mount the dataset to your local machine. We mount it using Singularity. An alternative way is to use library like PySquashfsImage to read the dataset. However, it is not officially supported by the authors.

Download Dataset (WIP)

We are still working on the dataset preparation. For the full dataset, ID 0-41 are for training, and ID 42-48 are for validation. We have prepared a small subset of the dataset for you to test the code. You can download the dataset from here.

Preprocessing

To preprocess the dataset, run:

sh preprocess.sh

Usage

Training

python train.py

Inference

Download the checkpoint from here.

python test_subset.py

Project Structure

v-hop/
├── docker/         # Docker configuration
├── config/         # Configuration files
├── dataset/        # Dataset related files
├── networks/       # Model files
├── FoundationPose/ # FoundationPose integration

License

This project is licensed under the Attribution-NonCommercial 4.0 International license.

Citation

If you find this work useful, please consider citing:

@inproceedings{li2025vhop,
    title={V-HOP: Visuo-Haptic 6D Object Pose Tracking}, 
    author={Li, Hongyu and Jia, Mingxi and Akbulut, Tuluhan and Xiang, Yu and Konidaris, George and Sridhar, Srinath},
    booktitle={Proceedings of Robotics: Science and Systems},
    year={2025}
}

Contact

Please contact Hongyu Li (hongyu@brown.edu) for any questions.

Acknowledgments

This work is supported by the National Science Foundation (NSF) under CAREER grant #2143576, grant #2346528, and the Office of Naval Research (ONR) grant #N00014-22-1-259. We thank Ying Wang, Tao Lu, Zekun Li, and Xiaoyan Cong for their valuable discussions. We thank the area chair and the reviewers for providing constructive feedback on improving the quality and clarity of our paper. This research was conducted using computational resources and services at the Center for Computation and Visualization, Brown University.

Our codebase is built on top of the following projects:

  • FoundationPose: We adopt their network and pretrained model.
  • dex-urdf: We adopt their collection of URDF models for generating synthetic data.

We thank the authors for their great work.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors