AdapTac-Dex: Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation
Authors:
Jinzhou Li* ·
Tianhao Wu* ·
Jiyao Zhang** ·
Zeyuan Chen**
Haotian Jin ·
Mingdong Wu ·
Yujun Shen ·
Yaodong Yang ·
Hao Dong†
Paper | Website | Video | Hardware | Teleoperation | Data
Effectively utilizing multi-sensory data is important for robots to generalize across diverse tasks. However, the heterogeneous nature of these modalities makes fusion challenging. Existing methods propose strategies to obtain comprehensively fused features but often ignore the fact that each modality requires different levels of attention at different manipulation stages. To address this, we propose a force-guided attention fusion module that adaptively adjusts the weights of visual and tactile features without human labeling. We also introduce a self-supervised future force prediction auxiliary task to reinforce the tactile modality, improve data imbalance, and encourage proper adjustment. Our method achieves an average success rate of 93% across three fine-grained, contact-rich tasks in real-world experiments. Further analysis shows that our policy appropriately adjusts attention to each modality at different manipulation stages.
-
Add the hardware setup, here
-
Released the pre-trained code and checkpoint
- We recommend training your own model using the provided code.
-
Release the init code
-
Release the dataset
-
Further clean up the code
-
Clean Teleopeartion Code
-
Release the tutorial
-
Clone the repository with submodules:
git clone --recursive https://github.com/kingchou007/adaptac-dex.git cd adaptac-dex -
Install dependencies (see Installation section below)
-
Generate training dataset: Configure parameters in
scripts/generate_data.shand run:bash scripts/generate_data.sh
-
Train your policy: Modify configs in
src/adaptac/configs/tasks/and run:bash scripts/command_train.sh
-
Evaluate your policy: Run:
bash scripts/command_eval.sh
For detailed instructions, see the Training Tutorial section.
- OS: Ubuntu 20.04 (tested) or compatible Linux distribution
- CUDA: 11.8 (recommended to avoid compatibility issues)
- Python: 3.8
- Conda: Anaconda or Miniconda
- Git: For cloning repositories
Clone the repository with submodules:
git clone --recursive https://github.com/kingchou007/adaptac-dex.git
cd adaptac-dexIf you've already cloned the repository without submodules, initialize them:
git submodule update --init --recursivePlease follow the instructions to install the conda environments and the dependencies of the codebase. We recommend using CUDA 11.8 during installations to avoid compatibility issues. If you are using 5090 GPU, it might has some issue with MinkowskiEngine, maybe try to
-
Create a new conda environment and activate the environment.
conda create -n adaptac python=3.8 conda activate adaptac
-
Install necessary dependencies.
conda install cudatoolkit=11.8 pip install -r requirements.txt
-
Install MinkowskiEngine manually following the official installation instructions.
Note: MinkowskiEngine is included as a Git submodule. If you cloned with
--recursive, it should already be available. Otherwise, initialize submodules first (see Clone Repository section).cd dependencies conda install openblas-devel -c anaconda export CUDA_HOME=/usr/local/cuda-11.8 git clone https://github.com/NVIDIA/MinkowskiEngine.git cd MinkowskiEngine python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas cd ../..
-
Install Pytorch3D manually.
Note: Pytorch3D is included as a Git submodule. If you cloned with
--recursive, it should already be available. Otherwise, initialize submodules first (see Clone Repository section).cd dependencies/pytorch3d pip install -e . cd ../..
-
(Optional) If you'd like to visualize point clouds in service, install the visualizer package:
# Install Plotly and Kaleido for point cloud visualization pip install kaleido plotly cd dependencies cd visualizer && pip install -e . cd ../..
- Flexiv Rizon 4 Robotic Arm
- Leap Hand: Please refer to the LEAP Hand API repository for installation and setup instructions
- Intel RealSense RGB-D Camera (D415/D435/L515)
- Paxini Tactile sensor
- Ubuntu 20.04 (tested) with previous environment installed
- If you are using Intel RealSense RGB-D camera, install the python wrapper
pyrealsense2oflibrealsenseaccording to the official installation instructions.
Before running the script, please configure these key parameters in scripts/generate_data.sh:
data_dir: Directory for raw datasetoutput_dir: Directory for processed datasetframe: Transform data in 'camera' or 'base' frametactile_rep_type: Type of tactile representation
Then run:
bash scripts/generate_data.shModify the configuration file in src/adaptac/configs/tasks directory, then launch the training script:
bash scripts/command_train.shRun evaluation:
bash scripts/command_eval.sh| Task | Script |
|---|---|
| Dataset Generation | scripts/generate_data.sh |
| Training | scripts/command_train.sh |
| Evaluation | scripts/command_eval.sh |
- Ensure all paths are correct in the scripts before running them
- Make sure the robot is reset to the initial position before starting evaluation
- Check that all hardware devices are properly connected and configured
- Verify USB port assignments match your hardware setup
Submodules not initialized:
- If dependencies are missing, make sure you cloned with
--recursiveflag - Or run:
git submodule update --init --recursive
Submodule out of sync:
- Update submodules to the latest commit:
git submodule update --remote
We assign USB 0 for tactile sensor 1, USB 1 for tactile sensor 2, and USB 2 for the hand. Please make sure the USB port is correct.
Check USB ports:
ls /dev/ttyUSB*Grant permissions to USB ports:
sudo chmod 777 /dev/ttyUSB* # Replace * with the specific USB port numberIf you have any questions, please contact Jinzhou Li or Tianhao Wu.
@article{li2025adaptive,
title={Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation},
author={Li, Jinzhou and Wu, Tianhao and Zhang, Jiyao and Chen, Zeyuan and Jin, Haotian and Wu, Mingdong and Shen, Yujun and Yang, Yaodong and Dong, Hao},
journal={arXiv preprint arXiv:2505.13982},
year={2025}
}We acknowledge the RISE/FoAR authors for their open-source codebase. If you find our work beneficial, please consider citing us.
This repository is released under the MIT License. See the LICENSE file for more details.
