SKIL: Semantic Keypoint Imitation Learning for Generalizable, Data‑efficient Robot Manipulation

Shengjie Wang^1,2,3, Jiacheng You^1,2,3, Yihang Hu¹, Jiongye Li¹, Yang Gao^1,2,3

¹Tsinghua University, ²Shanghai Qi Zhi Institute, ³Shanghai AI Laboratory

🚀 Key Contributions

We propose the Semantic Keypoint Imitation Learning (SKIL) framework, which automatically obtains the semantic keypoints through a vision foundation model, and forms the descriptor of semantic keypoints for downstream policy learning.
- The sparsity of semantic keypoint representations enables data-efficient learning.
- The proposed descriptor of semantic keypoints enhances the policy’s robustness.
- Such semantic representations enable effective learning from cross-embodiment human and robot videos.
SKIL shows a remarkable improvement over previous methods in 6 real-world tasks, by achieving a success rate of 72.8% during testing, offering a 146% increase compared to baselines. SKIL can perform long-horizon tasks such as hanging a towel or cloth on a rack, with as few as 30 demonstrations, where previous methods fail completely.

🧩 Install

Clone the repository

# Clone the repo
git clone https://github.com/your-org/SKIL.git
cd SKIL

Create and activate the conda environment

conda env create -f conda_environment.yml
conda activate skill

install mujoco in ~/.mujoco

cd ~/.mujoco
wget https://github.com/deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz -O mujoco210.tar.gz --no-check-certificate
tar -xvzf mujoco210.tar.gz

and put the following into your bash script (usually in YOUR_HOME_PATH/.bashrc). Remember to source ~/.bashrc to make it work and then open a new terminal.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HOME}/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export MUJOCO_GL=egl

and then install mujoco-py (in the folder of third_party):

cd YOUR_PATH_TO_THIRD_PARTY
cd mujoco-py-2.1.2.14
pip install -e .

Install third-party dependencies (Metaworld)

cd third_party/metaworld
pip install -e .

⚙️ Usage

We illustrate the simulation evaluation pipeline using the Metaworld Hammer task as an example. The full process involves four main steps:

1. Generate Expert Demonstrations

Run the following script to generate expert demonstrations for the Hammer task:

bash scripts/generate_data/generate_metaworld_data.sh

💡 If you're using a different Metaworld environment, modify the task_lst variable inside the script accordingly.

2. One-time Selection of Semantic Keypoints

Navigate to the keypoint generation folder:

cd scripts/generate_kp

There are two ways to annotate keypoints:

Option A: Manual Keypoint Selection

Launch the interactive notebook:

jupyter notebook draw_kp_metaworld_skil.ipynb

Select 10 task-relevant keypoints by clicking on the object in the provided visualization interface.

Option B: Automatic Keypoint Extraction (SAM + KMeans)

If the Segment Anything Model (SAM) is installed, you can automatically generate keypoints via KMeans clustering on extracted object masks:

jupyter notebook draw_kp_metaworld_skil_kmeans.ipynb

3. Preprocess Data into Zarr Format

Convert the raw demonstrations and keypoints into training data:

bash scripts/data2zarr/metaworld/metaworld_skil.sh

⚠️ Make sure to update the task_lst in this script if using tasks other than hammer.

4. Train SKIL Policy

Execute the training script:

bash scripts/train/train_skil.sh

You can modify key training parameters directly in the script:

seed=0 – Random seed
gpu_id=0,6,7 – GPU IDs to use
num_epochs=1000 – Number of training epochs
benchmark="metaworld" – Benchmark name
env="hammer" – Task/environment name

🤖 Real Robot

Our real-world robot experiments are built on top of the hardware setup provided by the DROID project. For data collection and policy evaluation, we closely follow the DROID codebase, adapting and extending its tooling where necessary.

In particular, our policy evaluation is implemented by modifying the policy_wrapper.py script from DROID to wrap around the policy classes defined in the Diffusion Policy framework. This integration enables seamless evaluation of our learned policies on real hardware.

If you encounter any issues or have questions about replicating our evaluation setup, feel free to contact us.

🧾 Acknowledgements

Our codebase builds upon several influential works in the imitation learning and robotic manipulation community. In particular, we reference and adapt components from:

We sincerely thank the authors of these projects for their contributions and open-sourcing their code. Their work has been instrumental in the development of this project.

Contact Shengjie Wang if you have any questions or suggestions.

📚 Citation

@article{wang2025skil,
  title = {SKIL: Semantic Keypoint Imitation Learning for Generalizable Data‑efficient Manipulation},
  author = {Wang, Shengjie and You, Jiacheng and Hu, Yihang and Li, Jiongye and Gao, Yang},
  journal = {arXiv preprint arXiv:2501.14400},
  year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
diffusion_policy		diffusion_policy
media		media
pre_process/metaworld		pre_process/metaworld
scripts		scripts
third_party/metaworld		third_party/metaworld
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_environment.yaml		conda_environment.yaml
eval.py		eval.py
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SKIL: Semantic Keypoint Imitation Learning for Generalizable, Data‑efficient Robot Manipulation

🚀 Key Contributions

🧩 Install

⚙️ Usage

1. Generate Expert Demonstrations

2. One-time Selection of Semantic Keypoints

Option A: Manual Keypoint Selection

Option B: Automatic Keypoint Extraction (SAM + KMeans)

3. Preprocess Data into Zarr Format

4. Train SKIL Policy

🤖 Real Robot

🧾 Acknowledgements

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SKIL: Semantic Keypoint Imitation Learning for Generalizable, Data‑efficient Robot Manipulation

🚀 Key Contributions

🧩 Install

⚙️ Usage

1. Generate Expert Demonstrations

2. One-time Selection of Semantic Keypoints

Option A: Manual Keypoint Selection

Option B: Automatic Keypoint Extraction (SAM + KMeans)

3. Preprocess Data into Zarr Format

4. Train SKIL Policy

🤖 Real Robot

🧾 Acknowledgements

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages