Skip to content

jishnujayakumar/robokit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 RoboKit

robokit-banner

Strictly for academic and non-commercial use

📚 Citations

If you use Robokit itself or its upstream components in academic work, please cite the toolkit and any relevant works. BibTex for Robokit:

@misc{p2024robokit,
  title  = {Robokit: A toolkit for robotic tasks},
  author = {Jishnu Jaykumar P},
  year   = {2024},
  note   = {\url{https://github.com/jishnujayakumar/robokit}},
}

🚀 Projects Using RoboKit

Chronologically listed (latest first):

  • Perception: MRVG
  • Mobile Manipulation: HRT1
  • Interactive Robot Teaching: iTeach
  • Robot Exploration and Navigation: AutoX-SemMap
  • Perception Research: NIDS-Net
  • Grasp Trajectory Optimization: GTO

✨ Features

  • Docker Support

  • Zero-Shot Capabilities

    • 🔍 CLIP-based classification
    • 🎯 Text-to-BBox: GroundingDINO
    • 🧼 BBox-to-Mask: Segment Anything (MobileSAM)
    • 📏 Image-to-Depth: Depth Anything
    • 🔼 Feature Upsampling: FeatUp
    • 🚪 DoorHandle Detection: iTeach–DHYOLO (demo)
    • 📽️ Mask Propagation for Videos: SegmentAnythingV2 (SAMv2)
      • Input: jpg or mp4
      • Supports:
        • Point/BBox prompts across video frames
        • Multi-object point collection
      • Tip: Use jpgs for frame-wise prediction; skip conversion for single images
      • Note that SAMv2 only supports mp4 or jpg files as of 11/06/2024
      • If you have an mp4 file then extract individual frames as jpg and store in a directory
      • For single image mask predictions, no need to convert to jpg.

⚙️ Getting Started

🧰 Prerequisites

  • Python 3.7 or higher (tested 3.9.18)
  • torch (tested 2.0)
  • torchvision
  • pytorch-cuda=11.8 (tested)
  • SAMv2 requires py>=3.10.0 (here the installation has been tweaked to remove this constraint)

🛠️ Installation

RoboKit relies on upstream git repositories (GroundingDINO, MobileSAM, FeatUp, etc.), so the supported workflow is to clone the repository and install from source.

git clone https://github.com/IRVLUTD/robokit.git
cd robokit
pip install -e .
# or configure extras via robokit.install.cfg and rerun `pip install -e .`
# or selectively: pip install -e '.[gdino,sam2]'

To use the config-based workflow, edit robokit.install.cfg and set the comma-separated extras you want:

[extras]
include = gdino, sam2

Running pip install -e . now installs the listed extras automatically—handy for team-wide setups without long pip commands.

Available extras:

Extra Includes
gdino GroundingDINO + CLIP
sam MobileSAM
sam2 SAM v2 + Hydra
depthany Depth Anything (Transformers)
dhyolo Ultralytics + DHYOLO toolkit
featup FeatUp (requires CUDA toolkit and CUDA_HOME)
all Installs every extra supported on your environment

Fetching checkpoints

The default install skips large asset downloads so it works in offline/CI environments. When you need RoboKit to fetch and stage pretrained checkpoints (GDINO, MobileSAM, DHYOLO, SAMv2), rerun the install with:

ROBOTKIT_ENABLE_FILEFETCH=1 pip install -e .

Checkpoints (GDINO, MobileSAM, DHYOLO, SAMv2) are downloaded when ROBOTKIT_ENABLE_FILEFETCH=1 is set before installation (e.g., ROBOTKIT_ENABLE_FILEFETCH=1 python setup.py install). Without it, installations skip the heavy downloads, and you can trigger them manually later when needed. If a CUDA toolkit is not detected, the FeatUp install is skipped with a warning.

🧩 Known Installation Issues

NameError: name '_C' is not defined
  • For SAMv2, ModuleNotFoundError: No module named 'omegaconf.vendor'
pip install --upgrade --force-reinstall hydra-core

🧪 Usage

🛣️ Roadmap

Planned improvements:

  • Config-based pretrained checkpoint switching
  • ✨ More features coming soon...

🙏 Acknowledgments

This project is based on the following repositories (license check mandatory):

Special thanks to Dr. Yu Xiang, Sai Haneesh Allu, and Itay Kadosh for their early feedback.

📜 License

This project is licensed under the MIT License. However, before using this tool please check the respective works for specific licenses.

About

A toolkit for robotic tasks (strictly for research purposes)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages