What it is

Dynamic ROI Object Recognition with Hand Pointing & YOLOv8.

What it does

This project demonstrates an interactive system that uses hand tracking to detect where a user points and then applies object detection on a dynamically defined Region of Interest (ROI). The detected object's label is then announced via text-to-speech.

How it works

  1. Hand Tracking:

The webcam feed is processed using MediaPipe to extract hand landmarks. The positions of the wrist and index fingertip are used to compute a pointing vector.

  1. Dynamic ROI Calculation:

The pointing vector is normalized and scaled (using a tunable scale parameter) to determine the center of the ROI. A fixed-size ROI (defined by roi_size) is centered at this computed point. The code ensures that the ROI remains within the frame boundaries.

  1. Object Detection with YOLOv8:

Every few frames (controlled by a frame counter), the ROI is passed to the YOLOv8 model. The model returns detections (bounding boxes and class labels) within the ROI. The highest-confidence detection is selected, and its label is adjusted to frame coordinates for visualization.

  1. Text-to-Speech:

If a new object is detected (different from the last announced object), the system uses pyttsx3 to announce the object label. This prevents repetitive announcements and keeps the feedback dynamic.

  1. Visualization:

The application displays hand landmarks, the pointing vector, the ROI rectangle, detection bounding boxes, and labels on the output frame. An arrow is drawn to show the pointing direction.

How we built it

The system leverages:

  • MediaPipe Hands for real-time hand landmark detection.
  • YOLOv8 for object detection.
  • pyttsx3 for text-to-speech functionality.
  • OpenCV for image processing and visualization.

Dependencies:

You can install the required packages using pip:

pip install opencv-python mediapipe numpy ultralytics pyttsx3

Challenges we ran into

  • Linking the back-end to a separate device
  • Recognition accuracy
  • System overload causing camera to lag

Built With

Share this project:

Updates