RoboChaser

DL Day Poster

DevPost Submission Title: RoboChaser Who: Olivia (odheng), Narek (nharutyu), Julian (jdhanda) Github: https://github.com/oliviaaheng/robochaser Introduction: Detecting a unique object using Computer Vision and a Convolutional Neural Network trained to take in an image in real-time with an Intel RealSense camera to output commands to a ground iRobot Create 3 to follow the unique object. This new project is based on the Yolo model (discussed in Related Work), which is classified as type of structured prediction in computer vision since we are predicting structured outputs and class labels for objects in the real-time images. Related Work: The “YOLOv4: Optimal Speed and Accuracy of Object Detection” paper is relevant to our topic beyond the idea we are researching. The paper addresses the need for real-time object detection systems with both speed and accuracy and emphasizes the importance of creating a neural network model that operates efficiently on conventional GPUs. The authors focus on optimizing parallel computations and operational speed rather than solely on theoretical indicators like computation volume. Their contributions include the development of an efficient object detection model that can be trained using standard GPUs, verifying state-of-the-art techniques, and modifying methods to enhance efficiency for single GPU training. They propose architecture selections and modifications, such as incorporating CSPDarknet53 backbone, SPP block, PANet path-aggregation neck, and YOLOv3 head, for the YOLOv4 model, aiming to achieve both high speed and accuracy in object detection tasks. Relevant List: https://pytorch.org/hub/ultralytics_yolov5/ https://arxiv.org/abs/2004.10934 Data: We are going to try to create our dataset. We will collect many images using the real sense camera we will use and do manual labeling. We can use different software to change each of the images we took to have more data to feed into our model. If we use common objects to follow, like a toy car, we will also use publicly available data from Google and label them manually. We will try to train our model with a couple thousand images that we will collect. If the performance is not good, we will increase the dataset more. Methodology: We are going to use CNN architecture with a feed-forward layer. We will code and train the model in a way similar to the way we did the CNN homework earlier using the TensorFlow library and Keras model. We will use existing paper as a guide but will end up implementing our model. The design of the project is feeding an image into the model and the model will give an output that will contain the coordinates of the item we are trying to follow. The output will be sent to an iRobot following the object. The backup plan would be to have the model output the coordinate or the location of the item of interest from the input image. Metrics: We plan on testing that the iRobot follows the target, even when it leaves the field of view. For the model itself, accuracy does apply, as we can measure where it thinks the object is vs where it actually is. In terms of the full project working, this will likely be experimentally tested, where we can see if the iRobot moves accordingly. We will be building a new model and will assess its performance using loss and accuracy metrics. Below defines our base, target, and stretch goals: Base: The robot can follow the unique object (turn to face the correct direction) Target: Can dynamically track and follow Stretch: Can do so at fast speeds, and find the object even if it goes out of the FOV Ethics: Deep Learning is a good approach to this problem since this is a modified image classification problem, where the model will be detecting a pattern from a dynamic image, which is a classic deep learning problem. We will most likely need to build/collect our own dataset and ideally set up a system or make use of existing tools in order to help label this data. For our application, the stakeholders are limited as the potential consequences of the iRobot getting ‘lost’ are very limited. However, if this were to be applied to some sort of search/rescue type mission, for example, where it is trained to follow a person or dog to carry cargo to some spot, the downsides could be very important. We are planning to quantify or measure error as if the iRobot is not able to accurately follow the direction of the unique object and success as if the iRobot is able to follow the direction of the unique object accurately. Division of labor: These are general divisions, but we will all work together on all parts Julian - Collecting and labeling the dataset in an efficient manner Olivia - Building of the convolutional neural network Narek - Integration of the model with the iRobot

Inspiration

Inspired by the CNN models we learned from class.

What it does

This is a CNN model that will take an image as input and output, where the image is the object of interest. This output will be converted into a motor speed and sent to the iRobot, which will follow our object of interest.

How we built it

Making our own model and making our own dataset with manual labeling. Training it with the CNN model, we come up. Hardware-wise, we are hooking up a real sense camera on iRobot. The camera sends a picture to a remote computer, which computes the output speed and returns it to the robot.