This repository contains a project in Deep Learn created by @MatanTopel and myself.
We seperated this project into two parts:
- A CNN which locates a hand of a diver - using YOLOv5s object detection architecture (for more information visit https://github.com/ultralytics/yolov5).
- A CNN which classifies the gesture of the hand from the cropped image - using our own architecture.
After training, the full network got 97.85% accuracy on the test set.
Here is a link to a video about our project (It's in low res, we are working on that): https://youtu.be/XiyP-1jyPso
- CADDY is a project focused on developing a robot that communicates with a diver and preforms tasks.
- CADDIAN is the sign language the diver uses to communicate with the robot.
- One of the challenges of CADDY is Interpreting the hand gestures of the diver from a big unclear picture.
- CNN is ideal for localization and for classifying images – translating CADDIAN to English!
Project site can be found here:
http://www.caddy-fp7.eu/

creating a high accuracy CNN classifier of a diver’s gestures from CADDIAN, using stereo images taken underwater in 8 different water conditions:
In this notebook, we will explain how we implemented the network. This is what we are trying to achieve:
- Localization - YOLOv5s
- classification - Our own CNN.
There is no published articles on YOLOv5, so we will show the architecture of YOLOv4[1], because it has many similarities to YOLOv5.
Results on the testset after training:
Our CNN architecture is conventional. The network is built from 3 connected Conv blocks which increase the number of channels and decrease the size of each channel, and at the end are connected to three Fully Connected layers, and a Softmax for classification after that. We use RelU activations and Adam optimizer.
Results on the testset after training (images already resized and normalizied):
| File name | Purpsoe |
|---|---|
(1)CADDYProjectDL.ipynb |
main program for training and merging both networks |
(2)TestCADDYClassifier.ipynb |
main program for testing the trained end-to-end network |
(3)caddy_loc.yaml |
file containing the directories of the train/valid/test for the YOLOv5 and the num of classes (1 in our case) |
(4)best.pt |
weights of the trained YOLOv5s on our dataset |
(5)caddy_cnn_ckpt1.pth |
weights of our trained CNN on our dataset |
(6)testset_results |
folder that contains the results of our traind networks on the testset |
(7)Underwater Hand Gesture Localization and Classification Using YOLOv5 and Other CNNs.pdf |
report of our project |
To use this project, you will need to download files and uplode them to a designated folder in your google drive named: 'CADDY_stereo_gesture_data'.
If you want to use the CADDY dataset, uplode the full CADDY dataset (Complete dataset – 2.5GB, zipped) to the designated folder from here:http://www.caddian.eu//CADDY-Underwater-Gestures-Dataset.html
To use our trained end-to-end network, you will also need to download files (3)-(5) from this repo, and uplode them to the designated folder.
Then you can use our example of running our fully trained network in google colab here:
To train our end-to-end network, you will also need to download file (3) from this repo, and uplode them to the designated folder.
After that, click here to use our full project in google colab:
[1] Alexey Bochkovskiy, Chien-Yao Wang, and HongYuan Mark Liao. “YOLOv4: Optimal speed and accuracy of object detection”. arXiv preprint arXiv:2004.10934, 2020.







