Inspiration
Handicapped people with disabilities on both hands and legs or who are completely paralyzed such as Stephen Hawking or Nick Vujicic hardly live their lives when they have to lie on their bed and survive day after day. The project's incentive is to help these people not to survive but to live happily with their most precious sense - vision.
Eye VR is a VR headset powered by machine learning that help the disabled actively control their activities, such as controlling their own "Eye Wheelchair" without assistance, unlocking their phones or house doors with "Eye ID", using "Eye Exam" to take MCQ examinations without a pencil and many other applications that are API-enabled. Three different applications on all topics are built at the same time to emphasize its universality and scalability.
What it does
0 - (Universal) Eye VR - Window to Everything [Raspberry Pi Camera - VR headset]
The primary component is a VR headset with a camera controlled by Raspberry Pi. As worn on, it constantly records videos of eye's movement and feeds these visual data into a Deep Neural Eye Movement Classifier which output the corresponding gesture as the user points their eye up / down/ left / right / center / blinking.
The system follows a centralized control pattern that enables any client application to make use of the eye-gesture collected for whatever specific purposes.
For the purpose of the hackathon, three applications are developed to fit the themes:
1 - (Space / Environment) Eye Wheelchair - Space Explorer for all [Raspberry pi car]:
The eye - controlled wheelchair allows disabled users to move and explore their surrounding space and area and enjoy life. It is represented as a remote Raspberry Pi car that are controlled through wifi communication with the VR headset. It can also represent an eye-controlled robot car that explores Mars or inhabitable environment. Its main functions are:
@ Blinking: toggle run (forward) / stop the car
@ Look Left: turn left
@ Look Right: turn right
2 - (Security) Eye ID - Breakthrough in Security [Web application]:
While the traditional keypad lock, Android drawing pattern lock, Touch ID and Face ID on iPhones may not work with paralyzed, limbless people or those whose hands are busy, Eye ID allows these users to unlock their phones, open doors, garage with just a glimpse of eye.
Usage: Each gesture (left / down / up / right) is a key for a sequence of gesture-digit password (length 4 or 6) that users have to subsequently perform in front of the camera to unlock their phones, doors, computers, wardrobes...
3 - (Education) Eye Exam - Exams for Everybody [Web application]:
Students with this disability are now able to take Multiple Choice Examinations like normal student with Eye - Exam. This tool enables pupils to choose option A/B/C by perform corresponding eye gestures left / right / up and blink to confirm the answer, switch questions or submit their attempts.
Usage: the demo app contains some super-easy questions to test the tools, just subsequently answer question by looking left / right / up, blink when confirm.
How I built it
Eye VR:
A deep neural network with multiple convolutional layers followed by bidirectional long short - term memory cells (machine learning model) are built and trained on a dataset of image sequences of the eye in 1 second interval. It learns to classify input videos into a pre-defined category (left / right / up / down / center / blinking). The dataset is collected by constantly recording video of eye gestures of user (for now, it's me only), the process is known as calibration. Tensorflow and Keras library help me build this model.
Once the machine learning model are well trained, it is deployed inside the Pi and a Flask server is executed to accept any HTTP request for eye gestures from any client application. Centralized control pattern is utilized here as the camera knows nothing about how its output results are used for. Hence plenty of applications can be built to make use of these features. This vastly enables the scalability of the system.
Eye Wheelchair
As simple as possible, the Pi car just contains a controller which keeps HTTP - requesting Eye VR for eye gesture and match the responded direction to the movement command of the wheels.
Eye ID
Developed by Angular 4, once activated, Eye ID simply keeps requesting for eye gestures and append valid ones to a prospective password. When all 4 "digits" have been recorded, Eye ID will validate and show whether the password is correct or not. Example valid password can be: right - up - right - left
Eye Exam
Developed along with Eye ID by Angular 4, once student starts the exam, A clock will count down and the web page will constantly request for eye gestures. It will select the corresponding option, re-select if the gesture changes and confirm and move to the next question if user blinks.
Once all questions are answered, overall score are easily calculated.
Challenges I ran into
Deep learning model and the lack of data
A successful deep neural network typically requires gigabytes of data to be well - trained. The dataset I have was collected on the spot (< 100Mb) and only on a single eye (mine). So I do not have enough data and data diversity which requires calibrations on many people and time - consuming. This cannot be achieved in 24 hours and the real-time accuracy may not be as close to 100%.
Limited computational power of Raspberry Pi
While Google Translator Neural Engine is trained and deployed on a TeraFlops distributed servers, the Raspberry Pi is well underpowered that it takes an extra of 0.3s of latency for each request, totally 1.3s delay. The Pi also consumes very high current when deploying the machine learning, whereas I do not have a good power bank at this moment so I have to plug it in a laptop to power it instead.
Deploying ML model on an outside powerful server and forcing the camera to upload video and request for gesture result is not a good idea as it causes a lot of latency due to transmission via internet.
Connectivity
For now, all devices are connected and communicate through a hotspot with no access to outside internet (as I use my spare phone as hotspot). This may cause some inconvenience.
Accomplishments that I'm proud of
Successfully built and trained a new neural network. This is the first time I hands on a neural image processing problem as I only worked with language and text processing in the past.
Successfully implemented the 3 separate ideas that share the same mechanism the VR headset system.
What I learned
I leaned a lot more about machine learning, how to deal with time constraint. Learn more about Angular 4 and so on...
What's next for Eye Gesture Control Application
Universality
As stated, many more different applications can be developed regardless of topics and themes. The target audiences are not only limited to paralyzed people but it can also be extended to others who want to do additional tasks while their hands are busy. Some can be mountain climbers, high - construction workers... This offers a great scalability.
2 cameras combined system
2 small camera can be placed to monitor both eyes simultaneously. This vastly increases the number of complicated gestures that can be recorded (such as looking left - blinking right eye, one closes - one open....). Hence it improves the complexity of passcode for Eye ID, more controls for Eye Wheelchair...
Huge datasets
Investment on enlarging the eye datasets is greatly encouraged. Deep learning models can perform much better if the datasets are huge and diverse.
Screen on Eye VR
A small screen can be placed in front of the headset which allows user to see some other controls. Hence, they can actually answer phone call, texting, surf the web with the eye gesture and many more application. This leverages user experience.
Built With
- angular.js
- flask
- javascript
- neural-network
- python
- raspberry-pi
- tensorflow
Log in or sign up for Devpost to join the conversation.