GuardianCruise

Logo
LLaVA crash detection
Network Flow of The Project
LOGO

Inspiration

Do you remember your first time driving? Sitting in the car all nervous to press the gas, but having your parent next to you to urge you forward? For many, this feeling of safety is what they would call nostalgia, and the voice of their parent guiding them on the road brings a sense of comfort and security. Going with the theme of UofTHacks11, we wanted to embody this feeling of nostalgia through our very own driver support interface, using the support of your family to aid you on your adventure.

Technology requirements

Currently, it is run as a simulation of the car through 3 laptops, using cameras, microphones, and a monitor to display a driver dashboard. We also implement an SMS component that requires an active phone number.

What it does

Currently it is run as a simulation of the car through 3 laptops, using cameras, microphones, and a monitor to display a driver dashboard. We also implement an SMS component which requires an active phone number.

What it does

There are three parties by which this program would be accessed: The Driver will be receiving moral support through the chatting interface of the program. The parent, who will be submitting voice samples for the simulation, will receive real-time alerts if the car falls into distress. And a security provider, who will receive logs of the situation and be prepared to make a call to emergency services in case of an accident.

There are two main segments that we are simulating, the driving scenario, and the crash scenario:

The program simulates the flow of going on a drive, beginning within the car. Firstly, we run various ML algorithms to detect if the driver is in a state fit to drive. We use eye tracking technology through OpenCV to detect if the driver is in a drowsy state, and active voice detection and processing to listen for concerning statements about alcohol or drugs. The driver’s speech is constantly sent to Cohere’s Classify endpoint to check for concerning statements, and if they are detected, we begin Plan 101 “Lecture the Child”.

We use voice Deepfaking technology to clone a parent’s voice from a sample and use prompt engineering to create a Cohere Chat tool that emulates the user’s parent. The message that Cohere returns is then processed into the parent's voice, and saved as a .wav file to be played.

But, we never know what might happen, and despite our efforts, a driver may still crash. For that reason, we want to make sure that emergency services are quick and ready to head to action. We utilize Cohere’s Chat Endpoint alongside Twilio’s SMS services to inform the Driver’s emergency contact of the situation, allowing them to ask questions about what happened, and get a status update on the driver’s situation.

We further share this information with a security company, who upon receiving notification that a driver may be drunk/overly sleepy, will be prepared to call emergency services even before the crash occurs. We further share all information with the security company to ensure the accuracy of the information, including live video, audio clips of the driver talking, and a description of what appears to have been the damage.

How we built it

Front end: QT and C++

Real time Behavioural Analysis: OpenCV and Tensorflow

Voice Transcription and Vocal Behavioral Analysis: python speech recognition library for transcription and cohere for voice analysis

Parental Deep Fake Alerts: Elevenlabs

Security Company Accurate report: LLaVA (Large Language and Visions Assistant) and Socket

Challenges we ran into

One of the significant challenges we faced was fine-tuning the machine learning models to accurately detect signs of drowsiness.

Due to having such a versatile technical stack we underestimated the time it would take all of us t integrate.

What we learned

Learned about a lot of LLaVAs and their future as they are coming up as a competitor to chatGPT. We also learned about the importance of good API documentation as we came across so many undocumented APIs that we could not use. As a solution to the integration issues, we learned about sockets which deepened our knowledge of networking.

What's next for GuardianCruise

Though accidental, the network framework is the beauty of this project. We are catering to 3 different user bases seamlessly. Moving forward, we plan on making this networking model more scalable with a higher user base of drivers, families, and security providers.

Built With

c++
deepfake
elevenlabs
google-maps
javascript
llava
machine-learning
openai
opencv
python
pytorch
qml
qt
sockets
speechrecognition
twilio

Updates

Karanvir Khanna started this project — Jan 28, 2024 07:21 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.