YOLOLLM Team

Inspiration

We decided to work on the optical danger detection for people with visual impairments.

We believe that the combination of YOLO, LLMs and good quality Text To Speech technologies can deliver incredible results and solutions for real life problems.

What it does

The setup contains a Nvidia Jetson Nano with a USB webcam and headphones. A custom python program runs on the device, doing the following:

  • Continuously capture images/frames
  • Process the image frames using YOLO
  • In case an object is detected, it sends the frame to OpenAI Vision to get a simple description of what the frame contains
  • The received text is output using OpenAI Text-To-Speech

How we built it

  • We first discussed about one specific use case. We thought about a blind pedestrian walking on the sidewalk and potential dangereous situations that could come up
  • Based on the provided example script, we added connections to our other services
  • A fastapi service triggers the description and tts creation/play

Challenges we ran into

  • Jetson Nano + Setup issues: General dependency complications with different python environments and libraries.
  • Latency vs. Camera Input Quality: We probably don't use the proper full potential of the board due to lack of direct experience with it. The frames created from the camera are heavily affected by typical factors like lighting, resolution and compression, which can lead to less precise output in various cases.

Accomplishments that we're proud of

Getting our image processing pipeline to analyse the visual input to give feedback & warnings in real time!

What we learned

Aside from having to handle (and therefore learn about how the device works) all complications that arised during the implementation, we learned a lot about how much we can achieve with existing technologies. It is already possible to quickly come up with a solution for every-day problems. Within the 36hours we managed to create a working PoC of optical danger detection for people with visual impairments.

What's next for YoloLLM

Our little project can definitely be optimized to run smoother and react with more urgency in case of upcoming emergencies. The continous improvement of the hardware (and the optimization of the services used) will make the current setup way smaller and more accessible to everybody in the future. With the addition of dedicated processing units for AI a regular smartphone could soon be used instead of the Jetson Nano.

Built With

Share this project:

Updates