Inspiration
Despite great innovation in the world as a whole, our team was frustrated with stagnant development of tools for the visually impaired and blind community. With the average afflicted individual only possessing access to common, rudimentary, outdated, assistive supports. Such as walking sticks, braille on signs, tactile paving, etc. This lack of development, along with personal connections with families who do suffer from impaired vision, drove us to harness modern AI technologies, and build a comprehensive digital tool that supports them in every way possible, day in and day out.
What it does
Proximus.ai is a tool which uses a main camera to actively analyze a user's environment to provide concerning, necessary, and queried information. It is able to detect oncoming danger such as another person walking towards the user, threats such as cars, buses, and motorcycles on roads, differentiate between crosswalk lights, and then relay natural, speaking feedback to the user to instruct them appropriately. The system is also built to analyze things such as signs, and text from a user's surroundings when prompted to help someone find places like restaurants, exits, stores, room names/numbers and more, when having difficulty navigating through internally in buildings.
How we built it
Powered by a very diverse comprehensive tech stack, Proximus.ai is built on:
- The Gemini API for LLM processing, managing user queries and contextual environment analyzation.
- Whisper by OpenAI, API, for managing speech to text conversion, allowing users to communicate with their assistant
- YOLOv8n, AI framework, for object detection and differentiation, classifying warning types.
- OpenCV for image, and live video feed management
- paddleOCR framework for deciphering text from images, from all user orientations relaying environment context and text.
- ElevenLabs for user communication, natural, comforting support and direction for the visually impaired. Converting computed decision processes, warnings, and more into understandable, supportive speech.
- NumPy, pandas, Matplotlib for general computation, data presentation and more.
Challenges we ran into
Managing and optimizing model processing speed. When working with very complex frameworks, LLM's and intricate APIs, heavy memory, CPU, and GPU usage was created, caused very slow run times and poor fps, when unoptimized. However we worked hard to improve implementations over the weekend, to reach faster speeds, create model adjustment features that alter based on hardware. Also selecting/researching the best models/frameworks to use for our purpose posed challenges along the way.
Accomplishments that we're proud of
Completing full development of a working model and AI system! At the beginning we definitely thought this would be a challenging task which was outside our current scope of knowledge/experience base, but we were glad to really pull together a meaningful model which makes use of active computer vision AI systems.
We also really loved that we were able to use ElevenLabs for such a meaningful purpose, and use the power of quality AI voices/conversational abilities to make a model which users can truly connect to, and communicate with.
What we learned
Gained exposure to a lot of core technical foundations in the field of AI. This includes:
- Working with mainstream AI, APIs such as the Gemini API, and ElevenLabs API, and understanding how to prompt them, integrate them and more.
- How to implement deep learning frameworks, image processing, SLAM and spatial reasoning systems, and more.
- How to work with speech to text, and text to speech AI processes!
What's next for Proximus.ai
Vision for this product is that it becomes (in theory) a localized implementation into full scale products like the RayBan Meta glasses. As a AI companion on those glasses, all hardware requirements would be satisfied, as there are camaras for visual input and speakers for audio. It would also be a comfortable wear for users.
Additionally for regular improved features we want to add in extra navigation systems backed by memory, recall and storage, so our model can outline previous landmarks, locations and more when queried.
Log in or sign up for Devpost to join the conversation.