HorusAI

HorusAI Logo
HorusAI Home Screen
HorusAI Object Detection

The Inspiration

20 million Americans are visually impaired. 1 million of them are legally blind. Vision has and always has been a significant health issue in our world, and it will likely stay that way in the near future. Visual impairment hinders daily function and self-sufficiency. The pervasive difficulty for these individuals to interact with their environment is unacceptable given the capabilities of today’s technology. When building our project, we asked ourselves a question: how can computers see the world for us? The topic of computer vision immediately came to mind - an emerging and exciting field with no shortage of innovative models and solutions. We began brainstorming how we could utilize this field to our advantage: would it be possible to use a language model to make the interaction between an individual’s desired actions and their “robotic eyes” seamless? This was the driving force behind HorusAI. We wanted our app to be as accessible to impaired users as possible - requiring as little effort as possible from our users.

What HorusAI Does

HorusAI helps the visually impaired locate objects by acting as their eyes. Enabled by the push of a (very large and visible) button, Horus listens to a user’s requests and discerns the object they want to identify. After giving input, the user can use a handheld camera/phone to scan the room they’re in. Horus will ping the user once their device points in the direction of the object. Currently, HorusAI is capable of detecting over 80 unique common objects.

For those who can and are willing to type/read, HorusAI also features a text-based input system and an object recommendation system. Users can see and request identification for past identified objects.

How We Built It

As freshmen relatively new to the world of hackathons and full-stack development, we opted to use NYU track/sponsor Streamlit’s app framework to deploy our app. Streamlit enables fast and clean development of applications, which was important to us under hackathon conditions. It also worked for our UI goals - we wanted to create a simple UI for a visually impaired demographic who might find otherwise complex UI challenging to use.

To take in user input, we used OpenAI’s Whisper to convert a user’s commands to text, and then used OpenAI’s GPT-3 to first identify the user’s object of interest, and then find the closest matching object in HorusAI’s recognition capabilities.

We built HorusAI’s real-time computer vision capabilities using YOLOv8, a trainable computer vision model architecture for detection, classification, and segmentation. Video capture was powered by OpenCV. Alongside the pretrained model, we expanded YOLOv8’s identification capabilities by retraining new models on data with our desired labels.

Our recommendation system was built using MongoDB Atlas. We hope it will be of use to users who find/need certain objects frequently.

Challenges We Ran Into

We quickly realized that mobile app development would be extremely challenging in the timeframe of the hackathon due to iOS limitations on continuous camera streaming. To mediate this challenge, we focused our efforts on making an intuitive webapp in the meantime. We also quickly realized that, given the current capabilities of YOLOv8, transfer learning would be near to impossible, and retraining a pretrained model would take countless hours. Improvising around this fact, we decided to create separate models for different object classes, and used OpenAI’s API to decide which model to use given the user input.

Accomplishments We’re Proud Of

As a team of two freshmen and first-time college hackathon participants, we’re proud of the amount of progress we were able to make in this (relatively) short time frame. Our experiences taught us that with the right drive/determination, almost anything can be accomplished. It was incredibly rewarding to learn about and embrace new technologies to create amazing things, and we were glad we were able to coordinate and work together so effectively on this project.

What We Learned

Before the project, both of us were relatively unfamiliar with machine learning algorithms and web development. Throughout our project’s development, we quickly gained basic familiarity with the world of computer vision and learned how to collaborate together on a large-scale project. We also learned how to take advantage of API’s for the sake of efficiency and usability - the API’s we used in our project were immensely helpful and crucial for its success.

What’s next for HorusAI

As mentioned earlier, we hope to focus on mobile development for HorusAI. Currently, HorusAI works in conjunction with a handheld camera/phone but requires a laptop to run. We’re hopeful that we can navigate through the kinks and intricacies of iOS development. We also hope to expand the range of classes HorusAI can identify in the future. This is mainly limited by space and time constraints, which is not the worst problem to have. We also plan on enabling multimodal input/output from HorusAI. We ran into multithreading issues with the camera/microphone during project development and we hope to resolve this issue soon, instead of just sidestepping it.