hyacinthe

PITCH DECK: https://1drv.ms/p/c/910efb1556a62246/EdCPizFwUuhAjmu0302RsxcB9kRfAICIkXQ8ryzqZNgzxg?e=m8niYu

Signs are an essential wayfinding tool to navigate around buildings. They tell us where to go and how to get there. But what about those who are visually impaired and can not read signs? That is where hyacinthe comes in.

Users provide audio input on their destination, and as they walk, a camera continuously scans their surroundings for signs. Hyacinthe processes this visual data, interpreting the information on the signs to determine the best possible route. It then provides real-time, adaptive guidance, ensuring the user navigates efficiently and accurately based on the available signage.

hyacinthe does not require access to building floor plans - rather, it uses human-style reasoning to predict the best path given the environmental context.

How we built it

Stream video input from user. Trained a YOLO11 model on indoor signage data to detect when signage is present - then isolate it to a smaller file size and perform an OCR to extract sign contents.

Stream audio input from user about what they are looking for. Use faster-whisper library to perform speech recognition and convert user speech to prompts.

Path finding decision making model applies common human analysis to sign contents - e.g., if the room numbers are increasing but the destination has a lower room number, it's time to turn around.

Output audio instructions to user using built-in MacOS "say" function.

Challenges we ran into

One of the objectives was to keep costs to a minimum - using open source data wherever possible, minimizing API calls, etc. At the same time, we wanted to ensure good performance and speed of the tool. These two things came into conflict at some points, where the free-access options were simply non functional. To work around this, we focused on implementing many smaller changes to free options - although not significant as individual units, they added up to form significant enhancements that allowed the fuffilment of both criterion.

Another challenge was the lack of computing power available to us on laptops - far below the amount needed to run complex OCR, TTS/STT, and tracking models. To resolve this, we used lower compute power options, combined with creative workarounds. For example, reducing the rate at which frames were sent to for extraction in the OCR.

Accomplishments that we're proud of

Collected our own dataset (hundreds of pictures), as there were not readily available image databases for indoor signage.

Despite challenges faced with libraries (especially OCR) we got it to work in the end, after countless hours of trial and error.

Also the fact that this is a product that is not only a commercial product, but will also deliver tangible benefits to people.

What we learned

The extensive need to pre-process and clean up images before OCR can work to extract text out of them. Also the TTS/STT workflows.

What's next for hyacinthe

The key when developing a product like this is to embed user-friendliness at every step - and that means extensive consultations with those who are visually impaired. Understanding their concerns will ultimately drive where the project goes next.

A few initial ideas we had, as starting points for further consultation:

In the short term, to offer better functionality hyacinthe will need to detect doors, stairwells, and corners. This will better enable it to offer accurate navigation instructions. In the medium term, integration with a mapping tool will enable live mapping of the building as the person walks through - making it easier for them and subsequent users, to find their way. In the long term, the goal is to develop a (or partner with an existing) wearable device to make the entire navigation process more seamless and eliminate the need to hold a phone/camera up.