Inspiration
Have you ever wondered if the ingredients in your favorite soda or snack could be potentially harmful? This thought usually dissipates after a quick skim over the label, knowing that researching each individual ingredient would be a waste of time.
What it does
Salus takes a scan of a nutrition label and will (not working yet) send the image to OpenAI's GPT-4-turbo through their API with our prompt and explain what the potentially harmful ingredients are and the way that they could harm you. After this initial response, it will open up a chat box where you can ask a few further questions about the ingredients.
How we built it
The app was built with swift and a community made OpenAI package that make calls to their API.
Challenges we ran into
Originally, we wanted to use GPT 4 or Apple's built in vision models for OCR and Microsoft's new SLM Phi-3 to scan blood pressure monitors in order to send the data into a health app, but the text detection was not accurate enough in either model to make it feasible.
For this reason we pivoted to reading and explaining nutrition labels as this is something that doesn't require such high accuracy but instead needs knowledge of different ingredients which LLM's do well.
However our biggest challenges came from building out the front-end as none of us had experience in building an app and integrating the GPT-4 API as it is not directly supported yet by OpenAI and the packages the bring support to it do not have support for the latest GPT-4 vision API.
Accomplishments that we're proud of
Building an app in Swift Using OpenAI's new GPT-4 API for Vision (working in python, not yet in swift)
What we learned
Collaborating and testing in swift is difficult if not all group members have apple devices. Using frameworks can help reduce compatibility issues with packages and testing.
What's next for Salus
Fixing image API call to use GPT-4-turbo to describe harmful ingredients Adding a context aware chat feature after scanning image Publishing to App Store Implementing Apple's built in vision models for OCR and Microsoft's new SLM Phi-3 quantized models so that health data never leaves the device Expanding to other image-detection cases like reading a blood pressure monitor Expanding to android and other platforms
Built With
- apple-ocr
- gpt-4
- microsoft-phi-3
- openai
- swift
Log in or sign up for Devpost to join the conversation.