Inspiration
We wanted to make grocery shopping smarter, more sustainable, and more informed. Every day, people make quick purchasing decisions without realizing the environmental, ethical, and health implications behind each product. Our goal was to bridge that gap by combining AI vision, real-time data, and sustainability insights into a seamless, hands-free experience that empowers users to shop responsibly.
What it does
Cartify identifies grocery items in real time using computer vision and provides instant feedback about each product’s sustainability, nutritional value, and ethical sourcing. As users pick up items, Cartify automatically adds them to a virtual cart, calculates total cost, and assigns a sustainability score. Using Google Gemini, the app explains the reasoning behind each score in simple, conversational language, while ElevenLabs converts that feedback into natural voice responses for an engaging, hands-free shopping experience.
How we built it
We developed Cartify using a multi-service architecture: The vision backend captures frames from a live camera feed and identifies products using trained image recognition models. The backend API processes this data and sends structured product details to the Google Gemini API, which generates sustainability analyses and concise explanations. The ElevenLabs API transforms Gemini’s responses into realistic audio, creating a fully interactive experience. The frontend displays the product info, sustainability score, and running total in real time. Our stack included Python (Flask) for the backend, React for the frontend, and Gemini 2.0 Flash + ElevenLabs for AI integration.
Challenges we ran into
Making sure our AI model was trained properly was definitely the biggest hurdle. Being able to differentiate between what product you are trying to get the info for and background products was difficult. It took many trial images and videos to get consistent results from our model, then detecting whether the item was added to cart or not was a separate issue. Overall, this was a strenuous task that took well over 22 hours before the model was consistent enough to do proper trial runs.
Accomplishments that we're proud of
We’re proud of building a fully functional AI-powered assistant that combines computer vision, reasoning, and speech in real time. Seeing the system recognize a product, evaluate its sustainability, and talk back with natural voice feedback felt like bringing futuristic shopping to life. We also successfully integrated multiple complex APIs (Gemini + ElevenLabs) into a cohesive and responsive experience.
What we learned
We learned how powerful and flexible modern multimodal AI systems can be when orchestrated effectively. From handling prompt engineering in Gemini to optimizing audio generation with ElevenLabs, we gained a deeper understanding of building end-to-end intelligent systems. We also improved at managing asynchronous pipelines and handling real-time inference under performance constraints.
What's next for Cartify
We plan to expand Cartify with personalized shopping profiles, nutritional tracking, and community-sourced sustainability data. Integrating live product databases (e.g., OpenFoodFacts) and retailer APIs could make the app even more practical. Long-term, we envision Cartify as a personal sustainability companion, helping users make conscious choices every time they shop — whether online or in-store.

Log in or sign up for Devpost to join the conversation.