Inspiration

The inspiration for SnapSpeak came from the desire to make visual content accessible to everyone, especially individuals with visual impairments. With the rapid advancements in AI, we wanted to leverage technology to create an intuitive assistant capable of describing images and providing spoken descriptions seamlessly.

What it does

SnapSpeak processes uploaded images, generates detailed captions using AI, and converts the descriptions to natural speech. It enables users to toggle between simplified and detailed captions, offering customizable insights into visual content.

How we built it

  • Backend: Built using FastAPI, leveraging the Salesforce BLIP model for detailed image captions and gTTS for text-to-speech conversion. Redis is integrated to optimize request handling and caching.
  • Frontend: Developed with React, ensuring accessibility by adhering to ARIA guidelines and supporting screen readers. It enables seamless interactions with the backend.
  • Deployment: Designed for deployment on both local platforms and cloud environments like Gitpod or Vercel for scalability.

Challenges we ran into

  • Model fine-tuning: Ensuring captions were both detailed and contextually relevant was a challenge.
  • TTS Integration: Aligning audio playback with generated captions and ensuring natural speech was tricky.
  • Accessibility Compliance: Implementing screen reader compatibility without compromising the design posed several iterations.
  • Performance Optimization: Handling high-quality image processing while maintaining low latency was resource-intensive.

Accomplishments that we're proud of

  • Successfully implemented detailed captioning for diverse images.
  • Added customizable caption toggling for detailed vs. simplified descriptions.
  • Achieved full screen reader compatibility, enhancing accessibility.
  • Optimized the backend to handle real-time requests efficiently.

What we learned

This project taught us the nuances of working with multimodal AI, integrating multiple technologies, and building inclusive solutions. We gained insights into the importance of accessibility features and the challenges of deploying robust AI models.

What's next for SnapSpeak

  • Language Expansion: Incorporate multi-language support for captions and speech.
  • Mobile App: Develop a mobile version to make the assistant more accessible on-the-go.
  • Advanced Features: Add object detection and contextual understanding for enhanced descriptions.
  • Community Contributions: Open the platform for user feedback and additional features driven by the community.

Built With

Share this project:

Updates