SnapSpeak

Image-Upload-Interface
Description-And-Audio-Output
Home-Page-Interface

Inspiration

The inspiration for SnapSpeak came from the desire to make visual content accessible to everyone, especially individuals with visual impairments. With the rapid advancements in AI, we wanted to leverage technology to create an intuitive assistant capable of describing images and providing spoken descriptions seamlessly.

What it does

SnapSpeak processes uploaded images, generates detailed captions using AI, and converts the descriptions to natural speech. It enables users to toggle between simplified and detailed captions, offering customizable insights into visual content.

How we built it

Backend: Built using FastAPI, leveraging the Salesforce BLIP model for detailed image captions and gTTS for text-to-speech conversion. Redis is integrated to optimize request handling and caching.
Frontend: Developed with React, ensuring accessibility by adhering to ARIA guidelines and supporting screen readers. It enables seamless interactions with the backend.
Deployment: Designed for deployment on both local platforms and cloud environments like Gitpod or Vercel for scalability.

Challenges we ran into

Model fine-tuning: Ensuring captions were both detailed and contextually relevant was a challenge.
TTS Integration: Aligning audio playback with generated captions and ensuring natural speech was tricky.
Accessibility Compliance: Implementing screen reader compatibility without compromising the design posed several iterations.
Performance Optimization: Handling high-quality image processing while maintaining low latency was resource-intensive.

Accomplishments that we're proud of

Successfully implemented detailed captioning for diverse images.
Added customizable caption toggling for detailed vs. simplified descriptions.
Achieved full screen reader compatibility, enhancing accessibility.
Optimized the backend to handle real-time requests efficiently.

What we learned

This project taught us the nuances of working with multimodal AI, integrating multiple technologies, and building inclusive solutions. We gained insights into the importance of accessibility features and the challenges of deploying robust AI models.

What's next for SnapSpeak

Language Expansion: Incorporate multi-language support for captions and speech.
Mobile App: Develop a mobile version to make the assistant more accessible on-the-go.
Advanced Features: Add object detection and contextual understanding for enhanced descriptions.
Community Contributions: Open the platform for user feedback and additional features driven by the community.

Built With

aria
axios
blip
coquitts
espeak-ng
fastapi
gitpod
huggingfaceapi
javascript
python
react
redis
redux

Updates

Amar Tiwari started this project — Nov 24, 2024 07:05 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.