CoCo

Inspiration

We built Coco because we believe creativity should be collaborative—not outsourced. When OpenAI released its new image model recently, we noticed how social media was being flooded by AI generated images. The model was seen as the artist and not as an assistant. People were typing prompts and calling it imagination. We wanted to flip the script.

What is Coco?

CoCo is a computer-vision-based drawing tool that lets users create beautiful sketches using intuitive gestures. With an engaging interface and AI toolset, users can easily bring their ideas to life within seconds. Instead of relying on text prompts, we use MediaPipe and OpenCV to detect finger movements and custom gestures for a fluid, expressive and highly unique drawing experience.

CoCo is part sketchpad, part story engine, and fully collaborative.

You can draw with friends in real time, watching each other’s ideas unfold on a shared canvas. CoCo itself acts as an assistant that furthers collaboration. After drawing, users can enhance their images with Gemini to generate polished illustrations based on their sketches. We even added a storyboard mode, where each canvas becomes a scene. With narration and transitions, CoCo weaves these frames into an animated video—an especially magical feature for kids to transform their ideas into living stories.

How we built it

We built a responsive front-end with React and TypeScript, using lucide-react and TailwindCSS to craft a clean, intuitive UI. Our custom canvas component allows users to draw fluidly with hand gestures detected via MediaPipe. For seamless collaboration, we implemented WebSocket-based real-time sync on a Node.js backend, enabling multiple users to co-create on the same canvas simultaneously.

To power CoCo’s AI assistant, we used a Flask backend integrated with Gemini for enhanced image and text generation, and ElevenLabs for voice narration in storyboard mode. This pipeline transforms rough sketches into polished illustrations and narrated stories—while keeping the user’s creative intent at the center of the process.

Authentication and user experience are enhanced with Auth0, including a playful twist: instead of a traditional CAPTCHA, we built a custom “not a bot” gesture challenge using the Auth0 framework that requires users to perform specific hand motions to gain access. It’s a creative layer of security that matches the interactive spirit of CoCo.

Challenges we ran into

Integrating multiple complex features—real-time collaboration, gesture-based drawing, AI-enhanced rendering, and videostory generation—into a unified React frontend was one of the most challenging (and rewarding) aspects of the project. Managing shared state across users in real-time while ensuring smooth UX required careful coordination between WebSocket events, React state, and custom canvas logic.

We also faced difficulties in optimizing gesture recognition with MediaPipe, ensuring it remained accurate and responsive without overwhelming the browser. Balancing client-side performance with backend calls to Gemini and Flask for image enhancement and narration also required careful throttling and asynchronous data handling.

Applications

With such interactivity, engagement, and ability to collaborate we see many applications for CoCo in the education, software collaboration and entertainment space.

For example, with social media many kids today grow up as content consumers—scrolling, watching, and absorbing the infinite information out there. This diminishes their creativity. We see CoCo helping them think as creators with creative freedom. With gesture-based drawing and AI-powered storytelling, kids can bring their own ideas to life and watch them unfold in fun, suspenseful animated stories. It’s hands-on, imaginative, and far more engaging than passively watching videos.

Beyond that, CoCo’s real-time collaboration opens doors for educational brainstorming, classroom sketch sessions, and creative mockups for teams. We're thinking: Figma meets VR meets storytelling. The same tech can extend to game design, entertainment, and interactive workshops, empowering users of all ages to build, ideate, and play together in real time.

What's next for CoCo

We’re excited to expand CoCo into an educational and entertainment platform where kids, students, and creatives can storyboard, collaborate, and animate effortlessly. To add onto the human-computer interaction aspect we are looking to include a voice model (most likely Gemini's multimodal live streaming) that allows the user to more directly interact with Coco's suggested sketch improvements.

Further extensions include adding voice-to-animation, VR sketching, and gamification of certain aspects (for example a real-time skribbl.io but using Coco!). We want to better fit the specific use cases of children and foster creativity for the general public, so deployment and user-feedback will be critical. We are overall very passionate about its applications and want to push this out into the real world to see how creative people can get!

Built With

Share this project:

Updates