Project Story

Inspiration

Most AI assistants today live inside a chat box. You type a prompt, the AI replies with text, and then you manually apply that output somewhere else.

But real thinking rarely happens line by line.

It happens visually — on whiteboards, diagrams, mind maps, and messy idea boards where ideas connect through space.

We wanted to build an AI that feels less like a chatbot and more like a spatial teammate that can understand and manipulate your workspace directly.

That idea became Stun.

Stun is a canvas-powered UI navigator where Google Gemini can see your workspace, understand relationships between ideas, and execute actions directly on the board — without forcing users into a chat interface.

Instead of asking AI for answers, you work with it inside the canvas.


What We Built

Stun is an infinite canvas combined with an AI action engine that allows users to interact with their workspace visually.

Users can speak or type commands such as:

  • "Turn this into a roadmap"
  • "Group related ideas"
  • "Connect these concepts"

The AI interprets the board and transforms it in real time.

Key Capabilities

Multimodal understanding
Gemini analyzes both a canvas screenshot and structured node data (nodes and edges) so it understands layout, grouping, and relationships.

AI action planning
Instead of generating text responses, Gemini produces a structured JSON action plan describing operations such as move, create, group, connect, and zoom.

Live execution
The frontend executes these actions instantly, allowing the canvas to reorganize itself in real time.

Hybrid canvas architecture
We combine three systems:

  • TLDraw → infinite canvas workspace
  • Excalidraw → drawing and visual editing
  • React Flow → structured knowledge graph for AI reasoning

This creates a workspace that works naturally for humans while remaining machine-readable for AI.

Voice + text interaction
Users can interact with the system using voice commands or text prompts, which are translated into spatial updates.

Real-time collaboration
Firestore synchronizes board state across users so teams can think and build together.


What We Learned

Building spatial AI systems revealed several key insights.

Large language models struggle with spatial reasoning when given only text. Combining visual input (screenshots) with structured graph data dramatically improves reliability.

Users expect AI interactions to feel instant. Achieving a smooth experience required keeping the entire pipeline — screenshot capture, AI inference, and canvas updates — under roughly one second.

We also learned that Firestore’s 1MB document limit forces careful data modeling when working with large collaborative boards.

Finally, synchronizing multiple canvas systems in real time is complex and requires thoughtful state management and conflict resolution.


Challenges We Faced

One of the biggest challenges was keeping three different canvas layers synchronized (TLDraw, Excalidraw, and React Flow) while allowing AI to manipulate the board without interrupting user input.

Another challenge was LLM hallucination. Sometimes Gemini would generate actions that didn’t make sense. To address this, we built a validation layer that checks and sanitizes every AI action plan before execution.

Performance was another critical challenge. Capturing screenshots, sending them to the AI, and applying updates had to remain under about one second to maintain a natural interaction experience.

Finally, enabling real-time collaboration meant ensuring the system remained consistent even when multiple users issued commands simultaneously.


Built With

Languages & Frameworks

  • TypeScript
  • Next.js
  • Express
  • React
  • Zustand
  • Bun

Cloud & AI

  • Google Cloud Run
  • Firestore (real-time database)
  • Google Gemini 2.5 Flash (GenAI SDK)
  • Firebase Authentication + JWT

Canvas & Interface

  • TLDraw (infinite canvas workspace)
  • Excalidraw (visual editing tools)
  • React Flow (knowledge graph structure)
  • html2canvas (canvas screenshot capture)
  • Web Speech API (voice commands)

Try It Out

Live Demo
https://stun-frontend-dev-ees5yh3pua-uc.a.run.app

https://stun-backend-dev-ees5yh3pua-uc.a.run.app/health

Source Code
https://github.com/Invariants0/Stun

Blog
https://medium.com/@mdatharjamalmakki/we-built-an-ai-that-lives-inside-a-canvas-not-a-chatbox-c6c14b66ec56

Built With

Share this project:

Updates