Skip to content

Shreyp087/SAGA

Repository files navigation

⚡ SAGA — The World's First Living Multimodal Story Engine

Built for the Gemini Live Agent Challenge 2026 · Creative Storyteller Category

Demo Video Live Demo Gemini 2.0 Flash Google ADK Cloud Run Hackathon

SAGA is a cinematic story universe engine where prose, illustrations, narration, ambient score, live voice direction, and persistent world memory flow together in one manuscript. It is designed as an agent, not a chatbot: Gemini Live listens, reasons as a co-author, and can autonomously trigger the next story movement with GENERATING: directions.

Live demo: https://saga-frontend-172547633566.us-central1.run.app
GitHub: https://github.com/Shreyp087/SAGA

Why SAGA

Most AI storytelling tools still act like text boxes. SAGA treats story creation as a living system:

  • See: inline illustrations with consistent character visual profiles
  • Hear: narration and ambient score appear inside the same manuscript flow
  • Speak: Gemini Live acts as a voice co-author, not just transcription
  • Remember: Firestore + Qdrant keep the story world persistent between visits

Architecture Diagram

SAGA Architecture

Mermaid source lives in docs/architecture/SAGA-Architecture.md.

graph TB
    User["User\n(Browser)"] -->|Voice/Text| Frontend["Next.js Frontend\n(Cloud Run)"]
    Frontend <-->|WebSocket| Backend["FastAPI Backend\n(Cloud Run)"]
    Frontend <-->|WebSocket| LiveProxy["Gemini Live Proxy\n(FastAPI /ws/live)"]

    Backend --> GeminiFlash["Gemini 2.0 Flash\n(Story Engine)"]
    Backend --> Imagen["Imagen 4\n(Illustrations)"]
    Backend --> Veo["Veo 2\n(Cinematic Clips)"]
    Backend --> TTS["Gemini TTS\n(Narration)"]
    Backend --> Lyria["Lyria 2\n(Ambient Music)"]

    LiveProxy --> GeminiLive["Gemini Live API\n(Voice Co-Author)"]

    Backend --> Firestore["Firestore\n(Persistent World)"]
    Backend --> GCS["Cloud Storage\n(Media Files)"]
    Backend --> SecretMgr["Secret Manager\n(API Keys)"]
    Backend --> Qdrant["Qdrant Cloud\n(Vector Memory)"]

    GeminiFlash -.->|ADK Orchestration| ADK["Google ADK\n(Agent Framework)"]
Loading

Feature Matrix

5 Google AI Models. One Story Engine.

Capability Google Model / Service What SAGA does with it
Story generation Gemini 2.0 Flash Writes the next story section with interleaved tags for media, world state, and continuity
Live co-authoring Gemini Live API Runs a bidirectional voice conversation and emits GENERATING: to trigger autonomous story continuation
Illustrations Imagen 4 Creates inline 16:9 scene illustrations derived from the immediately preceding passage
Cinematic clips Veo 2 Generates short scene transitions when a beat deserves motion
Narration Gemini TTS Adds voiced story passages inline with the manuscript
Ambient score Lyria 2 Composes scene-level audio beds for major turns

Awards-Oriented Product Highlights

  • Interleaved manuscript: text, images, audio, video, and score appear in one stream instead of separate tabs.
  • Gemini Live as an agent: the co-author asks clarifying questions, then autonomously triggers new story generation.
  • Persistent world return: close the browser, return later, and SAGA welcomes you back with story-specific characters and locations.
  • Character Visual Bible: generated character profiles are injected into every illustration prompt for visual consistency.
  • 3D world globe: locations and story connections appear in a live world atlas as the manuscript evolves.
  • Story Bible export: a formatted PDF chronicle with manuscript sections, character archive pages, and story metadata.

Tech Stack

Layer Technology
Frontend Next.js 15, React 19, Framer Motion, Zustand
Backend FastAPI, WebSockets, Structlog
Primary Google SDK google-genai
Agent Layer Google ADK (backend/app/agents/saga_adk_agent.py)
Live Voice Gemini Live API
Persistence Firestore
Media Storage Google Cloud Storage
Vector Memory Qdrant Cloud
Deployment Cloud Run, Artifact Registry, Secret Manager
Infrastructure as Code Terraform
PDF Export WeasyPrint

Quick Start (Local)

Prerequisites

  • Python 3.12+
  • Node.js 22+
  • Docker Desktop + Docker Compose
  • Google Cloud SDK (gcloud)
  • Gemini API key from https://aistudio.google.com
  • Optional: Qdrant Cloud URL + API key for remote vector memory

Clone and install

git clone https://github.com/Shreyp087/SAGA.git
cd SAGA
make setup

Configure environment

cp backend/.env.example backend/.env
cp frontend/.env.local.example frontend/.env.local

Fill in at least:

  • backend/.env: GOOGLE_API_KEY, GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_REGION, GCS_BUCKET_NAME, QDRANT_URL
  • frontend/.env.local: NEXT_PUBLIC_BACKEND_URL, NEXT_PUBLIC_WS_URL

Run the full stack

make quickstart

Expected local URLs:

  • Frontend: http://localhost:3000
  • Backend API: http://localhost:8000
  • Backend health: http://localhost:8000/health/
  • Qdrant (local compose): http://localhost:6333

Expected working state

Landing page -> Begin Your Saga -> /story
            -> WebSocket connects
            -> text streams
            -> inline image appears
            -> narration/score attach
            -> world globe updates

Cloud Deploy (Google Cloud)

Fast path

gcloud auth login
gcloud config set project YOUR_PROJECT_ID
./scripts/deploy.sh YOUR_PROJECT_ID us-central1

That script:

  1. Enables required Google Cloud APIs
  2. Syncs secrets into Secret Manager
  3. Builds and pushes backend/frontend images to Artifact Registry
  4. Deploys saga-backend to Cloud Run
  5. Deploys saga-frontend to Cloud Run using the discovered backend URL

Terraform path

cd infrastructure/terraform
terraform init
terraform apply -auto-approve \
  -var="project_id=YOUR_PROJECT_ID" \
  -var="gemini_api_key=YOUR_GEMINI_KEY" \
  -var="qdrant_url=https://YOUR-QDRANT-CLUSTER"

Relevant files:

Environment Variables Reference

Backend

Variable Required Purpose
GOOGLE_API_KEY Yes Gemini / Google GenAI SDK access
GOOGLE_CLOUD_PROJECT Yes GCP project id
GOOGLE_CLOUD_REGION Yes Region for Vertex AI / Cloud Run
GCS_BUCKET_NAME Yes Bucket for generated media
FIRESTORE_DATABASE Yes Usually (default)
QDRANT_URL Yes Qdrant Cloud URL or local Qdrant URL
QDRANT_API_KEY Optional Auth for Qdrant Cloud
CORS_ORIGINS Yes Local/browser allowlist
CORS_ORIGIN_REGEX Optional Cloud Run frontend allowlist
GEMINI_MODEL Optional Defaults to gemini-2.0-flash
GEMINI_FALLBACK_MODEL Optional Defaults to gemini-2.5-flash

Frontend

Variable Required Purpose
NEXT_PUBLIC_BACKEND_URL Yes HTTPS backend base URL
NEXT_PUBLIC_WS_URL Yes WSS backend base URL
NEXT_PUBLIC_APP_ENV Optional development or production

Cost Per Story (Illustrative)

Story mode What runs Approx cost
Text only Gemini 2.0 Flash ~$0.002
Text + image Gemini + Imagen 4 ~$0.06
Text + narration Gemini + TTS ~$0.02
Text + image + narration + score Gemini + Imagen + TTS + Lyria ~$0.12
Full cinematic Gemini + Imagen + TTS + Lyria + Veo ~$1.06

These are planning estimates for hackathon-style usage, not billing guarantees.

Demo Video

Third-Party Integrations

Integration Role in SAGA License / Note
Pollinations.ai Emergency image fallback if Imagen fails External fallback service
Three.js r128 via cdnjs 3D globe rendering in the live world atlas MIT
Framer Motion Cinematic UI motion and media reveals MIT
Zustand Global story/session state management MIT
WeasyPrint Story Bible PDF rendering BSD

Repository Guide

  • backend/app/agents/story_agent.py: interleaved multimodal story orchestration
  • backend/app/agents/saga_adk_agent.py: explicit ADK agent surface and tool registration
  • backend/app/api/live.py: Gemini Live proxy and GENERATING: trigger path
  • backend/app/api/websocket.py: story streaming, resume, and media restoration
  • backend/app/services/firestore_service.py: persistent worlds and cinematic welcome-back generation
  • frontend/src/components/story/StoryCanvas.tsx: inline multimodal manuscript renderer
  • frontend/src/components/story/WorldMap.tsx: live 3D world atlas

Hackathon Submission Context

SAGA was built specifically for the Gemini Live Agent Challenge 2026 in the Creative Storyteller category. The repo includes:

  • ADK agent definition
  • Cloud Run deployment automation
  • Terraform IaC
  • architecture diagram
  • demo script
  • Devpost submission draft
  • blog post draft
  • judge-facing deployment proof

Live Links

Hashtag

#GeminiLiveAgentChallenge

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors