⚡ SAGA — The World's First Living Multimodal Story Engine

Built for the Gemini Live Agent Challenge 2026 · Creative Storyteller Category

SAGA is a cinematic story universe engine where prose, illustrations, narration, ambient score, live voice direction, and persistent world memory flow together in one manuscript. It is designed as an agent, not a chatbot: Gemini Live listens, reasons as a co-author, and can autonomously trigger the next story movement with GENERATING: directions.

Live demo: https://saga-frontend-172547633566.us-central1.run.app
GitHub: https://github.com/Shreyp087/SAGA

Why SAGA

Most AI storytelling tools still act like text boxes. SAGA treats story creation as a living system:

See: inline illustrations with consistent character visual profiles
Hear: narration and ambient score appear inside the same manuscript flow
Speak: Gemini Live acts as a voice co-author, not just transcription
Remember: Firestore + Qdrant keep the story world persistent between visits

Architecture Diagram

Mermaid source lives in docs/architecture/SAGA-Architecture.md.

graph TB
    User["User\n(Browser)"] -->|Voice/Text| Frontend["Next.js Frontend\n(Cloud Run)"]
    Frontend <-->|WebSocket| Backend["FastAPI Backend\n(Cloud Run)"]
    Frontend <-->|WebSocket| LiveProxy["Gemini Live Proxy\n(FastAPI /ws/live)"]

    Backend --> GeminiFlash["Gemini 2.0 Flash\n(Story Engine)"]
    Backend --> Imagen["Imagen 4\n(Illustrations)"]
    Backend --> Veo["Veo 2\n(Cinematic Clips)"]
    Backend --> TTS["Gemini TTS\n(Narration)"]
    Backend --> Lyria["Lyria 2\n(Ambient Music)"]

    LiveProxy --> GeminiLive["Gemini Live API\n(Voice Co-Author)"]

    Backend --> Firestore["Firestore\n(Persistent World)"]
    Backend --> GCS["Cloud Storage\n(Media Files)"]
    Backend --> SecretMgr["Secret Manager\n(API Keys)"]
    Backend --> Qdrant["Qdrant Cloud\n(Vector Memory)"]

    GeminiFlash -.->|ADK Orchestration| ADK["Google ADK\n(Agent Framework)"]

Feature Matrix

5 Google AI Models. One Story Engine.

Capability	Google Model / Service	What SAGA does with it
Story generation	Gemini 2.0 Flash	Writes the next story section with interleaved tags for media, world state, and continuity
Live co-authoring	Gemini Live API	Runs a bidirectional voice conversation and emits `GENERATING:` to trigger autonomous story continuation
Illustrations	Imagen 4	Creates inline 16:9 scene illustrations derived from the immediately preceding passage
Cinematic clips	Veo 2	Generates short scene transitions when a beat deserves motion
Narration	Gemini TTS	Adds voiced story passages inline with the manuscript
Ambient score	Lyria 2	Composes scene-level audio beds for major turns

Awards-Oriented Product Highlights

Interleaved manuscript: text, images, audio, video, and score appear in one stream instead of separate tabs.
Gemini Live as an agent: the co-author asks clarifying questions, then autonomously triggers new story generation.
Persistent world return: close the browser, return later, and SAGA welcomes you back with story-specific characters and locations.
Character Visual Bible: generated character profiles are injected into every illustration prompt for visual consistency.
3D world globe: locations and story connections appear in a live world atlas as the manuscript evolves.
Story Bible export: a formatted PDF chronicle with manuscript sections, character archive pages, and story metadata.

Tech Stack

Layer	Technology
Frontend	Next.js 15, React 19, Framer Motion, Zustand
Backend	FastAPI, WebSockets, Structlog
Primary Google SDK	`google-genai`
Agent Layer	Google ADK (`backend/app/agents/saga_adk_agent.py`)
Live Voice	Gemini Live API
Persistence	Firestore
Media Storage	Google Cloud Storage
Vector Memory	Qdrant Cloud
Deployment	Cloud Run, Artifact Registry, Secret Manager
Infrastructure as Code	Terraform
PDF Export	WeasyPrint

Quick Start (Local)

Prerequisites

Python 3.12+
Node.js 22+
Docker Desktop + Docker Compose
Google Cloud SDK (gcloud)
Gemini API key from https://aistudio.google.com
Optional: Qdrant Cloud URL + API key for remote vector memory

Clone and install

git clone https://github.com/Shreyp087/SAGA.git
cd SAGA
make setup

Configure environment

cp backend/.env.example backend/.env
cp frontend/.env.local.example frontend/.env.local

Fill in at least:

backend/.env: GOOGLE_API_KEY, GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_REGION, GCS_BUCKET_NAME, QDRANT_URL
frontend/.env.local: NEXT_PUBLIC_BACKEND_URL, NEXT_PUBLIC_WS_URL

Run the full stack

make quickstart

Expected local URLs:

Frontend: http://localhost:3000
Backend API: http://localhost:8000
Backend health: http://localhost:8000/health/
Qdrant (local compose): http://localhost:6333

Expected working state

Landing page -> Begin Your Saga -> /story
            -> WebSocket connects
            -> text streams
            -> inline image appears
            -> narration/score attach
            -> world globe updates

Cloud Deploy (Google Cloud)

Fast path

gcloud auth login
gcloud config set project YOUR_PROJECT_ID
./scripts/deploy.sh YOUR_PROJECT_ID us-central1

That script:

Enables required Google Cloud APIs
Syncs secrets into Secret Manager
Builds and pushes backend/frontend images to Artifact Registry
Deploys saga-backend to Cloud Run
Deploys saga-frontend to Cloud Run using the discovered backend URL

Terraform path

cd infrastructure/terraform
terraform init
terraform apply -auto-approve \
  -var="project_id=YOUR_PROJECT_ID" \
  -var="gemini_api_key=YOUR_GEMINI_KEY" \
  -var="qdrant_url=https://YOUR-QDRANT-CLUSTER"

Relevant files:

Environment Variables Reference

Backend

Variable	Required	Purpose
`GOOGLE_API_KEY`	Yes	Gemini / Google GenAI SDK access
`GOOGLE_CLOUD_PROJECT`	Yes	GCP project id
`GOOGLE_CLOUD_REGION`	Yes	Region for Vertex AI / Cloud Run
`GCS_BUCKET_NAME`	Yes	Bucket for generated media
`FIRESTORE_DATABASE`	Yes	Usually `(default)`
`QDRANT_URL`	Yes	Qdrant Cloud URL or local Qdrant URL
`QDRANT_API_KEY`	Optional	Auth for Qdrant Cloud
`CORS_ORIGINS`	Yes	Local/browser allowlist
`CORS_ORIGIN_REGEX`	Optional	Cloud Run frontend allowlist
`GEMINI_MODEL`	Optional	Defaults to `gemini-2.0-flash`
`GEMINI_FALLBACK_MODEL`	Optional	Defaults to `gemini-2.5-flash`

Frontend

Variable	Required	Purpose
`NEXT_PUBLIC_BACKEND_URL`	Yes	HTTPS backend base URL
`NEXT_PUBLIC_WS_URL`	Yes	WSS backend base URL
`NEXT_PUBLIC_APP_ENV`	Optional	`development` or `production`

Cost Per Story (Illustrative)

Story mode	What runs	Approx cost
Text only	Gemini 2.0 Flash	~$0.002
Text + image	Gemini + Imagen 4	~$0.06
Text + narration	Gemini + TTS	~$0.02
Text + image + narration + score	Gemini + Imagen + TTS + Lyria	~$0.12
Full cinematic	Gemini + Imagen + TTS + Lyria + Veo	~$1.06

These are planning estimates for hackathon-style usage, not billing guarantees.

Demo Video

Demo video placeholder: DEMO_URL
Full script: docs/DEMO-VIDEO-SCRIPT.md

Third-Party Integrations

Integration	Role in SAGA	License / Note
Pollinations.ai	Emergency image fallback if Imagen fails	External fallback service
Three.js r128 via cdnjs	3D globe rendering in the live world atlas	MIT
Framer Motion	Cinematic UI motion and media reveals	MIT
Zustand	Global story/session state management	MIT
WeasyPrint	Story Bible PDF rendering	BSD

Repository Guide

backend/app/agents/story_agent.py: interleaved multimodal story orchestration
backend/app/agents/saga_adk_agent.py: explicit ADK agent surface and tool registration
backend/app/api/live.py: Gemini Live proxy and GENERATING: trigger path
backend/app/api/websocket.py: story streaming, resume, and media restoration
backend/app/services/firestore_service.py: persistent worlds and cinematic welcome-back generation
frontend/src/components/story/StoryCanvas.tsx: inline multimodal manuscript renderer
frontend/src/components/story/WorldMap.tsx: live 3D world atlas

Hackathon Submission Context

SAGA was built specifically for the Gemini Live Agent Challenge 2026 in the Creative Storyteller category. The repo includes:

ADK agent definition
Cloud Run deployment automation
Terraform IaC
architecture diagram
demo script
Devpost submission draft
blog post draft
judge-facing deployment proof

Live Links

GitHub repository: https://github.com/Shreyp087/SAGA
Demo video: https://youtu.be/mdONC55NxEU
Blog Link: https://dev.to/shreyp087/how-i-built-saga-a-living-multimodal-story-engine-with-5-google-ai-models-102b

Hashtag

#GeminiLiveAgentChallenge

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.tmp_frontend_updated		.tmp_frontend_updated
backend		backend
docs		docs
frontend		frontend
infrastructure		infrastructure
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
User-Manual.txt		User-Manual.txt
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ SAGA — The World's First Living Multimodal Story Engine

Why SAGA

Architecture Diagram

Feature Matrix

5 Google AI Models. One Story Engine.

Awards-Oriented Product Highlights

Tech Stack

Quick Start (Local)

Prerequisites

Clone and install

Configure environment

Run the full stack

Expected working state

Cloud Deploy (Google Cloud)

Fast path

Terraform path

Environment Variables Reference

Backend

Frontend

Cost Per Story (Illustrative)

Demo Video

Third-Party Integrations

Repository Guide

Hackathon Submission Context

Live Links

Hashtag

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚡ SAGA — The World's First Living Multimodal Story Engine

Why SAGA

Architecture Diagram

Feature Matrix

5 Google AI Models. One Story Engine.

Awards-Oriented Product Highlights

Tech Stack

Quick Start (Local)

Prerequisites

Clone and install

Configure environment

Run the full stack

Expected working state

Cloud Deploy (Google Cloud)

Fast path

Terraform path

Environment Variables Reference

Backend

Frontend

Cost Per Story (Illustrative)

Demo Video

Third-Party Integrations

Repository Guide

Hackathon Submission Context

Live Links

Hashtag

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages