Built for the Gemini Live Agent Challenge 2026 · Creative Storyteller Category
SAGA is a cinematic story universe engine where prose, illustrations, narration, ambient score, live voice direction, and persistent world memory flow together in one manuscript. It is designed as an agent, not a chatbot: Gemini Live listens, reasons as a co-author, and can autonomously trigger the next story movement with GENERATING: directions.
Live demo: https://saga-frontend-172547633566.us-central1.run.app
GitHub: https://github.com/Shreyp087/SAGA
Most AI storytelling tools still act like text boxes. SAGA treats story creation as a living system:
See: inline illustrations with consistent character visual profilesHear: narration and ambient score appear inside the same manuscript flowSpeak: Gemini Live acts as a voice co-author, not just transcriptionRemember: Firestore + Qdrant keep the story world persistent between visits
Mermaid source lives in docs/architecture/SAGA-Architecture.md.
graph TB
User["User\n(Browser)"] -->|Voice/Text| Frontend["Next.js Frontend\n(Cloud Run)"]
Frontend <-->|WebSocket| Backend["FastAPI Backend\n(Cloud Run)"]
Frontend <-->|WebSocket| LiveProxy["Gemini Live Proxy\n(FastAPI /ws/live)"]
Backend --> GeminiFlash["Gemini 2.0 Flash\n(Story Engine)"]
Backend --> Imagen["Imagen 4\n(Illustrations)"]
Backend --> Veo["Veo 2\n(Cinematic Clips)"]
Backend --> TTS["Gemini TTS\n(Narration)"]
Backend --> Lyria["Lyria 2\n(Ambient Music)"]
LiveProxy --> GeminiLive["Gemini Live API\n(Voice Co-Author)"]
Backend --> Firestore["Firestore\n(Persistent World)"]
Backend --> GCS["Cloud Storage\n(Media Files)"]
Backend --> SecretMgr["Secret Manager\n(API Keys)"]
Backend --> Qdrant["Qdrant Cloud\n(Vector Memory)"]
GeminiFlash -.->|ADK Orchestration| ADK["Google ADK\n(Agent Framework)"]
| Capability | Google Model / Service | What SAGA does with it |
|---|---|---|
| Story generation | Gemini 2.0 Flash | Writes the next story section with interleaved tags for media, world state, and continuity |
| Live co-authoring | Gemini Live API | Runs a bidirectional voice conversation and emits GENERATING: to trigger autonomous story continuation |
| Illustrations | Imagen 4 | Creates inline 16:9 scene illustrations derived from the immediately preceding passage |
| Cinematic clips | Veo 2 | Generates short scene transitions when a beat deserves motion |
| Narration | Gemini TTS | Adds voiced story passages inline with the manuscript |
| Ambient score | Lyria 2 | Composes scene-level audio beds for major turns |
- Interleaved manuscript: text, images, audio, video, and score appear in one stream instead of separate tabs.
- Gemini Live as an agent: the co-author asks clarifying questions, then autonomously triggers new story generation.
- Persistent world return: close the browser, return later, and SAGA welcomes you back with story-specific characters and locations.
- Character Visual Bible: generated character profiles are injected into every illustration prompt for visual consistency.
- 3D world globe: locations and story connections appear in a live world atlas as the manuscript evolves.
- Story Bible export: a formatted PDF chronicle with manuscript sections, character archive pages, and story metadata.
| Layer | Technology |
|---|---|
| Frontend | Next.js 15, React 19, Framer Motion, Zustand |
| Backend | FastAPI, WebSockets, Structlog |
| Primary Google SDK | google-genai |
| Agent Layer | Google ADK (backend/app/agents/saga_adk_agent.py) |
| Live Voice | Gemini Live API |
| Persistence | Firestore |
| Media Storage | Google Cloud Storage |
| Vector Memory | Qdrant Cloud |
| Deployment | Cloud Run, Artifact Registry, Secret Manager |
| Infrastructure as Code | Terraform |
| PDF Export | WeasyPrint |
- Python 3.12+
- Node.js 22+
- Docker Desktop + Docker Compose
- Google Cloud SDK (
gcloud) - Gemini API key from https://aistudio.google.com
- Optional: Qdrant Cloud URL + API key for remote vector memory
git clone https://github.com/Shreyp087/SAGA.git
cd SAGA
make setupcp backend/.env.example backend/.env
cp frontend/.env.local.example frontend/.env.localFill in at least:
backend/.env:GOOGLE_API_KEY,GOOGLE_CLOUD_PROJECT,GOOGLE_CLOUD_REGION,GCS_BUCKET_NAME,QDRANT_URLfrontend/.env.local:NEXT_PUBLIC_BACKEND_URL,NEXT_PUBLIC_WS_URL
make quickstartExpected local URLs:
- Frontend:
http://localhost:3000 - Backend API:
http://localhost:8000 - Backend health:
http://localhost:8000/health/ - Qdrant (local compose):
http://localhost:6333
Landing page -> Begin Your Saga -> /story
-> WebSocket connects
-> text streams
-> inline image appears
-> narration/score attach
-> world globe updates
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
./scripts/deploy.sh YOUR_PROJECT_ID us-central1That script:
- Enables required Google Cloud APIs
- Syncs secrets into Secret Manager
- Builds and pushes backend/frontend images to Artifact Registry
- Deploys
saga-backendto Cloud Run - Deploys
saga-frontendto Cloud Run using the discovered backend URL
cd infrastructure/terraform
terraform init
terraform apply -auto-approve \
-var="project_id=YOUR_PROJECT_ID" \
-var="gemini_api_key=YOUR_GEMINI_KEY" \
-var="qdrant_url=https://YOUR-QDRANT-CLUSTER"Relevant files:
- scripts/deploy.sh
- infrastructure/terraform/main.tf
- infrastructure/terraform/variables.tf
- infrastructure/terraform/outputs.tf
| Variable | Required | Purpose |
|---|---|---|
GOOGLE_API_KEY |
Yes | Gemini / Google GenAI SDK access |
GOOGLE_CLOUD_PROJECT |
Yes | GCP project id |
GOOGLE_CLOUD_REGION |
Yes | Region for Vertex AI / Cloud Run |
GCS_BUCKET_NAME |
Yes | Bucket for generated media |
FIRESTORE_DATABASE |
Yes | Usually (default) |
QDRANT_URL |
Yes | Qdrant Cloud URL or local Qdrant URL |
QDRANT_API_KEY |
Optional | Auth for Qdrant Cloud |
CORS_ORIGINS |
Yes | Local/browser allowlist |
CORS_ORIGIN_REGEX |
Optional | Cloud Run frontend allowlist |
GEMINI_MODEL |
Optional | Defaults to gemini-2.0-flash |
GEMINI_FALLBACK_MODEL |
Optional | Defaults to gemini-2.5-flash |
| Variable | Required | Purpose |
|---|---|---|
NEXT_PUBLIC_BACKEND_URL |
Yes | HTTPS backend base URL |
NEXT_PUBLIC_WS_URL |
Yes | WSS backend base URL |
NEXT_PUBLIC_APP_ENV |
Optional | development or production |
| Story mode | What runs | Approx cost |
|---|---|---|
| Text only | Gemini 2.0 Flash | ~$0.002 |
| Text + image | Gemini + Imagen 4 | ~$0.06 |
| Text + narration | Gemini + TTS | ~$0.02 |
| Text + image + narration + score | Gemini + Imagen + TTS + Lyria | ~$0.12 |
| Full cinematic | Gemini + Imagen + TTS + Lyria + Veo | ~$1.06 |
These are planning estimates for hackathon-style usage, not billing guarantees.
- Demo video placeholder:
DEMO_URL - Full script: docs/DEMO-VIDEO-SCRIPT.md
| Integration | Role in SAGA | License / Note |
|---|---|---|
| Pollinations.ai | Emergency image fallback if Imagen fails | External fallback service |
| Three.js r128 via cdnjs | 3D globe rendering in the live world atlas | MIT |
| Framer Motion | Cinematic UI motion and media reveals | MIT |
| Zustand | Global story/session state management | MIT |
| WeasyPrint | Story Bible PDF rendering | BSD |
backend/app/agents/story_agent.py: interleaved multimodal story orchestrationbackend/app/agents/saga_adk_agent.py: explicit ADK agent surface and tool registrationbackend/app/api/live.py: Gemini Live proxy andGENERATING:trigger pathbackend/app/api/websocket.py: story streaming, resume, and media restorationbackend/app/services/firestore_service.py: persistent worlds and cinematic welcome-back generationfrontend/src/components/story/StoryCanvas.tsx: inline multimodal manuscript rendererfrontend/src/components/story/WorldMap.tsx: live 3D world atlas
SAGA was built specifically for the Gemini Live Agent Challenge 2026 in the Creative Storyteller category. The repo includes:
- ADK agent definition
- Cloud Run deployment automation
- Terraform IaC
- architecture diagram
- demo script
- Devpost submission draft
- blog post draft
- judge-facing deployment proof
- GitHub repository: https://github.com/Shreyp087/SAGA
- Demo video: https://youtu.be/mdONC55NxEU
- Blog Link: https://dev.to/shreyp087/how-i-built-saga-a-living-multimodal-story-engine-with-5-google-ai-models-102b
#GeminiLiveAgentChallenge
