Skip to content
This repository was archived by the owner on Jun 23, 2026. It is now read-only.

GameStudioAI/GameStudio

Repository files navigation

Enterprise Training Studio

Voice-first corporate training studio powered by Deepgram Voice Agent, Gemini (scenario design + assessment), and Browserbase (web research during creation). Learners practice through continuous voice roleplay scenarios with a single scene background and competency scoring at the end.

Run

npm install
npm run dev

Opens the Vite client at http://localhost:5173 and the API server at http://localhost:3001.

Environment (.env)

Variable Used for
DEEPGRAM_API_KEY Voice Agent WebSocket, speech-to-text, and TTS
GEMINI_API_KEY Scenario generation, runtime agent brain, image generation, and competency scoring
GEMINI_MODEL Agent brain (gemini-3.1-flash-lite)
GEMINI_IMAGE_MODEL Scene backgrounds and marketing thumbnails
BROWSERBASE_API_KEY Researches topics on the web when creating/editing simulations
BROWSERBASE_PROJECT_ID Optional Browserbase project (inferred from API key if omitted)
ARIZE_API_KEY Required — competency assessment traces to Arize AX (OTLP)
ARIZE_SPACE_ID Required — Arize space ID for trace export
ARIZE_PROJECT_NAME Optional project name in Arize (default: gamestudio-learning)

How it works

  1. Studio (voice or type) — describe an enterprise training scenario
  2. Browserbase (creation) — opens a cloud browser and gathers factual excerpts from the web (see below)
  3. Gemini (creation) — designs a voice roleplay scenario with decision points and generates one scene background from the research
  4. Training (play) — learner has one continuous spoken conversation with a Deepgram-powered voice agent that roleplays the scenario (see below)
  5. Gemini + Arize (assessment) — at the natural end of the scenario, the full transcript is scored and optionally exported to Arize (see below)

Deepgram, Browserbase, and Arize

Deepgram

Deepgram powers all real-time voice and spoken audio in the app. The server never streams raw audio to the client for storage — it mints short-lived tokens and builds agent configs; the browser connects directly to Deepgram via the @deepgram/agents SDK.

Studio (create / edit flow) — On the Studio page, the mic opens a Deepgram Voice Agent session (GET /api/deepgram/token, GET /api/deepgram/config). Deepgram handles speech-to-text and turn detection. Gemini is wired in as the agent’s “think” provider: it interprets what you said and decides which studio tool to call (create_game, edit_game, publish_game, etc.). Spoken replies are rendered client-side through POST /api/deepgram/speak (Aura TTS), not the agent’s built-in speak pipeline.

Training (roleplay flow) — On the Training page, a separate lesson agent config is built per session (POST /api/deepgram/lesson-config). Here Deepgram listens to the learner’s microphone (with end-of-turn detection) while Gemini drives an in-character roleplay: the agent speaks as the scenario character(s), pushes back on weak answers, and walks through generated decision points. When the scenario reaches a natural endpoint, the agent calls a complete_scenario tool. Agent lines are spoken via the same REST TTS endpoint so captions stay in sync with audio.

Where it runs

Surface Deepgram role
Studio mic STT + turn taking; Gemini decides studio actions
Training mic STT + turn taking; Gemini roleplays the scenario
All spoken lines Aura TTS via POST /api/deepgram/speak

If DEEPGRAM_API_KEY is missing, voice mode is disabled but typed Studio chat and non-voice flows still work.

Browserbase

Browserbase is used only during creation and editing, not while a learner is training. When you ask Studio to create or edit a module, the server starts a Browserbase cloud browser session and connects to it with Playwright over CDP (server/browserbase.ts).

The session searches the web for material related to your topic:

  1. Wikipedia — runs a search, opens the first relevant article, and extracts a text excerpt.
  2. MDN — same pattern for technical or product-adjacent topics.

Those excerpts (URL, title, and body text) are formatted into research notes and passed to Gemini when generating:

  • The scenario outline (setting, persona, opening line, 3–4 decision points with 3–4 choices each)
  • Edits to an existing module (edit_game)

Learners never browse the live web during training — Browserbase is a research step for accurate scenario content, similar to an instructional designer looking up source material before writing a script.

If BROWSERBASE_API_KEY is not set, generation falls back to Gemini’s general knowledge without web research.

Arize

Arize receives observability traces for competency assessments, not live voice audio or Browserbase sessions. Scoring itself is done by Gemini in POST /api/learning/evaluate.

When a training session ends:

  1. The full conversation transcript and scenario decision points are sent to the evaluate endpoint.
  2. Gemini maps each decision point to what the learner said and assigns quality (strong / adequate / weak).
  3. A weighted mastery score and verdict are computed and returned to the UI.
  4. OpenTelemetry spans are emitted to Arize AX via OTLP with OpenInference attributes (input.value, output.value, openinference.span.kind, session.id).

Trace hierarchy per session:

Span name Kind Purpose
training.session CHAIN Root trace for one assessment
llm.analyze_decisions LLM Gemini decision mapping
llm.score_session LLM Gemini mastery scoring
learning.decision.eval LLM One span per decision point — use these for Q&A evaluator
llm.analyze_decisions LLM Internal Gemini decision-mapping call
llm.score_session LLM Internal Gemini scoring call
training.assessment CHAIN Session summary

Project filter in Arize: set ARIZE_PROJECT_NAME in .env (default gamestudio-learning) and select that project in the Arize UI.

LLM-as-a-Judge template: use Q&A with scope Span and this filter:

name = 'learning.decision.eval'

(Arize's default preview filter openinference.span.kind = LLM will match these spans.)

Template variable Map to attribute
{input} attributes.input.value → JSON field input (reference context + choices)
{question} attributes.evaluation.question
{output} attributes.output.value (learner's spoken answer)

Do not target llm.analyze_decisions or llm.score_session — those are internal Gemini prompts, not learner Q&A.

Optional: add User Frustration on training.assessment spans using the transcript in input.value.

The assessment overlay in the app is the learner-facing result. Arize is required for competency assessment — traces are flushed on every evaluation so they appear in Arize AX within ~30 seconds.

Studio agent flow

  1. Tap the mic on Studio (or use Type mode) → connects to Deepgram Voice Agent or Gemini text chat
  2. Describe a scenario — agent transcribes/responds and calls studio functions
  3. Tap again to disconnect (voice mode)

Example commands:

  • "Create a sales discovery call simulation for new enterprise AEs"
  • "Add a branch for handling pricing objections"
  • "Publish this training module"
  • "Make a B2B ad for sales enablement leaders"

Training flow

  1. Open a module from the Library → single background with voice UI overlay
  2. The agent opens with the scenario’s in-character opening line (spoken via Deepgram TTS)
  3. Learner unmutes and responds; the agent roleplays through decision points until a natural endpoint
  4. After the closing line finishes, the competency assessment overlay appears with score, feedback, transcript download, and restart

Generated assets

Page Generated by
Training Browserbase research → scenario outline + one Gemini scene background
Published Course catalog copy + thumbnail
Ads B2B ad hook + vertical image

Training assets are written to public/trainings/{id}/. Marketing assets are served from /api/assets/:id.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages