Enterprise Training Studio

Voice-first corporate training studio powered by Deepgram Voice Agent, Gemini (scenario design + assessment), and Browserbase (web research during creation). Learners practice through continuous voice roleplay scenarios with a single scene background and competency scoring at the end.

Run

npm install
npm run dev

Opens the Vite client at http://localhost:5173 and the API server at http://localhost:3001.

Environment (`.env`)

Variable	Used for
`DEEPGRAM_API_KEY`	Voice Agent WebSocket, speech-to-text, and TTS
`GEMINI_API_KEY`	Scenario generation, runtime agent brain, image generation, and competency scoring
`GEMINI_MODEL`	Agent brain (`gemini-3.1-flash-lite`)
`GEMINI_IMAGE_MODEL`	Scene backgrounds and marketing thumbnails
`BROWSERBASE_API_KEY`	Researches topics on the web when creating/editing simulations
`BROWSERBASE_PROJECT_ID`	Optional Browserbase project (inferred from API key if omitted)
`ARIZE_API_KEY`	Required — competency assessment traces to Arize AX (OTLP)
`ARIZE_SPACE_ID`	Required — Arize space ID for trace export
`ARIZE_PROJECT_NAME`	Optional project name in Arize (default: `gamestudio-learning`)

How it works

Studio (voice or type) — describe an enterprise training scenario
Browserbase (creation) — opens a cloud browser and gathers factual excerpts from the web (see below)
Gemini (creation) — designs a voice roleplay scenario with decision points and generates one scene background from the research
Training (play) — learner has one continuous spoken conversation with a Deepgram-powered voice agent that roleplays the scenario (see below)
Gemini + Arize (assessment) — at the natural end of the scenario, the full transcript is scored and optionally exported to Arize (see below)

Deepgram, Browserbase, and Arize

Deepgram

Deepgram powers all real-time voice and spoken audio in the app. The server never streams raw audio to the client for storage — it mints short-lived tokens and builds agent configs; the browser connects directly to Deepgram via the @deepgram/agents SDK.

Studio (create / edit flow) — On the Studio page, the mic opens a Deepgram Voice Agent session (GET /api/deepgram/token, GET /api/deepgram/config). Deepgram handles speech-to-text and turn detection. Gemini is wired in as the agent’s “think” provider: it interprets what you said and decides which studio tool to call (create_game, edit_game, publish_game, etc.). Spoken replies are rendered client-side through POST /api/deepgram/speak (Aura TTS), not the agent’s built-in speak pipeline.

Training (roleplay flow) — On the Training page, a separate lesson agent config is built per session (POST /api/deepgram/lesson-config). Here Deepgram listens to the learner’s microphone (with end-of-turn detection) while Gemini drives an in-character roleplay: the agent speaks as the scenario character(s), pushes back on weak answers, and walks through generated decision points. When the scenario reaches a natural endpoint, the agent calls a complete_scenario tool. Agent lines are spoken via the same REST TTS endpoint so captions stay in sync with audio.

Where it runs

Surface	Deepgram role
Studio mic	STT + turn taking; Gemini decides studio actions
Training mic	STT + turn taking; Gemini roleplays the scenario
All spoken lines	Aura TTS via `POST /api/deepgram/speak`

If DEEPGRAM_API_KEY is missing, voice mode is disabled but typed Studio chat and non-voice flows still work.

Browserbase

Browserbase is used only during creation and editing, not while a learner is training. When you ask Studio to create or edit a module, the server starts a Browserbase cloud browser session and connects to it with Playwright over CDP (server/browserbase.ts).

The session searches the web for material related to your topic:

Wikipedia — runs a search, opens the first relevant article, and extracts a text excerpt.
MDN — same pattern for technical or product-adjacent topics.

Those excerpts (URL, title, and body text) are formatted into research notes and passed to Gemini when generating:

The scenario outline (setting, persona, opening line, 3–4 decision points with 3–4 choices each)
Edits to an existing module (edit_game)

Learners never browse the live web during training — Browserbase is a research step for accurate scenario content, similar to an instructional designer looking up source material before writing a script.

If BROWSERBASE_API_KEY is not set, generation falls back to Gemini’s general knowledge without web research.

Arize

Arize receives observability traces for competency assessments, not live voice audio or Browserbase sessions. Scoring itself is done by Gemini in POST /api/learning/evaluate.

When a training session ends:

The full conversation transcript and scenario decision points are sent to the evaluate endpoint.
Gemini maps each decision point to what the learner said and assigns quality (strong / adequate / weak).
A weighted mastery score and verdict are computed and returned to the UI.
OpenTelemetry spans are emitted to Arize AX via OTLP with OpenInference attributes (input.value, output.value, openinference.span.kind, session.id).

Trace hierarchy per session:

Span name	Kind	Purpose
`training.session`	CHAIN	Root trace for one assessment
`llm.analyze_decisions`	LLM	Gemini decision mapping
`llm.score_session`	LLM	Gemini mastery scoring
`learning.decision.eval`	LLM	One span per decision point — use these for Q&A evaluator
`llm.analyze_decisions`	LLM	Internal Gemini decision-mapping call
`llm.score_session`	LLM	Internal Gemini scoring call
`training.assessment`	CHAIN	Session summary

Project filter in Arize: set ARIZE_PROJECT_NAME in .env (default gamestudio-learning) and select that project in the Arize UI.

LLM-as-a-Judge template: use Q&A with scope Span and this filter:

name = 'learning.decision.eval'

(Arize's default preview filter openinference.span.kind = LLM will match these spans.)

Template variable	Map to attribute
`{input}`	`attributes.input.value` → JSON field `input` (reference context + choices)
`{question}`	`attributes.evaluation.question`
`{output}`	`attributes.output.value` (learner's spoken answer)

Do not target llm.analyze_decisions or llm.score_session — those are internal Gemini prompts, not learner Q&A.

Optional: add User Frustration on training.assessment spans using the transcript in input.value.

The assessment overlay in the app is the learner-facing result. Arize is required for competency assessment — traces are flushed on every evaluation so they appear in Arize AX within ~30 seconds.

Studio agent flow

Tap the mic on Studio (or use Type mode) → connects to Deepgram Voice Agent or Gemini text chat
Describe a scenario — agent transcribes/responds and calls studio functions
Tap again to disconnect (voice mode)

Example commands:

"Create a sales discovery call simulation for new enterprise AEs"
"Add a branch for handling pricing objections"
"Publish this training module"
"Make a B2B ad for sales enablement leaders"

Training flow

Open a module from the Library → single background with voice UI overlay
The agent opens with the scenario’s in-character opening line (spoken via Deepgram TTS)
Learner unmutes and responds; the agent roleplays through decision points until a natural endpoint
After the closing line finishes, the competency assessment overlay appears with score, feedback, transcript download, and restart

Generated assets

Page	Generated by
Training	Browserbase research → scenario outline + one Gemini scene background
Published	Course catalog copy + thumbnail
Ads	B2B ad hook + vertical image

Training assets are written to public/trainings/{id}/. Marketing assets are served from /api/assets/:id.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
public		public
scripts		scripts
server		server
shared		shared
src		src
.gitignore		.gitignore
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enterprise Training Studio

Run

Environment (`.env`)

How it works

Deepgram, Browserbase, and Arize

Deepgram

Browserbase

Arize

Studio agent flow

Training flow

Generated assets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Enterprise Training Studio

Run

Environment (.env)

How it works

Deepgram, Browserbase, and Arize

Deepgram

Browserbase

Arize

Studio agent flow

Training flow

Generated assets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment (`.env`)

Packages