Tagline: To learn it, teach it. To master it, teach yourself.
My Stardust is a personalised AI learning companion that acts as a digital mirror of the user, cloned from their voice and a 3D headshot. Designed for university students, it turns solo revision into an active teaching experience — the user "trains" their Stardust by explaining concepts to it, leveraging active recall and the Feynman Technique through an evolving, holographic companion.
- Architecture Overview
- Tech Stack
- External APIs & Services
- Electron Process Model
- IPC Bridge Layer
- ElevenLabs Conversational Agent
- Client Tool Definitions
- Active Recall Pipeline
- Knowledge Base (Vector Memory)
- Google Gemini Integration
- 3D Particle Avatar System
- Onboarding Pipeline
- Exercise Generation Pipeline
- Visual Generation Pipeline
- Window System
- Project Structure
- Environment Variables
- Scripts & Dev Tools
- Build & Distribution
My Stardust is a desktop application built on the Nextron framework (Electron + Next.js). It runs as a transparent, frameless, full-screen overlay on the user's desktop. The rendering layer is a Next.js 14 app (static export, no SSR) served inside Electron's BrowserWindow. The main process handles all privileged operations (file I/O, native APIs, AI service calls) and communicates with the renderer via a strict IPC bridge.
┌─────────────────────────────────────────────────────────┐
│ Electron Main Process │
│ main/background.js │
│ ┌───────────────────┐ ┌──────────────────────────┐ │
│ │ electron-store │ │ services/knowledge.js │ │
│ │ (user profile) │ │ (vector memory, JSON) │ │
│ └───────────────────┘ └──────────────────────────┘ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ services/gemini.js │ │
│ │ - generateEmbedding (gemini-embedding-001) │ │
│ │ - generateText (gemini-2.5-flash) │ │
│ │ - generateImage (gemini-2.5-flash-image) │ │
│ │ - generateCodeExerciseStream (streaming) │ │
│ └───────────────────────────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────┘
│ contextBridge (ipc)
┌──────────────────────────┴──────────────────────────────┐
│ Electron Renderer Process │
│ Next.js 14 (static export) │
│ ┌───────────────────────────────────────────────────┐ │
│ │ pages/home.jsx │ │
│ │ - useConversation (@elevenlabs/react) │ │
│ │ - clientTools: add_knowledge, retrieve_knowledge │ │
│ │ - question_user_recall, │ │
│ │ check_user_recall_answer, │ │
│ │ generate_visual, │ │
│ │ generate_code_exercise │ │
│ └───────────────────────────────────────────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Scene.jsx │ │ Whiteboard │ │ Exercise │ │
│ │ (R3F Canvas)│ │ Overlay.jsx │ │ Overlay.jsx │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ GhostHead.jsx (particle system, 7 morph states) │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
│ WebSocket (ElevenLabs Realtime)
▼
┌─────────────────────┐ ┌─────────────────────┐
│ ElevenLabs │────▶│ Anthropic │
│ Conversational AI │ │ claude-3-5-sonnet │
│ (ASR + TTS) │ │ (LLM reasoning) │
└─────────────────────┘ └─────────────────────┘
| Layer | Technology | Version |
|---|---|---|
| Desktop shell | Electron | ^34.0.0 |
| Next.js bridge | Nextron | ^9.5.0 |
| Renderer framework | Next.js | ^14.2.4 |
| UI library | React | ^18.3.1 |
| Language | JavaScript (ESM in main process, JSX in renderer) | — |
| Library | Version | Role |
|---|---|---|
| Three.js | ^0.183.1 |
Low-level WebGL geometry, materials, BufferGeometry |
| @react-three/fiber | ^8.17.10 |
React reconciler for Three.js (Canvas, useFrame) |
| @react-three/drei | ^9.122.0 |
useGLTF, OrbitControls, Suspense helpers |
| Library | Version | Role |
|---|---|---|
| @elevenlabs/react | ^0.14.0 |
useConversation hook, WebSocket session management |
| @google/genai | ^1.42.0 |
Gemini text, embedding, image, streaming APIs |
| Library | Version | Role |
|---|---|---|
| electron-store | ^8.2.0 |
Persistent key-value store for user profile |
| electron-serve | ^1.3.0 |
Serves the static Next.js export in production |
| Library | Version | Role |
|---|---|---|
| react-icons | ^5.5.0 |
IoIosResize icon for the resize handle |
| Tool | Version | Role |
|---|---|---|
| electron-builder | ^24.13.3 |
Packages the app into .dmg / .zip / .exe |
| Next.js webpack | built-in | Bundles renderer JS |
- Base URL:
https://api.elevenlabs.io/v1/ - WebSocket: Managed by
@elevenlabs/reactSDK - Agent ID:
agent_1701khz2fkm6fpatfs601dpfn1w9 - Agent LLM:
claude-3-5-sonnet(Anthropic, routed through ElevenLabs) - TTS Model:
eleven_turbo_v2 - Default Voice ID:
fVVjLtJgnQI61CoImgHU - ASR Provider: ElevenLabs built-in
- ASR Audio Format: PCM 16000 Hz (input)
- TTS Audio Format: PCM 16000 Hz (output)
- Turn Model:
turn_v2 - Turn Timeout: 7 seconds
- Max Session Duration: 600 seconds (10 minutes)
- Streaming Latency Optimisation: Level 3
- TTS Stability: 0.5 | Similarity Boost: 0.8 | Speed: 1.0
- Client Events subscribed:
audio,interruption,user_transcript,agent_response,agent_response_correction
Onboarding REST endpoints used:
POST /v1/convai/agents/{BASE_AGENT_ID}/duplicate → clone agent for new user
POST /v1/voices/add → upload 10s webm, get voiceId
PATCH /v1/convai/agents/{newAgentId} → set tts.voice_id
| Model | Use case | API call |
|---|---|---|
gemini-embedding-001 |
Generate vector embeddings for knowledge base | models.embedContent |
gemini-2.5-flash |
Text generation — question creation, answer evaluation | models.generateContent |
gemini-2.5-flash |
Exercise streaming — full HTML/CSS/JS game | models.generateContentStream |
gemini-2.5-flash-image |
Visual generation — whiteboard diagrams | models.generateContent with responseModalities: ['IMAGE'] |
- API Key env var:
NEXT_PUBLIC_GEMINI_API_KEY - Client instantiation: Lazy singleton
new GoogleGenAI({ apiKey })inmain/services/gemini.js - Image prompt constraint: Appends
"Minimalist whiteboard illustration style, clean black ink strokes on a pure white background, hand-drawn diagrammatic style. Aspect ratio 8:5."to all image prompts.
- Storage upload:
POST https://fal.run/fal-ai/storage/upload- Headers:
Authorization: Key {FAL_KEY},Content-Type: image/png - Body: raw image blob
- Returns:
{ url: "https://..." }
- Headers:
- Job enqueue:
POST https://queue.fal.run/fal-ai/meshy/v6/image-to-3d- Body:
{ input: { image_url } } - Returns:
{ request_id }
- Body:
- Status polling:
GET https://queue.fal.run/fal-ai/meshy/v6/image-to-3d/requests/{request_id}/status- Polls every 3 seconds, up to 60 attempts (3-minute timeout)
- Statuses:
IN_QUEUE,COMPLETED,FAILED
- Result fetch:
GET https://queue.fal.run/fal-ai/meshy/v6/image-to-3d/requests/{request_id}- Returns:
{ model_glb: { url }, thumbnail: { url } }
- Returns:
- API Key env var:
NEXT_PUBLIC_FAL_KEY
- Entry point defined in
package.json→"main": "app/background.js"(compiled output) - Source:
main/background.js(ESM with Webpack) - Responsibilities:
- Create the transparent, frameless
BrowserWindow - Request macOS microphone permission via
systemPreferences.askForMediaAccess('microphone') - Handle all IPC channels (
ipcMain.handle/ipcMain.on) - Manage the
electron-storeuser profile - Download and replace
avatar.glbvia Node.jshttps/httpstreams - Register global shortcuts
- Create the transparent, frameless
BrowserWindow config:
{
width: screenW, // full work area width
height: screenH, // full work area height
x: 0, y: 0,
transparent: true, // desktop shows through
frame: false, // no title bar / chrome
resizable: true,
hasShadow: false,
webPreferences: {
preload: 'preload.js',
nodeIntegration: false,
contextIsolation: true, // security: renderer cannot access Node directly
},
}Chromium command-line switches applied:
--disable-features=WebRtcHideLocalIpsWithMdns (prevents WebRTC crash)
--enable-features=WebRTCPipeWireCapturer (Linux compatibility)
Global shortcuts:
Cmd/Ctrl+Shift+I— Toggle DevToolsCmd/Ctrl+Shift+R— Clearelectron-storeand reload (dev reset)
- Next.js 14 app, output mode:
export(static HTML/JS, no server) distDir:../appin production,.nextin developmenttrailingSlash: true,images.unoptimized: true- Loaded via
app://./home(prod) orhttp://localhost:{port}/home(dev) - All Node.js APIs are strictly off (
nodeIntegration: false) — renderer communicates only through thewindow.ipcobject injected by the preload script.
Bridges renderer ↔ main using Electron's contextBridge:
contextBridge.exposeInMainWorld('ipc', {
send(channel, value), // fire-and-forget
invoke(channel, ...args), // returns Promise
on(channel, callback), // event listener; returns unsubscribe fn
})All communication between the React renderer and the Electron main process goes through named IPC channels:
| Channel | Direction | Type | Handler | Description |
|---|---|---|---|---|
add-knowledge |
renderer→main | invoke |
addKnowledge(text, kbPath) |
Embed + store knowledge entry |
retrieve-knowledge |
renderer→main | invoke |
retrieveKnowledge(query, topK, kbPath) |
Cosine similarity search |
gemini-generate |
renderer→main | invoke |
generateText(prompt) |
Gemini 2.5 Flash text |
generate-visual |
renderer→main | invoke |
generateImage(prompt) |
Gemini image → base64 |
generate-exercise |
renderer→main | send |
generateCodeExerciseStream(query, onChunk) |
Streaming HTML game |
exercise-chunk |
main→renderer | send |
— | Streamed text chunk |
exercise-done |
main→renderer | send |
— | Stream complete signal |
exercise-error |
main→renderer | send |
— | Stream error signal |
resize-window |
renderer→main | send |
win.setBounds(...) |
IPC-driven window resize |
set-ignore-mouse-events |
renderer→main | send |
win.setIgnoreMouseEvents(...) |
Click-through toggle |
get-user-profile |
renderer→main | invoke |
userStore.store |
Read persisted profile |
set-user-profile |
renderer→main | invoke |
userStore.set(k, v) |
Write persisted profile |
download-and-replace-avatar |
renderer→main | invoke |
Downloads GLB via https.get, writes to renderer/public/avatar.glb |
Avatar pipeline step 3 |
Knowledge base path: app.getPath('userData') + '/knowledge_base.json'
The agent is configured in agent_config.json (full snapshot of the ElevenLabs agent definition):
{
"agent_id": "agent_1701khz2fkm6fpatfs601dpfn1w9",
"conversation_config": {
"asr": {
"quality": "high",
"provider": "elevenlabs",
"user_input_audio_format": "pcm_16000"
},
"turn": {
"turn_timeout": 7,
"mode": "turn",
"turn_eagerness": "normal",
"turn_model": "turn_v2"
},
"tts": {
"model_id": "eleven_turbo_v2",
"voice_id": "fVVjLtJgnQI61CoImgHU",
"agent_output_audio_format": "pcm_16000",
"optimize_streaming_latency": 3,
"stability": 0.5,
"speed": 1,
"similarity_boost": 0.8
},
"conversation": {
"max_duration_seconds": 600,
"client_events": ["audio","interruption","user_transcript","agent_response","agent_response_correction"]
}
}
}System prompt:
"You are 'My Stardust', a student's digital twin. When a user wants to be tested, use
question_user_recall. After the user answers, usecheck_user_recall_answer. If you are correct, you must say 'Correct answer: [Explanation]'. If incorrect, say 'Incorrect answer: [Explanation]'. For all other chat, be warm and empathetic."
LLM: claude-3-5-sonnet (temperature: 0, parallel tool calls: false)
React SDK usage (pages/home.jsx):
const conversation = useConversation({
clientTools: { add_knowledge, retrieve_knowledge, question_user_recall,
check_user_recall_answer, generate_visual, generate_code_exercise },
onConnect, onDisconnect, onMessage, onError,
})
// Start/stop session
await conversation.startSession({ agentId: "agent_1701khz2fkm6fpatfs601dpfn1w9" })
await conversation.endSession()
// State
conversation.status // 'connected' | 'disconnected'
conversation.isSpeaking // boolean (drives pulse animation)All 6 tools are registered as clientTools in useConversation. ElevenLabs calls them on the client; the renderer handles them and returns results back to the agent.
- Trigger: Agent decides to store new information the user has explained
- ElevenLabs timeout: 20 seconds
- Flow:
- Sets
isThinkingRef.current = true(triggers thinking morph) - Calls
window.ipc.invoke('add-knowledge', text)→ main process - Main calls
addKnowledge(text, kbPath):- Calls
generateEmbedding(text)→ Geminigemini-embedding-001 - Appends
{ id, text, embedding, createdAt }toknowledge_base.json
- Calls
- Returns confirmation string to agent
- Sets
- Trigger: Agent needs to recall something the user previously taught it
- ElevenLabs timeout: 20 seconds
- Default topK: 3
- Flow:
- Calls
window.ipc.invoke('retrieve-knowledge', query, topK) - Main calls
retrieveKnowledge(query, topK, kbPath):- Embeds the query via
gemini-embedding-001 - Loads all entries from
knowledge_base.json - Computes cosine similarity for every entry
- Sorts descending, returns top-K as
{ id, text, score, createdAt }[]
- Embeds the query via
- Formats results as
[1] text\n\n[2] text...string for agent
- Calls
- Trigger: User says "quiz me on X" or "ask me a question about Y"
- ElevenLabs timeout: 30 seconds
- Flow:
window.ipc.invoke('retrieve-knowledge', topic, 3)— fetch what student knows about topic- Builds prompt:
"This is what the student knows: {notes}\n\nGenerate ONE short-answer question about "{topic}" the question must be answerable using the limited knowledge provided... DO NOT provide the answer." window.ipc.invoke('gemini-generate', prompt)→gemini-2.5-flash- Returns generated question string to ElevenLabs agent (agent speaks it aloud)
- Trigger: User has responded to a quiz question
- ElevenLabs timeout: 30 seconds
- Flow:
- Builds full conversation history string from
chatHistoryRef.current - Prompt:
"Recent Conversation History:\n{history}\n\nUser's Answer: {user_answer}\n\nEvaluate... If correct, respond EXACTLY starting with 'Correct answer: '... If incorrect, 'Incorrect answer: '..." window.ipc.invoke('gemini-generate', prompt)→gemini-2.5-flash- Parses evaluation string to determine morph state:
"correct answer:"→resultMorphRef.current = 1→ success morph (3s auto-reset)"incorrect answer:"→resultMorphRef.current = -1→ failure morph (3s auto-reset)
- Returns evaluation string to ElevenLabs (agent speaks it in user's cloned voice)
- Builds full conversation history string from
- Trigger: User asks for a visual explanation, diagram, or illustration
- Flow:
- Sets
isGeneratingRef.current = true(gear morph) - Opens
WhiteboardOverlay(loading state, no image yet) window.ipc.invoke('generate-visual', prompt)→ main process- Main appends whiteboard style suffix to prompt, calls
gemini-2.5-flash-image - Extracts
inlineData.data(base64 PNG), returnsdata:image/png;base64,... - Sets
generatedImageUrl→ overlay renders the image - Returns
textparam to agent to read aloud
- Sets
- Trigger: User asks for an interactive game or exercise
- Flow:
- Sets
isGeneratingRef.current = true(gear morph) - Opens
DigitalExerciseOverlaywith loading state window.ipc.send('generate-exercise', query)— fire-and-forget (streaming)- Main calls
generateCodeExerciseStream(query, onChunk):- Sends chunked HTML to renderer via
event.sender.send('exercise-chunk', chunk) - Strips markdown fences (
```html) from chunks before sending - Fires
exercise-donewhen stream completes
- Sends chunked HTML to renderer via
- Renderer appends each chunk to
exerciseCodestate (progressive render) - Returns
textimmediately so agent starts speaking while code generates
- Sets
Full two-stage interactive quiz loop:
User: "Quiz me on programming"
│
▼
ElevenLabs detects intent → calls question_user_recall(topic="programming")
│
▼
[Client] window.ipc.invoke('retrieve-knowledge', 'programming', 3)
│
▼
[Main] Embeds "programming" → cosine search → returns top 3 notes
│
▼
[Client] Builds prompt → window.ipc.invoke('gemini-generate', prompt)
│
▼
[Main] gemini-2.5-flash generates question → returns to client
│
▼
Client returns question string to ElevenLabs
│
▼
ElevenLabs speaks question in user's cloned voice
User: "OOP uses objects to store data..."
│
▼
ElevenLabs calls check_user_recall_answer(user_answer="OOP uses objects...")
│
▼
[Client] Builds history + answer → window.ipc.invoke('gemini-generate', evalPrompt)
│
▼
[Main] gemini-2.5-flash evaluates → "Correct answer: ..."
│
▼
[Client] Parses verdict → sets resultMorphRef = 1 (success)
Avatar morphs: head → ✅ (green, 3 seconds) → resets to head
│
▼
ElevenLabs speaks evaluation in user's cloned voice
Implementation: main/services/knowledge.js
The knowledge base is a flat JSON array persisted to the user's userData directory. There is no external database — everything is local and offline.
Storage format (knowledge_base.json):
[
{
"id": "1751234567890-abc123",
"text": "OOP uses objects to store data and methods that operate on that data.",
"embedding": [0.023, -0.041, ...], // 3072-dimensional float array
"createdAt": "2025-06-01T12:00:00.000Z"
}
]addKnowledge(text, dbPath):
generateEmbedding(text)→GoogleGenAI.models.embedContent({ model: 'gemini-embedding-001', contents: text })- Generates unique ID:
Date.now() + '-' + Math.random().toString(36).slice(2,8) - Appends entry, writes JSON with
JSON.stringify(store, null, 2)
retrieveKnowledge(query, topK, dbPath):
generateEmbedding(query)→ query vector- Loads all entries from JSON file
- For each entry, computes cosine similarity:
dot / (Math.sqrt(normA) * Math.sqrt(normB))
- Sorts descending by score, slices top-K
- Returns
{ id, text, score, createdAt }[](embedding stripped from response)
User profile store (electron-store):
- Store name:
user-profile - Keys stored:
agentId(cloned agent),voiceId(cloned voice) - Cleared on
Cmd+Shift+Rdev shortcut
Implementation: main/services/gemini.js
The module runs entirely in the Electron main process (Node.js). It reads the API key from renderer/.env at startup if the env var is not already set.
// .env loading (main process)
fs.readFileSync('../renderer/.env').split('\n').forEach(line => {
// Parses KEY=VALUE pairs, strips quotes
})Functions exported:
| Function | Model | Method | Returns |
|---|---|---|---|
generateEmbedding(text) |
gemini-embedding-001 |
models.embedContent |
Float64Array (3072-dim) |
cosineSimilarity(a, b) |
— | Pure math | number (-1 to 1) |
generateText(prompt) |
gemini-2.5-flash |
models.generateContent |
string |
generateImage(prompt) |
gemini-2.5-flash-image |
models.generateContent with responseModalities: ['IMAGE'] |
"data:image/png;base64,..." |
generateCodeExerciseStream(query, onChunk) |
gemini-2.5-flash |
models.generateContentStream |
void (callback-driven) |
Exercise prompt template:
Act as an expert educational game developer. Create a single-file HTML/CSS/JS solution for an interactive exercise based on: {query}.
Constraint 1: Everything MUST be in one self-contained HTML file (internal CSS/JS).
Constraint 2: No external CDNs or libraries (use raw JS/Canvas).
Constraint 3: Use a 'dark mode' aesthetic with glowing accents (cyber-education).
Constraint 4: The game must be responsive. CRITICAL: the exercise will initially load in a small floating window (approx 400x300 pixels).
Implementation: renderer/components/GhostHead.jsx + renderer/components/Scene.jsx
<Canvas
camera={{ position: [0, 0, 3.5], fov: 40, near: 0.01 }}
gl={{ antialias: true, alpha: true }}
onCreated={({ gl }) => {
gl.setClearColor(0x000000, 0) // fully transparent background
}}
>
<OrbitControls enablePan={false} minDistance={0.3} maxDistance={5} />
<GhostHead ... />
</Canvas>- Dynamically imported with
ssr: false(Three.js is not SSR-compatible) - Wrapped in a React
Componenterror boundary
Five GLB files are loaded via useGLTF:
avatar.glb— user's 3D head (default or generated by fal-ai/meshy)thinking.glb— thought-bubble shapesuccess.glb— checkmark (✅)failure.glb— X mark (❌)gear.glb— gear (used during exercise generation)
For each GLB, vertices are extracted and face-sampled:
- Vertex extraction: Traverses all
THREE.Meshnodes, appliesmatrixWorldtransforms, collectspositionandnormalattributes. - Face sampling: For each triangle, samples
SAMPLES_PER_FACE = 3random barycentric points (r1 + r2 ≤ 1; r3 = 1 - r1 - r2). - Normalisation: Non-avatar GLBs are scaled to match the avatar's Y-span (
avatarHeight = maxY - minY) so all morph targets fit in the same visual space. - Sphere positions: Random uniform sphere sampling (
SPHERE_RADIUS = 0.25) — used for speaking state.
All positions stored as Float32Array in memory. A single BufferGeometry is created once (useMemo) and its position attribute is mutated every frame.
Morph state variables (all smoothly lerped):
| Ref | Target | Rate |
|---|---|---|
smoothPulseRef |
pulseRef.current (0 or 0.5–1.0) |
TARGET_SMOOTHING = 0.15 |
morphRef |
1.0 if speaking, else 0.0 | 0.15 × 0.4 |
thinkMorphRef |
1.0 if thinking, else 0.0 | 0.15 × 0.55 |
generateMorphRef |
1.0 if generating, else 0.0 | 0.15 |
smoothResultRef |
resultMorphRef (-1, 0, or 1) |
0.15 |
Morph priority (layered):
- Generating (gear) overrides everything
- Success/failure overrides thinking
- Thinking overrides speaking
- Speaking (sphere) is base
Per-particle position computation (executed for every particle each frame):
Jitter: sin/cos waves with per-particle phase offsets (JITTER_AMPLITUDE = 0.0004)
Head: homePosition + normal × (pulse × DISTORTION_FACTOR × 1.5)
Sphere: spherePosition + radial noise × pulse (diffusion + flare noise)
Head↔Sphere: THREE.MathUtils.lerp(head, sphere, morphRef)
Thinking: lerp(head/sphere, thinkingPos + breatheWave, thinkMorphRef)
Gear: lerp(thinking, gearPos, generateMorphRef)
Success/Failure: lerp(gear, successPos/failurePos + breatheWave, activeResultMorph)
Final: + jitter
Rotation logic:
- Idle: gentle sway —
sin(t × 0.3) × MAX_SWAY_RADon Y,sin(t × 0.4) × 0.3 × MAX_SWAY_RADon X - Thinking: continuous spin
+= delta × 0.22(Y),+= delta × 0.07(X) - Generating: fast gear spin
−= delta × 3.5(Z),+= delta × 1.5(Y) - All lerped at rate
0.05so no snapping during transitions
Tuning constants:
DISTORTION_FACTOR = 0.19 // how far normals push on voice amplitude
JITTER_AMPLITUDE = 0.0004 // micro-jitter per particle
TARGET_SMOOTHING = 0.15 // exponential smoothing rate
PARTICLE_SIZE = 0.006 // WebGL point size
PARTICLE_OPACITY = 0.75
MAX_SWAY_RAD = 0.087 // radians (~5°)
SPHERE_RADIUS = 0.25 // world units
SAMPLES_PER_FACE = 3 // barycentric samples per triangleColor states:
| State | Color |
|---|---|
| Idle | White, brightness 1.0 + pulse × 1.5 |
| Thinking | Blue-white breathing RGB(0.55→1, 0.72→1, 0.65→1) |
| Success | Bright green/teal pulse RGB(0.2→1, 0.8→1, 0.4→1) |
| Failure | Hot red/orange RGB(0.9→1, 0.2→1, 0.2→1) |
| Generating | Oscillating light-blue (#87CEFA) ↔ soft-violet (#DDA0DD) |
Material settings:
<pointsMaterial
size={PARTICLE_SIZE}
transparent
opacity={PARTICLE_OPACITY}
color="#ffffff"
sizeAttenuation // size scales with camera distance
depthWrite={false} // prevents z-fighting with transparent window
/>In home.jsx, a requestAnimationFrame loop updates pulseRef.current:
pulseRef.current = (isConnected && conversation.isSpeaking)
? 0.5 + Math.random() * 0.5
: 0This random 0.5–1.0 range is smoothed in GhostHead.useFrame to create organic pulsation. The pulseRef is a React ref (not state) to avoid re-renders on every frame.
Implementation: renderer/components/onboarding/
State machine with 3 steps (voice → avatar → done) managed by useReducer.
Audio capture:
navigator.mediaDevices.getUserMedia({ audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: true } })MediaRecorderwithmimeType: 'audio/webm;codecs=opus'- Chunk interval: 250ms
- Duration: 10 seconds (auto-stop) or manual early stop
- Level metering:
ScriptProcessorNode(buffer: 2048 samples, 1 channel) with RMS computation — notAnalyserNode(returns zeros in Electron's audio context)
Pipeline (4 REST calls):
1. POST /v1/convai/agents/{BASE_AGENT_ID}/duplicate
→ { agent_id: newAgentId }
2. POST /v1/voices/add (multipart/form-data)
Fields: name, files (voice-sample.webm), remove_background_noise=true, description
→ { voice_id }
3. PATCH /v1/convai/agents/{newAgentId}
Body: { conversation_config: { tts: { voice_id } } }
→ 200 OK
4. window.ipc.invoke('set-user-profile', { agentId: newAgentId, voiceId })
→ persisted to electron-store
Image capture:
- Webcam:
getUserMedia({ video: { width: 512, height: 512, facingMode: 'user' } }) - Canvas crop: square-centre crop at 512×512, mirrored horizontally (
ctx.scale(-1, 1)) - Output: PNG blob via
canvas.toBlob() - File upload:
<input type="file" accept="image/*">with same canvas resize path
Pipeline (5 steps):
1. POST https://fal.run/fal-ai/storage/upload
Body: image PNG blob
Headers: Authorization: Key {FAL_KEY}
→ { url: publicImageUrl }
2. POST https://queue.fal.run/fal-ai/meshy/v6/image-to-3d
Body: { input: { image_url: publicImageUrl } }
→ { request_id }
3. POLL GET .../requests/{request_id}/status every 3s (max 60 × 3s = 3min)
States: IN_QUEUE (show position) → COMPLETED
4. GET .../requests/{request_id}
→ { model_glb: { url: glbUrl } }
5. window.ipc.invoke('download-and-replace-avatar', glbUrl)
Main process: https.get(glbUrl) → fs.createWriteStream(avatarPath)
Overwrites renderer/public/avatar.glb (dev) or app/avatar.glb (prod)
UI Portal animation: Camera viewfinder uses CSS clip-path: circle(0% → 50%) transition for a portal-open effect (cubic-bezier(0.25, 0.8, 0.25, 1), 800ms).
- ElevenLabs agent calls
generate_code_exercise({ query, text }) - Renderer fires
window.ipc.send('generate-exercise', query)(non-blocking) - Returns
textimmediately so agent starts narrating - Main process calls
generateCodeExerciseStream(query, onChunk)which callsgetAI().models.generateContentStream - Each
chunk.textis cleaned of markdown fences (```html) and sent viaevent.sender.send('exercise-chunk', cleaned) - Renderer accumulates chunks in
exerciseCodestate viaipc.on('exercise-chunk') DigitalExerciseOverlayrenders the accumulated HTML inside a<iframe srcDoc>(progressively updating)exercise-donefires → loading state clears, gear morph resets
- ElevenLabs agent calls
generate_visual({ prompt, text }) - Renderer opens
WhiteboardOverlay(shows loading shimmer) window.ipc.invoke('generate-visual', prompt)→ main process- Main appends whiteboard style suffix, calls
gemini-2.5-flash-imagewithresponseModalities: ['IMAGE'] - Extracts
candidates[0].content.parts[].inlineData.data(base64) - Returns
"data:image/png;base64,{data}"string - Renderer sets
generatedImageUrl→<img>renders in overlay - Agent speaks the
textparameter aloud
The window uses setIgnoreMouseEvents to be transparent to mouse events when the cursor is not over interactive UI:
- On mount:
set-ignore-mouse-events true { forward: true }— window is click-through, events forwarded to OS onMouseEnterany interactive element:set-ignore-mouse-events false— window captures eventsonMouseLeave:set-ignore-mouse-events true { forward: true }— back to click-through
Top-left 30×30px circle with WebkitAppRegion: 'drag' — allows window dragging without a title bar.
Bottom-right resize handle uses Pointer Capture API (setPointerCapture/releasePointerCapture) to track drag delta and calls window.ipc.send('resize-window', { width, height }). Main process uses win.setBounds() with a minimum of 400×300.
nextronapp_before_gamma/
├── main/ # Electron main process (Node.js / ESM)
│ ├── background.js # Entry point, IPC handlers, BrowserWindow
│ ├── preload.js # contextBridge: exposes window.ipc
│ ├── helpers/
│ │ ├── create-window.js # (nextron scaffold)
│ │ └── index.js
│ └── services/
│ ├── gemini.js # Gemini SDK wrapper (embed, text, image, stream)
│ └── knowledge.js # Vector store: add/retrieve (cosine similarity)
│
├── renderer/ # Next.js app (renderer process)
│ ├── pages/
│ │ ├── _app.jsx # Global Next.js app wrapper
│ │ ├── home.jsx # Main page: ElevenLabs session, tool handlers
│ │ └── next.jsx # (scaffold page)
│ ├── components/
│ │ ├── Scene.jsx # R3F Canvas + OrbitControls + error boundary
│ │ ├── GhostHead.jsx # Particle avatar: GLB sampling, 7-morph animation
│ │ ├── WhiteboardOverlay.jsx # Fullscreen image display for generate_visual
│ │ ├── DigitalExerciseOverlay.jsx # Iframe-based HTML game display
│ │ └── onboarding/
│ │ ├── OnboardingOverlay.jsx # State machine (voice → avatar → done)
│ │ ├── StepVoiceCapture.jsx # MediaRecorder + ElevenLabs voice clone API
│ │ └── StepAvatarCapture.jsx # Webcam + fal-ai/meshy 3D pipeline
│ ├── styles/
│ │ └── globals.css
│ ├── public/
│ │ ├── avatar.glb # Default 3D head (replaced after onboarding)
│ │ ├── thinking.glb # Thought-bubble morph shape
│ │ ├── success.glb # ✅ morph shape
│ │ ├── failure.glb # ❌ morph shape
│ │ ├── gear.glb # Gear morph shape (generating state)
│ │ └── images/logo.png
│ ├── next.config.js # output: export, distDir: ../app (prod)
│ └── .env # API keys (gitignored)
│
├── agent_config.json # Full ElevenLabs agent definition snapshot
├── package.json # Dependencies + scripts
├── electron-builder.yml # Build config (output: dist/, resources/)
├── resources/
│ ├── icon.icns # macOS app icon
│ └── icon.ico # Windows app icon
├── scripts/
│ ├── test-active-recall.js # Dev test for active recall pipeline
│ └── update_agent.js # Script to push agent config changes to ElevenLabs
├── reset-knowledge.js # Clears knowledge_base.json
└── test-knowledge.js # Tests add/retrieve knowledge flow
All env vars live in renderer/.env (loaded by Next.js in renderer; also manually parsed by main/services/gemini.js for main process access):
| Variable | Required | Description |
|---|---|---|
NEXT_PUBLIC_GEMINI_API_KEY |
Yes | Google AI Studio API key |
NEXT_PUBLIC_ELEVENLABS_API_KEY |
Yes | ElevenLabs xi-api-key header for REST calls |
NEXT_PUBLIC_BASE_AGENT_ID |
Yes | Base agent to duplicate during onboarding |
NEXT_PUBLIC_ELEVENLABS_AGENT_ID |
Yes | Active agent ID for conversation sessions |
NEXT_PUBLIC_FAL_KEY |
Yes | fal-ai API key (key_id:key_secret format) |
# Development (hot-reload Electron + Next.js)
yarn dev
# Production build (static export to app/ + Electron compile)
yarn build
# Install native Electron deps after npm install
yarn postinstall # electron-builder install-app-deps
# Clear the local knowledge base (knowledge_base.json)
yarn reset-kb # node reset-knowledge.js
# Test add/retrieve knowledge pipeline
yarn test-kb # node test-knowledge.js
# Test the active recall two-stage loop
yarn test-active-recall # node scripts/test-active-recall.jsIn-app dev shortcuts:
| Shortcut | Action |
|---|---|
Cmd/Ctrl+Shift+I |
Toggle DevTools (detached window in dev, toggleable in prod) |
Cmd/Ctrl+Shift+R |
Clear electron-store (reset onboarding) + reload renderer |
Tool: electron-builder v24.13.3
Config (electron-builder.yml):
appId: com.example.nextron
productName: My Nextron App
directories:
output: dist
buildResources: resources
files:
- from: .
filter: [package.json, app]Build artifacts (dist/):
My Nextron App-1.0.0-arm64.dmg— macOS installer (Apple Silicon)My Nextron App-1.0.0-arm64-mac.zip— macOS zip (Apple Silicon).blockmapfiles for delta updates
Build pipeline:
nextron buildtriggers:next buildinsiderenderer/→ static export to../app/- Webpack compiles
main/background.js+main/preload.js→app/background.js+app/preload.js
electron-builderpackagesapp/+package.jsoninto an Asar archive- Output: signed (or unsigned)
.dmg+.zip
Production vs development loading:
// Production: custom protocol
serve({ directory: 'app' })
mainWindow.loadURL('app://./home')
// Development: Next.js dev server
mainWindow.loadURL(`http://localhost:${port}/home`)
mainWindow.webContents.openDevTools({ mode: 'detach' })