Skip to content

calebareeveso/My-Startdust-HackLondon

Repository files navigation

My Stardust

Tagline: To learn it, teach it. To master it, teach yourself.

My Stardust is a personalised AI learning companion that acts as a digital mirror of the user, cloned from their voice and a 3D headshot. Designed for university students, it turns solo revision into an active teaching experience — the user "trains" their Stardust by explaining concepts to it, leveraging active recall and the Feynman Technique through an evolving, holographic companion.


Table of Contents

  1. Architecture Overview
  2. Tech Stack
  3. External APIs & Services
  4. Electron Process Model
  5. IPC Bridge Layer
  6. ElevenLabs Conversational Agent
  7. Client Tool Definitions
  8. Active Recall Pipeline
  9. Knowledge Base (Vector Memory)
  10. Google Gemini Integration
  11. 3D Particle Avatar System
  12. Onboarding Pipeline
  13. Exercise Generation Pipeline
  14. Visual Generation Pipeline
  15. Window System
  16. Project Structure
  17. Environment Variables
  18. Scripts & Dev Tools
  19. Build & Distribution

Architecture Overview

My Stardust is a desktop application built on the Nextron framework (Electron + Next.js). It runs as a transparent, frameless, full-screen overlay on the user's desktop. The rendering layer is a Next.js 14 app (static export, no SSR) served inside Electron's BrowserWindow. The main process handles all privileged operations (file I/O, native APIs, AI service calls) and communicates with the renderer via a strict IPC bridge.

┌─────────────────────────────────────────────────────────┐
│                     Electron Main Process               │
│  main/background.js                                     │
│  ┌───────────────────┐   ┌──────────────────────────┐  │
│  │  electron-store   │   │  services/knowledge.js   │  │
│  │  (user profile)   │   │  (vector memory, JSON)   │  │
│  └───────────────────┘   └──────────────────────────┘  │
│  ┌───────────────────────────────────────────────────┐  │
│  │  services/gemini.js                               │  │
│  │  - generateEmbedding (gemini-embedding-001)       │  │
│  │  - generateText (gemini-2.5-flash)                │  │
│  │  - generateImage (gemini-2.5-flash-image)         │  │
│  │  - generateCodeExerciseStream (streaming)         │  │
│  └───────────────────────────────────────────────────┘  │
└──────────────────────────┬──────────────────────────────┘
                           │ contextBridge (ipc)
┌──────────────────────────┴──────────────────────────────┐
│              Electron Renderer Process                  │
│              Next.js 14 (static export)                 │
│  ┌───────────────────────────────────────────────────┐  │
│  │  pages/home.jsx                                   │  │
│  │  - useConversation (@elevenlabs/react)            │  │
│  │  - clientTools: add_knowledge, retrieve_knowledge │  │
│  │  -              question_user_recall,             │  │
│  │                 check_user_recall_answer,         │  │
│  │                 generate_visual,                  │  │
│  │                 generate_code_exercise            │  │
│  └───────────────────────────────────────────────────┘  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │  Scene.jsx   │  │  Whiteboard  │  │  Exercise    │  │
│  │  (R3F Canvas)│  │  Overlay.jsx │  │  Overlay.jsx │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
│  ┌──────────────────────────────────────────────────┐   │
│  │  GhostHead.jsx (particle system, 7 morph states) │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘
         │
         │ WebSocket (ElevenLabs Realtime)
         ▼
┌─────────────────────┐     ┌─────────────────────┐
│  ElevenLabs         │────▶│  Anthropic           │
│  Conversational AI  │     │  claude-3-5-sonnet   │
│  (ASR + TTS)        │     │  (LLM reasoning)     │
└─────────────────────┘     └─────────────────────┘

Tech Stack

Runtime & Framework

Layer Technology Version
Desktop shell Electron ^34.0.0
Next.js bridge Nextron ^9.5.0
Renderer framework Next.js ^14.2.4
UI library React ^18.3.1
Language JavaScript (ESM in main process, JSX in renderer)

3D Rendering

Library Version Role
Three.js ^0.183.1 Low-level WebGL geometry, materials, BufferGeometry
@react-three/fiber ^8.17.10 React reconciler for Three.js (Canvas, useFrame)
@react-three/drei ^9.122.0 useGLTF, OrbitControls, Suspense helpers

AI SDKs

Library Version Role
@elevenlabs/react ^0.14.0 useConversation hook, WebSocket session management
@google/genai ^1.42.0 Gemini text, embedding, image, streaming APIs

Electron Utilities

Library Version Role
electron-store ^8.2.0 Persistent key-value store for user profile
electron-serve ^1.3.0 Serves the static Next.js export in production

UI Utilities

Library Version Role
react-icons ^5.5.0 IoIosResize icon for the resize handle

Build Tooling

Tool Version Role
electron-builder ^24.13.3 Packages the app into .dmg / .zip / .exe
Next.js webpack built-in Bundles renderer JS

External APIs & Services

1. ElevenLabs Conversational AI

  • Base URL: https://api.elevenlabs.io/v1/
  • WebSocket: Managed by @elevenlabs/react SDK
  • Agent ID: agent_1701khz2fkm6fpatfs601dpfn1w9
  • Agent LLM: claude-3-5-sonnet (Anthropic, routed through ElevenLabs)
  • TTS Model: eleven_turbo_v2
  • Default Voice ID: fVVjLtJgnQI61CoImgHU
  • ASR Provider: ElevenLabs built-in
  • ASR Audio Format: PCM 16000 Hz (input)
  • TTS Audio Format: PCM 16000 Hz (output)
  • Turn Model: turn_v2
  • Turn Timeout: 7 seconds
  • Max Session Duration: 600 seconds (10 minutes)
  • Streaming Latency Optimisation: Level 3
  • TTS Stability: 0.5 | Similarity Boost: 0.8 | Speed: 1.0
  • Client Events subscribed: audio, interruption, user_transcript, agent_response, agent_response_correction

Onboarding REST endpoints used:

POST /v1/convai/agents/{BASE_AGENT_ID}/duplicate   → clone agent for new user
POST /v1/voices/add                                → upload 10s webm, get voiceId
PATCH /v1/convai/agents/{newAgentId}               → set tts.voice_id

2. Google Gemini (@google/genai)

Model Use case API call
gemini-embedding-001 Generate vector embeddings for knowledge base models.embedContent
gemini-2.5-flash Text generation — question creation, answer evaluation models.generateContent
gemini-2.5-flash Exercise streaming — full HTML/CSS/JS game models.generateContentStream
gemini-2.5-flash-image Visual generation — whiteboard diagrams models.generateContent with responseModalities: ['IMAGE']
  • API Key env var: NEXT_PUBLIC_GEMINI_API_KEY
  • Client instantiation: Lazy singleton new GoogleGenAI({ apiKey }) in main/services/gemini.js
  • Image prompt constraint: Appends "Minimalist whiteboard illustration style, clean black ink strokes on a pure white background, hand-drawn diagrammatic style. Aspect ratio 8:5." to all image prompts.

3. fal-ai / Meshy (Image-to-3D)

  • Storage upload: POST https://fal.run/fal-ai/storage/upload
    • Headers: Authorization: Key {FAL_KEY}, Content-Type: image/png
    • Body: raw image blob
    • Returns: { url: "https://..." }
  • Job enqueue: POST https://queue.fal.run/fal-ai/meshy/v6/image-to-3d
    • Body: { input: { image_url } }
    • Returns: { request_id }
  • Status polling: GET https://queue.fal.run/fal-ai/meshy/v6/image-to-3d/requests/{request_id}/status
    • Polls every 3 seconds, up to 60 attempts (3-minute timeout)
    • Statuses: IN_QUEUE, COMPLETED, FAILED
  • Result fetch: GET https://queue.fal.run/fal-ai/meshy/v6/image-to-3d/requests/{request_id}
    • Returns: { model_glb: { url }, thumbnail: { url } }
  • API Key env var: NEXT_PUBLIC_FAL_KEY

Electron Process Model

Main Process (main/background.js)

  • Entry point defined in package.json"main": "app/background.js" (compiled output)
  • Source: main/background.js (ESM with Webpack)
  • Responsibilities:
    • Create the transparent, frameless BrowserWindow
    • Request macOS microphone permission via systemPreferences.askForMediaAccess('microphone')
    • Handle all IPC channels (ipcMain.handle / ipcMain.on)
    • Manage the electron-store user profile
    • Download and replace avatar.glb via Node.js https/http streams
    • Register global shortcuts

BrowserWindow config:

{
  width: screenW,          // full work area width
  height: screenH,         // full work area height
  x: 0, y: 0,
  transparent: true,       // desktop shows through
  frame: false,            // no title bar / chrome
  resizable: true,
  hasShadow: false,
  webPreferences: {
    preload: 'preload.js',
    nodeIntegration: false,
    contextIsolation: true, // security: renderer cannot access Node directly
  },
}

Chromium command-line switches applied:

--disable-features=WebRtcHideLocalIpsWithMdns   (prevents WebRTC crash)
--enable-features=WebRTCPipeWireCapturer         (Linux compatibility)

Global shortcuts:

  • Cmd/Ctrl+Shift+I — Toggle DevTools
  • Cmd/Ctrl+Shift+R — Clear electron-store and reload (dev reset)

Renderer Process (renderer/)

  • Next.js 14 app, output mode: export (static HTML/JS, no server)
  • distDir: ../app in production, .next in development
  • trailingSlash: true, images.unoptimized: true
  • Loaded via app://./home (prod) or http://localhost:{port}/home (dev)
  • All Node.js APIs are strictly off (nodeIntegration: false) — renderer communicates only through the window.ipc object injected by the preload script.

Preload Script (main/preload.js)

Bridges renderer ↔ main using Electron's contextBridge:

contextBridge.exposeInMainWorld('ipc', {
  send(channel, value),          // fire-and-forget
  invoke(channel, ...args),      // returns Promise
  on(channel, callback),         // event listener; returns unsubscribe fn
})

IPC Bridge Layer

All communication between the React renderer and the Electron main process goes through named IPC channels:

Channel Direction Type Handler Description
add-knowledge renderer→main invoke addKnowledge(text, kbPath) Embed + store knowledge entry
retrieve-knowledge renderer→main invoke retrieveKnowledge(query, topK, kbPath) Cosine similarity search
gemini-generate renderer→main invoke generateText(prompt) Gemini 2.5 Flash text
generate-visual renderer→main invoke generateImage(prompt) Gemini image → base64
generate-exercise renderer→main send generateCodeExerciseStream(query, onChunk) Streaming HTML game
exercise-chunk main→renderer send Streamed text chunk
exercise-done main→renderer send Stream complete signal
exercise-error main→renderer send Stream error signal
resize-window renderer→main send win.setBounds(...) IPC-driven window resize
set-ignore-mouse-events renderer→main send win.setIgnoreMouseEvents(...) Click-through toggle
get-user-profile renderer→main invoke userStore.store Read persisted profile
set-user-profile renderer→main invoke userStore.set(k, v) Write persisted profile
download-and-replace-avatar renderer→main invoke Downloads GLB via https.get, writes to renderer/public/avatar.glb Avatar pipeline step 3

Knowledge base path: app.getPath('userData') + '/knowledge_base.json'


ElevenLabs Conversational Agent

The agent is configured in agent_config.json (full snapshot of the ElevenLabs agent definition):

{
  "agent_id": "agent_1701khz2fkm6fpatfs601dpfn1w9",
  "conversation_config": {
    "asr": {
      "quality": "high",
      "provider": "elevenlabs",
      "user_input_audio_format": "pcm_16000"
    },
    "turn": {
      "turn_timeout": 7,
      "mode": "turn",
      "turn_eagerness": "normal",
      "turn_model": "turn_v2"
    },
    "tts": {
      "model_id": "eleven_turbo_v2",
      "voice_id": "fVVjLtJgnQI61CoImgHU",
      "agent_output_audio_format": "pcm_16000",
      "optimize_streaming_latency": 3,
      "stability": 0.5,
      "speed": 1,
      "similarity_boost": 0.8
    },
    "conversation": {
      "max_duration_seconds": 600,
      "client_events": ["audio","interruption","user_transcript","agent_response","agent_response_correction"]
    }
  }
}

System prompt:

"You are 'My Stardust', a student's digital twin. When a user wants to be tested, use question_user_recall. After the user answers, use check_user_recall_answer. If you are correct, you must say 'Correct answer: [Explanation]'. If incorrect, say 'Incorrect answer: [Explanation]'. For all other chat, be warm and empathetic."

LLM: claude-3-5-sonnet (temperature: 0, parallel tool calls: false)

React SDK usage (pages/home.jsx):

const conversation = useConversation({
  clientTools: { add_knowledge, retrieve_knowledge, question_user_recall,
                 check_user_recall_answer, generate_visual, generate_code_exercise },
  onConnect, onDisconnect, onMessage, onError,
})

// Start/stop session
await conversation.startSession({ agentId: "agent_1701khz2fkm6fpatfs601dpfn1w9" })
await conversation.endSession()

// State
conversation.status        // 'connected' | 'disconnected'
conversation.isSpeaking    // boolean (drives pulse animation)

Client Tool Definitions

All 6 tools are registered as clientTools in useConversation. ElevenLabs calls them on the client; the renderer handles them and returns results back to the agent.

add_knowledge({ text })

  • Trigger: Agent decides to store new information the user has explained
  • ElevenLabs timeout: 20 seconds
  • Flow:
    1. Sets isThinkingRef.current = true (triggers thinking morph)
    2. Calls window.ipc.invoke('add-knowledge', text) → main process
    3. Main calls addKnowledge(text, kbPath):
      • Calls generateEmbedding(text) → Gemini gemini-embedding-001
      • Appends { id, text, embedding, createdAt } to knowledge_base.json
    4. Returns confirmation string to agent

retrieve_knowledge({ query, topK? })

  • Trigger: Agent needs to recall something the user previously taught it
  • ElevenLabs timeout: 20 seconds
  • Default topK: 3
  • Flow:
    1. Calls window.ipc.invoke('retrieve-knowledge', query, topK)
    2. Main calls retrieveKnowledge(query, topK, kbPath):
      • Embeds the query via gemini-embedding-001
      • Loads all entries from knowledge_base.json
      • Computes cosine similarity for every entry
      • Sorts descending, returns top-K as { id, text, score, createdAt }[]
    3. Formats results as [1] text\n\n[2] text... string for agent

question_user_recall({ topic })

  • Trigger: User says "quiz me on X" or "ask me a question about Y"
  • ElevenLabs timeout: 30 seconds
  • Flow:
    1. window.ipc.invoke('retrieve-knowledge', topic, 3) — fetch what student knows about topic
    2. Builds prompt: "This is what the student knows: {notes}\n\nGenerate ONE short-answer question about "{topic}" the question must be answerable using the limited knowledge provided... DO NOT provide the answer."
    3. window.ipc.invoke('gemini-generate', prompt)gemini-2.5-flash
    4. Returns generated question string to ElevenLabs agent (agent speaks it aloud)

check_user_recall_answer({ user_answer })

  • Trigger: User has responded to a quiz question
  • ElevenLabs timeout: 30 seconds
  • Flow:
    1. Builds full conversation history string from chatHistoryRef.current
    2. Prompt: "Recent Conversation History:\n{history}\n\nUser's Answer: {user_answer}\n\nEvaluate... If correct, respond EXACTLY starting with 'Correct answer: '... If incorrect, 'Incorrect answer: '..."
    3. window.ipc.invoke('gemini-generate', prompt)gemini-2.5-flash
    4. Parses evaluation string to determine morph state:
      • "correct answer:"resultMorphRef.current = 1 → success morph (3s auto-reset)
      • "incorrect answer:"resultMorphRef.current = -1 → failure morph (3s auto-reset)
    5. Returns evaluation string to ElevenLabs (agent speaks it in user's cloned voice)

generate_visual({ prompt, text })

  • Trigger: User asks for a visual explanation, diagram, or illustration
  • Flow:
    1. Sets isGeneratingRef.current = true (gear morph)
    2. Opens WhiteboardOverlay (loading state, no image yet)
    3. window.ipc.invoke('generate-visual', prompt) → main process
    4. Main appends whiteboard style suffix to prompt, calls gemini-2.5-flash-image
    5. Extracts inlineData.data (base64 PNG), returns data:image/png;base64,...
    6. Sets generatedImageUrl → overlay renders the image
    7. Returns text param to agent to read aloud

generate_code_exercise({ query, text })

  • Trigger: User asks for an interactive game or exercise
  • Flow:
    1. Sets isGeneratingRef.current = true (gear morph)
    2. Opens DigitalExerciseOverlay with loading state
    3. window.ipc.send('generate-exercise', query)fire-and-forget (streaming)
    4. Main calls generateCodeExerciseStream(query, onChunk):
      • Sends chunked HTML to renderer via event.sender.send('exercise-chunk', chunk)
      • Strips markdown fences (```html) from chunks before sending
      • Fires exercise-done when stream completes
    5. Renderer appends each chunk to exerciseCode state (progressive render)
    6. Returns text immediately so agent starts speaking while code generates

Active Recall Pipeline

Full two-stage interactive quiz loop:

User: "Quiz me on programming"
           │
           ▼
ElevenLabs detects intent → calls question_user_recall(topic="programming")
           │
           ▼
[Client] window.ipc.invoke('retrieve-knowledge', 'programming', 3)
           │
           ▼
[Main] Embeds "programming" → cosine search → returns top 3 notes
           │
           ▼
[Client] Builds prompt → window.ipc.invoke('gemini-generate', prompt)
           │
           ▼
[Main] gemini-2.5-flash generates question → returns to client
           │
           ▼
Client returns question string to ElevenLabs
           │
           ▼
ElevenLabs speaks question in user's cloned voice

User: "OOP uses objects to store data..."
           │
           ▼
ElevenLabs calls check_user_recall_answer(user_answer="OOP uses objects...")
           │
           ▼
[Client] Builds history + answer → window.ipc.invoke('gemini-generate', evalPrompt)
           │
           ▼
[Main] gemini-2.5-flash evaluates → "Correct answer: ..."
           │
           ▼
[Client] Parses verdict → sets resultMorphRef = 1 (success)
         Avatar morphs: head → ✅ (green, 3 seconds) → resets to head
           │
           ▼
ElevenLabs speaks evaluation in user's cloned voice

Knowledge Base (Vector Memory)

Implementation: main/services/knowledge.js

The knowledge base is a flat JSON array persisted to the user's userData directory. There is no external database — everything is local and offline.

Storage format (knowledge_base.json):

[
  {
    "id": "1751234567890-abc123",
    "text": "OOP uses objects to store data and methods that operate on that data.",
    "embedding": [0.023, -0.041, ...],   // 3072-dimensional float array
    "createdAt": "2025-06-01T12:00:00.000Z"
  }
]

addKnowledge(text, dbPath):

  1. generateEmbedding(text)GoogleGenAI.models.embedContent({ model: 'gemini-embedding-001', contents: text })
  2. Generates unique ID: Date.now() + '-' + Math.random().toString(36).slice(2,8)
  3. Appends entry, writes JSON with JSON.stringify(store, null, 2)

retrieveKnowledge(query, topK, dbPath):

  1. generateEmbedding(query) → query vector
  2. Loads all entries from JSON file
  3. For each entry, computes cosine similarity:
    dot / (Math.sqrt(normA) * Math.sqrt(normB))
  4. Sorts descending by score, slices top-K
  5. Returns { id, text, score, createdAt }[] (embedding stripped from response)

User profile store (electron-store):

  • Store name: user-profile
  • Keys stored: agentId (cloned agent), voiceId (cloned voice)
  • Cleared on Cmd+Shift+R dev shortcut

Google Gemini Integration

Implementation: main/services/gemini.js

The module runs entirely in the Electron main process (Node.js). It reads the API key from renderer/.env at startup if the env var is not already set.

// .env loading (main process)
fs.readFileSync('../renderer/.env').split('\n').forEach(line => {
  // Parses KEY=VALUE pairs, strips quotes
})

Functions exported:

Function Model Method Returns
generateEmbedding(text) gemini-embedding-001 models.embedContent Float64Array (3072-dim)
cosineSimilarity(a, b) Pure math number (-1 to 1)
generateText(prompt) gemini-2.5-flash models.generateContent string
generateImage(prompt) gemini-2.5-flash-image models.generateContent with responseModalities: ['IMAGE'] "data:image/png;base64,..."
generateCodeExerciseStream(query, onChunk) gemini-2.5-flash models.generateContentStream void (callback-driven)

Exercise prompt template:

Act as an expert educational game developer. Create a single-file HTML/CSS/JS solution for an interactive exercise based on: {query}.

Constraint 1: Everything MUST be in one self-contained HTML file (internal CSS/JS).
Constraint 2: No external CDNs or libraries (use raw JS/Canvas).
Constraint 3: Use a 'dark mode' aesthetic with glowing accents (cyber-education).
Constraint 4: The game must be responsive. CRITICAL: the exercise will initially load in a small floating window (approx 400x300 pixels).

3D Particle Avatar System

Implementation: renderer/components/GhostHead.jsx + renderer/components/Scene.jsx

Scene Setup (Scene.jsx)

<Canvas
  camera={{ position: [0, 0, 3.5], fov: 40, near: 0.01 }}
  gl={{ antialias: true, alpha: true }}
  onCreated={({ gl }) => {
    gl.setClearColor(0x000000, 0)   // fully transparent background
  }}
>
  <OrbitControls enablePan={false} minDistance={0.3} maxDistance={5} />
  <GhostHead ... />
</Canvas>
  • Dynamically imported with ssr: false (Three.js is not SSR-compatible)
  • Wrapped in a React Component error boundary

Particle Geometry Construction (GhostHead.jsxuseMemo)

Five GLB files are loaded via useGLTF:

  • avatar.glb — user's 3D head (default or generated by fal-ai/meshy)
  • thinking.glb — thought-bubble shape
  • success.glb — checkmark (✅)
  • failure.glb — X mark (❌)
  • gear.glb — gear (used during exercise generation)

For each GLB, vertices are extracted and face-sampled:

  1. Vertex extraction: Traverses all THREE.Mesh nodes, applies matrixWorld transforms, collects position and normal attributes.
  2. Face sampling: For each triangle, samples SAMPLES_PER_FACE = 3 random barycentric points (r1 + r2 ≤ 1; r3 = 1 - r1 - r2).
  3. Normalisation: Non-avatar GLBs are scaled to match the avatar's Y-span (avatarHeight = maxY - minY) so all morph targets fit in the same visual space.
  4. Sphere positions: Random uniform sphere sampling (SPHERE_RADIUS = 0.25) — used for speaking state.

All positions stored as Float32Array in memory. A single BufferGeometry is created once (useMemo) and its position attribute is mutated every frame.

Animation Loop (useFrame — per frame)

Morph state variables (all smoothly lerped):

Ref Target Rate
smoothPulseRef pulseRef.current (0 or 0.5–1.0) TARGET_SMOOTHING = 0.15
morphRef 1.0 if speaking, else 0.0 0.15 × 0.4
thinkMorphRef 1.0 if thinking, else 0.0 0.15 × 0.55
generateMorphRef 1.0 if generating, else 0.0 0.15
smoothResultRef resultMorphRef (-1, 0, or 1) 0.15

Morph priority (layered):

  1. Generating (gear) overrides everything
  2. Success/failure overrides thinking
  3. Thinking overrides speaking
  4. Speaking (sphere) is base

Per-particle position computation (executed for every particle each frame):

Jitter: sin/cos waves with per-particle phase offsets (JITTER_AMPLITUDE = 0.0004)
Head: homePosition + normal × (pulse × DISTORTION_FACTOR × 1.5)
Sphere: spherePosition + radial noise × pulse (diffusion + flare noise)
Head↔Sphere: THREE.MathUtils.lerp(head, sphere, morphRef)
Thinking: lerp(head/sphere, thinkingPos + breatheWave, thinkMorphRef)
Gear: lerp(thinking, gearPos, generateMorphRef)
Success/Failure: lerp(gear, successPos/failurePos + breatheWave, activeResultMorph)
Final: + jitter

Rotation logic:

  • Idle: gentle sway — sin(t × 0.3) × MAX_SWAY_RAD on Y, sin(t × 0.4) × 0.3 × MAX_SWAY_RAD on X
  • Thinking: continuous spin += delta × 0.22 (Y), += delta × 0.07 (X)
  • Generating: fast gear spin −= delta × 3.5 (Z), += delta × 1.5 (Y)
  • All lerped at rate 0.05 so no snapping during transitions

Tuning constants:

DISTORTION_FACTOR = 0.19   // how far normals push on voice amplitude
JITTER_AMPLITUDE  = 0.0004 // micro-jitter per particle
TARGET_SMOOTHING  = 0.15   // exponential smoothing rate
PARTICLE_SIZE     = 0.006  // WebGL point size
PARTICLE_OPACITY  = 0.75
MAX_SWAY_RAD      = 0.087  // radians (~5°)
SPHERE_RADIUS     = 0.25   // world units
SAMPLES_PER_FACE  = 3      // barycentric samples per triangle

Color states:

State Color
Idle White, brightness 1.0 + pulse × 1.5
Thinking Blue-white breathing RGB(0.55→1, 0.72→1, 0.65→1)
Success Bright green/teal pulse RGB(0.2→1, 0.8→1, 0.4→1)
Failure Hot red/orange RGB(0.9→1, 0.2→1, 0.2→1)
Generating Oscillating light-blue (#87CEFA) ↔ soft-violet (#DDA0DD)

Material settings:

<pointsMaterial
  size={PARTICLE_SIZE}
  transparent
  opacity={PARTICLE_OPACITY}
  color="#ffffff"
  sizeAttenuation    // size scales with camera distance
  depthWrite={false} // prevents z-fighting with transparent window
/>

Voice Pulse Mapping

In home.jsx, a requestAnimationFrame loop updates pulseRef.current:

pulseRef.current = (isConnected && conversation.isSpeaking)
  ? 0.5 + Math.random() * 0.5
  : 0

This random 0.5–1.0 range is smoothed in GhostHead.useFrame to create organic pulsation. The pulseRef is a React ref (not state) to avoid re-renders on every frame.


Onboarding Pipeline

Implementation: renderer/components/onboarding/

State machine with 3 steps (voice → avatar → done) managed by useReducer.

Step 1: Voice Cloning (StepVoiceCapture.jsx)

Audio capture:

  • navigator.mediaDevices.getUserMedia({ audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: true } })
  • MediaRecorder with mimeType: 'audio/webm;codecs=opus'
  • Chunk interval: 250ms
  • Duration: 10 seconds (auto-stop) or manual early stop
  • Level metering: ScriptProcessorNode (buffer: 2048 samples, 1 channel) with RMS computation — not AnalyserNode (returns zeros in Electron's audio context)

Pipeline (4 REST calls):

1. POST /v1/convai/agents/{BASE_AGENT_ID}/duplicate
   → { agent_id: newAgentId }

2. POST /v1/voices/add (multipart/form-data)
   Fields: name, files (voice-sample.webm), remove_background_noise=true, description
   → { voice_id }

3. PATCH /v1/convai/agents/{newAgentId}
   Body: { conversation_config: { tts: { voice_id } } }
   → 200 OK

4. window.ipc.invoke('set-user-profile', { agentId: newAgentId, voiceId })
   → persisted to electron-store

Step 2: Avatar Capture (StepAvatarCapture.jsx)

Image capture:

  • Webcam: getUserMedia({ video: { width: 512, height: 512, facingMode: 'user' } })
  • Canvas crop: square-centre crop at 512×512, mirrored horizontally (ctx.scale(-1, 1))
  • Output: PNG blob via canvas.toBlob()
  • File upload: <input type="file" accept="image/*"> with same canvas resize path

Pipeline (5 steps):

1. POST https://fal.run/fal-ai/storage/upload
   Body: image PNG blob
   Headers: Authorization: Key {FAL_KEY}
   → { url: publicImageUrl }

2. POST https://queue.fal.run/fal-ai/meshy/v6/image-to-3d
   Body: { input: { image_url: publicImageUrl } }
   → { request_id }

3. POLL GET .../requests/{request_id}/status every 3s (max 60 × 3s = 3min)
   States: IN_QUEUE (show position) → COMPLETED

4. GET .../requests/{request_id}
   → { model_glb: { url: glbUrl } }

5. window.ipc.invoke('download-and-replace-avatar', glbUrl)
   Main process: https.get(glbUrl) → fs.createWriteStream(avatarPath)
   Overwrites renderer/public/avatar.glb (dev) or app/avatar.glb (prod)

UI Portal animation: Camera viewfinder uses CSS clip-path: circle(0% → 50%) transition for a portal-open effect (cubic-bezier(0.25, 0.8, 0.25, 1), 800ms).


Exercise Generation Pipeline

  1. ElevenLabs agent calls generate_code_exercise({ query, text })
  2. Renderer fires window.ipc.send('generate-exercise', query) (non-blocking)
  3. Returns text immediately so agent starts narrating
  4. Main process calls generateCodeExerciseStream(query, onChunk) which calls getAI().models.generateContentStream
  5. Each chunk.text is cleaned of markdown fences (```html) and sent via event.sender.send('exercise-chunk', cleaned)
  6. Renderer accumulates chunks in exerciseCode state via ipc.on('exercise-chunk')
  7. DigitalExerciseOverlay renders the accumulated HTML inside a <iframe srcDoc> (progressively updating)
  8. exercise-done fires → loading state clears, gear morph resets

Visual Generation Pipeline

  1. ElevenLabs agent calls generate_visual({ prompt, text })
  2. Renderer opens WhiteboardOverlay (shows loading shimmer)
  3. window.ipc.invoke('generate-visual', prompt) → main process
  4. Main appends whiteboard style suffix, calls gemini-2.5-flash-image with responseModalities: ['IMAGE']
  5. Extracts candidates[0].content.parts[].inlineData.data (base64)
  6. Returns "data:image/png;base64,{data}" string
  7. Renderer sets generatedImageUrl<img> renders in overlay
  8. Agent speaks the text parameter aloud

Window System

Click-Through Mechanism

The window uses setIgnoreMouseEvents to be transparent to mouse events when the cursor is not over interactive UI:

  • On mount: set-ignore-mouse-events true { forward: true } — window is click-through, events forwarded to OS
  • onMouseEnter any interactive element: set-ignore-mouse-events false — window captures events
  • onMouseLeave: set-ignore-mouse-events true { forward: true } — back to click-through

Drag Handle

Top-left 30×30px circle with WebkitAppRegion: 'drag' — allows window dragging without a title bar.

IPC Resize

Bottom-right resize handle uses Pointer Capture API (setPointerCapture/releasePointerCapture) to track drag delta and calls window.ipc.send('resize-window', { width, height }). Main process uses win.setBounds() with a minimum of 400×300.


Project Structure

nextronapp_before_gamma/
├── main/                          # Electron main process (Node.js / ESM)
│   ├── background.js              # Entry point, IPC handlers, BrowserWindow
│   ├── preload.js                 # contextBridge: exposes window.ipc
│   ├── helpers/
│   │   ├── create-window.js       # (nextron scaffold)
│   │   └── index.js
│   └── services/
│       ├── gemini.js              # Gemini SDK wrapper (embed, text, image, stream)
│       └── knowledge.js           # Vector store: add/retrieve (cosine similarity)
│
├── renderer/                      # Next.js app (renderer process)
│   ├── pages/
│   │   ├── _app.jsx               # Global Next.js app wrapper
│   │   ├── home.jsx               # Main page: ElevenLabs session, tool handlers
│   │   └── next.jsx               # (scaffold page)
│   ├── components/
│   │   ├── Scene.jsx              # R3F Canvas + OrbitControls + error boundary
│   │   ├── GhostHead.jsx          # Particle avatar: GLB sampling, 7-morph animation
│   │   ├── WhiteboardOverlay.jsx  # Fullscreen image display for generate_visual
│   │   ├── DigitalExerciseOverlay.jsx  # Iframe-based HTML game display
│   │   └── onboarding/
│   │       ├── OnboardingOverlay.jsx   # State machine (voice → avatar → done)
│   │       ├── StepVoiceCapture.jsx    # MediaRecorder + ElevenLabs voice clone API
│   │       └── StepAvatarCapture.jsx   # Webcam + fal-ai/meshy 3D pipeline
│   ├── styles/
│   │   └── globals.css
│   ├── public/
│   │   ├── avatar.glb             # Default 3D head (replaced after onboarding)
│   │   ├── thinking.glb           # Thought-bubble morph shape
│   │   ├── success.glb            # ✅ morph shape
│   │   ├── failure.glb            # ❌ morph shape
│   │   ├── gear.glb               # Gear morph shape (generating state)
│   │   └── images/logo.png
│   ├── next.config.js             # output: export, distDir: ../app (prod)
│   └── .env                       # API keys (gitignored)
│
├── agent_config.json              # Full ElevenLabs agent definition snapshot
├── package.json                   # Dependencies + scripts
├── electron-builder.yml           # Build config (output: dist/, resources/)
├── resources/
│   ├── icon.icns                  # macOS app icon
│   └── icon.ico                   # Windows app icon
├── scripts/
│   ├── test-active-recall.js      # Dev test for active recall pipeline
│   └── update_agent.js            # Script to push agent config changes to ElevenLabs
├── reset-knowledge.js             # Clears knowledge_base.json
└── test-knowledge.js              # Tests add/retrieve knowledge flow

Environment Variables

All env vars live in renderer/.env (loaded by Next.js in renderer; also manually parsed by main/services/gemini.js for main process access):

Variable Required Description
NEXT_PUBLIC_GEMINI_API_KEY Yes Google AI Studio API key
NEXT_PUBLIC_ELEVENLABS_API_KEY Yes ElevenLabs xi-api-key header for REST calls
NEXT_PUBLIC_BASE_AGENT_ID Yes Base agent to duplicate during onboarding
NEXT_PUBLIC_ELEVENLABS_AGENT_ID Yes Active agent ID for conversation sessions
NEXT_PUBLIC_FAL_KEY Yes fal-ai API key (key_id:key_secret format)

Scripts & Dev Tools

# Development (hot-reload Electron + Next.js)
yarn dev

# Production build (static export to app/ + Electron compile)
yarn build

# Install native Electron deps after npm install
yarn postinstall       # electron-builder install-app-deps

# Clear the local knowledge base (knowledge_base.json)
yarn reset-kb          # node reset-knowledge.js

# Test add/retrieve knowledge pipeline
yarn test-kb           # node test-knowledge.js

# Test the active recall two-stage loop
yarn test-active-recall  # node scripts/test-active-recall.js

In-app dev shortcuts:

Shortcut Action
Cmd/Ctrl+Shift+I Toggle DevTools (detached window in dev, toggleable in prod)
Cmd/Ctrl+Shift+R Clear electron-store (reset onboarding) + reload renderer

Build & Distribution

Tool: electron-builder v24.13.3

Config (electron-builder.yml):

appId: com.example.nextron
productName: My Nextron App
directories:
  output: dist
  buildResources: resources
files:
  - from: .
    filter: [package.json, app]

Build artifacts (dist/):

  • My Nextron App-1.0.0-arm64.dmg — macOS installer (Apple Silicon)
  • My Nextron App-1.0.0-arm64-mac.zip — macOS zip (Apple Silicon)
  • .blockmap files for delta updates

Build pipeline:

  1. nextron build triggers:
    • next build inside renderer/ → static export to ../app/
    • Webpack compiles main/background.js + main/preload.jsapp/background.js + app/preload.js
  2. electron-builder packages app/ + package.json into an Asar archive
  3. Output: signed (or unsigned) .dmg + .zip

Production vs development loading:

// Production: custom protocol
serve({ directory: 'app' })
mainWindow.loadURL('app://./home')

// Development: Next.js dev server
mainWindow.loadURL(`http://localhost:${port}/home`)
mainWindow.webContents.openDevTools({ mode: 'detach' })

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors