An interactive 3D city builder powered by emotion, neural signals, and AI vision — built with Three.js, FastAPI, and Meta's TRIBE v2.
- Overview
- Features
- Architecture
- Demo
- Getting Started
- Environment Variables
- Usage
- EEG & Neural Symbiosis
- Vision Pipeline (BFL → TRIBE → Classifier)
- TRIBE v2 CLI Pipelines
- RunPod / GPU Setup
- API Reference
- Project Structure
- EEG Channel Map
- Troubleshooting
- License
Inception is a browser-based 3D city builder where every building, prop, and environmental detail is shaped by emotion (arousal, valence, focus), mood, and environment (sun elevation, fog density). Users place objects on a grid via text prompts or EEG blink triggers, and an AI backend (Claude) determines material properties — emissive glow, roughness, metalness, scale — to match the emotional context.
A second dimension uses Meta's TRIBE v2 brain encoder: real photos or AI-generated images are converted to neural feature vectors, classified by a scikit-learn model, and mapped back into the city as specific asset types (bridge, skyscraper, lake, tree, etc.).
- 100+ Kenney city kit GLB assets — commercial, industrial, suburban, roads
- Procedural objects — water, stadiums, gardens, parks, hills, clouds generated in code
- Post-processing — Unreal bloom, ambient occlusion, sky system with real-time sun/fog
- Emotion pad — arousal, valence, and focus control material appearance
- Mood quadrants — liminal, euphoric, melancholic, chaotic states shape the environment
- AI materials — Claude interprets object + emotion → Three.js material parameters
- Offline fallback — heuristic engine mirrors server behavior when backend is down
- Dual-headset "Neural Symbiosis" — g.tec + OpenBCI Cyton, two users, one shared world
- Blink-to-build — EEG blink detection opens the placement dialog
- Live emotion stream — 10 Hz WebSocket feed drives scene atmosphere in real-time
- Text → Image → Video → Neural features → Classification via BFL FLUX + TRIBE v2
- Streaming NDJSON responses for progressive UI updates
- Brain surface visualization of TRIBE vectors via nilearn
- MediaPipe iris tracking via webcam → WebSocket gaze data at 30 fps
- Camera follows gaze direction in the 3D scene
- Perplexity Sonar or Sketchfab fallback for finding and importing external GLB models
- Built-in CORS proxy for cross-origin model downloads
- ElevenLabs TTS integration — AI narration played in-scene after placement
┌─────────────────────────────────────────────────────────────────┐
│ Browser (index.html) │
│ Three.js scene · Emotion pad · Mood HUD · Voice · WebSockets │
│ world-camera/ → orbit rig, pointer, EEG bridge, eye bridge │
└──────────┬──────────────┬──────────────┬───────────────┬────────┘
│ HTTP │ WS :8765 │ WS :8766 │ WS :8767
▼ ▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────┐ ┌───────────┐
│ FastAPI │ │ EEG Decoder │ │ Eye │ │ TRIBE WS │
│ :8000 │ │ (dual/single)│ │ Tracker │ │ (fMRI │
│ │ │ │ │ │ │ decoder) │
│ /api/place │ │ g.tec + │ │ MediaPipe│ │ │
│ /api/vision-*│ │ OpenBCI │ │ + OpenCV │ │ tribe_ │
│ /api/tts │ │ │ │ │ │ decoding/ │
│ /api/search │ └──────────────┘ └──────────┘ └───────────┘
│ /api/proxy │
└──────┬───────┘
│
┌────┴─────────────────────────┐
│ External APIs │
│ · Anthropic Claude │
│ · BFL FLUX (image gen) │
│ · ElevenLabs (TTS) │
│ · Perplexity / Sketchfab │
└──────────────────────────────┘
| Feature | Preview |
|---|---|
| City scene overview | ![]() |
| Emotion pad + placement | ![]() |
| EEG Neural Symbiosis | ![]() |
| Vision pipeline result | ![]() |
- Python 3.10+
- Node.js is not required — the frontend uses vanilla ES modules with Three.js via CDN
- ffmpeg on
PATH(required for vision pipeline) - EEG hardware (optional): g.tec BCICore-8 and/or OpenBCI Cyton
git clone <YOUR_REPO_URL> Inception
cd Inceptioncd backend
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Edit .env and add your API keys (see Environment Variables below)Open two terminals:
Terminal 1 — API server:
cd backend
source .venv/bin/activate
uvicorn app:app --reload --host 127.0.0.1 --port 8000Terminal 2 — Static file server:
# From repo root
python -m http.server 8080Open http://localhost:8080/index.html — the console should show [Backend] online.
Eye tracking (Terminal 3):
cd backend
python eye_track.py # opens webcam debug preview
python eye_track.py --no-preview # headless mode
python eye_track.py --sens 1.5 # increase gaze sensitivityEEG dual decoder (Terminal 4):
cd backend
python eeg_decode_dual.py --mock # simulated (no hardware)
python eeg_decode_dual.py # real hardware (BLE + USB)Create a .env file at the repo root (loaded by the backend automatically).
| Variable | Required | Purpose |
|---|---|---|
ANTHROPIC_API_KEY |
Recommended | Claude API for intelligent material generation. Alias: CLAUDE_API_KEY |
ANTHROPIC_MODEL |
No | Model override (default: claude-3-5-sonnet-20241022) |
BFL_API_KEY |
For vision | BFL FLUX image generation for the vision pipeline |
BFL_MODEL |
No | FLUX model (default: flux-2-klein-4b) |
PERPLEXITY_API_KEY |
For model search | 3D model search via Perplexity Sonar. Alias: PPLX_API_KEY |
SKETCHFAB_API_TOKEN |
For model search | Fallback 3D model search + download |
ELEVEN_LABS_API_KEY |
For narration | ElevenLabs text-to-speech |
| Variable | Default | Purpose |
|---|---|---|
TRIBE_FORCE_CPU |
unset | Force CPU inference (set to 1 on Mac / no GPU) |
TRIBE_VIDEO_SKIP_WHISPER |
1 |
Skip Whisper ASR on video audio |
TRIBE_FEATURES_VIDEO_ONLY |
1 |
Only load video extractor (skip Llama/Wav2Vec) |
TRIBE_CACHE_FOLDER |
— | Override model cache directory |
TRIBE_DATALOADER_WORKERS |
4 (CUDA) |
DataLoader worker count |
TRIBE_CUDNN_BENCHMARK |
1 (CUDA) |
Enable cuDNN benchmark |
TRIBE_WHISPER_DEVICE |
auto | Override Whisper device (cpu / cuda) |
TRIBE_WHISPER_COMPUTE_TYPE |
auto | Override Whisper compute type (float16 / float32) |
Without an Anthropic key, the server falls back to a deterministic heuristic engine that mirrors the client-side buildLocalParams() function.
- Blink (with EEG + "open build after blink" enabled) or click the ground to open the placement dialog
- Type a building name (e.g. "red massive bridge", "glass skyscraper", "small cottage")
- The client sends the label + emotion state + environment to
POST /api/place - Claude (or the heuristic fallback) returns material parameters
- The object spawns on the grid with emotion-driven materials
The emotion pad controls three axes:
- Arousal — energy level (calm → excited)
- Valence — positivity (negative → positive)
- Focus — attention (diffuse → concentrated)
These values influence material properties (emissive intensity, roughness) and environment (fog density, sun position, ambient color).
Enable the "vision (BFL→TRIBE→class)" checkbox in the placement dialog:
- Your text prompt generates an image via BFL FLUX
- The image is converted to a short video
- TRIBE v2 extracts neural features from the video
- A sklearn classifier predicts the city element class
- The corresponding 3D asset spawns in the scene
Inception supports live EEG input from one or two headsets simultaneously.
Two users wear EEG headsets (g.tec BCICore-8 + OpenBCI Cyton) and co-create in a shared world. The system tracks:
- Per-user emotion (arousal, valence, focus)
- Blink detection for hands-free building placement
- Symbiosis scores — correlation metrics between the two brain signals
- Active user — determines who has creation permission (toggled via stdin)
Python (eeg_decode_dual.py) Browser (index.html)
─────────────────────────── ─────────────────────
User 1: g.tec BCICore-8 ← ws://127.0.0.1:8765
User 2: OpenBCI Cyton dual EEG bridge + symbiosis UI
→ JSON: { user1, user2, active_user, symbiosis, capture }
One g.tec headset streams flat JSON at 10 Hz:
{
"present": true,
"blink": false,
"emotion": { "arousal": 0.72, "valence": 0.41, "focus": 0.60 }
}Test without hardware:
python eeg_decode_dual.py --mock # dual mock
python eeg_decode.py --mock # single mock
python eeg_mock_ws.py # minimal blink-only mockThe vision pipeline transforms text prompts into classified city elements through a multi-stage neural pipeline.
Text prompt
│
▼
BFL FLUX API ──── generates image
│
▼
ffmpeg ────────── image → MP4 (short clip)
│
▼
TRIBE v2 ──────── extracts neural feature vector
│
▼
sklearn ───────── classifies into city element
classifier (bridge, lake, skyscraper, tree, house, …)
│
▼
3D asset spawns in scene
BFL_API_KEYin.env- TRIBE v2 installed:
pip install -r requirements-tribe.txt - Trained classifier at
outputs/photo_element_logreg.joblib ffmpegon PATH
The backend can render TRIBE feature vectors as cortical surface maps using nilearn, returned as part of the streaming vision pipeline response.
Train and evaluate classifiers from the command line.
| Command | Description |
|---|---|
python -m pipeline.neural_matrix --help |
Text CSV → TRIBE neural feature matrix |
python -m pipeline.photo_neural_matrix --help |
Photos → MP4 → TRIBE feature matrix |
python -m pipeline.train_element_classifier --help |
Train sklearn classifier on .npz |
python -m pipeline.eval_element_classifier --help |
Evaluate on holdout .npz |
python -m pipeline.classify_text --help |
Classify a single text phrase |
python -m pipeline.bfl_tribe_classify --help |
BFL text→image→TRIBE→class (needs BFL_API_KEY) |
# Place images under data/photo_dataset/source/<class>/
# e.g. data/photo_dataset/source/bridge/photo1.jpg
python -m pipeline.photo_neural_matrix \
--dataset-root data/photo_dataset \
--output outputs/photo_tribe_neural.npz \
--holdout-per-class 2 # reserve 2 images per class for testingpython -m pipeline.train_element_classifier \
--input outputs/photo_tribe_neural.npz \
--output outputs/photo_element_logreg.joblib
python -m pipeline.eval_element_classifier \
--model outputs/photo_element_logreg.joblib \
--input outputs/photo_tribe_neural_holdout.npzmacOS / CPU:
export TRIBE_FORCE_CPU=1
export CUDA_VISIBLE_DEVICES=
pip install -r requirements-tribe.txtWhisperX is forced to --device cpu + float32 on Darwin to avoid ctranslate2 float16 crashes.
GPU / RunPod: See RunPod Setup below.
For heavy TRIBE workloads, use a GPU cloud instance.
| Tier | GPUs | VRAM |
|---|---|---|
| Minimum | RTX 4000 Ada, A4000, RTX 4080 | 16 GB |
| Recommended | RTX 3090, RTX 4090, L4, A10 | 24 GB |
| Headroom | A6000, A100 | 40–48 GB |
unset TRIBE_FORCE_CPU
export HF_HOME=/workspace/.cache/huggingface
export UV_CACHE_DIR=/workspace/.cache/uv
# Install ffmpeg
sudo apt-get update && sudo apt-get install -y ffmpeg
# Install uv (tribev2 calls uvx whisperx)
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
# Clone and set up
cd /workspace
git clone <YOUR_REPO_URL> imagine && cd imagine
python3 -m venv /workspace/.venv && source /workspace/.venv/bin/activate
pip install -U pip && pip install -r requirements-runpod.txtVerify:
nvidia-smi
python -c "import torch; print('CUDA:', torch.cuda.is_available())"See RUNPOD.md for full details and troubleshooting.
All endpoints are served by the FastAPI backend at http://localhost:8000.
Health check — returns { "status": "ok" }.
Place an object with emotion-driven materials.
Request:
{
"label": "red massive bridge",
"base_label": "bridge",
"hints": {
"color": "#cc3333",
"size": 1.2,
"material": { "roughness": 0.4, "metalness": 0.1 }
},
"emotion": { "arousal": 0.7, "valence": 0.4, "focus": 0.6 },
"mood": "liminal",
"environment": {
"mood_quadrant": "liminal",
"sun_elevation_deg": 18,
"fog_density": 0.3
}
}Response:
{
"material_params": {
"emissive": "#221100",
"emissiveIntensity": 0.08,
"roughness": 0.45,
"metalness": 0.2,
"scale": 1.0,
"pointLight": null
},
"narration": "optional line shown in the scene",
"audio_b64": null
}Streaming NDJSON — generates image, classifies it, and returns brain visualization.
BFL image → MP4 → TRIBE → classifier label.
Generate an image via BFL FLUX (returns base64).
Classify an existing image via TRIBE.
Search for 3D models via Perplexity Sonar or Sketchfab fallback.
Proxy external GLB files to avoid CORS issues.
Text-to-speech via ElevenLabs — returns audio.
Inception/
├── index.html # Main SPA — Three.js scene, UI, placement logic
├── world-camera/ # ES modules for camera, input, and bridges
│ ├── index.js # Orbit rig, pointer ground follow
│ ├── eeg-bridge.js # Single EEG WebSocket bridge
│ ├── eeg-bridge-dual.js # Dual EEG (Neural Symbiosis) bridge
│ ├── eye-bridge.js # Eye tracking WebSocket bridge
│ └── tribe-bridge.js # TRIBE fMRI WebSocket bridge
├── backend/ # FastAPI placement & vision API
│ ├── app.py # All HTTP routes, CORS, env loading
│ ├── vision_place.py # BFL → TRIBE → sklearn classification
│ ├── brain_render.py # Cortical surface visualization (nilearn)
│ ├── eye_track.py # MediaPipe iris → WebSocket (:8766)
│ ├── eeg_decode_dual.py # Dual EEG decoder → WebSocket (:8765)
│ ├── eeg_decode.py # Single EEG decoder
│ ├── eeg_mock_ws.py # Minimal blink-only mock
│ ├── tribe_ws.py # TRIBE fMRI WebSocket server
│ └── requirements.txt
├── pipeline/ # TRIBE CLI tools
│ ├── neural_matrix.py # Text CSV → TRIBE features
│ ├── photo_neural_matrix.py # Photos → MP4 → TRIBE features
│ ├── train_element_classifier.py
│ ├── eval_element_classifier.py
│ ├── classify_text.py
│ └── bfl_api.py # BFL FLUX API client
├── tribe/ # TRIBE v2 helpers
│ ├── model.py # load_model, device selection
│ ├── env_flags.py # Environment variable configuration
│ └── whisper_patch.py # Platform-aware WhisperX patching
├── tribe_decoding/ # sklearn PCA + logistic decoders (fMRI)
├── eeg/ # EEGNet training, blink detection, data
│ ├── models/ # Trained EEG models
│ ├── data/ # Collected EEG data
│ └── eegnet.py # EEGNet architecture
├── emg/ # EMG utilities (separate requirements)
├── gpype/ # Vendored g.tec BCI Python package
├── assets/ # Kenney city kit GLB models
│ ├── commercial/
│ ├── industrial/
│ ├── suburban/
│ └── roads/
├── data/ # Photo datasets for training
│ └── photo_dataset/source/
├── tools/ # Calibration scripts
├── scripts/ # RunPod SSH, venv helpers
├── city_elements_dataset.csv # Training data for text classifier
├── requirements-tribe.txt # TRIBE dependencies (CPU / Mac)
├── requirements-runpod.txt # TRIBE dependencies (GPU / RunPod)
├── .env.example # Template for API keys
├── RUNPOD.md # GPU cloud setup guide
└── README.md # This file
| Channel | 10-20 Position |
|---|---|
| Ch1 | Fp2 |
| Ch2 | F7 |
| Ch3 | FC5 |
| Ch4 | Fp1 |
| Ch5 | P7 |
| Ch6 | T7 |
| Ch7 | T8 |
| Ch8 | O2 |
| Channel | Wire Color | 10-20 Position |
|---|---|---|
| Ch0 | Grey | Fp2 |
| Ch1 | Purple | C4 |
| Ch2 | Blue | Cz |
| Ch3 | Green | T3 |
| Ch4 | Yellow | T4 |
| Ch5 | Orange | F3 |
| Ch6 | Red | Fz |
| Ch7 | Brown | F4 |
| Problem | Solution |
|---|---|
[Backend] offline in browser console |
Ensure FastAPI is running on port 8000 |
| No materials applied (flat grey) | Check ANTHROPIC_API_KEY in .env — heuristic fallback still works |
| Vision pipeline returns 503 | Missing BFL_API_KEY, TRIBE not installed, or classifier .joblib not found |
Torch not compiled with CUDA |
Wrong PyTorch wheel — use the CUDA image, don't install CPU torch on top |
| CUDA OOM | Use 24 GB+ GPU, run one pipeline at a time, or shorten --duration |
| WhisperX crash on macOS | Already handled — tribe/whisper_patch.py forces CPU + float32 on Darwin |
ffmpeg: command not found |
Install ffmpeg: brew install ffmpeg (Mac) or apt install ffmpeg (Linux) |
| ES module import errors | Serve via HTTP server, not file:// — python -m http.server 8080 |
Built with Three.js · FastAPI · TRIBE v2 · Claude · BFL FLUX











