AI YZY: The Genius Experience

A real-time, interactive music and chat application powered by AI. Chat with AI YZY, a personality inspired by Kanye West, and take control of the music as a DJ using a MIDI controller.

Features:

AI Chat: Have a conversation with AI YZY, a brilliant and erratic musical genius AI.
High-Performance Voice Synthesis: AI YZY's responses are spoken aloud using a custom, high-performance voice powered by the Coqui XTTS-v2 model.
AI Art Generation: Ask the AI to create visual art from a text description.
Real-time Music DJ: Switch to DJ mode to control generative music in real-time.
Reactive Visuals: The interface reacts to the music and the AI's speech.

Tech Stack

Frontend: HTML, CSS, TypeScript, Lit
AI Models (Google Gemini API):
- Chat & Function Calling: gemini-2.5-flash
- Image Generation: imagen-4.0-generate-001
Text-to-Speech: Coqui XTTS-v2 (self-hosted via Flask/Gunicorn)

Project Setup

This project uses a static frontend and a self-hosted Python backend for voice synthesis.

1. Gemini API Key

You need a Google Gemini API key to power the chat, music, and art generation.

Visit Google AI Studio to create an API key.
Open the AI YZY application in your browser.
Click the settings icon in the top-right corner, paste your API key, and click "Save & Start".

2. Backend Voice Server: Performance Guide

High-quality voice synthesis is computationally expensive. The following guide explains how to run the included server.py for optimal performance and cost-effectiveness.

Important

Voice File is Required! The voice server will not work without a voice sample. You must place a high-quality, clean .wav audio sample of the target voice in the project's root directory and name it kanye_voice.wav.

Troubleshooting the Voice Server

If you see a "CORS" or "NetworkError" message in the browser console, or if the voice falls back to the robotic browser voice, it almost always means the Python server is not running correctly.

Follow these steps to fix it:

Check the Terminal: Look at the terminal where you started the Python server. Are there any error messages? The error will tell you exactly what's wrong (e.g., kanye_voice.wav not found, ModuleNotFoundError, etc.).
Verify Voice File: Make sure the kanye_voice.wav file exists in the main project directory and is named correctly.
Check Dependencies: Ensure you have installed all packages correctly by running pip install -r requirements.txt.
Confirm Port: The server runs on port 9001 by default. Make sure no other process is using this port.

Option 1: Optimized CPU Deployment (Recommended Default)

This setup uses the high-performance XTTS-v2 model, which is significantly faster than many alternatives on a CPU. It is ideal for running in a standard Docker container on platforms like DigitalOcean Apps.

Place Voice File: Obtain a high-quality, clean .wav audio sample of the target voice (e.g., Kanye West). The sample should be short (5-15 seconds), clear, and contain no background noise. Save it as kanye_voice.wav in the project's root directory.

Install Dependencies:

# Create and activate a virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

# Install Python packages for CPU
pip install -r requirements.txt

Run the Production Server: Do not use flask run. For production, use a robust WSGI server like Gunicorn to handle requests efficiently. The included server.py pre-loads the model and caches responses for maximum speed.
```
# Use 2-4 workers for a multi-core CPU. Binds to port 9001.
gunicorn --workers 2 --threads 4 --bind 0.0.0.0:9001 server:app
```
The server will now be running on http://localhost:9001. The frontend application will connect to it automatically.

Option 2: Serverless GPU (Most Cost-Effective Speed)

For the absolute best performance (nearly instant responses) without the cost of a dedicated GPU, deploy the voice server as a microservice on a "serverless GPU" platform.

How it works: These services load the model onto a GPU only when a request comes in. You pay per second of processing time, so it's incredibly cheap when idle.
Platforms: Modal, Banana.dev, Replicate.
Implementation:
1. Adapt the server.py logic for the platform of your choice.
2. Deploy it as a separate service.
3. Update the fetch URL in index.tsx from http://localhost:9001/synthesize to your new serverless API endpoint.

Option 3: Dedicated GPU Droplet (Maximum Power)

If you have very high, consistent traffic, a dedicated GPU droplet on DigitalOcean or another cloud provider is the most powerful option.

Install CUDA: Ensure you have the NVIDIA CUDA Toolkit installed on your system (e.g., version 12.1).
Install GPU-enabled Dependencies:
- Open requirements.txt.
- Comment out the --extra-index-url line for cpu.
- Uncomment the line for your CUDA version (e.g., cu121).
- Re-install dependencies: pip install -r requirements.txt

Run the Server with Gunicorn: When using a GPU, run a single worker to avoid memory conflicts.

# Use 1 worker for a GPU setup
gunicorn --workers 1 --threads 4 --bind 0.0.0.0:9001 server:app

3. Frontend Development

The frontend is composed of static files.

Serve the project files using a simple local web server. For example, using Node.js:
```
npx http-server . --port 8080
```
The application will be accessible at http://localhost:8080.

Post-Deploy Guide (GPU Droplet + Nginx)

This section documents the exact steps we used after initial deployment to stabilize the backend and publish the production frontend.

1) Voice Prompt File and Path

The backend expects a speaker file via SPEAKER_WAV_PATH in server.py, defaulting to kanye_voice_prompt.wav in the process working directory.

Ensure the service WorkingDirectory points to the project root (we use /opt/ai-yzy).

Verify:

systemctl show -p WorkingDirectory ai-yzy-backend
readlink -f /proc/$(pgrep -f 'gunicorn.*9001' | head -n 1)/cwd

Place your voice file at /opt/ai-yzy/kanye_voice_prompt.wav and ensure read perms:

ls -l /opt/ai-yzy/kanye_voice_prompt.wav
chmod 0644 /opt/ai-yzy/kanye_voice_prompt.wav

2) Improve Prompt Audio Quality (on droplet)

Optimal: 16 kHz, mono, 16‑bit PCM WAV, clean 5–15s speech.

Convert with ffmpeg (install once):

apt-get update && apt-get install -y ffmpeg
cd /opt/ai-yzy
cp -v kanye_voice_prompt.wav kanye_voice_prompt.wav.bak.$(date +%F-%H%M%S)
ffmpeg -y -i kanye_voice_prompt.wav -ac 1 -ar 16000 -c:a pcm_s16le kanye_voice_prompt_clean.wav
mv -v kanye_voice_prompt_clean.wav kanye_voice_prompt.wav
chmod 0644 kanye_voice_prompt.wav
file kanye_voice_prompt.wav

Alternative (SoX):

apt-get update && apt-get install -y sox libsox-fmt-all
cd /opt/ai-yzy
sox kanye_voice_prompt.wav -b 16 -c 1 -r 16000 kanye_voice_prompt_clean.wav
mv -v kanye_voice_prompt_clean.wav kanye_voice_prompt.wav
chmod 0644 kanye_voice_prompt.wav

No service restart is required; the server reads the file per request.

3) Backend Health Checks

systemctl restart ai-yzy-backend
journalctl -u ai-yzy-backend -n 50 --no-pager

# Direct (bypass Nginx)
curl -fSsv -H "Content-Type: application/json" \
  -d '{"text":"Hello from YZY"}' \
  http://127.0.0.1:9001/synthesize -o /dev/null

You should see HTTP/1.1 200 OK with Content-Type: audio/wav and model load logs such as INFO: TTS model loaded successfully.

4) Nginx Site and TLS

We use a dedicated site config at /etc/nginx/sites-available/ai-yzy with a symlink in sites-enabled/ and Let’s Encrypt TLS.

Key directives:

server {
  listen 443 ssl http2;
  server_name sim.virtual-yzy.com;

  ssl_certificate     /etc/letsencrypt/live/sim.virtual-yzy.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/sim.virtual-yzy.com/privkey.pem;

  root /var/www/ai-yzy/dist;
  index index.html;

  location /assets/ {
    try_files $uri =404;
    access_log off;
    expires 1y;
    add_header Cache-Control "public, immutable";
  }

  location / {
    try_files $uri $uri/ /index.html; # SPA fallback
  }

  location /synthesize {
    proxy_pass http://127.0.0.1:9001;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_read_timeout 600s;
    proxy_send_timeout 600s;
  }
}

Reload after changes:

nginx -t && systemctl reload nginx

4.1) TLS for Landing Domain (virtual-yzy.com + www)

Issue a Let’s Encrypt certificate for the landing domain using the nginx plugin. This mirrors exactly what we used in production.

Prereqs:

DNS: A @ -> 178.128.227.142
DNS: CNAME www -> @ (or www -> virtual-yzy.com depending on your DNS UI)
Port 80 open for HTTP-01 challenge

Create an HTTP server block so Certbot can validate:

If you manage sites in sites-available/:

tee /etc/nginx/sites-available/virtual-yzy.conf >/dev/null <<'NGX'
server {
  listen 80;
  server_name virtual-yzy.com www.virtual-yzy.com;

  root /var/www/ai-yzy/dist;
  index about.html index.html;

  location /assets/ {
    try_files $uri =404;
    access_log off;
    expires 1y;
    add_header Cache-Control "public, immutable";
  }

  location / {
    try_files $uri $uri/ /about.html;
  }
}
NGX

ln -sf /etc/nginx/sites-available/virtual-yzy.conf /etc/nginx/sites-enabled/virtual-yzy.conf
nginx -t && systemctl reload nginx

Or, if you use conf.d/:

tee /etc/nginx/conf.d/virtual-yzy.conf >/dev/null <<'NGX'
server {
  listen 80;
  server_name virtual-yzy.com www.virtual-yzy.com;

  root /var/www/ai-yzy/dist;
  index about.html index.html;

  location /assets/ {
    try_files $uri =404;
    access_log off;
    expires 1y;
    add_header Cache-Control "public, immutable";
  }

  location / {
    try_files $uri $uri/ /about.html;
  }
}
NGX

nginx -t && systemctl reload nginx

Install Certbot nginx plugin:

apt-get update && apt-get install -y certbot python3-certbot-nginx

Issue the certificate for apex + www and enable redirect:

certbot --nginx -d virtual-yzy.com -d www.virtual-yzy.com --redirect -m dev@virtual-yzy.com --agree-tos -n

If www hasn’t propagated yet, issue apex first, then expand later:

certbot --nginx -d virtual-yzy.com --redirect -m dev@virtual-yzy.com --agree-tos -n
certbot --nginx -d virtual-yzy.com -d www.virtual-yzy.com --redirect -m dev@virtual-yzy.com --agree-tos -n --expand

Verify and auto‑renew:

nginx -t && systemctl reload nginx
curl -I https://virtual-yzy.com/
curl -I https://www.virtual-yzy.com/
certbot certificates | sed -n '1,200p'
systemctl status certbot.timer
certbot renew --dry-run

5) Build Frontend and Deploy to Nginx

Build with Vite and upload the built dist/ to the Nginx root.

Local (Windows):

# In C:\Users\<you>\ai-yzy
$env:GEMINI_API_KEY="<your_key_here>"   # or define in .env.local
npm ci
npm run build

$IP  = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkey"
ssh -i $KEY root@${IP} "mkdir -p /var/www/ai-yzy/dist"
scp -i $KEY -r dist/* "root@${IP}:/var/www/ai-yzy/dist/"

On droplet:

chown -R www-data:www-data /var/www/ai-yzy
find /var/www/ai-yzy -type d -exec chmod 755 {} \;
find /var/www/ai-yzy -type f -exec chmod 644 {} \;
nginx -t && systemctl reload nginx

6) Verify Static Assets and MIME Types

curl -I https://sim.virtual-yzy.com/
curl -I https://sim.virtual-yzy.com/assets/index-*.css   # should be text/css
curl -I https://sim.virtual-yzy.com/assets/index-*.js    # should be application/javascript

If CSS/JS are returned as text/html, it means the files are missing at root and SPA fallback served index.html. Ensure dist/ is fully deployed to /var/www/ai-yzy/dist.

Post-Production Maintenance & Updates

This section documents the exact update flow we use after initial deploy. It keeps the app fast and simple by pushing the built static files to the droplet and reloading Nginx.

Simple Maintenance (Backend) — Minimal, copy/paste

Use this when you only changed server.py (voice tuning etc.). No Docker. No extra env.

Upload backend file from Windows (PowerShell):

$IP  = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkeyaugust"  # adjust your key path
scp -i $KEY C:\Users\<you>\ai-yzy\server.py "root@${IP}:/opt/ai-yzy/"

Restart the backend on the droplet (run on the droplet shell):

sudo systemctl restart ai-yzy-backend
sudo journalctl -u ai-yzy-backend -n 100 --no-pager

Generate a test WAV on the droplet (no playback on server — most droplets have no sound device):

TEXT='{"text":"Yo, YZY with the new defaults. One two, one two."}'
curl -fSsv -H "Content-Type: application/json" -d "$TEXT" \
  http://127.0.0.1:9001/synthesize -o /root/yzy_test.wav
ls -lh /root/yzy_test.wav

Pull and play locally (Windows PowerShell):

$IP  = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkeyaugust"
scp -i $KEY "root@${IP}:/root/yzy_test.wav" .\yzy_test.wav
start .\yzy_test.wav

Notes:

Don’t play audio on the droplet. aplay will fail with ALSA errors (no sound card). Always download and listen locally.
Change the input text slightly each test to bypass backend caching.

Chatterbox alignment (keep it simple)

Current backend (server.py) uses Coqui XTTS-v2 and intentionally strips SSML via ssml_to_text() to avoid browser TTS and watermark side‑effects. That means tone/prosody tags are not passed through right now.

If you want to use Resemble’s Chatterbox flow without extra complexity:

Keep server.py as the single synth endpoint.
Option A: Have the client (Chatterbox) emit plain text and let our backend handle humanization (rate, gap, slight pitch). This is the current setup.
Option B (next step): Accept optional request fields (e.g., pitch_semitones, rate, gap_ms) in JSON so you can drive expressivity per‑utterance from Chatterbox without env or server restarts. Minimal patch, no SSML, no watermark.

If you want me to implement Option B, I’ll add request-level overrides with strict bounds and keep the current defaults.

A) Update Frontend via SCP (preferred quick path)

Build locally (Windows PowerShell):

npm ci
npm run build

Upload to droplet:

$IP  = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkey"   # adjust to your key path

# Ensure target exists (first time only)
ssh -i $KEY root@${IP} "mkdir -p /var/www/ai-yzy/dist"

# Upload Vite build output
scp -i $KEY -r .\dist\* "root@${IP}:/var/www/ai-yzy/dist/"

# Upload the About landing page (not emitted by Vite by default)
scp -i $KEY .\about.html "root@${IP}:/var/www/ai-yzy/dist/about.html"

Make main domain land on About (once):

We set Nginx to prefer about.html as index. Use the correct site file for your setup:

# If your site file is /etc/nginx/sites-available/default
ssh -i $KEY "root@${IP}" "sed -i.bak -E 's/index\s+[^;]+;/index about.html index.html;/' /etc/nginx/sites-available/default && nginx -t && systemctl reload nginx"

# Or if you use /etc/nginx/conf.d/default.conf
ssh -i $KEY "root@${IP}" "sed -i.bak -E 's/index\s+[^;]+;/index about.html index.html;/' /etc/nginx/conf.d/default.conf && nginx -t && systemctl reload nginx"

Verify:

ssh -i $KEY "root@${IP}" "curl -I http://localhost/"

Root (/) should serve about.html by default.
The app is still available at /index.html.

B) Optional: Rebuild via Docker Compose on droplet

If you track this repo on the server and prefer image rebuilds:

ssh -i $KEY root@${IP}
cd /opt/ai-yzy   # adjust to your path
git pull
docker compose build --no-cache frontend
docker compose up -d

This approach bakes static assets and nginx.conf into the image. Useful when you want infra-managed, reproducible deployments.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
components		components
deploy_pkg		deploy_pkg
public/images		public/images
raw_dj_mode		raw_dj_mode
utils		utils
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DEPLOY_GPU_DROPLET.md		DEPLOY_GPU_DROPLET.md
Dockerfile		Dockerfile
Dockerfile.cpu		Dockerfile.cpu
README.md		README.md
about.css		about.css
about.html		about.html
check_python.bat		check_python.bat
docker-compose.yml		docker-compose.yml
dockerfile_updated.txt		dockerfile_updated.txt
final_test.wav		final_test.wav
fix_dependencies.py		fix_dependencies.py
frontend.Dockerfile		frontend.Dockerfile
hello.wav		hello.wav
index.css		index.css
index.html		index.html
index.tsx		index.tsx
kanye_voice.wav		kanye_voice.wav
kanye_voice2.wav		kanye_voice2.wav
kanye_voice3.wav		kanye_voice3.wav
kanye_voice_prompt.wav		kanye_voice_prompt.wav
metadata.json		metadata.json
nginx.conf		nginx.conf
package-lock.json		package-lock.json
package.json		package.json
requirements.custom.txt		requirements.custom.txt
requirements.txt		requirements.txt
resemble_perth.py		resemble_perth.py
server.py		server.py
server_code.py		server_code.py
setup.bat		setup.bat
setup_prod.sh		setup_prod.sh
start-backend.sh		start-backend.sh
start.bat		start.bat
start.sh		start.sh
test_flask.py		test_flask.py
tsconfig.json		tsconfig.json
types.ts		types.ts
vite-env.d.ts		vite-env.d.ts
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI YZY: The Genius Experience

Tech Stack

Project Setup

1. Gemini API Key

2. Backend Voice Server: Performance Guide

Troubleshooting the Voice Server

Option 1: Optimized CPU Deployment (Recommended Default)

Option 2: Serverless GPU (Most Cost-Effective Speed)

Option 3: Dedicated GPU Droplet (Maximum Power)

3. Frontend Development

Post-Deploy Guide (GPU Droplet + Nginx)

1) Voice Prompt File and Path

2) Improve Prompt Audio Quality (on droplet)

3) Backend Health Checks

4) Nginx Site and TLS

4.1) TLS for Landing Domain (virtual-yzy.com + www)

5) Build Frontend and Deploy to Nginx

6) Verify Static Assets and MIME Types

Post-Production Maintenance & Updates

Simple Maintenance (Backend) — Minimal, copy/paste

Chatterbox alignment (keep it simple)

A) Update Frontend via SCP (preferred quick path)

B) Optional: Rebuild via Docker Compose on droplet

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI YZY: The Genius Experience

Tech Stack

Project Setup

1. Gemini API Key

2. Backend Voice Server: Performance Guide

Troubleshooting the Voice Server

Option 1: Optimized CPU Deployment (Recommended Default)

Option 2: Serverless GPU (Most Cost-Effective Speed)

Option 3: Dedicated GPU Droplet (Maximum Power)

3. Frontend Development

Post-Deploy Guide (GPU Droplet + Nginx)

1) Voice Prompt File and Path

2) Improve Prompt Audio Quality (on droplet)

3) Backend Health Checks

4) Nginx Site and TLS

4.1) TLS for Landing Domain (virtual-yzy.com + www)

5) Build Frontend and Deploy to Nginx

6) Verify Static Assets and MIME Types

Post-Production Maintenance & Updates

Simple Maintenance (Backend) — Minimal, copy/paste

Chatterbox alignment (keep it simple)

A) Update Frontend via SCP (preferred quick path)

B) Optional: Rebuild via Docker Compose on droplet

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages