A real-time, interactive music and chat application powered by AI. Chat with AI YZY, a personality inspired by Kanye West, and take control of the music as a DJ using a MIDI controller.
Features:
- AI Chat: Have a conversation with AI YZY, a brilliant and erratic musical genius AI.
- High-Performance Voice Synthesis: AI YZY's responses are spoken aloud using a custom, high-performance voice powered by the Coqui XTTS-v2 model.
- AI Art Generation: Ask the AI to create visual art from a text description.
- Real-time Music DJ: Switch to DJ mode to control generative music in real-time.
- Reactive Visuals: The interface reacts to the music and the AI's speech.
- Frontend: HTML, CSS, TypeScript, Lit
- AI Models (Google Gemini API):
- Chat & Function Calling:
gemini-2.5-flash - Image Generation:
imagen-4.0-generate-001
- Chat & Function Calling:
- Text-to-Speech: Coqui XTTS-v2 (self-hosted via Flask/Gunicorn)
This project uses a static frontend and a self-hosted Python backend for voice synthesis.
You need a Google Gemini API key to power the chat, music, and art generation.
- Visit Google AI Studio to create an API key.
- Open the AI YZY application in your browser.
- Click the settings icon in the top-right corner, paste your API key, and click "Save & Start".
High-quality voice synthesis is computationally expensive. The following guide explains how to run the included server.py for optimal performance and cost-effectiveness.
Important
Voice File is Required!
The voice server will not work without a voice sample. You must place a high-quality, clean .wav audio sample of the target voice in the project's root directory and name it kanye_voice.wav.
If you see a "CORS" or "NetworkError" message in the browser console, or if the voice falls back to the robotic browser voice, it almost always means the Python server is not running correctly.
Follow these steps to fix it:
- Check the Terminal: Look at the terminal where you started the Python server. Are there any error messages? The error will tell you exactly what's wrong (e.g.,
kanye_voice.wav not found,ModuleNotFoundError, etc.). - Verify Voice File: Make sure the
kanye_voice.wavfile exists in the main project directory and is named correctly. - Check Dependencies: Ensure you have installed all packages correctly by running
pip install -r requirements.txt. - Confirm Port: The server runs on port
9001by default. Make sure no other process is using this port.
This setup uses the high-performance XTTS-v2 model, which is significantly faster than many alternatives on a CPU. It is ideal for running in a standard Docker container on platforms like DigitalOcean Apps.
-
Place Voice File: Obtain a high-quality, clean
.wavaudio sample of the target voice (e.g., Kanye West). The sample should be short (5-15 seconds), clear, and contain no background noise. Save it askanye_voice.wavin the project's root directory. -
Install Dependencies:
# Create and activate a virtual environment (recommended) python3 -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` # Install Python packages for CPU pip install -r requirements.txt
-
Run the Production Server: Do not use
flask run. For production, use a robust WSGI server like Gunicorn to handle requests efficiently. The includedserver.pypre-loads the model and caches responses for maximum speed.# Use 2-4 workers for a multi-core CPU. Binds to port 9001. gunicorn --workers 2 --threads 4 --bind 0.0.0.0:9001 server:appThe server will now be running on
http://localhost:9001. The frontend application will connect to it automatically.
For the absolute best performance (nearly instant responses) without the cost of a dedicated GPU, deploy the voice server as a microservice on a "serverless GPU" platform.
- How it works: These services load the model onto a GPU only when a request comes in. You pay per second of processing time, so it's incredibly cheap when idle.
- Platforms: Modal, Banana.dev, Replicate.
- Implementation:
- Adapt the
server.pylogic for the platform of your choice. - Deploy it as a separate service.
- Update the
fetchURL inindex.tsxfromhttp://localhost:9001/synthesizeto your new serverless API endpoint.
- Adapt the
If you have very high, consistent traffic, a dedicated GPU droplet on DigitalOcean or another cloud provider is the most powerful option.
-
Install CUDA: Ensure you have the NVIDIA CUDA Toolkit installed on your system (e.g., version 12.1).
-
Install GPU-enabled Dependencies:
- Open
requirements.txt. - Comment out the
--extra-index-urlline forcpu. - Uncomment the line for your CUDA version (e.g.,
cu121). - Re-install dependencies:
pip install -r requirements.txt
- Open
-
Run the Server with Gunicorn: When using a GPU, run a single worker to avoid memory conflicts.
# Use 1 worker for a GPU setup gunicorn --workers 1 --threads 4 --bind 0.0.0.0:9001 server:app
The frontend is composed of static files.
- Serve the project files using a simple local web server. For example, using Node.js:
npx http-server . --port 8080 - The application will be accessible at
http://localhost:8080.
This section documents the exact steps we used after initial deployment to stabilize the backend and publish the production frontend.
- The backend expects a speaker file via
SPEAKER_WAV_PATHinserver.py, defaulting tokanye_voice_prompt.wavin the process working directory. - Ensure the service WorkingDirectory points to the project root (we use
/opt/ai-yzy).- Verify:
systemctl show -p WorkingDirectory ai-yzy-backend readlink -f /proc/$(pgrep -f 'gunicorn.*9001' | head -n 1)/cwd
- Verify:
- Place your voice file at
/opt/ai-yzy/kanye_voice_prompt.wavand ensure read perms:ls -l /opt/ai-yzy/kanye_voice_prompt.wav chmod 0644 /opt/ai-yzy/kanye_voice_prompt.wav
Optimal: 16 kHz, mono, 16‑bit PCM WAV, clean 5–15s speech.
- Convert with ffmpeg (install once):
apt-get update && apt-get install -y ffmpeg cd /opt/ai-yzy cp -v kanye_voice_prompt.wav kanye_voice_prompt.wav.bak.$(date +%F-%H%M%S) ffmpeg -y -i kanye_voice_prompt.wav -ac 1 -ar 16000 -c:a pcm_s16le kanye_voice_prompt_clean.wav mv -v kanye_voice_prompt_clean.wav kanye_voice_prompt.wav chmod 0644 kanye_voice_prompt.wav file kanye_voice_prompt.wav
Alternative (SoX):
apt-get update && apt-get install -y sox libsox-fmt-all
cd /opt/ai-yzy
sox kanye_voice_prompt.wav -b 16 -c 1 -r 16000 kanye_voice_prompt_clean.wav
mv -v kanye_voice_prompt_clean.wav kanye_voice_prompt.wav
chmod 0644 kanye_voice_prompt.wavNo service restart is required; the server reads the file per request.
systemctl restart ai-yzy-backend
journalctl -u ai-yzy-backend -n 50 --no-pager
# Direct (bypass Nginx)
curl -fSsv -H "Content-Type: application/json" \
-d '{"text":"Hello from YZY"}' \
http://127.0.0.1:9001/synthesize -o /dev/nullYou should see HTTP/1.1 200 OK with Content-Type: audio/wav and model load logs such as INFO: TTS model loaded successfully.
We use a dedicated site config at /etc/nginx/sites-available/ai-yzy with a symlink in sites-enabled/ and Let’s Encrypt TLS.
Key directives:
server {
listen 443 ssl http2;
server_name sim.virtual-yzy.com;
ssl_certificate /etc/letsencrypt/live/sim.virtual-yzy.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/sim.virtual-yzy.com/privkey.pem;
root /var/www/ai-yzy/dist;
index index.html;
location /assets/ {
try_files $uri =404;
access_log off;
expires 1y;
add_header Cache-Control "public, immutable";
}
location / {
try_files $uri $uri/ /index.html; # SPA fallback
}
location /synthesize {
proxy_pass http://127.0.0.1:9001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
}
}Reload after changes:
nginx -t && systemctl reload nginxIssue a Let’s Encrypt certificate for the landing domain using the nginx plugin. This mirrors exactly what we used in production.
Prereqs:
- DNS:
A @ -> 178.128.227.142 - DNS:
CNAME www -> @(orwww -> virtual-yzy.comdepending on your DNS UI) - Port 80 open for HTTP-01 challenge
- Create an HTTP server block so Certbot can validate:
If you manage sites in sites-available/:
tee /etc/nginx/sites-available/virtual-yzy.conf >/dev/null <<'NGX'
server {
listen 80;
server_name virtual-yzy.com www.virtual-yzy.com;
root /var/www/ai-yzy/dist;
index about.html index.html;
location /assets/ {
try_files $uri =404;
access_log off;
expires 1y;
add_header Cache-Control "public, immutable";
}
location / {
try_files $uri $uri/ /about.html;
}
}
NGX
ln -sf /etc/nginx/sites-available/virtual-yzy.conf /etc/nginx/sites-enabled/virtual-yzy.conf
nginx -t && systemctl reload nginxOr, if you use conf.d/:
tee /etc/nginx/conf.d/virtual-yzy.conf >/dev/null <<'NGX'
server {
listen 80;
server_name virtual-yzy.com www.virtual-yzy.com;
root /var/www/ai-yzy/dist;
index about.html index.html;
location /assets/ {
try_files $uri =404;
access_log off;
expires 1y;
add_header Cache-Control "public, immutable";
}
location / {
try_files $uri $uri/ /about.html;
}
}
NGX
nginx -t && systemctl reload nginx- Install Certbot nginx plugin:
apt-get update && apt-get install -y certbot python3-certbot-nginx- Issue the certificate for apex + www and enable redirect:
certbot --nginx -d virtual-yzy.com -d www.virtual-yzy.com --redirect -m dev@virtual-yzy.com --agree-tos -nIf www hasn’t propagated yet, issue apex first, then expand later:
certbot --nginx -d virtual-yzy.com --redirect -m dev@virtual-yzy.com --agree-tos -n
certbot --nginx -d virtual-yzy.com -d www.virtual-yzy.com --redirect -m dev@virtual-yzy.com --agree-tos -n --expand- Verify and auto‑renew:
nginx -t && systemctl reload nginx
curl -I https://virtual-yzy.com/
curl -I https://www.virtual-yzy.com/
certbot certificates | sed -n '1,200p'
systemctl status certbot.timer
certbot renew --dry-runBuild with Vite and upload the built dist/ to the Nginx root.
Local (Windows):
# In C:\Users\<you>\ai-yzy
$env:GEMINI_API_KEY="<your_key_here>" # or define in .env.local
npm ci
npm run build
$IP = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkey"
ssh -i $KEY root@${IP} "mkdir -p /var/www/ai-yzy/dist"
scp -i $KEY -r dist/* "root@${IP}:/var/www/ai-yzy/dist/"On droplet:
chown -R www-data:www-data /var/www/ai-yzy
find /var/www/ai-yzy -type d -exec chmod 755 {} \;
find /var/www/ai-yzy -type f -exec chmod 644 {} \;
nginx -t && systemctl reload nginxcurl -I https://sim.virtual-yzy.com/
curl -I https://sim.virtual-yzy.com/assets/index-*.css # should be text/css
curl -I https://sim.virtual-yzy.com/assets/index-*.js # should be application/javascriptIf CSS/JS are returned as text/html, it means the files are missing at root and SPA fallback served index.html. Ensure dist/ is fully deployed to /var/www/ai-yzy/dist.
This section documents the exact update flow we use after initial deploy. It keeps the app fast and simple by pushing the built static files to the droplet and reloading Nginx.
Use this when you only changed server.py (voice tuning etc.). No Docker. No extra env.
- Upload backend file from Windows (PowerShell):
$IP = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkeyaugust" # adjust your key path
scp -i $KEY C:\Users\<you>\ai-yzy\server.py "root@${IP}:/opt/ai-yzy/"- Restart the backend on the droplet (run on the droplet shell):
sudo systemctl restart ai-yzy-backend
sudo journalctl -u ai-yzy-backend -n 100 --no-pager- Generate a test WAV on the droplet (no playback on server — most droplets have no sound device):
TEXT='{"text":"Yo, YZY with the new defaults. One two, one two."}'
curl -fSsv -H "Content-Type: application/json" -d "$TEXT" \
http://127.0.0.1:9001/synthesize -o /root/yzy_test.wav
ls -lh /root/yzy_test.wav- Pull and play locally (Windows PowerShell):
$IP = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkeyaugust"
scp -i $KEY "root@${IP}:/root/yzy_test.wav" .\yzy_test.wav
start .\yzy_test.wavNotes:
- Don’t play audio on the droplet.
aplaywill fail with ALSA errors (no sound card). Always download and listen locally. - Change the input text slightly each test to bypass backend caching.
Current backend (server.py) uses Coqui XTTS-v2 and intentionally strips SSML via ssml_to_text() to avoid browser TTS and watermark side‑effects. That means tone/prosody tags are not passed through right now.
If you want to use Resemble’s Chatterbox flow without extra complexity:
- Keep
server.pyas the single synth endpoint. - Option A: Have the client (Chatterbox) emit plain text and let our backend handle humanization (rate, gap, slight pitch). This is the current setup.
- Option B (next step): Accept optional request fields (e.g.,
pitch_semitones,rate,gap_ms) in JSON so you can drive expressivity per‑utterance from Chatterbox without env or server restarts. Minimal patch, no SSML, no watermark.
If you want me to implement Option B, I’ll add request-level overrides with strict bounds and keep the current defaults.
- Build locally (Windows PowerShell):
npm ci
npm run build- Upload to droplet:
$IP = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkey" # adjust to your key path
# Ensure target exists (first time only)
ssh -i $KEY root@${IP} "mkdir -p /var/www/ai-yzy/dist"
# Upload Vite build output
scp -i $KEY -r .\dist\* "root@${IP}:/var/www/ai-yzy/dist/"
# Upload the About landing page (not emitted by Vite by default)
scp -i $KEY .\about.html "root@${IP}:/var/www/ai-yzy/dist/about.html"- Make main domain land on About (once):
We set Nginx to prefer about.html as index. Use the correct site file for your setup:
# If your site file is /etc/nginx/sites-available/default
ssh -i $KEY "root@${IP}" "sed -i.bak -E 's/index\s+[^;]+;/index about.html index.html;/' /etc/nginx/sites-available/default && nginx -t && systemctl reload nginx"
# Or if you use /etc/nginx/conf.d/default.conf
ssh -i $KEY "root@${IP}" "sed -i.bak -E 's/index\s+[^;]+;/index about.html index.html;/' /etc/nginx/conf.d/default.conf && nginx -t && systemctl reload nginx"- Verify:
ssh -i $KEY "root@${IP}" "curl -I http://localhost/"- Root (
/) should serveabout.htmlby default. - The app is still available at
/index.html.
If you track this repo on the server and prefer image rebuilds:
ssh -i $KEY root@${IP}
cd /opt/ai-yzy # adjust to your path
git pull
docker compose build --no-cache frontend
docker compose up -dThis approach bakes static assets and nginx.conf into the image. Useful when you want infra-managed, reproducible deployments.