Skip to content

vmbbz/virtual-yzy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI YZY: The Genius Experience

A real-time, interactive music and chat application powered by AI. Chat with AI YZY, a personality inspired by Kanye West, and take control of the music as a DJ using a MIDI controller.

Features:

  • AI Chat: Have a conversation with AI YZY, a brilliant and erratic musical genius AI.
  • High-Performance Voice Synthesis: AI YZY's responses are spoken aloud using a custom, high-performance voice powered by the Coqui XTTS-v2 model.
  • AI Art Generation: Ask the AI to create visual art from a text description.
  • Real-time Music DJ: Switch to DJ mode to control generative music in real-time.
  • Reactive Visuals: The interface reacts to the music and the AI's speech.

Tech Stack

  • Frontend: HTML, CSS, TypeScript, Lit
  • AI Models (Google Gemini API):
    • Chat & Function Calling: gemini-2.5-flash
    • Image Generation: imagen-4.0-generate-001
  • Text-to-Speech: Coqui XTTS-v2 (self-hosted via Flask/Gunicorn)

Project Setup

This project uses a static frontend and a self-hosted Python backend for voice synthesis.

1. Gemini API Key

You need a Google Gemini API key to power the chat, music, and art generation.

  1. Visit Google AI Studio to create an API key.
  2. Open the AI YZY application in your browser.
  3. Click the settings icon in the top-right corner, paste your API key, and click "Save & Start".

2. Backend Voice Server: Performance Guide

High-quality voice synthesis is computationally expensive. The following guide explains how to run the included server.py for optimal performance and cost-effectiveness.

Important

Voice File is Required! The voice server will not work without a voice sample. You must place a high-quality, clean .wav audio sample of the target voice in the project's root directory and name it kanye_voice.wav.


Troubleshooting the Voice Server

If you see a "CORS" or "NetworkError" message in the browser console, or if the voice falls back to the robotic browser voice, it almost always means the Python server is not running correctly.

Follow these steps to fix it:

  1. Check the Terminal: Look at the terminal where you started the Python server. Are there any error messages? The error will tell you exactly what's wrong (e.g., kanye_voice.wav not found, ModuleNotFoundError, etc.).
  2. Verify Voice File: Make sure the kanye_voice.wav file exists in the main project directory and is named correctly.
  3. Check Dependencies: Ensure you have installed all packages correctly by running pip install -r requirements.txt.
  4. Confirm Port: The server runs on port 9001 by default. Make sure no other process is using this port.

Option 1: Optimized CPU Deployment (Recommended Default)

This setup uses the high-performance XTTS-v2 model, which is significantly faster than many alternatives on a CPU. It is ideal for running in a standard Docker container on platforms like DigitalOcean Apps.

  1. Place Voice File: Obtain a high-quality, clean .wav audio sample of the target voice (e.g., Kanye West). The sample should be short (5-15 seconds), clear, and contain no background noise. Save it as kanye_voice.wav in the project's root directory.

  2. Install Dependencies:

    # Create and activate a virtual environment (recommended)
    python3 -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    
    # Install Python packages for CPU
    pip install -r requirements.txt
  3. Run the Production Server: Do not use flask run. For production, use a robust WSGI server like Gunicorn to handle requests efficiently. The included server.py pre-loads the model and caches responses for maximum speed.

    # Use 2-4 workers for a multi-core CPU. Binds to port 9001.
    gunicorn --workers 2 --threads 4 --bind 0.0.0.0:9001 server:app

    The server will now be running on http://localhost:9001. The frontend application will connect to it automatically.


Option 2: Serverless GPU (Most Cost-Effective Speed)

For the absolute best performance (nearly instant responses) without the cost of a dedicated GPU, deploy the voice server as a microservice on a "serverless GPU" platform.

  • How it works: These services load the model onto a GPU only when a request comes in. You pay per second of processing time, so it's incredibly cheap when idle.
  • Platforms: Modal, Banana.dev, Replicate.
  • Implementation:
    1. Adapt the server.py logic for the platform of your choice.
    2. Deploy it as a separate service.
    3. Update the fetch URL in index.tsx from http://localhost:9001/synthesize to your new serverless API endpoint.

Option 3: Dedicated GPU Droplet (Maximum Power)

If you have very high, consistent traffic, a dedicated GPU droplet on DigitalOcean or another cloud provider is the most powerful option.

  1. Install CUDA: Ensure you have the NVIDIA CUDA Toolkit installed on your system (e.g., version 12.1).

  2. Install GPU-enabled Dependencies:

    • Open requirements.txt.
    • Comment out the --extra-index-url line for cpu.
    • Uncomment the line for your CUDA version (e.g., cu121).
    • Re-install dependencies: pip install -r requirements.txt
  3. Run the Server with Gunicorn: When using a GPU, run a single worker to avoid memory conflicts.

    # Use 1 worker for a GPU setup
    gunicorn --workers 1 --threads 4 --bind 0.0.0.0:9001 server:app

3. Frontend Development

The frontend is composed of static files.

  1. Serve the project files using a simple local web server. For example, using Node.js:
    npx http-server . --port 8080
  2. The application will be accessible at http://localhost:8080.

Post-Deploy Guide (GPU Droplet + Nginx)

This section documents the exact steps we used after initial deployment to stabilize the backend and publish the production frontend.

1) Voice Prompt File and Path

  • The backend expects a speaker file via SPEAKER_WAV_PATH in server.py, defaulting to kanye_voice_prompt.wav in the process working directory.
  • Ensure the service WorkingDirectory points to the project root (we use /opt/ai-yzy).
    • Verify:
      systemctl show -p WorkingDirectory ai-yzy-backend
      readlink -f /proc/$(pgrep -f 'gunicorn.*9001' | head -n 1)/cwd
  • Place your voice file at /opt/ai-yzy/kanye_voice_prompt.wav and ensure read perms:
    ls -l /opt/ai-yzy/kanye_voice_prompt.wav
    chmod 0644 /opt/ai-yzy/kanye_voice_prompt.wav

2) Improve Prompt Audio Quality (on droplet)

Optimal: 16 kHz, mono, 16‑bit PCM WAV, clean 5–15s speech.

  • Convert with ffmpeg (install once):
    apt-get update && apt-get install -y ffmpeg
    cd /opt/ai-yzy
    cp -v kanye_voice_prompt.wav kanye_voice_prompt.wav.bak.$(date +%F-%H%M%S)
    ffmpeg -y -i kanye_voice_prompt.wav -ac 1 -ar 16000 -c:a pcm_s16le kanye_voice_prompt_clean.wav
    mv -v kanye_voice_prompt_clean.wav kanye_voice_prompt.wav
    chmod 0644 kanye_voice_prompt.wav
    file kanye_voice_prompt.wav

Alternative (SoX):

apt-get update && apt-get install -y sox libsox-fmt-all
cd /opt/ai-yzy
sox kanye_voice_prompt.wav -b 16 -c 1 -r 16000 kanye_voice_prompt_clean.wav
mv -v kanye_voice_prompt_clean.wav kanye_voice_prompt.wav
chmod 0644 kanye_voice_prompt.wav

No service restart is required; the server reads the file per request.

3) Backend Health Checks

systemctl restart ai-yzy-backend
journalctl -u ai-yzy-backend -n 50 --no-pager

# Direct (bypass Nginx)
curl -fSsv -H "Content-Type: application/json" \
  -d '{"text":"Hello from YZY"}' \
  http://127.0.0.1:9001/synthesize -o /dev/null

You should see HTTP/1.1 200 OK with Content-Type: audio/wav and model load logs such as INFO: TTS model loaded successfully.

4) Nginx Site and TLS

We use a dedicated site config at /etc/nginx/sites-available/ai-yzy with a symlink in sites-enabled/ and Let’s Encrypt TLS.

Key directives:

server {
  listen 443 ssl http2;
  server_name sim.virtual-yzy.com;

  ssl_certificate     /etc/letsencrypt/live/sim.virtual-yzy.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/sim.virtual-yzy.com/privkey.pem;

  root /var/www/ai-yzy/dist;
  index index.html;

  location /assets/ {
    try_files $uri =404;
    access_log off;
    expires 1y;
    add_header Cache-Control "public, immutable";
  }

  location / {
    try_files $uri $uri/ /index.html; # SPA fallback
  }

  location /synthesize {
    proxy_pass http://127.0.0.1:9001;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_read_timeout 600s;
    proxy_send_timeout 600s;
  }
}

Reload after changes:

nginx -t && systemctl reload nginx

4.1) TLS for Landing Domain (virtual-yzy.com + www)

Issue a Let’s Encrypt certificate for the landing domain using the nginx plugin. This mirrors exactly what we used in production.

Prereqs:

  • DNS: A @ -> 178.128.227.142
  • DNS: CNAME www -> @ (or www -> virtual-yzy.com depending on your DNS UI)
  • Port 80 open for HTTP-01 challenge
  1. Create an HTTP server block so Certbot can validate:

If you manage sites in sites-available/:

tee /etc/nginx/sites-available/virtual-yzy.conf >/dev/null <<'NGX'
server {
  listen 80;
  server_name virtual-yzy.com www.virtual-yzy.com;

  root /var/www/ai-yzy/dist;
  index about.html index.html;

  location /assets/ {
    try_files $uri =404;
    access_log off;
    expires 1y;
    add_header Cache-Control "public, immutable";
  }

  location / {
    try_files $uri $uri/ /about.html;
  }
}
NGX

ln -sf /etc/nginx/sites-available/virtual-yzy.conf /etc/nginx/sites-enabled/virtual-yzy.conf
nginx -t && systemctl reload nginx

Or, if you use conf.d/:

tee /etc/nginx/conf.d/virtual-yzy.conf >/dev/null <<'NGX'
server {
  listen 80;
  server_name virtual-yzy.com www.virtual-yzy.com;

  root /var/www/ai-yzy/dist;
  index about.html index.html;

  location /assets/ {
    try_files $uri =404;
    access_log off;
    expires 1y;
    add_header Cache-Control "public, immutable";
  }

  location / {
    try_files $uri $uri/ /about.html;
  }
}
NGX

nginx -t && systemctl reload nginx
  1. Install Certbot nginx plugin:
apt-get update && apt-get install -y certbot python3-certbot-nginx
  1. Issue the certificate for apex + www and enable redirect:
certbot --nginx -d virtual-yzy.com -d www.virtual-yzy.com --redirect -m dev@virtual-yzy.com --agree-tos -n

If www hasn’t propagated yet, issue apex first, then expand later:

certbot --nginx -d virtual-yzy.com --redirect -m dev@virtual-yzy.com --agree-tos -n
certbot --nginx -d virtual-yzy.com -d www.virtual-yzy.com --redirect -m dev@virtual-yzy.com --agree-tos -n --expand
  1. Verify and auto‑renew:
nginx -t && systemctl reload nginx
curl -I https://virtual-yzy.com/
curl -I https://www.virtual-yzy.com/
certbot certificates | sed -n '1,200p'
systemctl status certbot.timer
certbot renew --dry-run

5) Build Frontend and Deploy to Nginx

Build with Vite and upload the built dist/ to the Nginx root.

Local (Windows):

# In C:\Users\<you>\ai-yzy
$env:GEMINI_API_KEY="<your_key_here>"   # or define in .env.local
npm ci
npm run build

$IP  = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkey"
ssh -i $KEY root@${IP} "mkdir -p /var/www/ai-yzy/dist"
scp -i $KEY -r dist/* "root@${IP}:/var/www/ai-yzy/dist/"

On droplet:

chown -R www-data:www-data /var/www/ai-yzy
find /var/www/ai-yzy -type d -exec chmod 755 {} \;
find /var/www/ai-yzy -type f -exec chmod 644 {} \;
nginx -t && systemctl reload nginx

6) Verify Static Assets and MIME Types

curl -I https://sim.virtual-yzy.com/
curl -I https://sim.virtual-yzy.com/assets/index-*.css   # should be text/css
curl -I https://sim.virtual-yzy.com/assets/index-*.js    # should be application/javascript

If CSS/JS are returned as text/html, it means the files are missing at root and SPA fallback served index.html. Ensure dist/ is fully deployed to /var/www/ai-yzy/dist.


Post-Production Maintenance & Updates

This section documents the exact update flow we use after initial deploy. It keeps the app fast and simple by pushing the built static files to the droplet and reloading Nginx.

Simple Maintenance (Backend) — Minimal, copy/paste

Use this when you only changed server.py (voice tuning etc.). No Docker. No extra env.

  1. Upload backend file from Windows (PowerShell):
$IP  = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkeyaugust"  # adjust your key path
scp -i $KEY C:\Users\<you>\ai-yzy\server.py "root@${IP}:/opt/ai-yzy/"
  1. Restart the backend on the droplet (run on the droplet shell):
sudo systemctl restart ai-yzy-backend
sudo journalctl -u ai-yzy-backend -n 100 --no-pager
  1. Generate a test WAV on the droplet (no playback on server — most droplets have no sound device):
TEXT='{"text":"Yo, YZY with the new defaults. One two, one two."}'
curl -fSsv -H "Content-Type: application/json" -d "$TEXT" \
  http://127.0.0.1:9001/synthesize -o /root/yzy_test.wav
ls -lh /root/yzy_test.wav
  1. Pull and play locally (Windows PowerShell):
$IP  = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkeyaugust"
scp -i $KEY "root@${IP}:/root/yzy_test.wav" .\yzy_test.wav
start .\yzy_test.wav

Notes:

  • Don’t play audio on the droplet. aplay will fail with ALSA errors (no sound card). Always download and listen locally.
  • Change the input text slightly each test to bypass backend caching.

Chatterbox alignment (keep it simple)

Current backend (server.py) uses Coqui XTTS-v2 and intentionally strips SSML via ssml_to_text() to avoid browser TTS and watermark side‑effects. That means tone/prosody tags are not passed through right now.

If you want to use Resemble’s Chatterbox flow without extra complexity:

  • Keep server.py as the single synth endpoint.
  • Option A: Have the client (Chatterbox) emit plain text and let our backend handle humanization (rate, gap, slight pitch). This is the current setup.
  • Option B (next step): Accept optional request fields (e.g., pitch_semitones, rate, gap_ms) in JSON so you can drive expressivity per‑utterance from Chatterbox without env or server restarts. Minimal patch, no SSML, no watermark.

If you want me to implement Option B, I’ll add request-level overrides with strict bounds and keep the current defaults.

A) Update Frontend via SCP (preferred quick path)

  1. Build locally (Windows PowerShell):
npm ci
npm run build
  1. Upload to droplet:
$IP  = "178.128.227.142"
$KEY = "C:\Users\<you>\sshkey"   # adjust to your key path

# Ensure target exists (first time only)
ssh -i $KEY root@${IP} "mkdir -p /var/www/ai-yzy/dist"

# Upload Vite build output
scp -i $KEY -r .\dist\* "root@${IP}:/var/www/ai-yzy/dist/"

# Upload the About landing page (not emitted by Vite by default)
scp -i $KEY .\about.html "root@${IP}:/var/www/ai-yzy/dist/about.html"
  1. Make main domain land on About (once):

We set Nginx to prefer about.html as index. Use the correct site file for your setup:

# If your site file is /etc/nginx/sites-available/default
ssh -i $KEY "root@${IP}" "sed -i.bak -E 's/index\s+[^;]+;/index about.html index.html;/' /etc/nginx/sites-available/default && nginx -t && systemctl reload nginx"

# Or if you use /etc/nginx/conf.d/default.conf
ssh -i $KEY "root@${IP}" "sed -i.bak -E 's/index\s+[^;]+;/index about.html index.html;/' /etc/nginx/conf.d/default.conf && nginx -t && systemctl reload nginx"
  1. Verify:
ssh -i $KEY "root@${IP}" "curl -I http://localhost/"
  • Root (/) should serve about.html by default.
  • The app is still available at /index.html.

B) Optional: Rebuild via Docker Compose on droplet

If you track this repo on the server and prefer image rebuilds:

ssh -i $KEY root@${IP}
cd /opt/ai-yzy   # adjust to your path
git pull
docker compose build --no-cache frontend
docker compose up -d

This approach bakes static assets and nginx.conf into the image. Useful when you want infra-managed, reproducible deployments.

About

YZY brought to life as a vitual AI DJ. Have fun chopping and mixing. He can defend himself agnst rug allegations using google search validation, stream live music sampled real time from the MIDI Deck to Google Lyria, Twitter archive access, Add more tools as you need. Powered by Gemini.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors