Skip to content

aidanburrowes/Talk2Doc

Repository files navigation

Talk2Doc

A containerized Flask microservice architecture for medical voice assistant functionality, including text-to-speech, speech-to-text, OAuth authentication, and medical AI assistant.

Gemini_Generated_Image_5ft0th5ft0th5ft0

Architecture

The application can run in two modes:

  1. Monolithic: Single service handling all endpoints
  2. Microservices: Separate services for TTS, STT, OAuth, Medical Assistant, and API Gateway

Quick Start

Prerequisites

  • Docker and Docker Compose
  • ElevenLabs API key

Setup

  1. Set your ElevenLabs API key (choose one method):
# Option 1: Create .env file (recommended)
echo "ELEVENLABS_API_KEY=your_api_key_here" > .env

# Option 2: Export as environment variable
export ELEVENLABS_API_KEY=your_api_key_here

# Option 3: Pass inline when running
ELEVENLABS_API_KEY=your_api_key_here docker compose up -d

Note: Docker Compose automatically reads .env files from the project root if they exist, but it's optional. You can also use environment variables.

  1. Run the service:
# Using Make (recommended)
make run

# Or using docker compose directly
docker compose up -d
  1. The API will be available at http://localhost:8000

API Endpoints

Text-to-Speech

  • POST /api/tts
  • Request body:
{
  "text": "Your text here",
  "voice_id": "TX3LPaxmHKxFdv7VOQHJ" (optional),
  "model_id": "eleven_v3" (optional)
}

Speech-to-Text

  • POST /api/stt
  • Send audio file as multipart/form-data with key audio

Health Check

  • GET /health

OAuth Authentication

  • GET /api/auth/providers - Get available OAuth providers
  • GET /api/auth/login/{provider} - Initiate OAuth login (google, github)
  • GET /api/auth/callback/{provider} - OAuth callback endpoint
  • GET /api/auth/user - Get current user (requires Bearer token)
  • POST /api/auth/verify - Verify JWT token
  • POST /api/auth/logout - Logout endpoint

See OAUTH_SETUP.md for detailed OAuth setup and usage.

Medical Assistant

  • POST /api/medical/chat - Chat with medical AI assistant
    • Request body:
    {
      "text": "I have a headache",
      "history": "",  // Optional: previous conversation history
      "max_tokens": 512,  // Optional: default 220
      "temperature": 0.2  // Optional: default 0.1
    }
    • Response:
    {
      "response": "Medical guidance here...",
      "history": "User: I have a headache\nAssistant: Medical guidance here...\n",
      "success": true
    }
  • POST /api/medical/reset - Reset conversation history

Microservices Mode

To run as separate microservices:

# Start microservices
make microservices-up

# Rebuild and restart with code changes
make microservices-rebuild

# Or step by step:
make microservices-build    # Build images
make microservices-down     # Stop services
make microservices-up       # Start services

This will start:

  • TTS service on port 8001
  • STT service on port 8002
  • OAuth service on port 8003
  • Medical Assistant service on port 8004
  • API Gateway on port 8000

Rebuilding Microservices with Code Changes

When you make code changes, rebuild and restart:

# Rebuild images and restart all services
make microservices-rebuild

# Or manually:
make microservices-down
make microservices-build
make microservices-up

Development

Development Mode (Hot Reload)

For development with automatic code reloading when you make changes:

# Start in development mode (with hot reload)
make dev

# Or run in background
make dev-up

# View logs
make dev-logs

# Stop development containers
make dev-down

Note: In development mode, your code changes are automatically reflected without rebuilding the container. Flask's debug mode is enabled, so the server will restart when you modify Python files.

Local Development (Without Docker)

pip install -r requirements.txt
python run.py

Production Docker Commands

# Build image
make build

# Start services
make up

# View logs
make logs

# Stop services
make down

# Clean up
make clean

Project Structure

.
├── app/
│   ├── __init__.py          # Flask app factory
│   ├── config.py            # Configuration
│   ├── routes/              # API routes
│   │   ├── tts.py
│   │   ├── stt.py
│   │   ├── oauth.py
│   │   ├── medical.py
│   │   └── health.py
│   └── services/            # Business logic
│       ├── elevenlabs_service.py
│       └── oauth_service.py
├── Dockerfile
├── docker-compose.yml       # Monolithic setup
├── docker-compose.microservices.yml  # Microservices setup
├── requirements.txt
└── run.py                   # Entry point

Environment Variables

ElevenLabs Configuration

  • ELEVENLABS_API_KEY: Your ElevenLabs API key (required)
  • DEFAULT_VOICE_ID: Default voice ID (optional)
  • DEFAULT_MODEL_ID: Default model ID (optional)

OAuth Configuration

  • GOOGLE_CLIENT_ID: Google OAuth client ID (optional)
  • GOOGLE_CLIENT_SECRET: Google OAuth client secret (optional)
  • GITHUB_CLIENT_ID: GitHub OAuth client ID (optional)
  • GITHUB_CLIENT_SECRET: GitHub OAuth client secret (optional)
  • JWT_SECRET_KEY: Secret key for JWT token signing (required for OAuth)
  • SECRET_KEY: Secret key for Flask sessions (required for OAuth)
  • OAUTH_REDIRECT_URI: OAuth redirect URI (optional)

Medical Assistant Configuration

  • HF_TOKEN: Hugging Face API token (required for medical service)
  • MEDICAL_BASE_URL: Hugging Face endpoint base URL (optional, defaults to provided endpoint)
  • MEDICAL_MODEL: Medical model name (optional, defaults to "medQA.Q8_0.gguf")

Server Configuration

  • HOST: Server host (default: 0.0.0.0)
  • PORT: Server port (default: 8000)
  • FLASK_ENV: Flask environment (default: production)
  • FLASK_DEBUG: Enable debug mode (default: False)
  • SERVICE_TYPE: Service type for microservices mode (tts, stt, oauth, medical, or empty for monolithic)

Complete Voice Assistant Flow

The microservices work together to provide a complete voice-based medical assistant:

  1. User speaks → STT service converts audio to text
  2. Text input → Medical Assistant service generates AI response
  3. Response text → TTS service converts to speech
  4. Audio output → User hears the medical guidance

Example integration flow:

// 1. Convert speech to text
const sttResponse = await fetch('http://localhost:8002/api/stt', {
  method: 'POST',
  body: formData  // Contains audio file
});
const { text } = await sttResponse.json();

// 2. Get medical response
const medicalResponse = await fetch('http://localhost:8004/api/medical/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text, history: previousHistory })
});
const { response, history } = await medicalResponse.json();

// 3. Convert response to speech
const ttsResponse = await fetch('http://localhost:8001/api/tts', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text: response })
});
const audioBlob = await ttsResponse.blob();

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors