A Flask-based REST API service that integrates Fish Audio's ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) capabilities. Built for CalHacks 2025.
- ASR Endpoint: Convert audio files to text transcriptions
- TTS Endpoint: Convert text to synthesized speech audio
- JanitorAI Integration: Example script for accessing JanitorAI completions API
- Mock Mode: Development mode with mock responses (no API calls)
- Health Check: Monitor service status and configuration
calhacks25/
├── src/
│ ├── fish/ # Fish Audio integration module
│ │ ├── __init__.py
│ │ ├── client.py # Session manager & API configuration
│ │ ├── asr.py # Speech-to-text functionality
│ │ └── tts.py # Text-to-speech functionality
│ └── app.py # Flask REST API server
├── app/ # Web frontend (optional)
│ ├── static/
│ │ ├── css/
│ │ ├── js/
│ │ └── images/
│ └── templates/
│ └── index.html
├── main.py # JanitorAI example script
├── requirements.txt
├── .env # Environment configuration
└── README.md
source venv/bin/activate # On macOS/Linux
# or
venv\Scripts\activate # On Windowspip install -r requirements.txtCreate or edit .env file with your Fish Audio API credentials:
# Fish Audio API Configuration
FISH_API_KEY=your_api_key_here
FISH_API_BASE=https://api.fish.audio
FISH_MOCK=false # Set to 'true' for mock mode (no API calls)
# Flask Server Configuration
HOST=0.0.0.0
PORT=5000
FLASK_DEBUG=truecd src
python app.pyThe API will be available at http://0.0.0.0:5000
python main.pyGET /healthReturns service status and configuration info.
POST /asr
Content-Type: application/octet-stream
# OR
Content-Type: multipart/form-data (with 'audio' field)
# Body: audio file bytes (WAV, MP3, etc.)Returns JSON with transcribed text.
Example:
curl -X POST http://localhost:5000/asr \
-H "Content-Type: application/octet-stream" \
--data-binary @audio.wavResponse:
{
"transcript": "Hello world",
"mock": false
}POST /tts
Content-Type: application/json
{
"text": "Hello world"
}Returns JSON with base64-encoded audio data.
Example:
curl -X POST http://localhost:5000/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello world"}'Response:
{
"audio": "base64_encoded_audio_data...",
"format": "wav",
"mock": false
}Set FISH_MOCK=true in .env to use mock responses without making real API calls:
- ASR returns: "This is a mock transcription of your audio."
- TTS returns: A simple 440Hz tone (1 second WAV file)
The project makes actual HTTP calls to Fish Audio API endpoints:
- ASR:
POST {FISH_API_BASE}/v1/asr - TTS:
POST {FISH_API_BASE}/v1/tts
Authentication uses Bearer token from FISH_API_KEY.
- Python 3.8+
- Flask 3.0.0
- python-dotenv 1.0.0
- requests 2.31.0
The main.py script demonstrates how to access the JanitorAI completions API for CalHacks 2025:
- Endpoint:
https://janitorai.com/hackathon/completions - API Key:
calhacks2047 - Model: OpenAI-compatible chat completions format
- The
.envfile contains API keys and should not be committed to version control - Audio files should be in standard formats (WAV, MP3, etc.)
- TTS responses are returned as base64-encoded audio for easy JSON transmission
- The service runs on
0.0.0.0:5000by default (configurable via environment variables)