Skip to content

DagaVedant/VoiceGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceGPT

Python OpenAI Status Platform

A voice-controlled AI assistant powered by GPT-4o + DALL·E 3

Speak naturally, generate AI images with your voice, and interact with OpenAI models completely hands-free.


Features

Voice Assistant

  • Wake-word activation using "hello"
  • Real-time speech recognition
  • AI-powered conversations using GPT-4o
  • Text-to-speech responses
  • Continuous listening loop after each response

Voice Image Generation

  • Generate images completely through voice commands
  • Uses OpenAI DALL·E 3
  • Automatically saves generated images locally
  • Opens generated images instantly after creation

Simple Setup

  • Minimal dependencies
  • Easy .env configuration
  • Beginner-friendly project structure
  • Runs directly from the terminal

Project Structure

VoiceGPT/
│
├── VoiceGPT.py                    # Main voice assistant
├── VoiceImageGeneration.py        # Voice-controlled image generation
├── requirements.txt               # Python dependencies
├── .env.example                   # Environment variable template
├── README.md
│
└── DALLE Image Generation Photos/ # Generated images

Installation

1. Clone the repository

git clone https://github.com/DagaVedant/VoiceGPT.git
cd VoiceGPT

2. Install dependencies

pip install -r requirements.txt

3. Install PyAudio

Windows

pip install pipwin
pipwin install pyaudio

macOS

brew install portaudio
pip install pyaudio

Environment Setup

Create a .env file in the project root:

OPENAI_API_KEY=your_api_key_here
IMAGE_SAVE_PATH=DALLE Image Generation Photos/

You can get an API key from:

https://platform.openai.com/account/api-keys


Running the Voice Assistant

python VoiceGPT.py

Workflow

  1. Wait for microphone calibration
  2. Say:
hello
  1. Ask any question naturally
  2. The AI responds using text-to-speech
  3. The assistant returns to listening mode

Running Voice Image Generation

python VoiceImageGeneration.py

Example prompts

  • "A futuristic city at sunset"
  • "A cyberpunk samurai standing in neon rain"
  • "A realistic lion in the African savanna"
  • "An astronaut skateboarding on Mars"

Generated images are automatically:

  • Saved locally
  • Opened instantly after generation

Technologies Used

Technology Purpose
Python Core programming language
OpenAI GPT-4o Conversational AI
DALL·E 3 AI image generation
SpeechRecognition Voice recognition
PyAudio Microphone input
pyttsx3 Text-to-speech

Future Improvements

  • Custom wake words
  • GUI version
  • Streaming responses
  • Multi-language support
  • Voice customization
  • Conversation memory
  • Faster response handling
  • Modern desktop app interface

Notes

  • A working microphone is required.
  • OpenAI API usage may incur costs.
  • Generated images are ignored through .gitignore.
  • Speech recognition quality depends on microphone quality and background noise.

Contributing

Pull requests, ideas, and improvements are welcome.

If you find bugs or want to add features, feel free to open an issue.


License

This project is open source and available under the MIT License.


If you like this project, consider starring the repo

Built with Python, AI, and way too much experimentation.