Speak naturally, generate AI images with your voice, and interact with OpenAI models completely hands-free.
- Wake-word activation using "hello"
- Real-time speech recognition
- AI-powered conversations using GPT-4o
- Text-to-speech responses
- Continuous listening loop after each response
- Generate images completely through voice commands
- Uses OpenAI DALL·E 3
- Automatically saves generated images locally
- Opens generated images instantly after creation
- Minimal dependencies
- Easy
.envconfiguration - Beginner-friendly project structure
- Runs directly from the terminal
VoiceGPT/
│
├── VoiceGPT.py # Main voice assistant
├── VoiceImageGeneration.py # Voice-controlled image generation
├── requirements.txt # Python dependencies
├── .env.example # Environment variable template
├── README.md
│
└── DALLE Image Generation Photos/ # Generated imagesgit clone https://github.com/DagaVedant/VoiceGPT.git
cd VoiceGPTpip install -r requirements.txtpip install pipwin
pipwin install pyaudiobrew install portaudio
pip install pyaudioCreate a .env file in the project root:
OPENAI_API_KEY=your_api_key_here
IMAGE_SAVE_PATH=DALLE Image Generation Photos/You can get an API key from:
https://platform.openai.com/account/api-keys
python VoiceGPT.py- Wait for microphone calibration
- Say:
hello
- Ask any question naturally
- The AI responds using text-to-speech
- The assistant returns to listening mode
python VoiceImageGeneration.py- "A futuristic city at sunset"
- "A cyberpunk samurai standing in neon rain"
- "A realistic lion in the African savanna"
- "An astronaut skateboarding on Mars"
Generated images are automatically:
- Saved locally
- Opened instantly after generation
| Technology | Purpose |
|---|---|
| Python | Core programming language |
| OpenAI GPT-4o | Conversational AI |
| DALL·E 3 | AI image generation |
| SpeechRecognition | Voice recognition |
| PyAudio | Microphone input |
| pyttsx3 | Text-to-speech |
- Custom wake words
- GUI version
- Streaming responses
- Multi-language support
- Voice customization
- Conversation memory
- Faster response handling
- Modern desktop app interface
- A working microphone is required.
- OpenAI API usage may incur costs.
- Generated images are ignored through
.gitignore. - Speech recognition quality depends on microphone quality and background noise.
Pull requests, ideas, and improvements are welcome.
If you find bugs or want to add features, feel free to open an issue.
This project is open source and available under the MIT License.