VoiceGPT

A voice-controlled AI assistant powered by GPT-4o + DALL·E 3

Speak naturally, generate AI images with your voice, and interact with OpenAI models completely hands-free.

Features

Voice Assistant

Wake-word activation using "hello"
Real-time speech recognition
AI-powered conversations using GPT-4o
Text-to-speech responses
Continuous listening loop after each response

Voice Image Generation

Generate images completely through voice commands
Uses OpenAI DALL·E 3
Automatically saves generated images locally
Opens generated images instantly after creation

Simple Setup

Minimal dependencies
Easy .env configuration
Beginner-friendly project structure
Runs directly from the terminal

Project Structure

VoiceGPT/
│
├── VoiceGPT.py                    # Main voice assistant
├── VoiceImageGeneration.py        # Voice-controlled image generation
├── requirements.txt               # Python dependencies
├── .env.example                   # Environment variable template
├── README.md
│
└── DALLE Image Generation Photos/ # Generated images

Installation

1. Clone the repository

git clone https://github.com/DagaVedant/VoiceGPT.git
cd VoiceGPT

2. Install dependencies

pip install -r requirements.txt

3. Install PyAudio

Windows

pip install pipwin
pipwin install pyaudio

macOS

brew install portaudio
pip install pyaudio

Environment Setup

Create a .env file in the project root:

OPENAI_API_KEY=your_api_key_here
IMAGE_SAVE_PATH=DALLE Image Generation Photos/

You can get an API key from:

https://platform.openai.com/account/api-keys

Running the Voice Assistant

python VoiceGPT.py

Workflow

Wait for microphone calibration
Say:

hello

Ask any question naturally
The AI responds using text-to-speech
The assistant returns to listening mode

Running Voice Image Generation

python VoiceImageGeneration.py

Example prompts

"A futuristic city at sunset"
"A cyberpunk samurai standing in neon rain"
"A realistic lion in the African savanna"
"An astronaut skateboarding on Mars"

Generated images are automatically:

Saved locally
Opened instantly after generation

Technologies Used

Technology	Purpose
Python	Core programming language
OpenAI GPT-4o	Conversational AI
DALL·E 3	AI image generation
SpeechRecognition	Voice recognition
PyAudio	Microphone input
pyttsx3	Text-to-speech

Future Improvements

Custom wake words
GUI version
Streaming responses
Multi-language support
Voice customization
Conversation memory
Faster response handling
Modern desktop app interface

Notes

A working microphone is required.
OpenAI API usage may incur costs.
Generated images are ignored through .gitignore.
Speech recognition quality depends on microphone quality and background noise.

Contributing

Pull requests, ideas, and improvements are welcome.

If you find bugs or want to add features, feel free to open an issue.

License

This project is open source and available under the MIT License.

If you like this project, consider starring the repo

Built with Python, AI, and way too much experimentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceGPT

A voice-controlled AI assistant powered by GPT-4o + DALL·E 3

Features

Voice Assistant

Voice Image Generation

Simple Setup

Project Structure

Installation

1. Clone the repository

2. Install dependencies

3. Install PyAudio

Windows

macOS

Environment Setup

Running the Voice Assistant

Workflow

Running Voice Image Generation

Example prompts

Technologies Used

Future Improvements

Notes

Contributing

License

If you like this project, consider starring the repo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VoiceGPT.py		VoiceGPT.py
VoiceImageGeneration.py		VoiceImageGeneration.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

VoiceGPT

A voice-controlled AI assistant powered by GPT-4o + DALL·E 3

Features

Voice Assistant

Voice Image Generation

Simple Setup

Project Structure

Installation

1. Clone the repository

2. Install dependencies

3. Install PyAudio

Windows

macOS

Environment Setup

Running the Voice Assistant

Workflow

Running Voice Image Generation

Example prompts

Technologies Used

Future Improvements

Notes

Contributing

License

If you like this project, consider starring the repo

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages