This project demonstrates how to create a simple AI agent using GPT models and speech recognition. The agent listens to user input via microphone, transcribes the speech to text, and generates a response using an LLM (Large Language Model).
- Voice Input: Prompts the user for information using the microphone.
- Speech Recognition: Uses Google's speech recognition API to transcribe spoken input.
- LLM Integration: Sends transcribed text to a GPT model (via LiteLLM) and returns a generated response.
- Logging: Logs function calls, return values, and exceptions to a rotating log file.
- Environment Configuration: Uses a
.envfile for API keys and log level.
git clone https://github.com/yourusername/agentic-ai.git
cd agentic-aiMake sure you have Homebrew installed.
brew install python@3.11
brew install portaudio
python3.11 -m venv venv
. venv/bin/activate
pip install -r requirements.txtCopy the sample file and adjust it to match the LLM you want to use:
cp .env.example .envKey variables:
LOGLEVEL– default logging level.LLM_NAME– LiteLLM model identifier (e.g.,openai/gpt-4o-mini,gemini/gemini-1.5-pro,ollama/mistral:7b).LLM_PROVIDER– provider hint (openai,gemini,ollama, etc.).LLM_API_KEY– API key or token (useollamafor local Ollama).LLM_API_BASE– override base URL (set tohttp://localhost:11434for Ollama, blank for OpenAI).LLM_TEMPERATURE,LLM_TIMEOUT– default generation settings.OPENAI_API_KEY– optional compatibility key for tooling that still expects it.
Any of these can be overridden at runtime with CLI flags such as --model, --provider, --api-key, --api-base, --temperature, --who, --question, or --stream.
./agent.py --model ollama/mistral:7b --provider ollama --streamIf you already populated .env, flags are optional. Use ./agent.py --help to see the full list of overrides.
venv/bin/python -m unittest discover -s tests
venv/bin/python -m pyrightagent.py— CLI entry point that wires modules together.cli.py— Argument parser definition.settings.py— Environment/CLI configuration loader.llm_client.py— LiteLLM wrapper responsible for routing and streaming output.speech_service.py— Microphone capture and transcription logic.utils.py— Utility functions and decorators (e.g., logging).logging_config.py— Centralized logging configuration.graph.py— Example DFS helper used by the exercises/tests..env— Environment variables (not committed to git).log/— Directory for log files (ignored by git).requirements.txt— Python dependencies.
- Python 3.11+
- PortAudio (for microphone input)
- See
requirements.txtfor Python packages.
- Microphone not working: Ensure your microphone is connected and accessible. You may need to grant microphone permissions.
- No response from LLM: Check your
.envfile for a validOPENAI_API_KEY. - Logs not appearing: Ensure the
log/directory exists and is writable.
- API Keys: Never commit your
.envfile or API keys to version control. - Logs: Log files may contain sensitive information. Handle them appropriately.
Further improvements and features will be added. Contributions are welcome!