AudioInsight

Overview 🎙️

This groundbreaking project revolutionizes the way we interact with audio content. By seamlessly processing audio files and transforming them into searchable, interactive experiences, it opens up a world of possibilities. With the ability to convert podcast MP3 files into text using Whisper and offering question-answering through embeddings stored in Pinecone, this system empowers users to effortlessly engage with and extract valuable insights from podcast content. It's time to experience audio in a whole new way.

Key Features ✨

Audio Chunking: Podcast MP3 files are split into smaller audio chunks for better handling and processing
Speech-to-Text: Using Whisper, each audio chunk is transcribed into text, ensuring high-quality transcription
Q&A System: Users can ask questions based on the podcast transcript. The system uses the transcript's embeddings to return accurate answers
Embeddings with Pinecone: Text chunks are embedded into vector format and stored in Pinecone for quick retrieval during the question-answering process

Project Workflow 🔄

Upload Podcast MP3: The user uploads a podcast MP3 file
Chunking: The MP3 file is split into smaller chunks
Speech-to-Text: Each chunk is transcribed into text using the Whisper model
Embeddings Generation: The transcribed text is split into smaller chunks and converted into embeddings using sentence-transformers or other embedding models
Storing Embeddings: The embeddings are stored in Pinecone, a vector database for fast retrieval
Question Answering: Users can ask questions based on the podcast, and the system retrieves relevant chunks using Pinecone embeddings to provide accurate answers

Project Architecture 🏗️

Podcast MP3 → MP3 Chunks: The MP3 is split into smaller audio chunks
Speech-to-Text Conversion: Whisper converts these audio chunks into text
Embeddings Generation: Text chunks are converted into vector embeddings for storage
Embeddings Storage (Pinecone): Vector embeddings are stored and managed in Pinecone
Interactive Q&A: Users can ask questions and receive answers from the embedded text data

Requirements 📋

Python 3.8 or higher
Virtual environment management tool (venv, conda, etc.)
All dependencies listed in requirements.txt

Installation and Setup 🛠️

1. Clone the repository

git clone <repository-url>
cd <repository-folder>

2. Create and activate virtual environment

Using venv (Python's built-in virtual environment)

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate

Using conda

# Create conda environment
conda create -n venv python=3.12

# Activate conda environment
conda activate venv

3. Install dependencies

# Make sure your virtual environment is activated
pip install -r requirements.txt

4. Run the Streamlit app

streamlit run app.py

5. Usage

Upload a podcast MP3 and interact with the transcription through the Q&A interface.

Tech Stack 🛠️

Groq: For optimizing the audio processing pipeline
Langchain: For creating modular and scalable components for question-answering
Pinecone: A vector database for managing embeddings and facilitating the Q&A process
Streamlit: A lightweight web app framework for creating interactive user interfaces
Whisper: A powerful model for converting speech to text
Pydub: Used for handling and chunking audio files
Sentence-Transformers: For generating embeddings from text chunks
Tiktoken: For tokenizing text

Troubleshooting 🔍

If you encounter any dependency-related issues:

Make sure your virtual environment is activated
Verify Python version compatibility: python --version
Try upgrading pip: pip install --upgrade pip
If using conda, ensure conda-forge channel is added: conda config --add channels conda-forge

License 📄

The source code for the project is licensed under the MIT license, which you can find in the LICENSE.md file.

Contributing 🤝

Fork the project
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.devcontainer		.devcontainer
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
packages.txt		packages.txt
project_screenshot.jpg		project_screenshot.jpg
render.yaml		render.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioInsight

Overview 🎙️

Key Features ✨

Project Workflow 🔄

Project Architecture 🏗️

Requirements 📋

Installation and Setup 🛠️

1. Clone the repository

2. Create and activate virtual environment

Using venv (Python's built-in virtual environment)

Using conda

3. Install dependencies

4. Run the Streamlit app

5. Usage

Tech Stack 🛠️

Troubleshooting 🔍

License 📄

Contributing 🤝

Connect with Us

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AudioInsight

Overview 🎙️

Key Features ✨

Project Workflow 🔄

Project Architecture 🏗️

Requirements 📋

Installation and Setup 🛠️

1. Clone the repository

2. Create and activate virtual environment

Using venv (Python's built-in virtual environment)

Using conda

3. Install dependencies

4. Run the Streamlit app

5. Usage

Tech Stack 🛠️

Troubleshooting 🔍

License 📄

Contributing 🤝

Connect with Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages