An intelligent AI Note-Generator that extracts, summarizes, translates, and speaks back notes from audio β powered by Flask, Gemini API, and AI-powered ETL (ExtractβTransformβLoad) processing.
EduHacks AI Note-Generator is a Flask-based backend application that automates lecture and meeting note-taking. You simply upload an audio file, and it performs a full AI-powered ETL pipeline:
- π€ Transcription β Converts speech to text.
- π§ Summarization β Generates bullet-point summaries using LexRank.
- π Flashcards β Creates intelligent Q&A flashcards.
- π Text-to-Speech (TTS) β Converts summaries back to audio.
- π Translation β Translates summaries using the Gemini API into your preferred language (default: Urdu π΅π°).
| Layer | Tools / Libraries |
|---|---|
| Backend | Flask, Flask-CORS, Werkzeug |
| AI & NLP | SpeechRecognition, Sumy, NLTK |
| Text-to-Speech | pyttsx3 (offline), gTTS (online fallback) |
| Translation | Google Gemini API |
| Audio Processing | pydub |
| Data Validation | Pydantic |
| Runtime | Python 3.9+ |
git clone https:https://github.com/ck-ahmad/EduHacks_AI_Note_Creator.git
cd eduHacks-ai-note-takerpython -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activatepip install -r requirements.txtCreate a .env file (or use environment variable):
API_Key_2=YOUR_GEMINI_API_KEY
You can get your API key from:
π https://aistudio.google.com/app/apikey
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"python app.pyServer runs on π http://localhost:5000
GET /health
Response:
{
"status": "ok",
"message": "EduHacks AI Note-Taker backend running"
}POST /api/upload
Form Data:
file: audio file (.mp3,.wav,.m4a, etc.)lang: optional target language code (default"ur"for Urdu)
Response:
{
"transcript": "...",
"bullets": ["...", "..."],
"flashcards": [
{"question": "...", "answer": "..."}
],
"translated_summary": "...",
"files": {
"transcript_txt": "/outputs/xxxx_transcript.txt",
"summary_txt": "/outputs/xxxx_summary.txt",
"flashcards_txt": "/outputs/xxxx_flashcards.txt",
"translated_summary_txt": "/outputs/xxxx_summary_translated_ur.txt",
"audio_summary": "/outputs/xxxx_summary.mp3"
}
}Files are saved in the /outputs folder:
transcript.txtβ raw speech-to-text outputsummary.txtβ bullet summaryflashcards.txtβ generated Q&A pairssummary_translated_ur.txtβ translated summarysummary.mp3β AI-generated audio version
You can access them directly via:
GET /outputs/<filename>
.wav.mp3.m4a.aac.ogg
The backend uses Gemini 2.5 Flash via the official google-genai SDK.
Example (translator.py):
from google import genai
class GeminiTranslator:
def __init__(self, api_key, model="gemini-2.5-flash"):
self.client = genai.Client(api_key=api_key)
self.model = model
def translate(self, text, target_language):
prompt = f"Translate the following text to {target_language}:\n\n{text}\n\nOnly return translation."
response = self.client.models.generate_content(model=self.model, contents=prompt)
return response.text.strip()π¦ eduHacks-ai-note-taker
β
βββ app.py
βββ translator.py
βββ speech_to_text.py
βββ summarizer.py
βββ flashcards.py
βββ tts.py
β
βββ uploads/ # Uploaded audio files
βββ outputs/ # Generated files (txt, mp3)
βββ templates/
β βββ home.html # Optional frontend
βββ static/
β
βββ requirements.txt
βββ README.md
flask
flask-cors
werkzeug
pydub
SpeechRecognition
sumy
nltk
pyttsx3
gTTS
pydantic==2.8.2
google-genai- πΉ Multi-language UI (frontend translation toggle)
- πΉ Database support (store notes, summaries, and metadata)
- πΉ Real-time transcription (WebSocket)
- πΉ User authentication & dashboards
- πΉ Audio segmentation for longer files
- πΉ Front-end React or Next.js integration
| Name | Role |
|---|---|
| Ahmad | Developer & ML Integrator |
| Aizazullah | Assistant in Frontend Development |
This project is licensed under the MIT License β free for educational and personal use.