Skip to content

ck-ahmad/EduHacks_AI_Note_Creator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎧 EduHacks AI Note-Generator β€” Backend (Flask + Gemini) & Frontend (HTML,CSS & JS)

An intelligent AI Note-Generator that extracts, summarizes, translates, and speaks back notes from audio β€” powered by Flask, Gemini API, and AI-powered ETL (Extract–Transform–Load) processing.


πŸš€ Overview

EduHacks AI Note-Generator is a Flask-based backend application that automates lecture and meeting note-taking. You simply upload an audio file, and it performs a full AI-powered ETL pipeline:

  1. 🎀 Transcription β€” Converts speech to text.
  2. 🧠 Summarization β€” Generates bullet-point summaries using LexRank.
  3. πŸ—‚ Flashcards β€” Creates intelligent Q&A flashcards.
  4. πŸ”Š Text-to-Speech (TTS) β€” Converts summaries back to audio.
  5. 🌍 Translation β€” Translates summaries using the Gemini API into your preferred language (default: Urdu πŸ‡΅πŸ‡°).

🧩 Tech Stack

Layer Tools / Libraries
Backend Flask, Flask-CORS, Werkzeug
AI & NLP SpeechRecognition, Sumy, NLTK
Text-to-Speech pyttsx3 (offline), gTTS (online fallback)
Translation Google Gemini API
Audio Processing pydub
Data Validation Pydantic
Runtime Python 3.9+

βš™οΈ Installation & Setup

1️⃣ Clone the Repository

git clone https:https://github.com/ck-ahmad/EduHacks_AI_Note_Creator.git
cd eduHacks-ai-note-taker

2️⃣ Create a Virtual Environment

python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Configure Gemini API Key

Create a .env file (or use environment variable):

API_Key_2=YOUR_GEMINI_API_KEY

You can get your API key from:
πŸ”— https://aistudio.google.com/app/apikey

5️⃣ Download NLTK Data (First Run)

python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"

6️⃣ Run the App

python app.py

Server runs on πŸ‘‰ http://localhost:5000


🧠 API Endpoints

βœ… Health Check

GET /health

Response:

{
  "status": "ok",
  "message": "EduHacks AI Note-Taker backend running"
}

🎀 Upload Audio (Full ETL Pipeline)

POST /api/upload

Form Data:

  • file: audio file (.mp3, .wav, .m4a, etc.)
  • lang: optional target language code (default "ur" for Urdu)

Response:

{
  "transcript": "...",
  "bullets": ["...", "..."],
  "flashcards": [
    {"question": "...", "answer": "..."}
  ],
  "translated_summary": "...",
  "files": {
    "transcript_txt": "/outputs/xxxx_transcript.txt",
    "summary_txt": "/outputs/xxxx_summary.txt",
    "flashcards_txt": "/outputs/xxxx_flashcards.txt",
    "translated_summary_txt": "/outputs/xxxx_summary_translated_ur.txt",
    "audio_summary": "/outputs/xxxx_summary.mp3"
  }
}

🎧 Outputs

Files are saved in the /outputs folder:

  • transcript.txt β†’ raw speech-to-text output
  • summary.txt β†’ bullet summary
  • flashcards.txt β†’ generated Q&A pairs
  • summary_translated_ur.txt β†’ translated summary
  • summary.mp3 β†’ AI-generated audio version

You can access them directly via:

GET /outputs/<filename>

πŸ—£ Supported Audio Formats

  • .wav
  • .mp3
  • .m4a
  • .aac
  • .ogg

🌍 Translation with Gemini

The backend uses Gemini 2.5 Flash via the official google-genai SDK.

Example (translator.py):

from google import genai

class GeminiTranslator:
    def __init__(self, api_key, model="gemini-2.5-flash"):
        self.client = genai.Client(api_key=api_key)
        self.model = model

    def translate(self, text, target_language):
        prompt = f"Translate the following text to {target_language}:\n\n{text}\n\nOnly return translation."
        response = self.client.models.generate_content(model=self.model, contents=prompt)
        return response.text.strip()

🧾 Folder Structure

πŸ“¦ eduHacks-ai-note-taker
β”‚
β”œβ”€β”€ app.py
β”œβ”€β”€ translator.py
β”œβ”€β”€ speech_to_text.py
β”œβ”€β”€ summarizer.py
β”œβ”€β”€ flashcards.py
β”œβ”€β”€ tts.py
β”‚
β”œβ”€β”€ uploads/            # Uploaded audio files
β”œβ”€β”€ outputs/            # Generated files (txt, mp3)
β”œβ”€β”€ templates/
β”‚   └── home.html       # Optional frontend
β”œβ”€β”€ static/
β”‚
β”œβ”€β”€ requirements.txt
└── README.md

🧰 Requirements.txt

flask
flask-cors
werkzeug
pydub
SpeechRecognition
sumy
nltk
pyttsx3
gTTS
pydantic==2.8.2
google-genai

πŸ’‘ Future Enhancements

  • πŸ”Ή Multi-language UI (frontend translation toggle)
  • πŸ”Ή Database support (store notes, summaries, and metadata)
  • πŸ”Ή Real-time transcription (WebSocket)
  • πŸ”Ή User authentication & dashboards
  • πŸ”Ή Audio segmentation for longer files
  • πŸ”Ή Front-end React or Next.js integration

πŸ§‘β€πŸ’» Contributors

Name Role
Ahmad Developer & ML Integrator
Aizazullah Assistant in Frontend Development

🏁 License

This project is licensed under the MIT License β€” free for educational and personal use.

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors