This is an AI-powered chatbot that allows users to upload PDF documents and ask questions about the content. The chatbot extracts text from the PDF, creates a vector store using FAISS, and generates responses using OpenAI GPT models.
- Upload PDF files and extract text.
- Store and retrieve text using FAISS vector search.
- Ask questions based on the document content.
- Uses LangChain and OpenAI API for question answering.
- FastAPI backend with a React.js frontend.
- Stylish UI with dark mode and modern design.
- FastAPI (for API endpoints)
- FAISS (for vector storage and similarity search)
- LangChain (for text chunking and embedding generation)
- OpenAI GPT-4 (for answering questions)
- pdfplumber & pytesseract (for text extraction)
- Uvicorn (for running the FastAPI server)
- React.js (for UI)
- Axios (for API calls)
- Styled Components / CSS (for UI styling)
- React Loader Spinner (for better user experience)
AI-PDF-CHATBOT/
│── backend/
│ ├── main.py # FastAPI backend logic
│ ├── pdf_processing.py # PDF text extraction logic
│ ├── vector_store.py # FAISS vector storage and retrieval
│ ├── settings.py # Centralized configuration (env-based)
│ ├── uploads/ # Directory to store uploaded PDFs
│ ├── cache/ # Directory to store cached embeddings
│ ├── .env # API keys and environment variables (not committed)
│ ├── .env.example # Example backend environment variables
│ ├── venv/ # Virtual environment (ignored in Git)
│
│── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ ├── FileUpload.js # Component to upload PDF
│ │ │ ├── Chat.js # Component for asking questions
│ │ │ ├── ErrorMessage.js# Shared error display component
│ │ ├── App.js # Main React app logic
│ │ ├── styles/App.css # Styling for UI
│ ├── public/
│ ├── package.json # Frontend dependencies
│ ├── .env.example # Example frontend environment variables
│
│── README.md # Project documentation
│── requirements.txt # Python dependencies
│── package.json # React dependencies
- Clone the repository:
git clone https://github.com/yourusername/AI-PDF-CHATBOT.git cd AI-PDF-CHATBOT/backend - Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows
- Install dependencies:
pip install -r requirements.txt
- Set up your .env file (based on
.env.example):cp .env.example .env # Then edit .env to include your real values - Run the FastAPI server (development):
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
- Visit http://127.0.0.1:8000/docs to test the API, or http://127.0.0.1:8000/health for a simple health check.
For a production-style run (no auto-reload), you can omit the --reload flag:
uvicorn main:app --host 0.0.0.0 --port 8000- Navigate to the frontend folder:
cd ../frontend - Install dependencies:
npm install
- (Optional) Create a
.envfile from the example to point to a non-local backend:cd src/.. cp .env.example .env # REACT_APP_API_URL defaults to http://localhost:8000 if not set
- Start the React app:
npm start
- Open http://localhost:3000 in your browser.
Backend (backend/.env based on backend/.env.example):
OPENAI_API_KEY– your OpenAI API key (required).OPENAI_MODEL– OpenAI model name to use (default:gpt-4-turbo).ALLOWED_ORIGINS– comma-separated list of allowed frontend origins for CORS (default includeshttp://localhost:3000).UPLOAD_DIR– directory for uploaded PDFs (default:uploads).CACHE_DIR– directory for cached embeddings (default:cache).MAX_PDF_SIZE_MB– maximum allowed PDF size in megabytes (default:20).MAX_PDF_PAGES– maximum number of pages to process (default:200).FAISS_K– number of chunks to retrieve from FAISS (default:5).CONTEXT_MAX_CHARS– maximum number of characters of context sent to the language model (default:8000).
Frontend (frontend/.env based on frontend/.env.example):
REACT_APP_API_URL– base URL of the backend API (default:http://localhost:8000).
- Upload a PDF file using the Upload PDF button.
- Once uploaded, ask a question related to the PDF content.
- The chatbot will retrieve the most relevant context and generate an AI-powered response.
- You can clear chat history and re-upload different PDFs.
- If FastAPI does not start, ensure your virtual environment is activated.
- If you get an OpenAI API key error, set your .env file correctly.
- If dependencies are missing, install them again:
pip install -r requirements.txt
- If React app does not start, ensure all dependencies are installed:
npm install
- If the backend is not reachable, ensure FastAPI is running at http://127.0.0.1:8000.
- If CORS issues occur, make sure CORS is enabled in
main.py.
- Multi-PDF Support: Handle multiple document uploads and queries.
- Better Search Optimization: Improve FAISS vector search accuracy.
- UI Enhancements: Dark mode toggle, better animations.
- Deploy to Cloud: Host on AWS, Vercel, or Heroku.
Developed by Rohan Jain. Contributions are welcome! Feel free to fork and improve the project.