An AI-powered academic advisor for Georgia State University students, providing information about courses, programs, and requirements based on the 2020-21 Academic catalog.
- Interactive Chat Interface: Ask questions about GSU courses, requirements, and programs
- Document-Based Responses: Answers are grounded in official GSU catalog information
- Source Citation: View the source documents used to generate each response
- Real-time Processing: Instant answers using Pinecone vector database
- Customizable System Prompts: Adjust the AI's response style and focus
- Frontend: Streamlit web application
- Backend: LangChain for RAG (Retrieval-Augmented Generation)
- Vector Database: Pinecone for semantic search
- Embeddings: OpenAI text-embedding-3-large model
- LLM: OpenAI GPT-3.5-turbo
- Document Processing: PDF text extraction with pdfplumber
Create a .env file in the root directory with your API keys:
OPENAI_API_KEY=sk-your-openai-api-key-here
PINECONE_API_KEY=your-pinecone-api-key-here
PINECONE_ENVIRONMENT=us-east-1-aws
PINECONE_INDEX_NAME=gsu-aipip install -r requirements.txtRequired packages include:
- streamlit
- langchain
- langchain-openai
- langchain-pinecone
- langchain-community
- pinecone-client
- pdfplumber
- python-dotenv
The system will automatically check and use existing Pinecone vectors. If you need to set up from scratch:
python initialize_knowledge_base.pystreamlit run app/streamlit_app.pyThe application will be available at http://localhost:8501
GSU-AI-Advisor/
├── app/
│ ├── streamlit_app.py # Main Streamlit application
│ ├── pinecone_setup.py # Pinecone database management
│ ├── retriever.py # Document retrieval setup
│ ├── pdf_processor.py # PDF processing and embedding
│ └── batch_processor.py # Batch document processing
├── initialize_knowledge_base.py # Setup script
├── requirements.txt # Python dependencies
├── .env # Environment variables (create this)
└── README.md # This file
- Document Processing: PDFs are downloaded, text extracted, and chunked
- Embedding Generation: Text chunks are converted to vectors using OpenAI
- Vector Storage: Embeddings are stored in Pinecone with metadata
- Query Processing: User questions are embedded and matched against stored vectors
- Response Generation: Retrieved context is used by GPT-3.5 to generate answers
The default system prompt can be modified in the Streamlit interface:
You are an AI Academic Advisor assistant. Use the provided context to answer questions accurately and helpfully. If the answer cannot be found in the context, say so clearly. Provide detailed, well-structured responses based on the available information. All answers must be relevant to Georgia State University. Provide course codes, their prerequisites and co-requisites, and all other necessary information along with the answer for the user to be aware of.
- Embedding Model: text-embedding-3-large (3072 dimensions)
- Chunk Size: 1000 characters with 200 character overlap
- Index Name: gsu-ai (configurable via environment)
-
Connection Errors
- Verify API keys in
.envfile - Check Pinecone index exists and is accessible
- Ensure OpenAI API key has sufficient credits
- Verify API keys in
-
No Knowledge Base Content
- Run
initialize_knowledge_base.pyto set up initial documents - Check Pinecone index has vectors loaded
- Run
-
Slow Responses
- Pinecone queries may have latency
- Check OpenAI API rate limits
Use the PDF processor to add new documents:
from app.pdf_processor import PDFProcessor
processor = PDFProcessor()
success, chunk_count = processor.process_pdf_url(
url="https://example.com/document.pdf",
title="Document Title"
)Modify retrieval parameters in app/retriever.py:
retriever = vector_store.as_retriever(
search_kwargs={"k": 5} # Number of documents to retrieve
)The system is trained on:
- GSU 2020-21 Undergraduate Catalog
- Additional institutional documents (as configured)
All information is accurate as of the 2020-21 academic year. Users should verify current requirements with official GSU sources.
This project is intended for educational purposes. GSU catalog content remains the property of Georgia State University.