Inspiration
It's difficult to traverse through and search for information on websites, inspired to build a chatbot which helps with question answers.
What it does
User enters their queries to the chatbot, the bot queries the model's memory and displays the answers to the user.
How we built it
Data Collection and Preprocessing We began by collecting relevant information from the University of Texas at Arlington (UTA) website. This involved: • Web scraping using BeautifulSoup to extract structured text data from various web pages. • Performing text cleaning and formatting to prepare the data for embedding using HuggingFace and LangChain libraries. • Removing unnecessary HTML tags, scripts, and duplicate content to ensure relevance and accuracy.
Document Ingestion and Embedding Once the raw text was cleaned: • We used LangChain and Hugging Face Transformers to load the textual data and split it into manageable chunks. • These chunks were transformed into high-dimensional embeddings using a pre-trained sentence transformer model. • The resulting vectors were stored in a FAISS vector store, enabling fast and accurate similarity searches. LLM Integration and Prompt Engineering We used Gemini, a state-of-the-art large language model (LLM), to generate responses based on retrieved knowledge. Our pipeline follows a Retrieval-Augmented Generation (RAG) approach: • User Query → Embedding → Similarity Search in FAISS • Retrieved document chunks were combined into a structured prompt, which was passed to the Gemini model. • The prompt format ensured context retention, enabling Gemini to generate domain-specific, coherent responses.
Frontend Development The chatbot’s interface was built using React: • A simple and intuitive chat UI was created using state management hooks. • User messages were captured, displayed, and passed to the backend. • The LLM's response was then displayed in the UI with real-time rendering and scroll handling using useRef.
Backend Integration The backend handled: • Receiving user input and converting it to embeddings. • Querying FAISS for the top-k most relevant document chunks. • Constructing a prompt with retrieved context. • Sending the prompt to Gemini and returning the model’s response to the frontend.
Built With
- beautiful-soup
- faiss
- gemini
- huggingface
- langchain
- react
Log in or sign up for Devpost to join the conversation.