Elasticsearch RAG Chat Assistant A complete Retrieval-Augmented Generation (RAG) chat assistant built with Elasticsearch as the vector database. This project demonstrates how to build a production-ready RAG system using Elasticsearch 8.x for semantic search.
Features Vector Search: Semantic document retrieval using dense vector embeddings Hybrid Search: Combines vector search (knn) with BM25 keyword search LLM Integration: OpenAI API support with mock fallback for demos Scalable: Built on Elasticsearch for enterprise-scale deployments Easy Setup: Simple configuration and sample data included Prerequisites Python 3.8+ Elasticsearch 8.x (running on localhost:9200 or cloud instance) (Optional) OpenAI API key for LLM generation Quick Start
- Install Dependencies pip install -r requirements.txt
- Configure Environment .env.example to .env and update if needed:
.env.example .env Edit .env:
Set ELASTICSEARCH_HOST and ELASTICSEARCH_PORT if not using localhost Add OPENAI_API_KEY (optional - will use mock LLM if not provided)
- Setup Elasticsearch Index python elasticsearch_setup.py
- Run the Chat Assistant python chat_assistant.py The script will:
Create the Elasticsearch index Index sample documents Start an interactive chat session Project Structure elasticsearch-rag-chat/ ├── config.py # Configuration settings ├── elasticsearch_setup.py # ES index creation ├── embeddings.py # Sentence Transformers integration ├── document_indexer.py # Document indexing ├── rag_retriever.py # Vector/hybrid search retrieval ├── llm_generator.py # LLM response generation ├── chat_assistant.py # Main chat interface ├── requirements.txt # Python dependencies ├── .env.example # Environment template └── README.md # This file 🔧 Usage Examples Basic Vector Search from elasticsearch_setup import get_elasticsearch_client from rag_retriever import RAGRetriever
es = get_elasticsearch_client() retriever = RAGRetriever(es)
results = retriever.retrieve("What is RAG?", top_k=3) for doc in results: print(f"{doc['title']}: {doc['text']}") Hybrid Search (BM25 + Vector) results = retriever.hybrid_search("What is RAG?", top_k=3) Index Custom Documents from document_indexer import DocumentIndexer
indexer = DocumentIndexer(es) documents = [ { "title": "My Document", "text": "Document content here...", "metadata": {"source": "custom"} } ] indexer.index_documents_batch(documents) Architecture User Query ↓ [Embedding Generation] ↓ [Vector Search in Elasticsearch] ↓ [Retrieve Top-K Documents] ↓ [Build Context] ↓ [LLM Generation] ↓ Response How It Works Document Indexing: Documents are converted to embeddings using Sentence Transformers and stored in Elasticsearch with vector fields Query Processing: User queries are embedded and used for vector similarity search Retrieval: Top-k most relevant documents are retrieved based on cosine similarity Generation: Retrieved context is passed to an LLM to generate accurate, context-aware responses Customization Change Embedding Model Edit config.py:
EMBEDDING_MODEL = "sentence-transformers/all-mpnet-base-v2" # More accurate, slower Adjust Retrieval Parameters TOP_K_RESULTS = 5 # Retrieve more documents Enable Hybrid Search assistant = ChatAssistant(use_hybrid_search=True) Performance Tips Use batch indexing for large document sets Adjust num_candidates in knn search for better recall Tune vector_weight and bm25_weight in hybrid search Use smaller embedding models (all-MiniLM-L6-v2) for faster indexing Troubleshooting Connection Error: Ensure Elasticsearch is running on the configured host/port
curl http://localhost:9200 Index Already Exists: The setup script will delete and recreate the index
Out of Memory: Reduce batch size or use a smaller embedding model
Built With
- database
- docker
- dovnet
- elastichosts-cloud-computing
- elasticsearch
- fastapi
- hybrid
- knn
- llm
- python
- rag
- vector
Log in or sign up for Devpost to join the conversation.