This project demonstrates a fully local Retrieval-Augmented Generation (RAG) pipeline implemented in Java 21 with Gradle.
It works without any cloud dependencies — all processing happens locally, including embedding generation.
The system performs:
- Exporting documents (example: Confluence REST API)
- Cleaning HTML → plain text
- Chunking documents
- Generating embeddings locally (via a Python SentenceTransformers server)
- Performing similarity search
- Building an LLM prompt + querying an LLM endpoint
You can find my blog post about this example right here:
- Java 21
- Gradle
pip install flask sentence-transformers
Optional if not already installed:
pip install torch
You also need a local embedding model:
📥 Download model (all-MiniLM-L6-v2):
https://www.kaggle.com/datasets/sircausticmail/all-minilm-l6-v2zip
Extract it into e.g.:
D:/embedding_models/all-MiniLM-L6-v2/
Then update the path in embedding_service.py:
MODEL_PATH = r"D:\embedding_models\all-MiniLM-L6-v2"Inside your Python environment:
python embedding_service.py
You should see:
Loading model from: D:\embedding_models\all-MiniLM-L6-v2
Embedding server running on http://localhost:5005 ...
Your Java code will now POST requests to:
http://localhost:5005/embed
Below is the correct order to run all Java components.
new ConfluenceExporter().exportSpace("<YOUR_CONFLUENCE_TOKEN>");new HtmlCleanerAndChunker(600, 100).process();new ChunkEmbedder().embedAllChunks();./gradlew run
Set these environment variables:
export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
export AZURE_OPENAI_DEPLOYMENT=gpt-4o
export AZURE_OPENAI_API_KEY=your-secret-key
This project gives you a full, local RAG pipeline:
✔ Extract internal documents
✔ Clean them
✔ Chunk them
✔ Embed locally
✔ Perform similarity search
✔ Send a prompt to your preferred LLM
Everything runs fully offline except the final LLM request.