Local RAG Example (Java 21 + Gradle)

This project demonstrates a fully local Retrieval-Augmented Generation (RAG) pipeline implemented in Java 21 with Gradle.
It works without any cloud dependencies — all processing happens locally, including embedding generation.

The system performs:

Exporting documents (example: Confluence REST API)
Cleaning HTML → plain text
Chunking documents
Generating embeddings locally (via a Python SentenceTransformers server)
Performing similarity search
Building an LLM prompt + querying an LLM endpoint

📓 Blog post

You can find my blog post about this example right here:

https://tuhrig.de/local-rag

🚀 Getting Started

1. Install Requirements

Java & Gradle

Java 21
Gradle

Python (for embedding server)

pip install flask sentence-transformers

Optional if not already installed:

pip install torch

You also need a local embedding model:

📥 Download model (all-MiniLM-L6-v2):

https://www.kaggle.com/datasets/sircausticmail/all-minilm-l6-v2zip

Extract it into e.g.:

D:/embedding_models/all-MiniLM-L6-v2/

Then update the path in embedding_service.py:

MODEL_PATH = r"D:\embedding_models\all-MiniLM-L6-v2"

🧠 Start the Embedding Server (Python)

Inside your Python environment:

python embedding_service.py

You should see:

Loading model from: D:\embedding_models\all-MiniLM-L6-v2
Embedding server running on http://localhost:5005 ...

Your Java code will now POST requests to:

http://localhost:5005/embed

📄 Step-by-Step RAG Pipeline

Below is the correct order to run all Java components.

2. Export Confluence Pages

new ConfluenceExporter().exportSpace("<YOUR_CONFLUENCE_TOKEN>");

3. Clean HTML + Chunk Documents

new HtmlCleanerAndChunker(600, 100).process();

4. Generate Embeddings for All Chunks

new ChunkEmbedder().embedAllChunks();

5. Run Semantic Search + Ask the LLM

./gradlew run

🔌 Optional: LLM Endpoint (Azure OpenAI)

Set these environment variables:

export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
export AZURE_OPENAI_DEPLOYMENT=gpt-4o
export AZURE_OPENAI_API_KEY=your-secret-key

🧱 Summary

This project gives you a full, local RAG pipeline:

✔ Extract internal documents
✔ Clean them
✔ Chunk them
✔ Embed locally
✔ Perform similarity search
✔ Send a prompt to your preferred LLM

Everything runs fully offline except the final LLM request.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
gradle/wrapper		gradle/wrapper
src/main/java/de/tuhrig/rag		src/main/java/de/tuhrig/rag
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
embedding_service.py		embedding_service.py
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local RAG Example (Java 21 + Gradle)

📓 Blog post

🚀 Getting Started

1. Install Requirements

Java & Gradle

Python (for embedding server)

🧠 Start the Embedding Server (Python)

📄 Step-by-Step RAG Pipeline

2. Export Confluence Pages

3. Clean HTML + Chunk Documents

4. Generate Embeddings for All Chunks

5. Run Semantic Search + Ask the LLM

🔌 Optional: LLM Endpoint (Azure OpenAI)

🧱 Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local RAG Example (Java 21 + Gradle)

📓 Blog post

🚀 Getting Started

1. Install Requirements

Java & Gradle

Python (for embedding server)

🧠 Start the Embedding Server (Python)

📄 Step-by-Step RAG Pipeline

2. Export Confluence Pages

3. Clean HTML + Chunk Documents

4. Generate Embeddings for All Chunks

5. Run Semantic Search + Ask the LLM

🔌 Optional: LLM Endpoint (Azure OpenAI)

🧱 Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages