Skip to content

c4geeks/cfg-rag-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

cfg-rag-demo

Self-hosted RAG pipeline with Ollama and pgvector. No API keys, no telemetry, no SaaS dependencies. Companion code for the Self-Hosted RAG Pipeline with Ollama and pgvector guide on computingforgeeks.com.

What it does

Ingests a folder of Markdown or text files into PostgreSQL + pgvector, indexes them with HNSW (cosine) for vector retrieval and GIN (tsvector) for full-text retrieval, then answers questions using a hybrid Reciprocal Rank Fusion query and a local Ollama chat model. Source citations included.

The whole thing is one Python file (~290 lines).

Stack

Component Purpose Why
PostgreSQL 17 + pgvector 0.8 Vector + full-text store ACID, backups, joins, one ops story
Ollama 0.23+ Embedding + chat models One service, open weights, no API
nomic-embed-text (768 dim) Embedding model Fast, accurate, small (274 MB)
llama3.1:8b Chat model Fits 6 GB VRAM at Q4_K_M
Python 3.10+ Glue psycopg, pgvector, ollama

Quick start

Prerequisites: PostgreSQL 17 with pgvector, Ollama running. See the guide for install instructions on Rocky Linux and Ubuntu.

# Database role + DB
sudo -u postgres psql <<'SQL'
CREATE ROLE rag LOGIN;
CREATE DATABASE rag OWNER rag;
\c rag
CREATE EXTENSION vector;
SQL

# Models
ollama pull nomic-embed-text
ollama pull llama3.1:8b

# Python
python3 -m venv .venv
. .venv/bin/activate
pip install "psycopg[binary]>=3.2" "pgvector>=0.3.6" "ollama>=0.4"

# Initialise schema and run
export RAG_DSN="postgresql://rag@/rag?host=/var/run/postgresql"
python rag.py init
python rag.py ingest path/to/docs/
python rag.py query "What does the documentation say about X?"
python rag.py stats

Commands

Command What it does
python rag.py init Create schema (documents, chunks) and HNSW + GIN indexes
python rag.py ingest <dir> Recursively ingest *.md and *.txt from <dir> (idempotent via sha256)
python rag.py query "..." Hybrid retrieve top-5, generate answer with citations
python rag.py stats Document count, chunk count, table and index sizes

Configuration

All via environment variables:

Variable Default Purpose
RAG_DSN postgresql:///rag PostgreSQL connection string
RAG_EMBED nomic-embed-text Embedding model name
RAG_EMBED_DIM 768 Embedding dimension (must match the model)
RAG_CHAT llama3.1:8b Chat model name
RAG_CHUNK_TOKENS 800 Target tokens per chunk
RAG_OVERLAP 100 Overlap tokens between adjacent chunks
RAG_TOP_K 5 How many chunks to retrieve per query
OLLAMA_HOST http://127.0.0.1:11434 Ollama API URL

Why hybrid retrieval

Pure cosine retrieval fails on technical documentation. Ask the demo corpus "How do I install MetalLB on Kubernetes?" using only embedding <=> query and the top 5 are all unrelated install guides — the embedding model matches the "install on platform" pattern more strongly than the rare token "metallb".

The RRF fusion of cosine top-k + tsvector top-k surfaces the right document in 50 ms. See the full explanation in the article.

License

MIT. See LICENSE.

See also

About

Companion repo for the Self-Hosted RAG with Ollama + pgvector guide on computingforgeeks.com. Hybrid retrieval (HNSW + tsvector RRF), no API keys.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages