| description | RAG pipeline: document loading, chunking, embedding, hybrid search, and reranking | ||
|---|---|---|---|
| tags |
|
Import: from selectools.rag import RAGAgent, DocumentLoader, VectorStore, TextSplitter
Stability: stable
Since: v0.14.0
from selectools import Agent, AgentConfig, tool
from selectools.providers.stubs import LocalProvider
from selectools.rag import DocumentLoader, TextSplitter
# Load and chunk documents
docs = DocumentLoader.from_text(
"Selectools supports OpenAI, Anthropic, Gemini, and Ollama providers. "
"It provides RAG, tool calling, guardrails, and multi-agent orchestration.",
metadata={"source": "overview.txt"},
)
splitter = TextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
print(f"Loaded {len(chunks)} chunks from {len(docs)} documents")
# In production, embed chunks into a VectorStore and use RAGAgent:
# store = VectorStore.create("memory", embedder=embedder)
# store.add_documents(chunks)
# agent = RAGAgent.from_documents(docs, provider=provider, vector_store=store)graph LR
D[Documents] --> C[Chunker]
C --> E[Embedder]
E --> V[Vector Store]
Q[Query] --> H[Hybrid Search]
V --> H
H --> RR[Reranker]
RR --> A[Agent]
!!! tip "See Also" - Embeddings -- OpenAI, Anthropic, Gemini, Cohere embedding providers - Vector Stores -- 7 backends: Memory, SQLite, Chroma, Pinecone, FAISS, Qdrant, pgvector - Advanced Chunking -- semantic and contextual chunking - Hybrid Search -- BM25 + vector fusion with reranking
Directory: src/selectools/rag/
Files: __init__.py, vector_store.py, loaders.py, chunking.py, tools.py
- Overview
- RAG Pipeline
- Document Loading
- Text Chunking
- Vector Storage
- RAG Tools
- RAGAgent High-Level API
- Cost Tracking
The RAG (Retrieval-Augmented Generation) system enables agents to answer questions about your documents by:
- Loading documents from various sources
- Chunking them into manageable pieces
- Generating vector embeddings
- Storing in a vector database
- Retrieving relevant chunks during queries
- Providing context to the LLM
DocumentLoader → TextSplitter → EmbeddingProvider → VectorStore → RAGTool → Agent
graph TD
A["Stage 1: Ingestion\nDocumentLoader\nfrom_file / from_directory / from_pdf"] --> B["Stage 2: Chunking\nTextSplitter / RecursiveTextSplitter\nchunk_size, chunk_overlap"]
B --> C["Stage 3: Embedding\nEmbeddingProvider\nOpenAI / Anthropic / Gemini"]
C --> D["Stage 4: Storage\nVectorStore\nMemory / SQLite / Chroma"]
Q["User Question"] --> E["Stage 5: Query & Retrieval\nembed_query() + VectorStore.search()\ncosine similarity, top_k"]
D --> E
E --> F["Stage 6: Generation\nRAGTool formats results with sources\nLLM generates answer with citations"]
from selectools.rag import DocumentLoader
# From text
docs = DocumentLoader.from_text("Hello world", metadata={"source": "memory"})
# From file
docs = DocumentLoader.from_file("document.txt")
# From directory
docs = DocumentLoader.from_directory(
directory="./docs",
glob_pattern="**/*.md",
recursive=True
)
# From PDF
docs = DocumentLoader.from_pdf("manual.pdf")from selectools.rag import DocumentLoader
# One document per row; text_column selects the content field
docs = DocumentLoader.from_csv(
"data.csv",
text_column="content",
metadata_columns=["author", "category"],
delimiter=",",
)
# When text_column is None, all columns are concatenated as "key: value" pairs# Handles JSON arrays (one Document per item) or a single object
docs = DocumentLoader.from_json(
"articles.json",
text_field="body", # Key whose value becomes the text
metadata_fields=["title", "author"], # Keys for metadata (None = all)
)# Full page
docs = DocumentLoader.from_html("page.html")
# With CSS selector (requires beautifulsoup4)
docs = DocumentLoader.from_html("page.html", selector="article")# Fetch a web page and extract text content
docs = DocumentLoader.from_url(
"https://example.com/article",
selector="main", # Optional CSS selector (requires beautifulsoup4)
timeout=30.0,
)@dataclass
class Document:
text: str # Document content
metadata: Dict[str, Any] # Source, page, etc.
embedding: Optional[List[float]] = None # Pre-computed embeddingAutomatically added:
source: File pathfilename: File name onlypage: Page number (PDFs)total_pages: Total pages (PDFs)
Large documents must be split because:
- Embedding models have token limits
- Retrieving entire documents is inefficient
- Smaller chunks improve retrieval precision
from selectools.rag import TextSplitter
splitter = TextSplitter(
chunk_size=1000, # Max characters per chunk
chunk_overlap=200, # Overlap for context continuity
separator="\n\n" # Prefer splitting on paragraphs
)
chunks = splitter.split_text(long_text)
chunked_docs = splitter.split_documents(documents)More intelligent splitting that respects natural boundaries:
from selectools.rag import RecursiveTextSplitter
splitter = RecursiveTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " ", ""] # Try in order
)
# Tries to split on:
# 1. Double newlines (paragraphs) - preferred
# 2. Single newlines (lines)
# 3. Sentences (". ")
# 4. Words (" ")
# 5. Characters - last resort{
"source": "docs/guide.md",
"filename": "guide.md",
"chunk": 0, # Chunk index
"total_chunks": 5 # Total chunks from this doc
}For semantic (topic-boundary) splitting and LLM-context enrichment, see Advanced Chunking.
from selectools.rag import VectorStore
from selectools.embeddings import OpenAIEmbeddingProvider
embedder = OpenAIEmbeddingProvider()
# In-memory (fast, not persistent)
store = VectorStore.create("memory", embedder=embedder)
# SQLite (persistent, local)
store = VectorStore.create("sqlite", embedder=embedder, db_path="docs.db")
# Chroma (advanced features)
store = VectorStore.create("chroma", embedder=embedder, persist_directory="./chroma")
# Pinecone (cloud-hosted, scalable)
store = VectorStore.create("pinecone", embedder=embedder, index_name="my-index")Stability: beta
Fast local similarity search using Facebook AI Similarity Search. Uses IndexFlatIP with L2-normalized vectors for exact cosine similarity. Thread-safe with persistence support.
pip install faiss-cpu numpyfrom selectools.embeddings import OpenAIEmbeddingProvider
from selectools.rag.stores import FAISSVectorStore
embedder = OpenAIEmbeddingProvider()
store = FAISSVectorStore(embedder, dimension=1536)
# Add documents
docs = [Document(text="Hello world", metadata={"source": "test"})]
ids = store.add_documents(docs)
# Search
query_emb = embedder.embed_query("hi")
results = store.search(query_emb, top_k=3)
# Persist to disk and reload
store.save("/tmp/my_index")
loaded = FAISSVectorStore.load("/tmp/my_index", embedder)Stability: beta
Production vector search with gRPC transport, automatic collection management, and advanced metadata filtering. Supports both self-hosted and Qdrant Cloud.
pip install qdrant-clientfrom selectools.embeddings import OpenAIEmbeddingProvider
from selectools.rag.stores import QdrantVectorStore
embedder = OpenAIEmbeddingProvider()
store = QdrantVectorStore(
embedder,
collection_name="my_docs",
url="http://localhost:6333", # Qdrant server URL
api_key="...", # Optional: for Qdrant Cloud
prefer_grpc=True, # Default: use gRPC transport
)
# Collection is auto-created on first add_documents() call
docs = [Document(text="Hello world", metadata={"source": "test"})]
ids = store.add_documents(docs)
# Search with metadata filtering
query_emb = embedder.embed_query("hi")
results = store.search(query_emb, top_k=3, filter={"source": "test"})Stability: beta
PostgreSQL-native vector search using the pgvector extension with HNSW indexing for fast approximate nearest-neighbour queries. Automatic table and index creation, JSONB metadata, and parameterized queries throughout.
pip install 'selectools[postgres]'
# or: pip install psycopg2-binaryRequires a PostgreSQL server with the vector extension installed.
from selectools.embeddings import OpenAIEmbeddingProvider
from selectools.rag.stores.pgvector import PgVectorStore
embedder = OpenAIEmbeddingProvider()
store = PgVectorStore(
embedder=embedder,
connection_string="postgresql://user:pass@localhost:5432/mydb",
table_name="selectools_documents", # Optional custom table name
)
# Table and HNSW index are created automatically on first use
docs = [Document(text="Hello world", metadata={"source": "test"})]
ids = store.add_documents(docs)
# Search with JSONB metadata filtering
query_emb = embedder.embed_query("hi")
results = store.search(query_emb, top_k=3, filter={"source": "test"})| Backend | Install | Persistent | Scalable | Metadata Filter | Best For |
|---|---|---|---|---|---|
| Memory | built-in | No | No | Yes | Prototyping, tests |
| SQLite | built-in | Yes | No | Yes | Local apps |
| Chroma | chromadb |
Yes | No | Yes | Local + advanced |
| Pinecone | pinecone-client |
Yes | Yes | Yes | Cloud scale |
| FAISS | faiss-cpu |
Yes (save/load) | No | Yes | Fast local search |
| Qdrant | qdrant-client |
Yes | Yes | Yes (advanced) | Production self-hosted/cloud |
| pgvector | psycopg2-binary |
Yes | Yes | Yes (JSONB) | PostgreSQL-native apps |
class VectorStore(ABC):
@abstractmethod
def add_documents(
self,
documents: List[Document],
embeddings: Optional[List[List[float]]] = None
) -> List[str]:
"""Add documents, return IDs."""
pass
@abstractmethod
def search(
self,
query_embedding: List[float],
top_k: int = 5,
filter: Optional[Dict[str, Any]] = None
) -> List[SearchResult]:
"""Search for similar documents."""
pass
@abstractmethod
def delete(self, ids: List[str]) -> None:
"""Delete documents by ID."""
pass
@abstractmethod
def clear(self) -> None:
"""Clear all documents."""
pass# Add documents
ids = store.add_documents(chunked_docs)
# Embeddings are generated automatically
# Search
query_embedding = embedder.embed_query("What are the features?")
results = store.search(query_embedding, top_k=3)
for result in results:
print(f"Score: {result.score}")
print(f"Text: {result.document.text}")
print(f"Source: {result.document.metadata['source']}")Pre-built tool for knowledge base search:
from selectools.rag import RAGTool
rag_tool = RAGTool(
vector_store=store,
top_k=3, # Retrieve top 3 chunks
score_threshold=0.5, # Minimum similarity
include_scores=True # Show relevance scores
)
# Use with agent
from selectools import Agent
agent = Agent(
tools=[rag_tool.search_knowledge_base],
provider=provider
)
response = agent.run([
Message(role=Role.USER, content="What are the installation steps?")
])[Source 1: README.md, Relevance: 0.89]
Installation is simple:
1. pip install selectools
2. Set OPENAI_API_KEY
3. Create an agent
[Source 2: docs/quickstart.md (page 1), Relevance: 0.82]
Quick start guide:
First, install the package...
[Source 3: docs/setup.md, Relevance: 0.75]
Setup instructions for production...
The LLM uses this context to generate an accurate answer.
from selectools.rag import RAGAgent
# 1. From documents
docs = DocumentLoader.from_file("doc.txt")
agent = RAGAgent.from_documents(
documents=docs,
provider=OpenAIProvider(),
vector_store=store
)
# 2. From directory (most common)
agent = RAGAgent.from_directory(
directory="./docs",
provider=OpenAIProvider(),
vector_store=store,
glob_pattern="**/*.md",
chunk_size=1000,
top_k=3
)
# 3. From specific files
agent = RAGAgent.from_files(
file_paths=["doc1.txt", "doc2.pdf"],
provider=OpenAIProvider(),
vector_store=store
)RAGAgent automatically:
- Loads documents
- Chunks them
- Generates embeddings
- Stores in vector database
- Creates RAGTool
- Returns configured Agent
# Ask questions
response = agent.run("What are the main features?")
print(response.content)
# Check costs (includes embeddings)
print(agent.get_usage_summary())
# Continue conversation
response = agent.run("Tell me more about feature X")RAG operations incur two types of costs:
- Embedding Costs: Generating vectors from text
- LLM Costs: Generating responses
agent = RAGAgent.from_directory("./docs", provider, store)
response = agent.run("What are the features?")
print(agent.usage)============================================================
📊 Usage Summary
============================================================
Total Tokens: 5,432
- Prompt: 3,210
- Completion: 1,200
- Embeddings: 1,022
Total Cost: $0.002150
- LLM: $0.002000
- Embeddings: $0.000150
============================================================
# Embedding cost (one-time, during indexing)
embedding_cost = (num_chunks * avg_chunk_tokens / 1M) * embedding_model_cost
# Per-query cost
query_cost = (
(query_tokens / 1M) * embedding_model_cost + # Query embedding
(prompt_tokens / 1M) * llm_prompt_cost + # LLM prompt
(completion_tokens / 1M) * llm_completion_cost # LLM completion
)# Short, focused documents
chunk_size=500
# Standard documents
chunk_size=1000
# Technical documentation
chunk_size=1500# Recommended overlap: 10-20% of chunk_size
splitter = TextSplitter(
chunk_size=1000,
chunk_overlap=200 # 20%
)# Simple queries
top_k=1
# Standard queries
top_k=3
# Complex queries
top_k=5rag_tool = RAGTool(
vector_store=store,
top_k=3,
score_threshold=0.7 # Filter low-relevance results
)# Prototyping
store = VectorStore.create("memory", embedder)
# Production (local, fast)
store = FAISSVectorStore(embedder, dimension=1536)
# Production (local, SQL)
store = VectorStore.create("sqlite", embedder, db_path="prod.db")
# Production (PostgreSQL-native)
store = PgVectorStore(embedder, connection_string="postgresql://...")
# Production (managed, scalable)
store = QdrantVectorStore(embedder, url="http://qdrant:6333")
store = VectorStore.create("pinecone", embedder, index_name="prod")from selectools.embeddings import GeminiEmbeddingProvider
# Gemini embeddings are FREE
embedder = GeminiEmbeddingProvider()
store = VectorStore.create("sqlite", embedder=embedder)from selectools import OpenAIProvider, Message, Role
from selectools.embeddings import OpenAIEmbeddingProvider
from selectools.rag import RAGAgent, VectorStore
from selectools.models import OpenAI
# 1. Set up embedding provider
embedder = OpenAIEmbeddingProvider(
model=OpenAI.Embeddings.TEXT_EMBEDDING_3_SMALL.id
)
# 2. Create vector store
store = VectorStore.create("sqlite", embedder=embedder, db_path="knowledge.db")
# 3. Create RAG agent from documents
agent = RAGAgent.from_directory(
directory="./docs",
glob_pattern="**/*.md",
provider=OpenAIProvider(default_model=OpenAI.GPT_4O_MINI.id),
vector_store=store,
chunk_size=1000,
chunk_overlap=200,
top_k=3,
score_threshold=0.5
)
# 4. Ask questions
questions = [
"What are the installation steps?",
"How do I create an agent?",
"What providers are supported?"
]
for question in questions:
print(f"\nQ: {question}")
response = agent.run([Message(role=Role.USER, content=question)])
print(f"A: {response.content}\n")
# 5. Check costs
print("=" * 60)
print(agent.get_usage_summary())# Issue: score_threshold too high
rag_tool = RAGTool(score_threshold=0.9) # Too strict
# Fix: Lower threshold
rag_tool = RAGTool(score_threshold=0.5)# Issue: chunk_size too large
splitter = TextSplitter(chunk_size=5000) # Too big
# Fix: Smaller chunks
splitter = TextSplitter(chunk_size=1000)# Issue: Expensive embedding model
embedder = OpenAIEmbeddingProvider(model="text-embedding-3-large")
# Fix: Use cheaper or free model
embedder = GeminiEmbeddingProvider() # FREE| # | Script | Description |
|---|---|---|
| 14 | 14_rag_basic.py |
Basic RAG pipeline with document loading |
| 15 | 15_semantic_search.py |
Semantic search over embedded documents |
| 16 | 16_rag_advanced.py |
Advanced RAG with chunking and score thresholds |
| 18 | 18_hybrid_search.py |
BM25 + vector hybrid search with reranking |
| 19 | 19_advanced_chunking.py |
Semantic and contextual chunking strategies |
- Advanced Chunking - SemanticChunker and ContextualChunker
- Embeddings Module - Embedding providers
- Vector Stores Module - Storage implementations
- Usage Module - Cost tracking
Next Steps: Understand embedding providers in the Embeddings Module.