feat: add Ollama integration example (closes #118) by jwchmodx · Pull Request #238 · HKUDS/RAG-Anything

jwchmodx · 2026-04-03T03:25:18Z

Problem

Ollama does not expose the /v1/embeddings endpoint that the existing
openai_embed helper targets. Its embedding API is /api/embed and must
be called via the native ollama Python client:

# Ollama-native (works for all models)
response = await ollama.AsyncClient(host=host).embed(model=model, input=texts)

# OpenAI-compat shim — only works if Ollama's /v1/embeddings is enabled,
# which is NOT the default for most models
embeddings = await openai_embed(texts, model=model, base_url=f"{host}/v1")

As reported in #118, this caused silent embedding failures when users
followed the existing example and pointed it at an Ollama host.

Solution

Add examples/ollama_integration_example.py modelled after the existing
lmstudio_integration_example.py. Key differences:

	LM Studio	Ollama
LLM endpoint	`/v1/chat/completions`	`/v1/chat/completions` ✅ same
Embedding endpoint	`/v1/embeddings`	`/api/embed` (native client)
Auth	arbitrary key	`"ollama"` (ignored)

The example includes:

Connection check — lists available models, warns if required ones are missing with the correct ollama pull command
Embedding sanity-check — calls ollama.AsyncClient.embed() and validates the vector dimension against OLLAMA_EMBEDDING_DIM
Chat sanity-check — one-shot prompt via openai_complete_if_cache
RAG init + sample query — inserts a text snippet and runs a hybrid query

Environment variables (all optional, defaults shown):

OLLAMA_HOST=http://localhost:11434
OLLAMA_LLM_MODEL=llama3.2
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_EMBEDDING_DIM=768

Quick start

ollama pull llama3.2
ollama pull nomic-embed-text
pip install ollama raganything
python examples/ollama_integration_example.py

Checklist

No changes to library code — example file only
ruff check --ignore=E402 + ruff format --check pass (E402 is ignored in pre-commit config)
Follows same structure as lmstudio_integration_example.py
No new required dependencies (ollama is already an optional extra)

HKUDS#159) Reasoning models (DeepSeek-R1, Qwen2.5-think, etc.) wrap their chain-of-thought in <think>…</think> blocks before emitting the final answer. When _robust_json_parse fails to extract a valid JSON object from the response, the four modal-processor parse methods (_parse_response, _parse_table_response, _parse_equation_response, _parse_generic_response) were returning the **raw** LLM response as the fallback caption and summary. This caused internal model reasoning to be stored in the knowledge graph instead of the actual content description. Fix: add a static helper `BaseModalProcessor._strip_thinking_tags` that removes <think>/<thinking> blocks (case-insensitive, multiline) and apply it in every fallback branch so only the final-answer text is stored or returned. The helper is tested in tests/test_strip_thinking_tags.py with 13 unit tests covering: tag variants, multiline blocks, multiple blocks, case-insensitivity, and the full fallback path for all four processor classes.

HKUDS#230) On systems where only 'soffice' is on PATH (common on macOS), the existing fallback loop logged a WARNING for the 'libreoffice' candidate before successfully converting via 'soffice'. This caused users to see: WARNING: LibreOffice command 'libreoffice' not found INFO: Successfully converted file.pptx to PDF using soffice …and conclude that something was broken, even though the conversion succeeded. Fix: log FileNotFoundError at DEBUG level for any non-final candidate so that routine 'libreoffice' → 'soffice' fallback stays silent in normal logs. The WARNING is preserved only when the last candidate in the list is not found (meaning no usable LibreOffice binary exists at all and the conversion is about to fail).

Ollama uses a different embedding API (/api/embed via the native ollama Python client) compared to the OpenAI-compatible /v1/embeddings endpoint assumed by the existing openai_embed helper. Pointing that helper at an Ollama host causes embedding failures for most models. Add examples/ollama_integration_example.py that: - Uses openai_complete_if_cache against Ollama's /v1 chat endpoint (works out of the box — Ollama exposes OpenAI-compatible chat) - Calls ollama.AsyncClient.embed() for embeddings so every model in the Ollama registry is supported without extra configuration - Follows the same structure as lmstudio_integration_example.py: connection check, embedding sanity-check, chat sanity-check, RAG init, sample insert + query - Supports OLLAMA_HOST / OLLAMA_LLM_MODEL / OLLAMA_EMBEDDING_MODEL / OLLAMA_EMBEDDING_DIM env vars with sensible defaults (llama3.2 + nomic-embed-text / 768-dim)

jwchmodx added 3 commits April 3, 2026 12:16

LarFii merged commit 8b622b8 into HKUDS:main Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Ollama integration example (closes #118)#238

feat: add Ollama integration example (closes #118)#238
LarFii merged 3 commits intoHKUDS:mainfrom
jwchmodx:feat/ollama-integration-example

jwchmodx commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jwchmodx commented Apr 3, 2026

Problem

Solution

Quick start

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants