Skip to content

feat: Add optional Ollama support for embeddings#100

Closed
niallobrien wants to merge 3 commits into
garrytan:masterfrom
niallobrien:feat/ollama-embeddings
Closed

feat: Add optional Ollama support for embeddings#100
niallobrien wants to merge 3 commits into
garrytan:masterfrom
niallobrien:feat/ollama-embeddings

Conversation

@niallobrien

@niallobrien niallobrien commented Apr 13, 2026

Copy link
Copy Markdown

I wanted Gbrain to optionally allow for local embeddings via Ollama. Work completed by Codex 5.4 and verified working.

Summary

  • use provider-native embedding dimensions by default, including OpenAI native dimensions unless an override is configured
  • select the correct storage and index type for the active embedding size and reconcile schema automatically when provider, model, or dimensions change
  • clear stale embeddings and require re-embedding after dimension-changing migrations instead of allowing mixed or invalid vectors
  • surface embedding provider, model, dimension, and reset status in doctor and health output
  • resolve embedding metadata in one place, including provider, model, effective dimensions, and whether dimensions were explicitly overridden
  • update docs and tests for Ollama nomic-embed-text, OpenAI defaults, and migration flows

Testing

I'm running Hermes and Gbrain within Docker, so all testing was conducted within this context.

  • docker compose exec -T hermes sh -lc 'cd /data/repos/gbrain && bun test test/embedding-ollama.test.ts test/pglite-engine.test.ts test/doctor.test.ts'
  • docker compose exec -T hermes sh -lc 'cd /data/repos/gbrain && HOME=/data/hermes/gbrain-home bun run src/cli.ts doctor --json'

Notes

  • live Hermes-mounted GBrain was migrated to schema v6 and reports ollama/nomic-embed-text at 768d

niallobrien and others added 2 commits April 13, 2026 10:30
Add EmbeddingProvider interface with two implementations:
- OpenAIEmbeddingProvider: text-embedding-3-large, 1536d (default, unchanged)
- OllamaEmbeddingProvider: local embeddings via /api/embed (default: nomic-embed-text, 768d)

Provider selection via:
1. GBRAIN_EMBEDDING_PROVIDER env var ('openai' | 'ollama')
2. embedding_provider in ~/.gbrain/config.json
3. Auto-detect: OpenAI if API key set, else Ollama

New init flags: --ollama, --openai, --embedding-model <name>
Backward-compatible: embed() / embedBatch() still work as before.
Tracks model name per chunk in DB for provenance.

9 files changed, +827/-86 lines
@niallobrien niallobrien changed the title feat: align embedding dimensions with provider defaults feat: Add optional Ollama support for embeddings Apr 13, 2026
@niallobrien niallobrien marked this pull request as ready for review April 13, 2026 14:26
@SanthoshMReddy

Copy link
Copy Markdown

Trialed this PR end-to-end on a 128-page Obsidian-backed vault (macOS, PGLite engine, local Ollama). Confirming the Ollama integration works — gbrain init --ollama --embedding-model nomic-embed-text + gbrain embed --stale cleanly embedded 253 chunks at 768d.

Some observations from real-world use that might be worth folding in:

1. Extend OLLAMA_MODEL_DIMENSIONS with a few more common models

const OLLAMA_MODEL_DIMENSIONS: Record<string, number> = {
  'nomic-embed-text': 768,
  'mxbai-embed-large': 1024,
  'snowflake-arctic-embed': 1024,
  'snowflake-arctic-embed2': 1024,   // top open-source English retrieval, 568M
  'bge-m3': 1024,                    // multilingual, 8k context
  'embeddinggemma': 768,             // Google's recent small-model leader
  'all-minilm': 384,
};

When I ran gbrain init --ollama --embedding-model snowflake-arctic-embed2 on a clean brain, the absence of the entry made it fall through to the || 768 default, which then failed on insert since the model actually returns 1024d. Adding it to the table avoids users needing the embedding_dimensions override for these popular models.

2. Dynamic dimension probe via /api/show

For anything not in the hardcoded table, POST http://localhost:11434/api/show with {"name":"<model>"} returns embedding_length in the response. Example:

"Model":
  architecture        nomic-bert
  parameters          137M
  embedding length    768
  ...

Could probe this at init time if embedding_dimensions isn't in config, eliminating the hardcoded table entirely. Happy to open a separate PR for this if you're interested.

3. halfvec for models >4000d (qwen3-embedding:8b, etc.)

pgvector's default vector column + HNSW index caps at 4000 dimensions. qwen3-embedding:8b (4096d) is a popular, very strong open-source model that exceeds this. Your codebase already has EmbeddingStorageType = 'vector' | 'halfvec' — exposing a --halfvec init flag (or auto-selecting halfvec when embedding_dimensions > 4000) would unlock this.

Encountered this when trying qwen3-embedding:8b for a vault trial. Settled on snowflake-arctic-embed2 which fits comfortably in the vector+HNSW default.


Not blocking this PR — Ollama integration as designed is already a huge win. Just notes from actually using it. Thanks for the work!

@niallobrien

Copy link
Copy Markdown
Author

Trialed this PR end-to-end on a 128-page Obsidian-backed vault (macOS, PGLite engine, local Ollama). Confirming the Ollama integration works — gbrain init --ollama --embedding-model nomic-embed-text + gbrain embed --stale cleanly embedded 253 chunks at 768d.

Some observations from real-world use that might be worth folding in:

1. Extend OLLAMA_MODEL_DIMENSIONS with a few more common models

const OLLAMA_MODEL_DIMENSIONS: Record<string, number> = {
  'nomic-embed-text': 768,
  'mxbai-embed-large': 1024,
  'snowflake-arctic-embed': 1024,
  'snowflake-arctic-embed2': 1024,   // top open-source English retrieval, 568M
  'bge-m3': 1024,                    // multilingual, 8k context
  'embeddinggemma': 768,             // Google's recent small-model leader
  'all-minilm': 384,
};

When I ran gbrain init --ollama --embedding-model snowflake-arctic-embed2 on a clean brain, the absence of the entry made it fall through to the || 768 default, which then failed on insert since the model actually returns 1024d. Adding it to the table avoids users needing the embedding_dimensions override for these popular models.

2. Dynamic dimension probe via /api/show

For anything not in the hardcoded table, POST http://localhost:11434/api/show with {"name":"<model>"} returns embedding_length in the response. Example:

"Model":
  architecture        nomic-bert
  parameters          137M
  embedding length    768
  ...

Could probe this at init time if embedding_dimensions isn't in config, eliminating the hardcoded table entirely. Happy to open a separate PR for this if you're interested.

3. halfvec for models >4000d (qwen3-embedding:8b, etc.)

pgvector's default vector column + HNSW index caps at 4000 dimensions. qwen3-embedding:8b (4096d) is a popular, very strong open-source model that exceeds this. Your codebase already has EmbeddingStorageType = 'vector' | 'halfvec' — exposing a --halfvec init flag (or auto-selecting halfvec when embedding_dimensions > 4000) would unlock this.

Encountered this when trying qwen3-embedding:8b for a vault trial. Settled on snowflake-arctic-embed2 which fits comfortably in the vector+HNSW default.

Not blocking this PR — Ollama integration as designed is already a huge win. Just notes from actually using it. Thanks for the work!

Awesome stuff, many thanks for the extensive review. I'll address the issues you highlighted later today and update this PR when ready.

@niallobrien

Copy link
Copy Markdown
Author

2. Dynamic dimension probe via /api/show

For anything not in the hardcoded table, POST http://localhost:11434/api/show with {"name":"<model>"} returns embedding_length in the response. Example:

"Model":
  architecture        nomic-bert
  parameters          137M
  embedding length    768
  ...

Could probe this at init time if embedding_dimensions isn't in config, eliminating the hardcoded table entirely. Happy to open a separate PR for this if you're interested.

Nice idea. I agree probing POST /api/show for embedding_length is a better long-term fix than expanding the hardcoded table. I’d prefer to land that as a separate PR so this one stays scoped to fixing the immediate mismatch for common models, but I’d definitely be interested in the follow-up.

@niallobrien

niallobrien commented Apr 14, 2026

Copy link
Copy Markdown
Author

3. halfvec for models >4000d (qwen3-embedding:8b, etc.)

pgvector's default vector column + HNSW index caps at 4000 dimensions. qwen3-embedding:8b (4096d) is a popular, very strong open-source model that exceeds this. Your codebase already has EmbeddingStorageType = 'vector' | 'halfvec' — exposing a --halfvec init flag (or auto-selecting halfvec when embedding_dimensions > 4000) would unlock this.

Encountered this when trying qwen3-embedding:8b for a vault trial. Settled on snowflake-arctic-embed2 which fits comfortably in the vector+HNSW default.

Not blocking this PR — Ollama integration as designed is already a huge win. Just notes from actually using it. Thanks for the work!

I think this is worth a follow-up, but the main missing piece may be dimension detection rather than halfvec support itself. The code already switches to halfvec for larger embeddings; the problem is that Ollama models not in the known-dimensions path can fall back to the wrong size at init. If we probe /api/show and get the real embedding_length, models like qwen3-embedding:8b should be able to use halfvec automatically. This could be handled in the separate PR we discussed above, wdyt?

@garrytan

garrytan commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Thanks for this contribution — and apologies for the slow triage. We did a full pass over the entire PR backlog. gbrain has moved fast, and the maintainer's larger "cathedral" rewrites have superseded a big share of community PRs: the AI gateway + recipes + user_provided_models system replaced almost all individual provider PRs; #1805 fixed the whole Postgres module-singleton class; #1542 unified the type taxonomy; #1657 the retrieval path; #1802 the doctor; and so on.

We're closing this one in that cleanup — either the fix already landed on master, it duplicates another PR or merged change, or it's outside the current merge bar. Where a closed PR carried a genuinely valuable idea, we've recorded it in docs/designs/COMMUNITY_IDEAS.md so nothing good is lost (a few may graduate into TODOs).

Please don't read the close as a judgment of the work — thank you for contributing. If you believe the underlying issue is still live on the latest master, reopen with a quick note and we'll take another look. 🙏

@garrytan garrytan closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants