feat: add configurable embedding providers by tonyxu-io · Pull Request #450 · garrytan/gbrain

tonyxu-io · 2026-04-26T14:47:37Z

Summary

Add a generic embedding provider layer with openai, openai-compatible, and copilot providers
Configure OpenAI-compatible providers with GBRAIN_EMBEDDING_BASE_URL, GBRAIN_EMBEDDING_API_KEY, GBRAIN_EMBEDDING_MODEL, and GBRAIN_EMBEDDING_DIMENSIONS
Keep the Copilot Blackbird shortcut as one provider using the same provider config path
Store embedding model metadata with dimensions on import/embed refresh paths
Add focused tests for generic provider request formatting, Copilot request formatting, and existing embed behavior

Example

GBRAIN_EMBEDDING_PROVIDER=openai-compatible \
GBRAIN_EMBEDDING_BASE_URL=https://api.example.com/v1 \
GBRAIN_EMBEDDING_API_KEY=... \
GBRAIN_EMBEDDING_MODEL=nomic-embed-text \
GBRAIN_EMBEDDING_DIMENSIONS=768 \
gbrain embed --stale

Test Plan

bun run typecheck
bun test test/aa-copilot-embedding.test.ts test/embed.test.ts test/import-file.test.ts test/sync-cost-preview.test.ts

sean-codevasp · 2026-04-28T06:14:44Z

i was going to try it and you guys already .. haha

Qodo-Free-For-OSS · 2026-04-29T07:12:14Z

Hi, The new provider config allows non-1536 embedding dimensions, but the DB schema defines content_chunks.embedding as vector(1536), so inserting 768/1024-length embeddings will fail and vector search will be incompatible.

Severity: action required | Category: correctness

How to fix: Align schema with dimensions

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

Non-1536 embeddings will not fit into content_chunks.embedding vector(1536). This makes Copilot (1024) and any custom dimension (e.g. 768) unusable for persistence/search.

Issue Context

The PR advertises configurable GBRAIN_EMBEDDING_DIMENSIONS and uses it in requests and chunk metadata.

Fix Focus Areas

src/core/schema-embedded.ts[92-121]

src/core/pglite-schema.ts[71-93]

src/core/migrate.ts[41-200] (add migration)

src/core/postgres-engine.ts[541-593]

src/core/pglite-engine.ts[460-513]

Expected fix

Choose one supported approach:

Keep fixed 1536: disallow configuring other dimensions (validate GBRAIN_EMBEDDING_DIMENSIONS must be 1536; reject Copilot/defaults or add a separate storage path).

Support variable dimensions via migrations: add a migration to ALTER the content_chunks.embedding type to the configured dimension (and rebuild HNSW index), plus ensure PGLite schema/init is consistent.

Store provider-specific embeddings separately: new table/column per dimension/provider.

Also add a guard in upsertChunks (before DB write) to fail fast with a clear error when vector length != expected DB dimension.

Found by Qodo code review

tonyxu-io · 2026-04-29T08:53:26Z

Addressed the embedding-dimension storage issue from the review.

Changes:

Added migration embedding_dimension_realign to align content_chunks.embedding with configured dimensions, clear stale embeddings when the vector type changes, and rebuild the HNSW index.
Passed the configured embedding model/dimensions into schema init before migrations for both Postgres and PGLite.
Added upsertChunks fail-fast validation when an embedding vector length does not match the configured dimension.
Added regression coverage for 1024-dim Copilot/PGLite schema init, 768-dim config SQL, migration shape, and vector-length guard.

Verified:

bun run typecheck
bun test test/embedding-dimensions.test.ts test/aa-copilot-embedding.test.ts test/embed.test.ts test/import-file.test.ts test/migrate.test.ts
git diff --check

…ings-pr450 # Conflicts: # src/core/pglite-engine.ts # src/core/postgres-engine.ts

tonyxu-io · 2026-04-29T09:02:57Z

Follow-up: rebased/merged the PR branch onto current origin/master and resolved the merge conflicts in the engine init paths.

Current status:

PR is now mergeable against master.
Kept the upstream forward-reference bootstrap and schema verification paths.
Kept the embedding dimension settings/migration path before schema replay, so configurable 1024/768 dimensions still work after the merge.

Verified after the merge update:

bun run typecheck
bun test test/embedding-dimensions.test.ts test/aa-copilot-embedding.test.ts test/embed.test.ts test/import-file.test.ts test/migrate.test.ts test/schema-bootstrap-coverage.test.ts test/schema-verify.test.ts test/reconcile-links.test.ts test/graph-query.test.ts test/minions-shell.test.ts — 155 pass
git diff --check HEAD~1..HEAD

I also tried full bun test; it hit Bun 1.3.9 native segfaults in large-suite runs, while the crash-adjacent files pass standalone (graph-query, minions-shell). No GitHub checks are configured/reported for this branch.

tonyxu-io · 2026-05-06T19:01:15Z

Closing this PR because the embedding architecture has changed substantially since this branch was opened.

This branch was based on the old v0.22-era embedding path. Current master uses the v0.27 AI gateway / recipe provider system, so carrying this PR forward as-is would touch the wrong files and create unnecessary conflicts.

I’m opening a fresh, smaller PR against latest master that restores Copilot / Blackbird embeddings using the current provider architecture.

Adds native Copilot embedding provider for GitHub Copilot/Blackbird Metis embeddings. Calls GitHub /embeddings endpoint directly with the Copilot request/response shape; not OpenAI-compatible. - New recipe: src/core/ai/recipes/copilot.ts with metis-1024-I16-Binary - New implementation kind: 'native-copilot' in types.ts - gateway.ts: native-copilot branches in instantiateEmbedding/Expansion/Chat (embedding does real fetch; chat+expansion throw clear errors since this provider is embedding-only) - embedSubBatch short-circuits when model is a Copilot model, bypassing Vercel AI SDK and threading abortSignal into native fetch Replaces garrytan#450. Built on current master (v0.36) instead of v0.22 legacy path. Auth: GBRAIN_COPILOT_TOKEN, COPILOT_GITHUB_TOKEN, GH_TOKEN, GITHUB_TOKEN, or ~/.copilot/config.json (Copilot CLI login). Config: export GBRAIN_EMBEDDING_MODEL=copilot:metis-1024-I16-Binary export GBRAIN_EMBEDDING_DIMENSIONS=1024

tonyxu-io force-pushed the feat/copilot-embeddings branch from c09cb59 to 8306aa6 Compare April 26, 2026 14:53

feat: add configurable embedding providers

263e850

tonyxu-io force-pushed the feat/copilot-embeddings branch from 8306aa6 to 263e850 Compare April 26, 2026 14:53

tonyxu-io changed the title ~~feat: add Copilot embedding provider~~ feat: add configurable embedding providers Apr 26, 2026

Cl3MM mentioned this pull request Apr 27, 2026

feat: parameterize content_chunks.embedding dimension #469

Closed

fix: align embedding schema dimensions

9685388

Merge remote-tracking branch 'origin/master' into feat/copilot-embedd…

a5b82d7

…ings-pr450 # Conflicts: # src/core/pglite-engine.ts # src/core/postgres-engine.ts

Cl3MM mentioned this pull request Apr 29, 2026

Make embedding provider, model, and dimensions configurable #133

Closed

tonyxu-io closed this May 6, 2026

tonyxu-io mentioned this pull request May 6, 2026

feat: add Copilot embedding provider #691

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add configurable embedding providers#450

feat: add configurable embedding providers#450
tonyxu-io wants to merge 3 commits into
garrytan:masterfrom
tonyxu-io:feat/copilot-embeddings

tonyxu-io commented Apr 26, 2026 •

edited

Loading

Uh oh!

sean-codevasp commented Apr 28, 2026

Uh oh!

Qodo-Free-For-OSS commented Apr 29, 2026

Issue description

Issue Context

Fix Focus Areas

Expected fix

Uh oh!

tonyxu-io commented Apr 29, 2026

Uh oh!

tonyxu-io commented Apr 29, 2026

Uh oh!

tonyxu-io commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tonyxu-io commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example

Test Plan

Uh oh!

sean-codevasp commented Apr 28, 2026

Uh oh!

Qodo-Free-For-OSS commented Apr 29, 2026

Issue description

Issue Context

Fix Focus Areas

Expected fix

Uh oh!

tonyxu-io commented Apr 29, 2026

Uh oh!

tonyxu-io commented Apr 29, 2026

Uh oh!

tonyxu-io commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tonyxu-io commented Apr 26, 2026 •

edited

Loading