Summary
Add support for gemini-embedding-2-preview as an option alongside the existing gemini-embedding-001 default. Users should be able to opt in via config.
Background
Google released gemini-embedding-2-preview (docs) with significantly better specs than gemini-embedding-001:
|
gemini-embedding-001 |
gemini-embedding-2-preview |
| Modalities |
Text only |
Text, image, video, audio, PDF |
| Dimensions |
768 |
3072 (configurable: 768, 1536, 3072) |
| Input tokens |
2048 |
8192 |
| Task types |
❌ |
✅ (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.) |
| Matryoshka |
❌ |
✅ (truncatable dimensions) |
The multimodal support is the headline feature — the same embedding space covers text, images, video frames, audio clips, and PDFs, enabling cross-modal retrieval (e.g. search text, get an image back and vice versa). Higher input token limit (8192 vs 2048) means fewer chunk splits for long memory entries.
Required Changes
1. src/memory/embedding-model-limits.ts
Add the new model's token limit:
"gemini:gemini-embedding-2-preview": 8192,
2. src/memory/embeddings-gemini.ts
- Support the
outputDimensionality parameter (3072 default; configurable)
- Support the
taskType field — pass RETRIEVAL_DOCUMENT for storage, RETRIEVAL_QUERY for search
- Support multimodal
parts in the request body: inlineData (base64 image/audio/video) and fileData (URI for PDFs, videos via File API)
- Both fields are absent for older models (backward compatible)
- Both single (
embedContent) and batch (batchEmbedContents) endpoints support multimodal parts
Text request shape (backward compat):
{
"model": "models/gemini-embedding-2-preview",
"content": { "parts": [{ "text": "..." }] },
"taskType": "RETRIEVAL_DOCUMENT",
"outputDimensionality": 3072
}
Multimodal request shape (image example):
{
"model": "models/gemini-embedding-2-preview",
"content": {
"parts": [
{ "inlineData": { "mimeType": "image/png", "data": "<base64>" } },
{ "text": "optional caption or context" }
]
},
"taskType": "RETRIEVAL_DOCUMENT",
"outputDimensionality": 3072
}
Supported MIME types: image/png, image/jpeg, image/webp, image/gif, video/mp4, audio/mp3, audio/wav, application/pdf, and others supported by the Gemini API.
3. src/memory/embeddings-model-normalize.ts
Ensure gemini-embedding-2-preview passes through normalization without being mangled.
4. New: src/memory/embeddings-gemini-multimodal.ts (or extend existing)
Helper to build a multimodal content object from a file path or URL, detecting MIME type and encoding inline vs. File API upload based on size. This keeps embeddings-gemini.ts clean.
5. Config / docs
- Update the
memorySearch.model config reference to list gemini-embedding-2-preview as a valid option
- Document supported input types beyond text
- Add a note that switching models requires re-indexing existing memory (vector dimensions change: 768 → 3072)
DEFAULT_GEMINI_EMBEDDING_MODEL stays gemini-embedding-001 — existing configs unaffected
Acceptance Criteria
Notes
gemini-embedding-2-preview is in preview — model name may change on GA. Consider aliasing once stable.
- Batch endpoint (
batchEmbedContents) also accepts outputDimensionality at the top level — ensure both paths pass it.
- For large files (video, long audio), use the Gemini File API upload path rather than inline base64.
- Cross-modal retrieval (embed image, search with text) works out of the box since both live in the same vector space.
Summary
Add support for
gemini-embedding-2-previewas an option alongside the existinggemini-embedding-001default. Users should be able to opt in via config.Background
Google released
gemini-embedding-2-preview(docs) with significantly better specs thangemini-embedding-001:gemini-embedding-001gemini-embedding-2-previewThe multimodal support is the headline feature — the same embedding space covers text, images, video frames, audio clips, and PDFs, enabling cross-modal retrieval (e.g. search text, get an image back and vice versa). Higher input token limit (8192 vs 2048) means fewer chunk splits for long memory entries.
Required Changes
1.
src/memory/embedding-model-limits.tsAdd the new model's token limit:
2.
src/memory/embeddings-gemini.tsoutputDimensionalityparameter (3072 default; configurable)taskTypefield — passRETRIEVAL_DOCUMENTfor storage,RETRIEVAL_QUERYfor searchpartsin the request body:inlineData(base64 image/audio/video) andfileData(URI for PDFs, videos via File API)embedContent) and batch (batchEmbedContents) endpoints support multimodal partsText request shape (backward compat):
{ "model": "models/gemini-embedding-2-preview", "content": { "parts": [{ "text": "..." }] }, "taskType": "RETRIEVAL_DOCUMENT", "outputDimensionality": 3072 }Multimodal request shape (image example):
{ "model": "models/gemini-embedding-2-preview", "content": { "parts": [ { "inlineData": { "mimeType": "image/png", "data": "<base64>" } }, { "text": "optional caption or context" } ] }, "taskType": "RETRIEVAL_DOCUMENT", "outputDimensionality": 3072 }Supported MIME types:
image/png,image/jpeg,image/webp,image/gif,video/mp4,audio/mp3,audio/wav,application/pdf, and others supported by the Gemini API.3.
src/memory/embeddings-model-normalize.tsEnsure
gemini-embedding-2-previewpasses through normalization without being mangled.4. New:
src/memory/embeddings-gemini-multimodal.ts(or extend existing)Helper to build a multimodal
contentobject from a file path or URL, detecting MIME type and encoding inline vs. File API upload based on size. This keepsembeddings-gemini.tsclean.5. Config / docs
memorySearch.modelconfig reference to listgemini-embedding-2-previewas a valid optionDEFAULT_GEMINI_EMBEDDING_MODELstaysgemini-embedding-001— existing configs unaffectedAcceptance Criteria
model: "gemini-embedding-2-preview"in config produces valid text embeddingsoutputDimensionalitydefaults to 3072; configurable via optionalmemorySearch.dimensionalityfieldtaskTypepassed appropriately for write vs. search operationsgemini-embedding-001behavior unchanged (text-only, no extra fields)Notes
gemini-embedding-2-previewis in preview — model name may change on GA. Consider aliasing once stable.batchEmbedContents) also acceptsoutputDimensionalityat the top level — ensure both paths pass it.