-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Problem Statement
Summary
I would like to request first-class support for Gemini Embedding 2 in OpenViking.
This feels like a strong fit for OpenViking because the project is positioned as a context database for AI agents, not just a text vector store. Gemini Embedding 2 is especially interesting because it can embed multiple modalities into a shared semantic space, which matches OpenViking’s broader context and retrieval goals.
What matters most
The key point is that this should be treated as a true multimodal retrieval backend, not just as another text embedding option.
To be valuable in OpenViking, support should preserve the multimodal nature of the model through the retrieval pipeline. In particular:
- it should not reduce everything to plain text before embedding
- it should preserve modality where possible during ingestion and indexing
- it should support meaningful cross-modal retrieval use cases, not only text-to-text search
Why this matters
A lot of systems already support standard text embeddings well. What could make OpenViking stand out is stronger retrieval across mixed context such as documents, images, audio, video, and other agent resources.
Gemini Embedding 2 seems like a very natural match for that direction.
Expected outcome
From a user perspective, the important result is:
- first-class support for Gemini Embedding 2
- multimodal inputs handled as multimodal, not flattened by default
- retrieval that benefits from a shared embedding space across resource types
- clear documentation of capabilities and limitations
I am intentionally not proposing a specific implementation path here. The main request is that, if added, Gemini Embedding 2 should be integrated in a way that reflects its value as a multimodal retrieval model.
Thanks for considering this.
Proposed Solution
Just use Codex ;)
Alternatives Considered
Feature Area
Model Integration
Use Case
OpenViking is meant to serve as a context layer for AI agents that work with more than plain text. A useful next step would be support for Gemini Embedding 2 so OpenViking can better handle retrieval across mixed resource types such as documents, images, audio, video, and screenshots. The main use case is multimodal retrieval in a shared embedding space, for example finding the right document from an image query, retrieving notes from an audio clip, or matching a text query to relevant visual or document context. This would make OpenViking more useful for real agent workflows where context is heterogeneous and should not always be flattened into text before retrieval.
Example API (Optional)
Additional Context
No response
Contribution
- I am willing to contribute to implementing this feature
Metadata
Metadata
Assignees
Labels
Type
Projects
Status