Skip to content

model: use text task prefixes for gemini-embedding-2; improve doc prefix#4851

Merged
KennethEnevoldsen merged 1 commit into
mainfrom
fix-gemini-embedding-2-prompts
Jun 23, 2026
Merged

model: use text task prefixes for gemini-embedding-2; improve doc prefix#4851
KennethEnevoldsen merged 1 commit into
mainfrom
fix-gemini-embedding-2-prompts

Conversation

@gowitheflow-1998

Copy link
Copy Markdown
Member

Two fixes:

  1. Pass query/task prompts directly as text prefixes instead of through taskType. Ref: https://ai.google.dev/gemini-api/docs/embeddings#gemini-embedding-2. taskType is documented for gemini-embedding-001; gemini-embedding-2 is recommended to directly format task prompts as text prefixes, e.g., task: fact checking | query: {content} for fact verification tasks; task: sentence similarity | query: {content} for STS etc.
    Note: I ran a small API check and verified that passing different taskType values leads to identical embeddings for gemini-embedding-2; so it's removed from EmbedContentConfig.
  2. Format documents without titles as title: none | text: {text} to align with the official docs. In previous implementation, documents were only formatted when a title existed.

Ran a few tasks before fix vs. after fix to validate:

Task before this PR Δ
ArguAna 41.9210 78.7090 +36.7880
Banking77Classification.v2 93.1242 93.3420 +0.2178
BiorxivClusteringP2P.v2 51.8584 53.1959 +1.3375
STS12 72.8689 81.1285 +8.2596

@KennethEnevoldsen KennethEnevoldsen changed the title fix: use text task prefixes for gemini-embedding-2; improve doc prefix model: use text task prefixes for gemini-embedding-2; improve doc prefix Jun 23, 2026
@KennethEnevoldsen KennethEnevoldsen merged commit b243378 into main Jun 23, 2026
13 checks passed
@KennethEnevoldsen KennethEnevoldsen deleted the fix-gemini-embedding-2-prompts branch June 23, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants