Skip to content

feat(embedder): add GeminiDenseEmbedder text embedding provider#4

Open
chethanuk wants to merge 5 commits intomainfrom
feat/gemini-landing
Open

feat(embedder): add GeminiDenseEmbedder text embedding provider#4
chethanuk wants to merge 5 commits intomainfrom
feat/gemini-landing

Conversation

@chethanuk
Copy link
Copy Markdown
Owner

@chethanuk chethanuk commented Mar 18, 2026

Summary

Adds `GeminiDenseEmbedder` — a production-ready dense embedding provider backed by the official `google-genai` SDK. Includes full `EmbeddingConfig` integration, non-symmetric task-type routing, async concurrent batching, empty-text guard, and 50 unit tests + 5 integration test files.


Files Changed (11 files, +1,429 lines)

`openviking/models/embedder/gemini_embedders.py` (new)

Core implementation of `GeminiDenseEmbedder(DenseEmbedderBase)`:

  • 3 models with `KNOWN_DIMENSIONS` registry + prefix/fallback rules for future models:
    • `gemini-embedding-2-preview` → 3072 (MRL, 8192-token limit)
    • `gemini-embedding-001` → 3072 (MRL, 2048-token limit)
    • `text-embedding-004` → 768 (fixed-dim legacy)
  • MRL dimension control via `output_dimensionality` param (1–3072, validated at construction)
  • 8 task types validated at construction (`RETRIEVAL_QUERY`, `RETRIEVAL_DOCUMENT`, `SEMANTIC_SIMILARITY`, `CLASSIFICATION`, `CLUSTERING`, `QUESTION_ANSWERING`, `FACT_VERIFICATION`, `CODE_RETRIEVAL_QUERY`)
  • `embed(text, is_query=False)` — single text; empty/whitespace-only input → zero vector without API call
  • `embed_batch(texts, is_query=False)` — up to 100 texts per SDK call; empty strings filtered pre-API and filled with zero vectors at original positions; per-item fallback on batch error
  • `async_embed_batch(texts)` — concurrent async batching via `anyio` + semaphore (`max_concurrent_batches=10`); optional dep (`pip install openviking[gemini-async]`)
  • Structured error messages with per-HTTP-status hints (400 key detection, 404 model-not-found, 429 quota, 500/503 retry)
  • No `_l2_normalize` — Gemini API returns L2-normalized vectors; `truncate_and_normalize` from base class is sufficient

`openviking_cli/utils/config/embedding_config.py` (+135 lines)

  • `EmbeddingModelConfig`: new `task_type` field with validation; fixed typo in `sk` description ("Secretfor" → "Secret for")
  • `EmbeddingConfig._create_embedder`: Gemini factory entry wiring `model_name`, `api_key`, `dimension`, and context-resolved `task_type`
  • `get_query_embedder()` / `get_document_embedder()`: non-symmetric routing for Gemini via `query_param` / `document_param`
  • `EmbeddingModelConfig.get_effective_dimension()`: Gemini path calls `GeminiDenseEmbedder._default_dimension(model)`

`openviking/models/embedder/init.py` (+4 lines)

Exports `GeminiDenseEmbedder` in public API; updates module docstring.

`examples/ov.conf.example` (+11 lines)

New `embedding_gemini_example` snippet demonstrating `gemini-embedding-2-preview` with non-symmetric `RETRIEVAL_QUERY` / `RETRIEVAL_DOCUMENT` routing and `${GOOGLE_API_KEY}` substitution.

`pyproject.toml` (+8 lines)

  • `google-genai>=0.8.0` added to core `dependencies`
  • `anyio>=4.0` in `test` optional deps; `gemini-async` optional extra
  • `integration` pytest marker
  • `pytest-xdist>=3.0` in `[dependency-groups].dev` for parallel test runs

`tests/integration/conftest.py` (+119 lines)

  • Shared fixtures: `GEMINI_MODELS`, `GEMINI_MODELS_FIXTURE`, `EMBED_PARAMS` parametrize lists
  • `gemini_embedder` (module-scoped, parametrized over known models at dim=768)
  • `gemini_ov_client` (function-scoped, 5 model×dim combinations)
  • `make_ov_client` / `teardown_ov_client` helpers; `gemini_config_dict` builder
  • `vectordb_engine_available()` — narrowed exception handling (`ImportError, ModuleNotFoundError, AttributeError`)
  • `requires_api_key` / `requires_engine` skip markers

`tests/integration/test_gemini_embedding_it.py` (+123 lines)

Real API integration tests (auto-skipped without `GOOGLE_API_KEY`):

Test Verifies
`test_embed_returns_correct_dimension` dim=768, L2 norm 0.99–1.01
`test_embed_batch_count` 5 texts → 5 results at correct dim
`test_batch_over_100` 150 texts auto-split into 2 SDK calls
`test_large_text_chunking` (×2 models) Text > token limit, L2-normalized result
`test_all_task_types_accepted` (×8) All task types accepted by API
`test_config_nonsymmetric_routing` `query_param`/`document_param` wire correct `task_type`
`test_invalid_api_key_error_message` RuntimeError with "Invalid API key" hint
`test_invalid_model_error_message` RuntimeError with "Model not found" hint

`tests/integration/test_gemini_openviking_it.py` (+209 lines)

Full OpenViking add-memory + search workflows (requires `GOOGLE_API_KEY` + compiled VectorDB engine):

Test Verifies
`test_add_and_search_basic` Single doc indexed and returned by `find()`
`test_batch_documents_search` 5-topic batch; relevant doc found first
`test_large_text_add_and_search` (×2 models) Chunked large doc is searchable
`test_retrieval_routing_workflow` (×2 configs) Non-symmetric task-type routing add+search
`test_dimension_variant_add_search` (×4 dims) 512/768/1536/3072 — dim assertion on `OpenVikingConfigSingleton`
`test_session_search_smoke` Session + `find()` returns indexed doc (`total > 0`)

`tests/integration/test_gemini_e2e.py` (+114 lines)

E2E HTTP server integration tests using the `server_url` uvicorn fixture.

`tests/unit/test_gemini_embedder.py` (+352 lines)

50 unit tests (all mocked, no API key needed):

  • Constructor: missing key, invalid task_type, invalid dimension bounds
  • `embed()`: SDK call shape, task_type in config, error mapping, empty/whitespace → zero vector (no API call)
  • `embed_batch()`: empty list, empty string filtering with position-preserving zero vectors, chunk splitting at 100, batch fallback, `is_query` forwarded
  • `async_embed_batch()`: empty list, concurrent dispatch, per-batch fallback
  • `_default_dimension()`: exact match, prefix rule (`text-embedding-*`), fallback
  • `close()`: graceful cleanup

`tests/unit/test_embedding_config_gemini.py` (+111 lines)

Unit tests for `EmbeddingConfig` Gemini path: valid construction, task_type validation, missing key rejection, `get_effective_dimension()`, factory kwargs, non-symmetric routing.


Post-Review Fixes Applied

Finding Fix
`embed()`/`embed_batch()` missing `is_query: bool = False` Added — satisfies `DenseEmbedderBase` interface
`_l2_normalize` redundant (Gemini returns normalized vectors) Removed function + all 3 call sites; removed unused `import math`
Empty text → non-descriptive API error Zero vector returned immediately; batch filters empties pre-API
`sk` description typo "Secretfor" Fixed to "Secret for"
`vectordb_engine_available()` broad `except Exception` Narrowed to `(ImportError, ModuleNotFoundError, AttributeError)`
Hardcoded fake API key string in test Dynamic: `"INVALID_KEY_" + "XYZZY_123"`
`test_dimension_variant_add_search` no dim assertion Added `OpenVikingConfigSingleton` dimension check
`test_session_search_smoke` tautology `>= 0` Fixed to `> 0` with meaningful failure message

Verification

# Unit tests (50, no API key needed) — all pass
uv run pytest tests/unit/test_gemini_embedder.py tests/unit/test_embedding_config_gemini.py \
  --override-ini="addopts=" -q
# → 50 passed in 0.11s

# Integration (auto-skip without key)
GOOGLE_API_KEY=$GEMINI_KEY uv run pytest \
  tests/integration/test_gemini_embedding_it.py \
  tests/integration/test_gemini_openviking_it.py \
  tests/integration/test_gemini_e2e.py \
  --override-ini="addopts=" -v

# Parallel run (after uv sync --group dev installs pytest-xdist)
GOOGLE_API_KEY=$GEMINI_KEY uv run pytest \
  tests/unit/ tests/integration/test_gemini_*.py \
  --override-ini="addopts=" -n 4 -v

Summary by CodeRabbit

Release Notes

New Features

  • Added Google Gemini as a supported embedding provider with multiple model options
  • Introduced configurable output dimensions for Gemini embeddings
  • Added task type selection for optimized query vs. document embeddings
  • Enabled asynchronous batch processing for improved performance

Chores

  • Updated dependencies to support Gemini integration

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Google Gemini dense-embedding support: new GeminiDenseEmbedder implementation, wiring into embedding configuration/factory with context-aware query/document routing, example config and dependency updates, and extensive unit and integration tests (including async paths and real-Gemini integration).

Changes

Cohort / File(s) Summary
Core Gemini Embedder Implementation
openviking/models/embedder/gemini_embedders.py
New GeminiDenseEmbedder (sync + async batch), token-aware chunking, per-batch fallback to per-item, dimension management and defaults, L2 normalization/truncation, error mapping, concurrency controls, and lifecycle methods.
Module API & Exports
openviking/models/embedder/__init__.py
Exports GeminiDenseEmbedder and updates public exports/docstring to include Google Gemini (dense-only).
Embedding Configuration & Wiring
openviking_cli/utils/config/embedding_config.py
Adds input_type, max_tokens, task_type; contextual getters (get_query_embedder, get_document_embedder, _get_contextual_embedder); integrates gemini into factory registry and delegates dimension resolution to Gemini embedder.
Example Configuration
examples/ov.conf.example
Adds top-level embedding_gemini_example dense block with model, provider, api_key, dimension, and query/document param names.
Project Metadata & Dependencies
pyproject.toml
Adds google-genai>=0.8.0, optional anyio>=4.0 (test/gemini-async), updates test/dev extras and adds pytest integration marker.
Integration Test Helpers & Fixtures
tests/integration/conftest.py
Adds Gemini constants, fixtures (gemini_embedder, gemini_ov_client), helpers (l2_norm, gemini_config_dict, client lifecycle), and pytest markers for API/engine availability.
Integration Tests — Embedder E2E
tests/integration/test_gemini_e2e.py
E2E tests covering dimensions, L2 normalization, batch vs single parity, semantic similarity, and async batch behavior against real Gemini API (guarded by markers).
Integration Tests — Embedder IT
tests/integration/test_gemini_embedding_it.py
Integration tests for single/batch embeddings, large-text chunking, task-type coverage, routing/non-symmetric config, and error-message assertions.
Integration Tests — OpenViking Workflows
tests/integration/test_gemini_openviking_it.py
Full OpenViking add/search workflows using Gemini embeddings: add/index, retrieval routing, multi-dimension checks, large-text handling, and session smoke tests.
Unit Tests
tests/unit/test_gemini_embedder.py, tests/unit/test_embedding_config_gemini.py
Unit coverage for init validation, token limits, default/fallback dimensions, sync/async batch behavior, error mapping, and embedding-config routing/validation.
Manifest / Example Files
manifest_file, requirements.txt
Manifest entries and requirements updated to reflect added dependencies/tests and example config addition.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Config as EmbeddingConfig
    participant Factory as EmbedderFactory
    participant Gemini as GeminiDenseEmbedder
    participant API as Google Gemini API

    User->>Config: request embedder(provider="gemini", model, task_type?, context?)
    Config->>Config: validate fields (api_key, task_type, dimension)
    Config->>Factory: create(provider="gemini", type="dense", context)
    Factory->>Gemini: instantiate(model, api_key, task_type, dimension, max_tokens, context)
    Gemini->>API: init genai Client (api_key)
    Factory-->>User: return embedder instance

    User->>Gemini: embed(text) / embed_batch(texts)
    Gemini->>Gemini: truncate/normalize inputs, chunk as needed
    alt batch call success
        Gemini->>API: batch embed request (task_type)
        API-->>Gemini: vectors
    else batch fails
        loop per-item fallback
            Gemini->>API: single embed request
            API-->>Gemini: vector
        end
    end
    Gemini->>Gemini: L2 normalize/truncate to target dimension
    Gemini-->>User: return vectors
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I nibble tokens in the starlit glen,
Gemini hums and folds them into ken,
Batches hop, async whiskers tap the key,
Query and doc find vectors, wild and free,
OpenViking sings — embeddings bloom for me.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a new Gemini dense embedding provider implementation to the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/gemini-landing
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the embedding capabilities by introducing native support for Google's Gemini Embedding models. It provides a robust and flexible integration, allowing users to leverage Gemini's dense embeddings with various task types and dimensions, including efficient asynchronous batch processing. The changes also refine the embedding configuration system to accommodate the new provider and its specific parameters, ensuring seamless integration into existing workflows. This enhancement broadens the range of available embedding providers, offering more choices and potentially improved performance for different use cases.

Highlights

  • New GeminiDenseEmbedder: Added a new GeminiDenseEmbedder class, integrating Google's Gemini Embedding models via the google-genai SDK. This new embedder supports all 8 Gemini task types and output dimensions from 1 to 3072.
  • Enhanced Embedding Configuration: Updated EmbeddingModelConfig to include the new gemini provider, task_type parameter, and max_tokens for better control over embedding requests. The configuration now also supports context-aware routing for query and document embeddings.
  • Asynchronous Batching Support: Implemented asynchronous concurrent batching for Gemini embeddings using anyio, improving performance for large volumes of text.
  • Comprehensive Testing: Introduced extensive unit and integration tests for the GeminiDenseEmbedder and its configuration, covering API key validation, dimension handling, task types, batching, and end-to-end OpenViking workflows.
  • Dependency Updates: Added google-genai>=0.8.0 and anyio>=4.0 as core dependencies, along with pytest-asyncio and pytest-cov for development and testing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the GeminiDenseEmbedder, significantly expanding the embedding provider options. The implementation includes robust error handling, efficient batching, and asynchronous capabilities using anyio. The configuration system has been updated to support Gemini-specific parameters and routing for non-symmetric embedding tasks. Comprehensive unit and integration tests have been added, covering various aspects like dimension handling, task type validation, batching, and error scenarios, ensuring the new functionality is well-tested and reliable. Overall, this is a well-executed feature addition.

Comment on lines +162 to +163
vector = _l2_normalize(
truncate_and_normalize(list(result.embeddings[0].values), self._dimension)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The truncate_and_normalize function already performs L2 normalization. Calling _l2_normalize on the result is redundant, as the vector will be normalized twice. Please remove the call to _l2_normalize here.

            vector = truncate_and_normalize(list(result.embeddings[0].values), self._dimension)

Comment on lines +183 to +184
truncate_and_normalize(list(emb.values), self._dimension)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the embed method, the truncate_and_normalize function already performs L2 normalization. Calling _l2_normalize again is redundant. Please remove the call to _l2_normalize here.

                    vector = truncate_and_normalize(list(emb.values), self._dimension)

Comment on lines +220 to +221
dense_vector=_l2_normalize(
truncate_and_normalize(list(emb.values), self._dimension)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the async_embed_batch method, the truncate_and_normalize function already performs L2 normalization. Calling _l2_normalize again is redundant. Please remove the call to _l2_normalize here.

                            dense_vector=truncate_and_normalize(list(emb.values), self._dimension)

Comment on lines 94 to 97
for key in ("input_type",):
value = data.get(key)
if isinstance(value, str):
data[key] = value.lower()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sync_provider_backend method currently only lowercases input_type. However, task_type, query_param, and document_param for Gemini are expected to be uppercase by GeminiDenseEmbedder's validation. If a user provides these values in lowercase, it will lead to a ValueError. To ensure consistency and prevent runtime errors, these fields should also be lowercased (or uppercased) here, or the GeminiDenseEmbedder should handle case-insensitivity.

Suggested change
for key in ("input_type",):
value = data.get(key)
if isinstance(value, str):
data[key] = value.lower()
for key in ("input_type", "query_param", "document_param", "task_type"):
value = data.get(key)
if isinstance(value, str):
data[key] = value.upper()

Comment on lines +155 to +159
_GEMINI_TASK_TYPES = {
"RETRIEVAL_QUERY", "RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY",
"CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING",
"FACT_VERIFICATION", "CODE_RETRIEVAL_QUERY",
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _GEMINI_TASK_TYPES frozenset is duplicated here. It is already defined as _VALID_TASK_TYPES in openviking.models.embedder.gemini_embedders. To improve maintainability and ensure a single source of truth, please import _VALID_TASK_TYPES from gemini_embedders instead of redefining it.

            from openviking.models.embedder.gemini_embedders import _VALID_TASK_TYPES as _GEMINI_TASK_TYPES

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pyproject.toml (1)

82-88: ⚠️ Potential issue | 🟠 Major

pytest-asyncio version mismatch requires careful resolution.

There's a version inconsistency for pytest-asyncio:

  • [project.optional-dependencies] test (Line 82): pytest-asyncio>=0.21.0
  • [dependency-groups] dev (Line 255): pytest-asyncio>=1.3.0

However, upgrading to 1.3.0 cannot be done with a simple version bump. The codebase defines custom event_loop() fixtures in tests/conftest.py and bot/tests/conftest.py that manually manage event loops with asyncio.new_event_loop(). pytest-asyncio 1.3.0 removed the event_loop fixture entirely and overhauled event loop scoping, requiring these fixtures to be refactored to use asyncio.get_running_loop() and explicit loop_scope configuration.

Either align both to >=0.21.0 for consistency, or upgrade to >=1.3.0 after migrating the test fixtures according to the pytest-asyncio migration guide.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 82 - 88, The project has conflicting
pytest-asyncio requirements; choose one: (A) keep both declarations at
pytest-asyncio>=0.21.0 by changing the [dependency-groups] dev entry to
pytest-asyncio>=0.21.0 so current event_loop fixtures continue to work, or (B)
fully upgrade to pytest-asyncio>=1.3.0 and refactor tests/conftest.py and
bot/tests/conftest.py by removing the custom event_loop fixtures that call
asyncio.new_event_loop(), adopt the new scoped loop behavior using loop_scope
configuration and replace uses with asyncio.get_running_loop() in async tests
and fixtures, following the pytest-asyncio migration guide; update the pyproject
entries so both places list the same version chosen.
🧹 Nitpick comments (7)
tests/integration/test_gemini_e2e.py (2)

92-92: Timing assertion may cause flaky tests.

assert elapsed < 15 is network-dependent and could fail due to API latency spikes, CI resource contention, or regional network conditions. Consider either:

  1. Using a more generous threshold (e.g., 60s)
  2. Removing the timing assertion and validating only correctness
  3. Marking this test with @pytest.mark.flaky if available
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_gemini_e2e.py` at line 92, The timing assertion assert
elapsed < 15 is flaky; update the test in tests/integration/test_gemini_e2e.py
to avoid brittle timing checks by either increasing the threshold (e.g., replace
the 15s limit with 60s for elapsed), removing the timing assertion altogether
and only asserting correctness, or decorating the test with a flaky marker
(e.g., `@pytest.mark.flaky`) so transient API/CI latency won't fail the test;
modify the assertion referencing elapsed accordingly and add a brief comment
explaining the reason.

56-63: Batch vs individual comparison may have false positives.

The test compares batch results to individual results expecting sim > 0.99. However, if the Gemini API is non-deterministic (e.g., different request batching), this could occasionally fail. The threshold of 0.99 is quite strict.

Consider loosening to 0.95 or documenting that exact reproducibility is expected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_gemini_e2e.py` around lines 56 - 63, The test
test_embed_batch_matches_individual is too strict and can false-fail if
embeddings vary; update the assertion threshold from 0.99 to a looser value
(e.g., 0.95) in the loop that compares batch_results from embed_batch to
individual_results from embed using _cosine_similarity, or add a clear comment
documenting that exact reproducibility is required; change the numeric check
(sim > 0.99) to sim > 0.95 (or configurable) so the test tolerates minor
non-determinism.
openviking_cli/utils/config/embedding_config.py (2)

155-159: Duplicate task type constants across files.

_GEMINI_TASK_TYPES is defined here and _VALID_TASK_TYPES is defined in gemini_embedders.py (line 40-49). These should be consolidated to a single source of truth to prevent drift.

♻️ Proposed approach

Import from the embedder module:

+from openviking.models.embedder.gemini_embedders import _VALID_TASK_TYPES as _GEMINI_TASK_TYPES
-            _GEMINI_TASK_TYPES = {
-                "RETRIEVAL_QUERY", "RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY",
-                "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING",
-                "FACT_VERIFICATION", "CODE_RETRIEVAL_QUERY",
-            }

Or extract to a shared constants module.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking_cli/utils/config/embedding_config.py` around lines 155 - 159, The
duplicate task-type sets (_GEMINI_TASK_TYPES in embedding_config.py and
_VALID_TASK_TYPES in gemini_embedders.py) should be consolidated to a single
source of truth: either export the canonical set from gemini_embedders.py (e.g.,
rename or re-export _VALID_TASK_TYPES) and import it into embedding_config.py,
or move the set into a new shared constants module and import it from both
places; update references to use the single exported symbol and remove the
duplicate definition in embedding_config.py so both modules rely on the same
constant.

438-468: New embedder methods not integrated into service layer.

The get_query_embedder() and get_document_embedder() methods are defined but unused. The service layer (openviking/service/core.py, lines 101 and 247) continues to call config.embedding.get_embedder(), so context-aware embeddings for non-symmetric providers (Gemini, Jina) won't be utilized without explicit service layer updates.

Consider integrating these methods into the service layer or documenting whether this is intentional (opt-in behavior to be used elsewhere).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking_cli/utils/config/embedding_config.py` around lines 438 - 468, The
new methods get_query_embedder and get_document_embedder are never used; update
the service layer code that currently calls config.embedding.get_embedder() to
instead call get_query_embedder when producing/querying embeddings and
get_document_embedder when creating/indexing document embeddings (identify the
call sites by the functions that currently invoke
config.embedding.get_embedder()); ensure the service logic preserves the
fallback behavior (call get_embedder if contextual methods are unavailable) and
keep existing behavior for symmetric providers by relying on
_get_contextual_embedder's effective_context logic.
openviking/models/embedder/gemini_embedders.py (3)

186-192: Fallback logic may re-raise error for persistent API failures.

When embed_batch falls back to individual embed() calls on line 192, if the API error is persistent (e.g., 401 auth error), each individual call will also fail and raise a RuntimeError. This will abort processing mid-batch rather than gracefully handling all items.

Consider whether this is the intended behavior or if the fallback should only apply to transient errors (5xx).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking/models/embedder/gemini_embedders.py` around lines 186 - 192, The
fallback in embed_batch that iterates calling self.embed(text) for each item can
re-raise persistent API errors (e.g., 401) and abort processing; change
embed_batch (and the except block that catches (APIError, ClientError) as e) to
only perform per-item fallback for transient server errors (HTTP 5xx) and
re-raise for persistent client/auth errors (4xx). Concretely, inspect e.code (or
error.kind) inside the except in embed_batch: if it's a 5xx, loop and attempt
per-item embedding but wrap each self.embed(text) in its own try/except to
catch/record individual failures (log and append a sentinel or result
placeholder); if e.code is 4xx (or other non-transient), re-raise the original
exception immediately. This targets the embed_batch method and the call to
self.embed to avoid aborting mid-batch on persistent errors.

226-233: Async fallback runs sync embed() sequentially, negating concurrency benefits.

When async batch embedding fails, the fallback at line 232 runs self.embed(text) via anyio.to_thread.run_sync for each text sequentially within the list comprehension. This makes the fallback path significantly slower than the primary async path.

Consider parallelizing the fallback or documenting this as expected degraded behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking/models/embedder/gemini_embedders.py` around lines 226 - 233, The
fallback currently runs self.embed(text) sequentially inside a list
comprehension (via anyio.to_thread.run_sync) which loses concurrency; update the
except handler to run the per-text fallbacks in parallel by spawning concurrent
tasks (e.g., using anyio.create_task_group or anyio.from_thread.run) that each
call anyio.to_thread.run_sync(self.embed, text) and then await/join and collect
results into results[idx], so each embed call executes concurrently rather than
sequentially in the except block handling APIError/ClientError.

244-249: Use the public client.close() method instead of accessing the private _http_client attribute.

The google-genai SDK provides an official close() method on the Client class for cleanup. Directly accessing the private _http_client attribute is fragile and can break across SDK versions. Replace self.client._http_client.close() with self.client.close().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking/models/embedder/gemini_embedders.py` around lines 244 - 249,
Replace the private attribute access in the close routine: instead of checking
for and calling self.client._http_client.close(), call the public
self.client.close() method; update the close method in the Gemini embedder class
(the method named close that currently references self.client and _http_client)
to invoke self.client.close() inside a try/except to handle errors and remove
reliance on the private _http_client attribute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Line 58: The Field description for the pydantic variable sk is missing a space
("Secretfor"); update the description string in the Field(...) for sk to read
"Access Key Secret for VikingDB API" so the text is corrected; locate the sk
Field declaration in embedding_config.py and adjust only the description value.

In `@openviking/models/embedder/__init__.py`:
- Line 15: The package docstring in openviking.models.embedder.__init__.py
incorrectly claims "Google Gemini: Dense only (multimodal)"; make it consistent
with the implementation by removing or correcting the "multimodal" note (or
explicitly state "not multimodal") so it matches
GeminiDenseEmbedder.supports_multimodal in gemini_embedders.py; update the
__init__ module docstring text to reflect that GeminiDenseEmbedder is dense-only
and does not support multimodal inputs.

In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 169-193: The embed_batch signature must match
DenseEmbedderBase.embed_batch by adding the is_query: bool = False parameter;
update embed_batch(self, texts: List[str], is_query: bool = False) and pass
is_query into the client call (client.models.embed_content(...,
is_query=is_query, ...)) and into the fallback calls to self.embed(text) (i.e.,
self.embed(text, is_query=is_query)) so the flag is propagated consistently;
adjust any internal uses of is_query if needed.
- Around line 154-167: The embed method in gemini_embedders.py must match the
base class DenseEmbedderBase signature to avoid TypeError; change the method
signature of embed (the function named embed in this file) to include the
missing parameter is_query: bool = False (i.e., def embed(self, text: str,
is_query: bool = False) -> EmbedResult) and keep the existing implementation
unchanged (or pass the flag through to the client if required later) so the
method is polymorphically compatible with DenseEmbedderBase.embed.

In `@tests/integration/conftest.py`:
- Around line 118-123: vectordb_engine_available() currently swallows all
exceptions; change it to only treat import/attribute-not-found as "engine
unavailable" so real syntax/runtime errors surface. In the function
(vectordb_engine_available) import openviking.storage.vectordb.engine and check
that PersistStore and VolatileStore are types, but replace the broad "except
Exception" with a narrow catch for ImportError, ModuleNotFoundError and
AttributeError (or similar import/lookup errors) and return False in those
cases, and re-raise any other exceptions so syntax/runtime failures are not
hidden.

In `@tests/integration/test_gemini_embedding_it.py`:
- Around line 90-95: The test test_invalid_api_key_error_message currently
hardcodes a fake API key that triggers secret scanners; instead construct the
bogus key at runtime (e.g., concatenate fragments or join a list) before passing
it to GeminiDenseEmbedder so the value is identical at runtime but not present
as a contiguous secret in source. Update the instantiation line that creates bad
= GeminiDenseEmbedder(..., api_key="INVALID_KEY_XYZZY_123") to build the string
dynamically and leave the rest of the test (the pytest.raises(RuntimeError,
match="Invalid API key") and bad.embed("hello")) unchanged.

In `@tests/integration/test_gemini_openviking_it.py`:
- Around line 142-160: The test currently only verifies add/find but not that
the requested embedding dimension is actually used; update
test_dimension_variant_add_search to assert the active embedder/index schema
dimension after client creation (e.g., inspect the client's embedder instance or
index schema) so the requested dim is honored—locate the client returned by
make_ov_client and check its embedder class/attribute (referencing
GeminiDenseEmbedder in openviking/models/embedder/gemini_embedders.py) or the
index schema’s vector dimension property and assert it equals dim before
proceeding with add_resource and search.
- Around line 167-186: The test_session_search_smoke creates a session via
session = client.session(session_id="gemini_it_session") and adds a message but
then calls client.find(...) which ignores that session and asserts result.total
>= 0 (a tautology); either exercise the session-scoped retrieval API or tighten
the assertion. Fix by invoking the session-aware find/search method (e.g.,
session.find(...) or client.find(..., session_id="gemini_it_session") depending
on API) so the added message is used, and change the assertion to a meaningful
check such as result.total > 0 or assert that returned hits relate to the added
resource; update references in the test (test_session_search_smoke, session,
client.find) accordingly.

In `@tests/unit/test_gemini_embedder.py`:
- Line 180: Tests are using the AnyIO marker `@pytest.mark.anyio` which conflicts
with pytest-asyncio in auto mode; either remove all occurrences of
`@pytest.mark.anyio` from the async test functions in test_gemini_embedder.py (so
pytest-asyncio auto-detection runs them) or replace those markers with
`@pytest.mark.asyncio` and switch pytest-asyncio configuration to
asyncio_mode="strict" in pyproject.toml; search for the decorator occurrences
(e.g., the `@pytest.mark.anyio` lines around the async test functions) and update
the decorators and/or the pyproject.toml asyncio_mode accordingly.

---

Outside diff comments:
In `@pyproject.toml`:
- Around line 82-88: The project has conflicting pytest-asyncio requirements;
choose one: (A) keep both declarations at pytest-asyncio>=0.21.0 by changing the
[dependency-groups] dev entry to pytest-asyncio>=0.21.0 so current event_loop
fixtures continue to work, or (B) fully upgrade to pytest-asyncio>=1.3.0 and
refactor tests/conftest.py and bot/tests/conftest.py by removing the custom
event_loop fixtures that call asyncio.new_event_loop(), adopt the new scoped
loop behavior using loop_scope configuration and replace uses with
asyncio.get_running_loop() in async tests and fixtures, following the
pytest-asyncio migration guide; update the pyproject entries so both places list
the same version chosen.

---

Nitpick comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 155-159: The duplicate task-type sets (_GEMINI_TASK_TYPES in
embedding_config.py and _VALID_TASK_TYPES in gemini_embedders.py) should be
consolidated to a single source of truth: either export the canonical set from
gemini_embedders.py (e.g., rename or re-export _VALID_TASK_TYPES) and import it
into embedding_config.py, or move the set into a new shared constants module and
import it from both places; update references to use the single exported symbol
and remove the duplicate definition in embedding_config.py so both modules rely
on the same constant.
- Around line 438-468: The new methods get_query_embedder and
get_document_embedder are never used; update the service layer code that
currently calls config.embedding.get_embedder() to instead call
get_query_embedder when producing/querying embeddings and get_document_embedder
when creating/indexing document embeddings (identify the call sites by the
functions that currently invoke config.embedding.get_embedder()); ensure the
service logic preserves the fallback behavior (call get_embedder if contextual
methods are unavailable) and keep existing behavior for symmetric providers by
relying on _get_contextual_embedder's effective_context logic.

In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 186-192: The fallback in embed_batch that iterates calling
self.embed(text) for each item can re-raise persistent API errors (e.g., 401)
and abort processing; change embed_batch (and the except block that catches
(APIError, ClientError) as e) to only perform per-item fallback for transient
server errors (HTTP 5xx) and re-raise for persistent client/auth errors (4xx).
Concretely, inspect e.code (or error.kind) inside the except in embed_batch: if
it's a 5xx, loop and attempt per-item embedding but wrap each self.embed(text)
in its own try/except to catch/record individual failures (log and append a
sentinel or result placeholder); if e.code is 4xx (or other non-transient),
re-raise the original exception immediately. This targets the embed_batch method
and the call to self.embed to avoid aborting mid-batch on persistent errors.
- Around line 226-233: The fallback currently runs self.embed(text) sequentially
inside a list comprehension (via anyio.to_thread.run_sync) which loses
concurrency; update the except handler to run the per-text fallbacks in parallel
by spawning concurrent tasks (e.g., using anyio.create_task_group or
anyio.from_thread.run) that each call anyio.to_thread.run_sync(self.embed, text)
and then await/join and collect results into results[idx], so each embed call
executes concurrently rather than sequentially in the except block handling
APIError/ClientError.
- Around line 244-249: Replace the private attribute access in the close
routine: instead of checking for and calling self.client._http_client.close(),
call the public self.client.close() method; update the close method in the
Gemini embedder class (the method named close that currently references
self.client and _http_client) to invoke self.client.close() inside a try/except
to handle errors and remove reliance on the private _http_client attribute.

In `@tests/integration/test_gemini_e2e.py`:
- Line 92: The timing assertion assert elapsed < 15 is flaky; update the test in
tests/integration/test_gemini_e2e.py to avoid brittle timing checks by either
increasing the threshold (e.g., replace the 15s limit with 60s for elapsed),
removing the timing assertion altogether and only asserting correctness, or
decorating the test with a flaky marker (e.g., `@pytest.mark.flaky`) so transient
API/CI latency won't fail the test; modify the assertion referencing elapsed
accordingly and add a brief comment explaining the reason.
- Around line 56-63: The test test_embed_batch_matches_individual is too strict
and can false-fail if embeddings vary; update the assertion threshold from 0.99
to a looser value (e.g., 0.95) in the loop that compares batch_results from
embed_batch to individual_results from embed using _cosine_similarity, or add a
clear comment documenting that exact reproducibility is required; change the
numeric check (sim > 0.99) to sim > 0.95 (or configurable) so the test tolerates
minor non-determinism.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e01e6e7-9faa-44c8-abdb-8d6be2909a5f

📥 Commits

Reviewing files that changed from the base of the PR and between 1efb9fb and 4fb7452.

📒 Files selected for processing (11)
  • examples/ov.conf.example
  • openviking/models/embedder/__init__.py
  • openviking/models/embedder/gemini_embedders.py
  • openviking_cli/utils/config/embedding_config.py
  • pyproject.toml
  • tests/integration/conftest.py
  • tests/integration/test_gemini_e2e.py
  • tests/integration/test_gemini_embedding_it.py
  • tests/integration/test_gemini_openviking_it.py
  • tests/unit/test_embedding_config_gemini.py
  • tests/unit/test_gemini_embedder.py

"""Unit tests for async_embed_batch (uses AsyncMock, no real API)."""

@patch("openviking.models.embedder.gemini_embedders.genai.Client")
@pytest.mark.anyio
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

pytest-asyncio asyncio_mode auto vs pytest.mark.anyio compatibility

💡 Result:

pytest-asyncio's asyncio_mode=auto automatically detects and runs all async test functions (adding @pytest.mark.asyncio implicitly) and takes ownership of all async fixtures. This is ideal for pure asyncio projects but conflicts with pytest.mark.anyio from the AnyIO pytest plugin. pytest.mark.anyio marks specific tests to run with AnyIO's anyio_backend fixture, supporting multiple backends (asyncio, trio, etc.). It requires explicit marking or anyio_mode=auto. Compatibility: - In asyncio_mode=auto (pytest-asyncio), it conflicts with AnyIO's plugin: pytest-asyncio grabs all async tests/fixtures first, preventing pytest.mark.anyio from working properly (AnyIO docs explicitly warn: "This does not work if pytest-asyncio is installed and configured to use its own auto mode"). - Solution for coexistence: Use pytest-asyncio's default strict mode (no asyncio_mode or asyncio_mode=strict). Then explicitly mark asyncio tests with @pytest.mark.asyncio and AnyIO tests with @pytest.mark.anyio. Strict mode only handles explicitly marked asyncio tests, allowing other plugins like AnyIO to handle their marked tests. Key differences: - asyncio_mode=auto: Simplest for asyncio-only, no markers needed, but no multi-backend or AnyIO compatibility. - Strict + markers: More verbose but enables mixing asyncio-specific and AnyIO/multi-backend tests in the same suite. Current versions (pytest-asyncio ~1.3.0, AnyIO latest as of 2026): Behavior unchanged per docs and issues.

Citations:


🏁 Script executed:

# Find and check pyproject.toml for asyncio configuration
find . -name "pyproject.toml" -type f | head -5

Repository: chethanuk/OpenViking

Length of output: 238


🏁 Script executed:

# Check if pytest-anyio is mentioned in dependencies
if [ -f pyproject.toml ]; then
  cat pyproject.toml | grep -A 20 -B 5 "asyncio\|anyio\|pytest" | head -80
fi

Repository: chethanuk/OpenViking

Length of output: 1850


🏁 Script executed:

# Check for requirements files
ls -la | grep -i req

Repository: chethanuk/OpenViking

Length of output: 46


🏁 Script executed:

# Check the test file for anyio markers at the specified lines
sed -n '175,250p' tests/unit/test_gemini_embedder.py | cat -n

Repository: chethanuk/OpenViking

Length of output: 4716


Replace @pytest.mark.anyio with @pytest.mark.asyncio or remove markers entirely.

The project uses pytest-asyncio>=0.21.0 with asyncio_mode = "auto" in pyproject.toml. This configuration is incompatible with @pytest.mark.anyio: pytest-asyncio's auto mode automatically detects and runs all async tests, preventing AnyIO's plugin from recognizing @pytest.mark.anyio markers (as explicitly documented in AnyIO's testing guide).

Choose one solution:

  1. Remove all @pytest.mark.anyio markers—auto mode will detect and run async tests automatically.
  2. Switch asyncio_mode to "strict" in pyproject.toml and replace @pytest.mark.anyio with @pytest.mark.asyncio for explicit pytest-asyncio compatibility.

Applies to lines: 180, 195, 215, 232, 240

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_gemini_embedder.py` at line 180, Tests are using the AnyIO
marker `@pytest.mark.anyio` which conflicts with pytest-asyncio in auto mode;
either remove all occurrences of `@pytest.mark.anyio` from the async test
functions in test_gemini_embedder.py (so pytest-asyncio auto-detection runs
them) or replace those markers with `@pytest.mark.asyncio` and switch
pytest-asyncio configuration to asyncio_mode="strict" in pyproject.toml; search
for the decorator occurrences (e.g., the `@pytest.mark.anyio` lines around the
async test functions) and update the decorators and/or the pyproject.toml
asyncio_mode accordingly.

chethanuk added a commit that referenced this pull request Mar 18, 2026
- embed()/embed_batch(): add is_query: bool = False to match base class signature
- embed_batch() fallback: pass is_query through to self.embed()
- Remove redundant _l2_normalize() calls in embed/embed_batch/async_embed_batch;
  Gemini API already returns L2-normalized vectors, truncate_and_normalize is sufficient
- Remove now-dead _l2_normalize() function and unused math import
- embedding_config.py: fix typo "Secretfor" → "Secret for" in sk field description
- conftest.py: narrow broad except Exception to (ImportError, ModuleNotFoundError, AttributeError)
- test_gemini_embedding_it.py: construct fake API key dynamically to avoid hardcoded secret pattern
- test_gemini_openviking_it.py: add dimension assertion in test_dimension_variant_add_search;
  fix tautology assertion (>= 0 → > 0) in test_session_search_smoke
@chethanuk chethanuk force-pushed the feat/gemini-landing branch 2 times, most recently from 7d10057 to df86558 Compare March 18, 2026 22:18
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
tests/integration/test_gemini_openviking_it.py (2)

185-189: ⚠️ Potential issue | 🟠 Major

Use the session in the search you assert.

The stronger > 0 check helps, but client.find(...) still ignores the session created on Lines 185-186. This remains a plain find smoke test, so a regression in session-scoped retrieval would still slip through.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_gemini_openviking_it.py` around lines 185 - 189, The
test creates a session object (session =
client.session(session_id="gemini_it_session") and adds a message, but then
calls client.find(...) without scoping to that session; update the find call to
use the created session (pass the session object or its session_id) so the
search is performed against the session-scoped data (e.g., client.find(...,
session=session) or client.find(..., session_id="gemini_it_session") instead of
the current call to client.find(query="pytest testing tool")).

149-151: ⚠️ Potential issue | 🟠 Major

Assert the runtime dimension, not the config echo.

OpenVikingConfigSingleton.instance().embedding.dimension is still derived from config, so this test stays green even if the factory drops dimension or the active embedder/index schema falls back to a different size. Please inspect the created embedder or the index schema directly here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_gemini_openviking_it.py` around lines 149 - 151, The
test currently asserts OpenVikingConfigSingleton.instance().embedding.dimension
(a config echo) so change it to assert the runtime embedder/index schema
dimension instead: after creating the embedder or index in the test, obtain the
actual runtime object (e.g. the created embedder instance or the index schema
from the component under test) and assert its reported dimension equals dim
rather than checking OpenVikingConfigSingleton.instance().embedding.dimension;
locate uses of OpenVikingConfigSingleton.instance() in the test and replace the
assertion with a check against the embedder.dimension or index.schema.dimension
on the actual created object.
🧹 Nitpick comments (1)
tests/integration/conftest.py (1)

173-178: Close the module-scoped Gemini client in fixture teardown.

This fixture creates a real GeminiDenseEmbedder and never calls close(), so the underlying HTTP client stays open for the life of the test process. Switching to yield keeps the live suite cleaner and less flaky.

♻️ Suggested change
 `@pytest.fixture`(scope="module", params=GEMINI_MODELS_FIXTURE)
 def gemini_embedder(request):
     """Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models."""
     from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder
     model_name, _, _ = request.param
-    return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
+    embedder = GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
+    yield embedder
+    embedder.close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/conftest.py` around lines 173 - 178, The gemini_embedder
fixture creates a GeminiDenseEmbedder but never closes it; change the
module-scoped fixture "gemini_embedder" to use yield instead of return:
instantiate GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY,
dimension=768), yield the instance, and after yield call instance.close() in the
teardown so the underlying HTTP client is properly closed; reference the fixture
name gemini_embedder and the class GeminiDenseEmbedder to locate the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 357-368: The builder for ("gemini", "dense") currently constructs
GeminiDenseEmbedder without passing EmbeddingConfig.max_concurrent, so changes
to cfg.max_concurrent have no effect; update the factory lambda that builds
GeminiDenseEmbedder to include "max_concurrent": cfg.max_concurrent (or the
appropriate cfg field name) in the kwargs passed to GeminiDenseEmbedder so the
embedder receives and uses the configured concurrency limit. Ensure the key name
matches the GeminiDenseEmbedder constructor parameter and keep the rest of the
existing keys ("model_name", "api_key", "dimension", "task_type") intact.

In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 140-145: The class currently sets a per-model token ceiling in
_token_limit but never enforces it before calling the Gemini SDK; update the
request paths (the embed() and embed_batch() methods in the same class, which
construct requests using self._embed_config and send contents to Gemini) to
validate and enforce self._token_limit: for single-item embed(), check the token
count of the input and either chunk, truncate, or raise a clear local error when
it exceeds _token_limit; for embed_batch(), perform the same per-item check and
also split oversized items into multiple requests or reject with a descriptive
exception. Ensure the enforcement uses the existing _token_limit (fallbacking to
_DEFAULT_TOKEN_LIMIT) and happens before any network call so oversized payloads
never go over the wire.

---

Duplicate comments:
In `@tests/integration/test_gemini_openviking_it.py`:
- Around line 185-189: The test creates a session object (session =
client.session(session_id="gemini_it_session") and adds a message, but then
calls client.find(...) without scoping to that session; update the find call to
use the created session (pass the session object or its session_id) so the
search is performed against the session-scoped data (e.g., client.find(...,
session=session) or client.find(..., session_id="gemini_it_session") instead of
the current call to client.find(query="pytest testing tool")).
- Around line 149-151: The test currently asserts
OpenVikingConfigSingleton.instance().embedding.dimension (a config echo) so
change it to assert the runtime embedder/index schema dimension instead: after
creating the embedder or index in the test, obtain the actual runtime object
(e.g. the created embedder instance or the index schema from the component under
test) and assert its reported dimension equals dim rather than checking
OpenVikingConfigSingleton.instance().embedding.dimension; locate uses of
OpenVikingConfigSingleton.instance() in the test and replace the assertion with
a check against the embedder.dimension or index.schema.dimension on the actual
created object.

---

Nitpick comments:
In `@tests/integration/conftest.py`:
- Around line 173-178: The gemini_embedder fixture creates a GeminiDenseEmbedder
but never closes it; change the module-scoped fixture "gemini_embedder" to use
yield instead of return: instantiate GeminiDenseEmbedder(model_name,
api_key=GOOGLE_API_KEY, dimension=768), yield the instance, and after yield call
instance.close() in the teardown so the underlying HTTP client is properly
closed; reference the fixture name gemini_embedder and the class
GeminiDenseEmbedder to locate the code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4ee682f8-b2b7-4274-8e24-320796a70f6e

📥 Commits

Reviewing files that changed from the base of the PR and between 4fb7452 and 7d10057.

📒 Files selected for processing (5)
  • openviking/models/embedder/gemini_embedders.py
  • openviking_cli/utils/config/embedding_config.py
  • tests/integration/conftest.py
  • tests/integration/test_gemini_embedding_it.py
  • tests/integration/test_gemini_openviking_it.py

Comment on lines +357 to +368
("gemini", "dense"): (
GeminiDenseEmbedder,
lambda cfg: {
"model_name": cfg.model,
"api_key": cfg.api_key,
"dimension": cfg.dimension,
"task_type": (
cfg.query_param if context == "query" and cfg.query_param
else cfg.document_param if context == "document" and cfg.document_param
else cfg.task_type
),
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

max_concurrent never reaches Gemini’s async batch limiter.

The config model exposes EmbeddingConfig.max_concurrent, but this builder always instantiates GeminiDenseEmbedder with its constructor default. Changing the config will not actually throttle or widen Gemini batch concurrency.

♻️ Suggested wiring
             ("gemini", "dense"): (
                 GeminiDenseEmbedder,
                 lambda cfg: {
                     "model_name": cfg.model,
                     "api_key": cfg.api_key,
                     "dimension": cfg.dimension,
+                    "max_concurrent_batches": self.max_concurrent,
                     "task_type": (
                         cfg.query_param if context == "query" and cfg.query_param
                         else cfg.document_param if context == "document" and cfg.document_param
                         else cfg.task_type
                     ),
                 },
             ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
("gemini", "dense"): (
GeminiDenseEmbedder,
lambda cfg: {
"model_name": cfg.model,
"api_key": cfg.api_key,
"dimension": cfg.dimension,
"task_type": (
cfg.query_param if context == "query" and cfg.query_param
else cfg.document_param if context == "document" and cfg.document_param
else cfg.task_type
),
},
("gemini", "dense"): (
GeminiDenseEmbedder,
lambda cfg: {
"model_name": cfg.model,
"api_key": cfg.api_key,
"dimension": cfg.dimension,
"max_concurrent_batches": self.max_concurrent,
"task_type": (
cfg.query_param if context == "query" and cfg.query_param
else cfg.document_param if context == "document" and cfg.document_param
else cfg.task_type
),
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking_cli/utils/config/embedding_config.py` around lines 357 - 368, The
builder for ("gemini", "dense") currently constructs GeminiDenseEmbedder without
passing EmbeddingConfig.max_concurrent, so changes to cfg.max_concurrent have no
effect; update the factory lambda that builds GeminiDenseEmbedder to include
"max_concurrent": cfg.max_concurrent (or the appropriate cfg field name) in the
kwargs passed to GeminiDenseEmbedder so the embedder receives and uses the
configured concurrency limit. Ensure the key name matches the
GeminiDenseEmbedder constructor parameter and keep the rest of the existing keys
("model_name", "api_key", "dimension", "task_type") intact.

Comment on lines +140 to +145
self._token_limit = _MODEL_TOKEN_LIMITS.get(model_name, _DEFAULT_TOKEN_LIMIT)
self._max_concurrent_batches = max_concurrent_batches
config_kwargs: Dict[str, Any] = {"output_dimensionality": self._dimension}
if self.task_type:
config_kwargs["task_type"] = self.task_type
self._embed_config = types.EmbedContentConfig(**config_kwargs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce the configured token limit before hitting Gemini.

Line 140 stores the per-model ceiling, but none of the request paths use it before sending contents to the SDK. Direct embed() / embed_batch() calls on oversized inputs will still go over the wire and fail remotely instead of being chunked, truncated, or rejected locally with a clear error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking/models/embedder/gemini_embedders.py` around lines 140 - 145, The
class currently sets a per-model token ceiling in _token_limit but never
enforces it before calling the Gemini SDK; update the request paths (the embed()
and embed_batch() methods in the same class, which construct requests using
self._embed_config and send contents to Gemini) to validate and enforce
self._token_limit: for single-item embed(), check the token count of the input
and either chunk, truncate, or raise a clear local error when it exceeds
_token_limit; for embed_batch(), perform the same per-item check and also split
oversized items into multiple requests or reject with a descriptive exception.
Ensure the enforcement uses the existing _token_limit (fallbacking to
_DEFAULT_TOKEN_LIMIT) and happens before any network call so oversized payloads
never go over the wire.

@chethanuk chethanuk force-pushed the feat/gemini-landing branch from df86558 to 148f6e3 Compare March 18, 2026 22:27
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
openviking_cli/utils/config/embedding_config.py (1)

278-291: ⚠️ Potential issue | 🟠 Major

Wire input_type into the OpenAI dense factory.

EmbeddingModelConfig.input_type is added and normalized above, but this builder never forwards it to OpenAIDenseEmbedder. In the symmetric OpenAI path that makes input_type a silent no-op, so configs cannot actually force query / document / passage behavior here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking_cli/utils/config/embedding_config.py` around lines 278 - 291, The
mapping for ("openai","dense") doesn't forward EmbeddingModelConfig.input_type
to OpenAIDenseEmbedder, so input_type is ignored; update the lambda that builds
kwargs for OpenAIDenseEmbedder to include "input_type": cfg.input_type (or
normalized property name if different) alongside model_name, api_key, api_base,
dimension, etc., so OpenAIDenseEmbedder receives and can act on the configured
input_type; locate the dict in the ("openai","dense") branch where
OpenAIDenseEmbedder and the lambda are returned and add the input_type key to
that kwargs dictionary.
♻️ Duplicate comments (3)
tests/integration/test_gemini_openviking_it.py (2)

205-209: ⚠️ Potential issue | 🟠 Major

The session setup is still outside the asserted behavior.

client.find() on Lines 208-209 does not consume the session or the message added on Line 206, so this test only covers plain embedding search. Use the session-aware retrieval path here or drop the unused session scaffolding.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_gemini_openviking_it.py` around lines 205 - 209, The
test creates a session via client.session(...) and adds a message with
session.add_message(...) but then calls client.find(...) without using that
session; either remove the unused session setup or switch to the session-aware
retrieval API. Fix by either deleting the session/session.add_message lines, or
call the session-aware finder (e.g., pass the session into the find call like
client.find(session=session, query="pytest testing tool") or use whatever
session.find(...) helper exists) so the added message is actually consumed by
the retrieval assertion.

163-168: ⚠️ Potential issue | 🟠 Major

This still only verifies the configured dimension, not the runtime one.

OpenVikingConfigSingleton.instance().embedding.dimension on Lines 166-168 just re-reads the requested config value. The test still goes green if the live embedder or index schema later falls back to 3072, so please assert the runtime dimension instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_gemini_openviking_it.py` around lines 163 - 168,
Replace the config-only assertion with a runtime check: after creating client
via make_ov_client(...), inspect the actual embedder/index created by the client
(e.g., client.embedder.dimension or client._embedder.dimension, or the index
schema such as client.index.schema.embedding_dimension) and assert that that
runtime dimension equals dim instead of using
OpenVikingConfigSingleton.instance().embedding.dimension; update the test to
pull the dimension from the live client/embedder/index and assert equality to
dim.
openviking_cli/utils/config/embedding_config.py (1)

370-383: ⚠️ Potential issue | 🟠 Major

Gemini still ignores the configured concurrency limit.

EmbeddingConfig.max_concurrent on Line 225 is the public throttle, but this lambda always instantiates GeminiDenseEmbedder with its constructor default. Any override is ignored, which makes rate-limit tuning and throughput control ineffective.

♻️ Suggested change
             ("gemini", "dense"): (
                 GeminiDenseEmbedder,
                 lambda cfg: {
                     "model_name": cfg.model,
                     "api_key": cfg.api_key,
                     "dimension": cfg.dimension,
+                    "max_concurrent_batches": self.max_concurrent,
                     "task_type": (
                         cfg.query_param
                         if context == "query" and cfg.query_param
                         else cfg.document_param
                         if context == "document" and cfg.document_param
                         else cfg.task_type
                     ),
                 },
             ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking_cli/utils/config/embedding_config.py` around lines 370 - 383, The
mapping for ("gemini", "dense") ignores EmbeddingConfig.max_concurrent by not
passing it into GeminiDenseEmbedder; update the lambda that constructs
GeminiDenseEmbedder to include the concurrency override (use cfg.max_concurrent
or EmbeddingConfig.max_concurrent as exposed on cfg) in the args passed to
GeminiDenseEmbedder so the instance respects the configured throttle, e.g., add
a "max_concurrent" (or the exact constructor param name used by
GeminiDenseEmbedder) entry to the returned dict alongside model_name, api_key,
dimension and task_type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/conftest.py`:
- Around line 182-188: The fixture gemini_embedder returns a live
GeminiDenseEmbedder instance but never closes it; change the fixture to use a
yield-based teardown so after yielding the GeminiDenseEmbedder (created in
gemini_embedder) you call its close method (e.g., embedder.close()) in the
teardown to release SDK clients/sockets; locate the fixture function
gemini_embedder and replace the direct return with yield and a post-yield
cleanup that calls GeminiDenseEmbedder.close() (or the appropriate shutdown
method on that class) to ensure the client is closed for each parametrized
module.

In `@tests/integration/test_gemini_e2e.py`:
- Around line 83-90: Remove the hard latency assertion that flakes against the
live Gemini API: delete or guard the lines that record t0, compute elapsed, and
assert elapsed < 15 (related to variables t0 and elapsed) in the
GeminiDenseEmbedder async_embed_batch test; keep the functional assertions
(len(results) == 300 and vector length checks). If you want to keep a timing
check, move it to a separate perf/benchmark test or gate it behind an opt-in env
flag (e.g., RUN_PERF) so the integration test only verifies correctness and not
wall-clock SLA.

---

Outside diff comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 278-291: The mapping for ("openai","dense") doesn't forward
EmbeddingModelConfig.input_type to OpenAIDenseEmbedder, so input_type is
ignored; update the lambda that builds kwargs for OpenAIDenseEmbedder to include
"input_type": cfg.input_type (or normalized property name if different)
alongside model_name, api_key, api_base, dimension, etc., so OpenAIDenseEmbedder
receives and can act on the configured input_type; locate the dict in the
("openai","dense") branch where OpenAIDenseEmbedder and the lambda are returned
and add the input_type key to that kwargs dictionary.

---

Duplicate comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 370-383: The mapping for ("gemini", "dense") ignores
EmbeddingConfig.max_concurrent by not passing it into GeminiDenseEmbedder;
update the lambda that constructs GeminiDenseEmbedder to include the concurrency
override (use cfg.max_concurrent or EmbeddingConfig.max_concurrent as exposed on
cfg) in the args passed to GeminiDenseEmbedder so the instance respects the
configured throttle, e.g., add a "max_concurrent" (or the exact constructor
param name used by GeminiDenseEmbedder) entry to the returned dict alongside
model_name, api_key, dimension and task_type.

In `@tests/integration/test_gemini_openviking_it.py`:
- Around line 205-209: The test creates a session via client.session(...) and
adds a message with session.add_message(...) but then calls client.find(...)
without using that session; either remove the unused session setup or switch to
the session-aware retrieval API. Fix by either deleting the
session/session.add_message lines, or call the session-aware finder (e.g., pass
the session into the find call like client.find(session=session, query="pytest
testing tool") or use whatever session.find(...) helper exists) so the added
message is actually consumed by the retrieval assertion.
- Around line 163-168: Replace the config-only assertion with a runtime check:
after creating client via make_ov_client(...), inspect the actual embedder/index
created by the client (e.g., client.embedder.dimension or
client._embedder.dimension, or the index schema such as
client.index.schema.embedding_dimension) and assert that that runtime dimension
equals dim instead of using
OpenVikingConfigSingleton.instance().embedding.dimension; update the test to
pull the dimension from the live client/embedder/index and assert equality to
dim.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0b79044-2113-46c2-9d65-dd5b53901d07

📥 Commits

Reviewing files that changed from the base of the PR and between 7d10057 and df86558.

📒 Files selected for processing (8)
  • openviking/models/embedder/gemini_embedders.py
  • openviking_cli/utils/config/embedding_config.py
  • tests/integration/conftest.py
  • tests/integration/test_gemini_e2e.py
  • tests/integration/test_gemini_embedding_it.py
  • tests/integration/test_gemini_openviking_it.py
  • tests/unit/test_embedding_config_gemini.py
  • tests/unit/test_gemini_embedder.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/unit/test_gemini_embedder.py
  • openviking/models/embedder/gemini_embedders.py
  • tests/unit/test_embedding_config_gemini.py

Comment on lines +182 to +188
@pytest.fixture(scope="module", params=GEMINI_MODELS_FIXTURE)
def gemini_embedder(request):
"""Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models."""
from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder

model_name, _, _ = request.param
return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tear down the module-scoped Gemini client.

This fixture returns a live GeminiDenseEmbedder but never closes it. Because it is parametrized over models and reused across tests, the suite can accumulate open SDK clients/sockets until process exit.

♻️ Suggested change
 `@pytest.fixture`(scope="module", params=GEMINI_MODELS_FIXTURE)
 def gemini_embedder(request):
     """Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models."""
     from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder
 
     model_name, _, _ = request.param
-    return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
+    embedder = GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
+    try:
+        yield embedder
+    finally:
+        embedder.close()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.fixture(scope="module", params=GEMINI_MODELS_FIXTURE)
def gemini_embedder(request):
"""Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models."""
from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder
model_name, _, _ = request.param
return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
`@pytest.fixture`(scope="module", params=GEMINI_MODELS_FIXTURE)
def gemini_embedder(request):
"""Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models."""
from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder
model_name, _, _ = request.param
embedder = GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
try:
yield embedder
finally:
embedder.close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/conftest.py` around lines 182 - 188, The fixture
gemini_embedder returns a live GeminiDenseEmbedder instance but never closes it;
change the fixture to use a yield-based teardown so after yielding the
GeminiDenseEmbedder (created in gemini_embedder) you call its close method
(e.g., embedder.close()) in the teardown to release SDK clients/sockets; locate
the fixture function gemini_embedder and replace the direct return with yield
and a post-yield cleanup that calls GeminiDenseEmbedder.close() (or the
appropriate shutdown method on that class) to ensure the client is closed for
each parametrized module.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
tests/integration/test_gemini_e2e.py (1)

81-90: ⚠️ Potential issue | 🟠 Major

Remove the hard latency SLA from the real-API test.

elapsed < 15 will flap with runner load, network jitter, or transient throttling even when batching is correct. Keep the functional assertions here and move timing checks to an opt-in perf/benchmark test.

♻️ Suggested fix
-        import time
-
         e = GeminiDenseEmbedder("gemini-embedding-2-preview", api_key=GOOGLE_API_KEY, dimension=128)
         texts = [f"sentence {i}" for i in range(300)]  # 3 batches of 100
-        t0 = time.monotonic()
         results = await e.async_embed_batch(texts)
-        elapsed = time.monotonic() - t0
         assert len(results) == 300
         assert all(len(r.dense_vector) == 128 for r in results)
-        assert elapsed < 15  # concurrent should be << 3× serial RTT
         e.close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_gemini_e2e.py` around lines 81 - 90, The test is
asserting a hard latency SLA which can flap; remove the timing assertion and
keep only the functional checks: instantiate GeminiDenseEmbedder, call
async_embed_batch, assert len(results) == 300 and each r.dense_vector has length
128, and drop the elapsed < 15 assertion (or move any timing assertions into a
separate, opt-in perf/benchmark test). Locate the timing logic around
GeminiDenseEmbedder and async_embed_batch in the test and delete the elapsed
measurement assertion while preserving the setup and functional assertions.
pyproject.toml (1)

82-89: ⚠️ Potential issue | 🟠 Major

pytest-asyncio auto mode still conflicts with the new AnyIO tests.

Line 212 still sets asyncio_mode = "auto", while this PR adds anyio support and @pytest.mark.anyio tests. AnyIO’s testing docs explicitly call out pytest-asyncio auto mode as incompatible with the AnyIO plugin, and pytest-asyncio documents strict mode as the coexistence mode when other async test plugins are in play. Either switch to strict mode and mark asyncio tests explicitly, or drop the AnyIO markers. (anyio.readthedocs.io)

Also applies to: 214-216

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 82 - 89, The test configuration currently sets
pytest-asyncio asyncio_mode = "auto" while the PR adds AnyIO-marked tests
(`@pytest.mark.anyio`), which conflicts; update the test config to use
asyncio_mode = "strict" (so pytest-asyncio coexists with the AnyIO plugin) and
then explicitly mark asyncio tests with `@pytest.mark.asyncio` where needed, or
alternatively remove the AnyIO markers and keep auto mode—change the
asyncio_mode setting and adjust test decorators (`@pytest.mark.anyio` /
`@pytest.mark.asyncio`) accordingly.
openviking/models/embedder/gemini_embedders.py (1)

143-143: ⚠️ Potential issue | 🟠 Major

_token_limit is still never enforced before Gemini calls.

Line 143 records the per-model ceiling, but embed(), embed_batch(), and async_embed_batch() never check it before sending contents to the SDK. Oversized inputs will still fail remotely instead of being chunked, truncated, or rejected locally with a clear error.

Also applies to: 150-160, 165-184, 223-225

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking/models/embedder/gemini_embedders.py` at line 143, The per-model
ceiling stored in self._token_limit is never enforced before calling Gemini;
update embed, embed_batch, and async_embed_batch to validate inputs against
self._token_limit (locations around the current embed() implementation and the
batch methods referenced) by tokenizing each content (use your existing
tokenizer utility) and either chunking long inputs into acceptable-sized pieces,
truncating with a clear warning, or raising a descriptive local error instead of
sending oversized payloads to the SDK; ensure the chosen behavior is applied
consistently in embed(), embed_batch(), and async_embed_batch(), and reference
_MODEL_TOKEN_LIMITS/_DEFAULT_TOKEN_LIMIT where needed to determine limits.
tests/integration/conftest.py (1)

182-188: ⚠️ Potential issue | 🟡 Minor

Close the module-scoped Gemini fixture.

gemini_embedder returns a live GeminiDenseEmbedder but never closes it. With multiple parametrized models, the suite can keep SDK clients/sockets open until process exit.

♻️ Suggested fix
 `@pytest.fixture`(scope="module", params=GEMINI_MODELS_FIXTURE)
 def gemini_embedder(request):
     """Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models."""
     from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder
 
     model_name, _, _ = request.param
-    return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
+    embedder = GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
+    try:
+        yield embedder
+    finally:
+        embedder.close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/conftest.py` around lines 182 - 188, The fixture
gemini_embedder returns a GeminiDenseEmbedder without closing it; change the
fixture to assign the instance to a local variable, register a teardown via
request.addfinalizer that calls the embedder's cleanup method (e.g.,
embedder.close() or embedder.shutdown(), or await embedder.aclose() if async)
and then return the instance; reference the fixture name gemini_embedder and the
class GeminiDenseEmbedder so the finalizer ensures SDK clients/sockets are
closed after each module param.
openviking_cli/utils/config/embedding_config.py (1)

225-227: ⚠️ Potential issue | 🟡 Minor

EmbeddingConfig.max_concurrent still does nothing for Gemini.

Lines 225-227 expose the limiter, but the Gemini factory never passes it as max_concurrent_batches, so tuning max_concurrent has no effect on async_embed_batch().

♻️ Suggested fix
             ("gemini", "dense"): (
                 GeminiDenseEmbedder,
                 lambda cfg: {
                     "model_name": cfg.model,
                     "api_key": cfg.api_key,
                     "dimension": cfg.dimension,
+                    "max_concurrent_batches": self.max_concurrent,
                     "task_type": (
                         cfg.query_param
                         if context == "query" and cfg.query_param
                         else cfg.document_param
                         if context == "document" and cfg.document_param

Also applies to: 370-383

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking_cli/utils/config/embedding_config.py` around lines 225 - 227,
EmbeddingConfig.max_concurrent is defined but not forwarded to Gemini's batching
limiter; update the Gemini embedding provider/factory to pass
EmbeddingConfig.max_concurrent as the max_concurrent_batches argument so
async_embed_batch actually uses it. Locate where the Gemini provider/factory
(the function/class that instantiates the Gemini embeddings client) constructs
or calls async_embed_batch and modify the call/constructor to accept and forward
embedding_config.max_concurrent (also update the second occurrence around the
other factory/initialization block noted in the review). Ensure the parameter
name matches the Gemini client API (max_concurrent_batches) and propagate the
value from EmbeddingConfig.max_concurrent through the factory to the
async_embed_batch invocations.
tests/integration/test_gemini_openviking_it.py (1)

163-168: ⚠️ Potential issue | 🟠 Major

This still checks config, not the active embedding dimension.

OpenVikingConfigSingleton.get_instance().embedding.dimension just echoes the configured value. If the Gemini factory silently falls back to 3072 and both indexing/search use that default, this test still passes, so it won't catch the regression it is meant to guard against. Assert the live embedder or index schema dimension instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_gemini_openviking_it.py` around lines 163 - 168, The
test is asserting the configured dimension via OpenVikingConfigSingleton rather
than the runtime/embedder/index dimension, so replace the config check with an
assertion against the live component returned by make_ov_client: after creating
client = await make_ov_client(...), query the actual embedder or index schema on
that client (e.g., inspect client.embedder.dimension or
client.index.schema.dimension or similar runtime property provided by the
OpenViking client) and assert it equals dim to ensure the active embedder/index
uses the expected dimension instead of relying on the config value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 453-483: The live embedder built by get_embedder() bypasses
Gemini/OpenAI per-context routing; modify get_embedder() so when self.dense and
self.dense.provider (e.g., "gemini" or "openai") exist and
query_param/document_param are present it returns a context-aware wrapper (or
factory callable/object) that accepts a context flag (e.g., is_query or context)
and delegates to _create_embedder(...) with effective_context similar to
_get_contextual_embedder logic; reference get_embedder(),
_get_contextual_embedder(), get_query_embedder(), get_document_embedder(), and
_create_embedder() to locate and implement this wrapper so calls that currently
use get_embedder() (and later invoke embed(..., is_query=...)) will route to the
correct task_type/params.

In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 216-240: async_embed_batch currently sends raw batch items to
client.aio.models.embed_content which breaks parity with embed_batch's handling
of empty/whitespace inputs; change the logic in _embed_one so you first split
batch into non-empty texts (trimmed), keep their original positions, call
client.aio.models.embed_content only with the non-empty list, then reconstruct
results[idx] by inserting zero vectors (use truncate_and_normalize on a zero
vector of length _dimension) for any empty inputs and the normalized embeddings
from the API response for non-empty positions; in the except block do the same
reconstruction but obtain embeddings for non-empty texts via await
anyio.to_thread.run_sync(self.embed, text) and still insert zero vectors for
empties so a single empty item doesn't force the whole chunk to become slow or
change semantics compared to embed_batch.

---

Duplicate comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 225-227: EmbeddingConfig.max_concurrent is defined but not
forwarded to Gemini's batching limiter; update the Gemini embedding
provider/factory to pass EmbeddingConfig.max_concurrent as the
max_concurrent_batches argument so async_embed_batch actually uses it. Locate
where the Gemini provider/factory (the function/class that instantiates the
Gemini embeddings client) constructs or calls async_embed_batch and modify the
call/constructor to accept and forward embedding_config.max_concurrent (also
update the second occurrence around the other factory/initialization block noted
in the review). Ensure the parameter name matches the Gemini client API
(max_concurrent_batches) and propagate the value from
EmbeddingConfig.max_concurrent through the factory to the async_embed_batch
invocations.

In `@openviking/models/embedder/gemini_embedders.py`:
- Line 143: The per-model ceiling stored in self._token_limit is never enforced
before calling Gemini; update embed, embed_batch, and async_embed_batch to
validate inputs against self._token_limit (locations around the current embed()
implementation and the batch methods referenced) by tokenizing each content (use
your existing tokenizer utility) and either chunking long inputs into
acceptable-sized pieces, truncating with a clear warning, or raising a
descriptive local error instead of sending oversized payloads to the SDK; ensure
the chosen behavior is applied consistently in embed(), embed_batch(), and
async_embed_batch(), and reference _MODEL_TOKEN_LIMITS/_DEFAULT_TOKEN_LIMIT
where needed to determine limits.

In `@pyproject.toml`:
- Around line 82-89: The test configuration currently sets pytest-asyncio
asyncio_mode = "auto" while the PR adds AnyIO-marked tests (`@pytest.mark.anyio`),
which conflicts; update the test config to use asyncio_mode = "strict" (so
pytest-asyncio coexists with the AnyIO plugin) and then explicitly mark asyncio
tests with `@pytest.mark.asyncio` where needed, or alternatively remove the AnyIO
markers and keep auto mode—change the asyncio_mode setting and adjust test
decorators (`@pytest.mark.anyio` / `@pytest.mark.asyncio`) accordingly.

In `@tests/integration/conftest.py`:
- Around line 182-188: The fixture gemini_embedder returns a GeminiDenseEmbedder
without closing it; change the fixture to assign the instance to a local
variable, register a teardown via request.addfinalizer that calls the embedder's
cleanup method (e.g., embedder.close() or embedder.shutdown(), or await
embedder.aclose() if async) and then return the instance; reference the fixture
name gemini_embedder and the class GeminiDenseEmbedder so the finalizer ensures
SDK clients/sockets are closed after each module param.

In `@tests/integration/test_gemini_e2e.py`:
- Around line 81-90: The test is asserting a hard latency SLA which can flap;
remove the timing assertion and keep only the functional checks: instantiate
GeminiDenseEmbedder, call async_embed_batch, assert len(results) == 300 and each
r.dense_vector has length 128, and drop the elapsed < 15 assertion (or move any
timing assertions into a separate, opt-in perf/benchmark test). Locate the
timing logic around GeminiDenseEmbedder and async_embed_batch in the test and
delete the elapsed measurement assertion while preserving the setup and
functional assertions.

In `@tests/integration/test_gemini_openviking_it.py`:
- Around line 163-168: The test is asserting the configured dimension via
OpenVikingConfigSingleton rather than the runtime/embedder/index dimension, so
replace the config check with an assertion against the live component returned
by make_ov_client: after creating client = await make_ov_client(...), query the
actual embedder or index schema on that client (e.g., inspect
client.embedder.dimension or client.index.schema.dimension or similar runtime
property provided by the OpenViking client) and assert it equals dim to ensure
the active embedder/index uses the expected dimension instead of relying on the
config value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 72c90a07-7cf7-45f5-a5db-b5200720e42b

📥 Commits

Reviewing files that changed from the base of the PR and between df86558 and 3f0b33f.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • openviking/models/embedder/gemini_embedders.py
  • openviking_cli/utils/config/embedding_config.py
  • pyproject.toml
  • tests/integration/conftest.py
  • tests/integration/test_gemini_e2e.py
  • tests/integration/test_gemini_embedding_it.py
  • tests/integration/test_gemini_openviking_it.py
  • tests/unit/test_embedding_config_gemini.py
  • tests/unit/test_gemini_embedder.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/test_embedding_config_gemini.py

Comment on lines +453 to +483
def get_query_embedder(self):
"""Get embedder instance for query embeddings."""
return self._get_contextual_embedder("query")

def get_document_embedder(self):
"""Get embedder instance for document/passage embeddings."""
return self._get_contextual_embedder("document")

def _get_contextual_embedder(self, context: str):
if not self.dense:
return self.get_embedder()

provider = (self.dense.provider or "").lower()
if provider == "openai":
non_symmetric = (
self.dense.query_param is not None or self.dense.document_param is not None
)
effective_context = context if non_symmetric else None
return self._create_embedder(provider, "dense", self.dense, context=effective_context)

if provider == "jina":
return self._create_embedder(provider, "dense", self.dense, context=context)

if provider == "gemini":
non_symmetric = (
self.dense.query_param is not None or self.dense.document_param is not None
)
effective_context = context if non_symmetric else None
return self._create_embedder(provider, "dense", self.dense, context=effective_context)

return self.get_embedder()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

get_embedder() still bypasses Gemini's query/document routing.

These helpers only work when callers opt into them, but openviking/service/core.py:1-50 still builds the live embedder via config.embedding.get_embedder(). That path creates a single Gemini embedder with a fixed task_type, and openviking/retrieve/hierarchical_retriever.py:1-50 later calls embed(..., is_query=True) on that same instance. So query_param / document_param never affect normal indexing vs retrieval unless the service layer switches to these helpers or get_embedder() returns a context-aware wrapper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking_cli/utils/config/embedding_config.py` around lines 453 - 483, The
live embedder built by get_embedder() bypasses Gemini/OpenAI per-context
routing; modify get_embedder() so when self.dense and self.dense.provider (e.g.,
"gemini" or "openai") exist and query_param/document_param are present it
returns a context-aware wrapper (or factory callable/object) that accepts a
context flag (e.g., is_query or context) and delegates to _create_embedder(...)
with effective_context similar to _get_contextual_embedder logic; reference
get_embedder(), _get_contextual_embedder(), get_query_embedder(),
get_document_embedder(), and _create_embedder() to locate and implement this
wrapper so calls that currently use get_embedder() (and later invoke embed(...,
is_query=...)) will route to the correct task_type/params.

Comment on lines +216 to +240
batches = [texts[i : i + _TEXT_BATCH_SIZE] for i in range(0, len(texts), _TEXT_BATCH_SIZE)]
results: List[Optional[List[EmbedResult]]] = [None] * len(batches)
sem = anyio.Semaphore(self._max_concurrent_batches)

async def _embed_one(idx: int, batch: List[str]) -> None:
async with sem:
try:
response = await self.client.aio.models.embed_content(
model=self.model_name, contents=batch, config=self._embed_config
)
results[idx] = [
EmbedResult(
dense_vector=truncate_and_normalize(list(emb.values), self._dimension)
)
for emb in response.embeddings
]
except (APIError, ClientError) as e:
logger.warning(
"Gemini async batch embed failed (HTTP %d) for batch of %d, falling back",
e.code,
len(batch),
)
results[idx] = [
await anyio.to_thread.run_sync(self.embed, text) for text in batch
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make async_embed_batch() mirror the sync blank-input behavior.

embed_batch() strips empty/whitespace strings and injects zero vectors, but async_embed_batch() sends the raw batch to client.aio.models.embed_content(). That changes semantics for blank inputs and lets a single empty item force the whole chunk into slow fallback mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking/models/embedder/gemini_embedders.py` around lines 216 - 240,
async_embed_batch currently sends raw batch items to
client.aio.models.embed_content which breaks parity with embed_batch's handling
of empty/whitespace inputs; change the logic in _embed_one so you first split
batch into non-empty texts (trimmed), keep their original positions, call
client.aio.models.embed_content only with the non-empty list, then reconstruct
results[idx] by inserting zero vectors (use truncate_and_normalize on a zero
vector of length _dimension) for any empty inputs and the normalized embeddings
from the API response for non-empty positions; in the except block do the same
reconstruction but obtain embeddings for non-empty texts via await
anyio.to_thread.run_sync(self.embed, text) and still insert zero vectors for
empties so a single empty item doesn't force the whole chunk to become slow or
change semantics compared to embed_batch.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
openviking/models/embedder/gemini_embedders.py (1)

311-316: close() relies on SDK private attribute _http_client.

Accessing self.client._http_client is fragile since _http_client is an internal implementation detail of the google-genai SDK and may change or be removed without notice.

💡 Consider a more defensive approach
     def close(self):
-        if hasattr(self.client, "_http_client"):
-            try:
-                self.client._http_client.close()
-            except Exception:
-                pass
+        # The google-genai SDK manages its own connection lifecycle.
+        # If explicit cleanup is needed in the future, check SDK docs for a public API.
+        pass

Alternatively, if cleanup is critical, document that this relies on SDK internals and may need updates when upgrading the SDK version.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openviking/models/embedder/gemini_embedders.py` around lines 311 - 316, The
close() method is accessing the private SDK attribute self.client._http_client
which is fragile; update close() to avoid relying on that internal name by first
checking for public/official cleanup methods (e.g., a close/async_close method
or transport attribute) via getattr/hasattr on self.client and invoking those if
present, otherwise fall back to a safe getattr(self.client, "_http_client",
None) check and then calling close on it inside a try/except; also add a short
comment documenting the fallback to the private attribute so future maintainers
know this is an SDK-internal fallback that may need updating when upgrading
google-genai.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 311-316: The close() method is accessing the private SDK attribute
self.client._http_client which is fragile; update close() to avoid relying on
that internal name by first checking for public/official cleanup methods (e.g.,
a close/async_close method or transport attribute) via getattr/hasattr on
self.client and invoking those if present, otherwise fall back to a safe
getattr(self.client, "_http_client", None) check and then calling close on it
inside a try/except; also add a short comment documenting the fallback to the
private attribute so future maintainers know this is an SDK-internal fallback
that may need updating when upgrading google-genai.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0d15807f-797f-4cef-8d86-8542b28b6467

📥 Commits

Reviewing files that changed from the base of the PR and between 3f0b33f and 1c20967.

📒 Files selected for processing (2)
  • openviking/models/embedder/gemini_embedders.py
  • tests/unit/test_gemini_embedder.py

@chethanuk chethanuk force-pushed the feat/gemini-landing branch 2 times, most recently from 1586f65 to 893a1da Compare March 19, 2026 13:00
qin-ctx and others added 4 commits March 20, 2026 16:24
…cengine#816)

pull_request event blocks fork PRs behind manual approval and denies
access to secrets, so DOUBAO_API_KEY is unavailable and the review
silently fails. pull_request_target runs in the base-repo context,
giving the workflow access to secrets while still executing the
workflow file from main (not from the fork), which keeps the API key
safe.
Closes volcengine#724

Adds a complete Helm chart at deploy/helm/openviking/ with:
- Deployment with Recreate strategy (RocksDB single-writer)
- PVC for persistent data storage
- ConfigMap for ov.conf configuration
- Optional ingress support
- Health probes matching Dockerfile HEALTHCHECK
- Configurable resources, replicas, and all ov.conf settings
- Installation guide at deploy/helm/README.md
* feat: add Azure OpenAI support for embedding and VLM

Add `azure` as a first-class provider for both embedding models and VLM,
using the official `openai.AzureOpenAI` / `openai.AsyncAzureOpenAI` clients.

Changes:
- embedder: OpenAIDenseEmbedder now accepts `provider` and `api_version`
  params; when provider is "azure", initializes AzureOpenAI client with
  azure_endpoint and api_version
- VLM: OpenAIVLM switches between OpenAI and AzureOpenAI clients based
  on the provider field; VLMFactory maps "azure" to OpenAIVLM
- config: EmbeddingModelConfig and VLMConfig gain `api_version` field;
  embedding_config adds ("azure", "dense") factory entry with validation
- registry: add "azure" to VALID_PROVIDERS
- docs: update README_CN.md with Azure provider table entry, config
  template fields, usage examples, and full config sample

Made-with: Cursor

* refactor: address PR review — extract helper, add validation, shared constant

- Extract `_build_openai_client_kwargs()` helper in openai_vlm.py to
  eliminate duplicated Azure/OpenAI client construction across
  get_client() and get_async_client()
- Add `api_base` validation for Azure provider in VLM client methods
  (was missing, unlike the embedder which already validated it)
- Extract `DEFAULT_AZURE_API_VERSION` constant in registry.py and
  reference it from both embedder and VLM, so future Azure API version
  bumps only require a single change
- Note: removing the hardcoded `self.provider = "openai"` override in
  OpenAIVLM.__init__ (done in the previous commit) is an intentional
  bug fix — it allows VLMBase.provider to correctly reflect the
  configured value (e.g. "azure"), which fixes token usage tracking

Made-with: Cursor
…ngine#702 pattern

- GeminiDenseEmbedder: accept query_param/document_param, use is_query
  in embed() and embed_batch() to select task_type at call time
- EmbeddingConfig: add Gemini provider, factory, validation, dimension
- No get_query_embedder/get_document_embedder/_get_contextual_embedder
  (removed in volcengine#702; embed(is_query=True/False) is the pattern)
- Tests use embed(text, is_query=True/False) pattern throughout
- Rebased onto current upstream/main
@chethanuk chethanuk force-pushed the feat/gemini-landing branch from 98f7ae4 to fba32cf Compare March 20, 2026 09:12
…r CI

- Remove task_type from EmbeddingModelConfig (query_param/document_param suffice)
- Wrap GeminiDenseEmbedder import in try/except (google-genai is optional)
- Update tests for removed field
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants