feat(embedder): add GeminiDenseEmbedder text embedding provider#4
feat(embedder): add GeminiDenseEmbedder text embedding provider#4
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Google Gemini dense-embedding support: new GeminiDenseEmbedder implementation, wiring into embedding configuration/factory with context-aware query/document routing, example config and dependency updates, and extensive unit and integration tests (including async paths and real-Gemini integration). Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Config as EmbeddingConfig
participant Factory as EmbedderFactory
participant Gemini as GeminiDenseEmbedder
participant API as Google Gemini API
User->>Config: request embedder(provider="gemini", model, task_type?, context?)
Config->>Config: validate fields (api_key, task_type, dimension)
Config->>Factory: create(provider="gemini", type="dense", context)
Factory->>Gemini: instantiate(model, api_key, task_type, dimension, max_tokens, context)
Gemini->>API: init genai Client (api_key)
Factory-->>User: return embedder instance
User->>Gemini: embed(text) / embed_batch(texts)
Gemini->>Gemini: truncate/normalize inputs, chunk as needed
alt batch call success
Gemini->>API: batch embed request (task_type)
API-->>Gemini: vectors
else batch fails
loop per-item fallback
Gemini->>API: single embed request
API-->>Gemini: vector
end
end
Gemini->>Gemini: L2 normalize/truncate to target dimension
Gemini-->>User: return vectors
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the embedding capabilities by introducing native support for Google's Gemini Embedding models. It provides a robust and flexible integration, allowing users to leverage Gemini's dense embeddings with various task types and dimensions, including efficient asynchronous batch processing. The changes also refine the embedding configuration system to accommodate the new provider and its specific parameters, ensuring seamless integration into existing workflows. This enhancement broadens the range of available embedding providers, offering more choices and potentially improved performance for different use cases. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Failed to generate code suggestions for PR |
There was a problem hiding this comment.
Code Review
This pull request introduces the GeminiDenseEmbedder, significantly expanding the embedding provider options. The implementation includes robust error handling, efficient batching, and asynchronous capabilities using anyio. The configuration system has been updated to support Gemini-specific parameters and routing for non-symmetric embedding tasks. Comprehensive unit and integration tests have been added, covering various aspects like dimension handling, task type validation, batching, and error scenarios, ensuring the new functionality is well-tested and reliable. Overall, this is a well-executed feature addition.
| vector = _l2_normalize( | ||
| truncate_and_normalize(list(result.embeddings[0].values), self._dimension) |
There was a problem hiding this comment.
| truncate_and_normalize(list(emb.values), self._dimension) | ||
| ) |
| dense_vector=_l2_normalize( | ||
| truncate_and_normalize(list(emb.values), self._dimension) |
There was a problem hiding this comment.
| for key in ("input_type",): | ||
| value = data.get(key) | ||
| if isinstance(value, str): | ||
| data[key] = value.lower() |
There was a problem hiding this comment.
The sync_provider_backend method currently only lowercases input_type. However, task_type, query_param, and document_param for Gemini are expected to be uppercase by GeminiDenseEmbedder's validation. If a user provides these values in lowercase, it will lead to a ValueError. To ensure consistency and prevent runtime errors, these fields should also be lowercased (or uppercased) here, or the GeminiDenseEmbedder should handle case-insensitivity.
| for key in ("input_type",): | |
| value = data.get(key) | |
| if isinstance(value, str): | |
| data[key] = value.lower() | |
| for key in ("input_type", "query_param", "document_param", "task_type"): | |
| value = data.get(key) | |
| if isinstance(value, str): | |
| data[key] = value.upper() |
| _GEMINI_TASK_TYPES = { | ||
| "RETRIEVAL_QUERY", "RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", | ||
| "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", | ||
| "FACT_VERIFICATION", "CODE_RETRIEVAL_QUERY", | ||
| } |
There was a problem hiding this comment.
The _GEMINI_TASK_TYPES frozenset is duplicated here. It is already defined as _VALID_TASK_TYPES in openviking.models.embedder.gemini_embedders. To improve maintainability and ensure a single source of truth, please import _VALID_TASK_TYPES from gemini_embedders instead of redefining it.
from openviking.models.embedder.gemini_embedders import _VALID_TASK_TYPES as _GEMINI_TASK_TYPESThere was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pyproject.toml (1)
82-88:⚠️ Potential issue | 🟠 Majorpytest-asyncio version mismatch requires careful resolution.
There's a version inconsistency for
pytest-asyncio:
[project.optional-dependencies] test(Line 82):pytest-asyncio>=0.21.0[dependency-groups] dev(Line 255):pytest-asyncio>=1.3.0However, upgrading to 1.3.0 cannot be done with a simple version bump. The codebase defines custom
event_loop()fixtures intests/conftest.pyandbot/tests/conftest.pythat manually manage event loops withasyncio.new_event_loop(). pytest-asyncio 1.3.0 removed the event_loop fixture entirely and overhauled event loop scoping, requiring these fixtures to be refactored to useasyncio.get_running_loop()and explicitloop_scopeconfiguration.Either align both to
>=0.21.0for consistency, or upgrade to>=1.3.0after migrating the test fixtures according to the pytest-asyncio migration guide.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 82 - 88, The project has conflicting pytest-asyncio requirements; choose one: (A) keep both declarations at pytest-asyncio>=0.21.0 by changing the [dependency-groups] dev entry to pytest-asyncio>=0.21.0 so current event_loop fixtures continue to work, or (B) fully upgrade to pytest-asyncio>=1.3.0 and refactor tests/conftest.py and bot/tests/conftest.py by removing the custom event_loop fixtures that call asyncio.new_event_loop(), adopt the new scoped loop behavior using loop_scope configuration and replace uses with asyncio.get_running_loop() in async tests and fixtures, following the pytest-asyncio migration guide; update the pyproject entries so both places list the same version chosen.
🧹 Nitpick comments (7)
tests/integration/test_gemini_e2e.py (2)
92-92: Timing assertion may cause flaky tests.
assert elapsed < 15is network-dependent and could fail due to API latency spikes, CI resource contention, or regional network conditions. Consider either:
- Using a more generous threshold (e.g., 60s)
- Removing the timing assertion and validating only correctness
- Marking this test with
@pytest.mark.flakyif available🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_gemini_e2e.py` at line 92, The timing assertion assert elapsed < 15 is flaky; update the test in tests/integration/test_gemini_e2e.py to avoid brittle timing checks by either increasing the threshold (e.g., replace the 15s limit with 60s for elapsed), removing the timing assertion altogether and only asserting correctness, or decorating the test with a flaky marker (e.g., `@pytest.mark.flaky`) so transient API/CI latency won't fail the test; modify the assertion referencing elapsed accordingly and add a brief comment explaining the reason.
56-63: Batch vs individual comparison may have false positives.The test compares batch results to individual results expecting
sim > 0.99. However, if the Gemini API is non-deterministic (e.g., different request batching), this could occasionally fail. The threshold of 0.99 is quite strict.Consider loosening to 0.95 or documenting that exact reproducibility is expected.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_gemini_e2e.py` around lines 56 - 63, The test test_embed_batch_matches_individual is too strict and can false-fail if embeddings vary; update the assertion threshold from 0.99 to a looser value (e.g., 0.95) in the loop that compares batch_results from embed_batch to individual_results from embed using _cosine_similarity, or add a clear comment documenting that exact reproducibility is required; change the numeric check (sim > 0.99) to sim > 0.95 (or configurable) so the test tolerates minor non-determinism.openviking_cli/utils/config/embedding_config.py (2)
155-159: Duplicate task type constants across files.
_GEMINI_TASK_TYPESis defined here and_VALID_TASK_TYPESis defined ingemini_embedders.py(line 40-49). These should be consolidated to a single source of truth to prevent drift.♻️ Proposed approach
Import from the embedder module:
+from openviking.models.embedder.gemini_embedders import _VALID_TASK_TYPES as _GEMINI_TASK_TYPES - _GEMINI_TASK_TYPES = { - "RETRIEVAL_QUERY", "RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", - "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", - "FACT_VERIFICATION", "CODE_RETRIEVAL_QUERY", - }Or extract to a shared constants module.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking_cli/utils/config/embedding_config.py` around lines 155 - 159, The duplicate task-type sets (_GEMINI_TASK_TYPES in embedding_config.py and _VALID_TASK_TYPES in gemini_embedders.py) should be consolidated to a single source of truth: either export the canonical set from gemini_embedders.py (e.g., rename or re-export _VALID_TASK_TYPES) and import it into embedding_config.py, or move the set into a new shared constants module and import it from both places; update references to use the single exported symbol and remove the duplicate definition in embedding_config.py so both modules rely on the same constant.
438-468: New embedder methods not integrated into service layer.The
get_query_embedder()andget_document_embedder()methods are defined but unused. The service layer (openviking/service/core.py, lines 101 and 247) continues to callconfig.embedding.get_embedder(), so context-aware embeddings for non-symmetric providers (Gemini, Jina) won't be utilized without explicit service layer updates.Consider integrating these methods into the service layer or documenting whether this is intentional (opt-in behavior to be used elsewhere).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking_cli/utils/config/embedding_config.py` around lines 438 - 468, The new methods get_query_embedder and get_document_embedder are never used; update the service layer code that currently calls config.embedding.get_embedder() to instead call get_query_embedder when producing/querying embeddings and get_document_embedder when creating/indexing document embeddings (identify the call sites by the functions that currently invoke config.embedding.get_embedder()); ensure the service logic preserves the fallback behavior (call get_embedder if contextual methods are unavailable) and keep existing behavior for symmetric providers by relying on _get_contextual_embedder's effective_context logic.openviking/models/embedder/gemini_embedders.py (3)
186-192: Fallback logic may re-raise error for persistent API failures.When
embed_batchfalls back to individualembed()calls on line 192, if the API error is persistent (e.g., 401 auth error), each individual call will also fail and raise aRuntimeError. This will abort processing mid-batch rather than gracefully handling all items.Consider whether this is the intended behavior or if the fallback should only apply to transient errors (5xx).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking/models/embedder/gemini_embedders.py` around lines 186 - 192, The fallback in embed_batch that iterates calling self.embed(text) for each item can re-raise persistent API errors (e.g., 401) and abort processing; change embed_batch (and the except block that catches (APIError, ClientError) as e) to only perform per-item fallback for transient server errors (HTTP 5xx) and re-raise for persistent client/auth errors (4xx). Concretely, inspect e.code (or error.kind) inside the except in embed_batch: if it's a 5xx, loop and attempt per-item embedding but wrap each self.embed(text) in its own try/except to catch/record individual failures (log and append a sentinel or result placeholder); if e.code is 4xx (or other non-transient), re-raise the original exception immediately. This targets the embed_batch method and the call to self.embed to avoid aborting mid-batch on persistent errors.
226-233: Async fallback runs syncembed()sequentially, negating concurrency benefits.When async batch embedding fails, the fallback at line 232 runs
self.embed(text)viaanyio.to_thread.run_syncfor each text sequentially within the list comprehension. This makes the fallback path significantly slower than the primary async path.Consider parallelizing the fallback or documenting this as expected degraded behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking/models/embedder/gemini_embedders.py` around lines 226 - 233, The fallback currently runs self.embed(text) sequentially inside a list comprehension (via anyio.to_thread.run_sync) which loses concurrency; update the except handler to run the per-text fallbacks in parallel by spawning concurrent tasks (e.g., using anyio.create_task_group or anyio.from_thread.run) that each call anyio.to_thread.run_sync(self.embed, text) and then await/join and collect results into results[idx], so each embed call executes concurrently rather than sequentially in the except block handling APIError/ClientError.
244-249: Use the publicclient.close()method instead of accessing the private_http_clientattribute.The google-genai SDK provides an official
close()method on theClientclass for cleanup. Directly accessing the private_http_clientattribute is fragile and can break across SDK versions. Replaceself.client._http_client.close()withself.client.close().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking/models/embedder/gemini_embedders.py` around lines 244 - 249, Replace the private attribute access in the close routine: instead of checking for and calling self.client._http_client.close(), call the public self.client.close() method; update the close method in the Gemini embedder class (the method named close that currently references self.client and _http_client) to invoke self.client.close() inside a try/except to handle errors and remove reliance on the private _http_client attribute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Line 58: The Field description for the pydantic variable sk is missing a space
("Secretfor"); update the description string in the Field(...) for sk to read
"Access Key Secret for VikingDB API" so the text is corrected; locate the sk
Field declaration in embedding_config.py and adjust only the description value.
In `@openviking/models/embedder/__init__.py`:
- Line 15: The package docstring in openviking.models.embedder.__init__.py
incorrectly claims "Google Gemini: Dense only (multimodal)"; make it consistent
with the implementation by removing or correcting the "multimodal" note (or
explicitly state "not multimodal") so it matches
GeminiDenseEmbedder.supports_multimodal in gemini_embedders.py; update the
__init__ module docstring text to reflect that GeminiDenseEmbedder is dense-only
and does not support multimodal inputs.
In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 169-193: The embed_batch signature must match
DenseEmbedderBase.embed_batch by adding the is_query: bool = False parameter;
update embed_batch(self, texts: List[str], is_query: bool = False) and pass
is_query into the client call (client.models.embed_content(...,
is_query=is_query, ...)) and into the fallback calls to self.embed(text) (i.e.,
self.embed(text, is_query=is_query)) so the flag is propagated consistently;
adjust any internal uses of is_query if needed.
- Around line 154-167: The embed method in gemini_embedders.py must match the
base class DenseEmbedderBase signature to avoid TypeError; change the method
signature of embed (the function named embed in this file) to include the
missing parameter is_query: bool = False (i.e., def embed(self, text: str,
is_query: bool = False) -> EmbedResult) and keep the existing implementation
unchanged (or pass the flag through to the client if required later) so the
method is polymorphically compatible with DenseEmbedderBase.embed.
In `@tests/integration/conftest.py`:
- Around line 118-123: vectordb_engine_available() currently swallows all
exceptions; change it to only treat import/attribute-not-found as "engine
unavailable" so real syntax/runtime errors surface. In the function
(vectordb_engine_available) import openviking.storage.vectordb.engine and check
that PersistStore and VolatileStore are types, but replace the broad "except
Exception" with a narrow catch for ImportError, ModuleNotFoundError and
AttributeError (or similar import/lookup errors) and return False in those
cases, and re-raise any other exceptions so syntax/runtime failures are not
hidden.
In `@tests/integration/test_gemini_embedding_it.py`:
- Around line 90-95: The test test_invalid_api_key_error_message currently
hardcodes a fake API key that triggers secret scanners; instead construct the
bogus key at runtime (e.g., concatenate fragments or join a list) before passing
it to GeminiDenseEmbedder so the value is identical at runtime but not present
as a contiguous secret in source. Update the instantiation line that creates bad
= GeminiDenseEmbedder(..., api_key="INVALID_KEY_XYZZY_123") to build the string
dynamically and leave the rest of the test (the pytest.raises(RuntimeError,
match="Invalid API key") and bad.embed("hello")) unchanged.
In `@tests/integration/test_gemini_openviking_it.py`:
- Around line 142-160: The test currently only verifies add/find but not that
the requested embedding dimension is actually used; update
test_dimension_variant_add_search to assert the active embedder/index schema
dimension after client creation (e.g., inspect the client's embedder instance or
index schema) so the requested dim is honored—locate the client returned by
make_ov_client and check its embedder class/attribute (referencing
GeminiDenseEmbedder in openviking/models/embedder/gemini_embedders.py) or the
index schema’s vector dimension property and assert it equals dim before
proceeding with add_resource and search.
- Around line 167-186: The test_session_search_smoke creates a session via
session = client.session(session_id="gemini_it_session") and adds a message but
then calls client.find(...) which ignores that session and asserts result.total
>= 0 (a tautology); either exercise the session-scoped retrieval API or tighten
the assertion. Fix by invoking the session-aware find/search method (e.g.,
session.find(...) or client.find(..., session_id="gemini_it_session") depending
on API) so the added message is used, and change the assertion to a meaningful
check such as result.total > 0 or assert that returned hits relate to the added
resource; update references in the test (test_session_search_smoke, session,
client.find) accordingly.
In `@tests/unit/test_gemini_embedder.py`:
- Line 180: Tests are using the AnyIO marker `@pytest.mark.anyio` which conflicts
with pytest-asyncio in auto mode; either remove all occurrences of
`@pytest.mark.anyio` from the async test functions in test_gemini_embedder.py (so
pytest-asyncio auto-detection runs them) or replace those markers with
`@pytest.mark.asyncio` and switch pytest-asyncio configuration to
asyncio_mode="strict" in pyproject.toml; search for the decorator occurrences
(e.g., the `@pytest.mark.anyio` lines around the async test functions) and update
the decorators and/or the pyproject.toml asyncio_mode accordingly.
---
Outside diff comments:
In `@pyproject.toml`:
- Around line 82-88: The project has conflicting pytest-asyncio requirements;
choose one: (A) keep both declarations at pytest-asyncio>=0.21.0 by changing the
[dependency-groups] dev entry to pytest-asyncio>=0.21.0 so current event_loop
fixtures continue to work, or (B) fully upgrade to pytest-asyncio>=1.3.0 and
refactor tests/conftest.py and bot/tests/conftest.py by removing the custom
event_loop fixtures that call asyncio.new_event_loop(), adopt the new scoped
loop behavior using loop_scope configuration and replace uses with
asyncio.get_running_loop() in async tests and fixtures, following the
pytest-asyncio migration guide; update the pyproject entries so both places list
the same version chosen.
---
Nitpick comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 155-159: The duplicate task-type sets (_GEMINI_TASK_TYPES in
embedding_config.py and _VALID_TASK_TYPES in gemini_embedders.py) should be
consolidated to a single source of truth: either export the canonical set from
gemini_embedders.py (e.g., rename or re-export _VALID_TASK_TYPES) and import it
into embedding_config.py, or move the set into a new shared constants module and
import it from both places; update references to use the single exported symbol
and remove the duplicate definition in embedding_config.py so both modules rely
on the same constant.
- Around line 438-468: The new methods get_query_embedder and
get_document_embedder are never used; update the service layer code that
currently calls config.embedding.get_embedder() to instead call
get_query_embedder when producing/querying embeddings and get_document_embedder
when creating/indexing document embeddings (identify the call sites by the
functions that currently invoke config.embedding.get_embedder()); ensure the
service logic preserves the fallback behavior (call get_embedder if contextual
methods are unavailable) and keep existing behavior for symmetric providers by
relying on _get_contextual_embedder's effective_context logic.
In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 186-192: The fallback in embed_batch that iterates calling
self.embed(text) for each item can re-raise persistent API errors (e.g., 401)
and abort processing; change embed_batch (and the except block that catches
(APIError, ClientError) as e) to only perform per-item fallback for transient
server errors (HTTP 5xx) and re-raise for persistent client/auth errors (4xx).
Concretely, inspect e.code (or error.kind) inside the except in embed_batch: if
it's a 5xx, loop and attempt per-item embedding but wrap each self.embed(text)
in its own try/except to catch/record individual failures (log and append a
sentinel or result placeholder); if e.code is 4xx (or other non-transient),
re-raise the original exception immediately. This targets the embed_batch method
and the call to self.embed to avoid aborting mid-batch on persistent errors.
- Around line 226-233: The fallback currently runs self.embed(text) sequentially
inside a list comprehension (via anyio.to_thread.run_sync) which loses
concurrency; update the except handler to run the per-text fallbacks in parallel
by spawning concurrent tasks (e.g., using anyio.create_task_group or
anyio.from_thread.run) that each call anyio.to_thread.run_sync(self.embed, text)
and then await/join and collect results into results[idx], so each embed call
executes concurrently rather than sequentially in the except block handling
APIError/ClientError.
- Around line 244-249: Replace the private attribute access in the close
routine: instead of checking for and calling self.client._http_client.close(),
call the public self.client.close() method; update the close method in the
Gemini embedder class (the method named close that currently references
self.client and _http_client) to invoke self.client.close() inside a try/except
to handle errors and remove reliance on the private _http_client attribute.
In `@tests/integration/test_gemini_e2e.py`:
- Line 92: The timing assertion assert elapsed < 15 is flaky; update the test in
tests/integration/test_gemini_e2e.py to avoid brittle timing checks by either
increasing the threshold (e.g., replace the 15s limit with 60s for elapsed),
removing the timing assertion altogether and only asserting correctness, or
decorating the test with a flaky marker (e.g., `@pytest.mark.flaky`) so transient
API/CI latency won't fail the test; modify the assertion referencing elapsed
accordingly and add a brief comment explaining the reason.
- Around line 56-63: The test test_embed_batch_matches_individual is too strict
and can false-fail if embeddings vary; update the assertion threshold from 0.99
to a looser value (e.g., 0.95) in the loop that compares batch_results from
embed_batch to individual_results from embed using _cosine_similarity, or add a
clear comment documenting that exact reproducibility is required; change the
numeric check (sim > 0.99) to sim > 0.95 (or configurable) so the test tolerates
minor non-determinism.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5e01e6e7-9faa-44c8-abdb-8d6be2909a5f
📒 Files selected for processing (11)
examples/ov.conf.exampleopenviking/models/embedder/__init__.pyopenviking/models/embedder/gemini_embedders.pyopenviking_cli/utils/config/embedding_config.pypyproject.tomltests/integration/conftest.pytests/integration/test_gemini_e2e.pytests/integration/test_gemini_embedding_it.pytests/integration/test_gemini_openviking_it.pytests/unit/test_embedding_config_gemini.pytests/unit/test_gemini_embedder.py
| """Unit tests for async_embed_batch (uses AsyncMock, no real API).""" | ||
|
|
||
| @patch("openviking.models.embedder.gemini_embedders.genai.Client") | ||
| @pytest.mark.anyio |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
pytest-asyncio asyncio_mode auto vs pytest.mark.anyio compatibility
💡 Result:
pytest-asyncio's asyncio_mode=auto automatically detects and runs all async test functions (adding @pytest.mark.asyncio implicitly) and takes ownership of all async fixtures. This is ideal for pure asyncio projects but conflicts with pytest.mark.anyio from the AnyIO pytest plugin. pytest.mark.anyio marks specific tests to run with AnyIO's anyio_backend fixture, supporting multiple backends (asyncio, trio, etc.). It requires explicit marking or anyio_mode=auto. Compatibility: - In asyncio_mode=auto (pytest-asyncio), it conflicts with AnyIO's plugin: pytest-asyncio grabs all async tests/fixtures first, preventing pytest.mark.anyio from working properly (AnyIO docs explicitly warn: "This does not work if pytest-asyncio is installed and configured to use its own auto mode"). - Solution for coexistence: Use pytest-asyncio's default strict mode (no asyncio_mode or asyncio_mode=strict). Then explicitly mark asyncio tests with @pytest.mark.asyncio and AnyIO tests with @pytest.mark.anyio. Strict mode only handles explicitly marked asyncio tests, allowing other plugins like AnyIO to handle their marked tests. Key differences: - asyncio_mode=auto: Simplest for asyncio-only, no markers needed, but no multi-backend or AnyIO compatibility. - Strict + markers: More verbose but enables mixing asyncio-specific and AnyIO/multi-backend tests in the same suite. Current versions (pytest-asyncio ~1.3.0, AnyIO latest as of 2026): Behavior unchanged per docs and issues.
Citations:
- 1: https://pytest-asyncio.readthedocs.io/en/stable/reference/markers/
- 2: https://stackoverflow.com/questions/57461868/what-is-pytest-mark-asyncio-is-used-for
- 3: https://pytest-asyncio.readthedocs.io/en/v0.25.3/reference/configuration.html
- 4: https://anyio.readthedocs.io/en/stable/testing.html
- 5: https://pytest-asyncio.readthedocs.io/en/stable/reference/configuration.html
- 6: https://pytest-asyncio.readthedocs.io/en/stable/concepts.html
- 7: https://pytest-asyncio.readthedocs.io/en/v0.20.3/concepts.html
- 8: https://anyio.readthedocs.io/en/latest/testing.html
- 9: tests/test_pytest_plugin.py 4 failures when pytest-asyncio is installed agronholm/anyio#328
🏁 Script executed:
# Find and check pyproject.toml for asyncio configuration
find . -name "pyproject.toml" -type f | head -5Repository: chethanuk/OpenViking
Length of output: 238
🏁 Script executed:
# Check if pytest-anyio is mentioned in dependencies
if [ -f pyproject.toml ]; then
cat pyproject.toml | grep -A 20 -B 5 "asyncio\|anyio\|pytest" | head -80
fiRepository: chethanuk/OpenViking
Length of output: 1850
🏁 Script executed:
# Check for requirements files
ls -la | grep -i reqRepository: chethanuk/OpenViking
Length of output: 46
🏁 Script executed:
# Check the test file for anyio markers at the specified lines
sed -n '175,250p' tests/unit/test_gemini_embedder.py | cat -nRepository: chethanuk/OpenViking
Length of output: 4716
Replace @pytest.mark.anyio with @pytest.mark.asyncio or remove markers entirely.
The project uses pytest-asyncio>=0.21.0 with asyncio_mode = "auto" in pyproject.toml. This configuration is incompatible with @pytest.mark.anyio: pytest-asyncio's auto mode automatically detects and runs all async tests, preventing AnyIO's plugin from recognizing @pytest.mark.anyio markers (as explicitly documented in AnyIO's testing guide).
Choose one solution:
- Remove all
@pytest.mark.anyiomarkers—auto mode will detect and run async tests automatically. - Switch
asyncio_modeto"strict"in pyproject.toml and replace@pytest.mark.anyiowith@pytest.mark.asynciofor explicit pytest-asyncio compatibility.
Applies to lines: 180, 195, 215, 232, 240
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/test_gemini_embedder.py` at line 180, Tests are using the AnyIO
marker `@pytest.mark.anyio` which conflicts with pytest-asyncio in auto mode;
either remove all occurrences of `@pytest.mark.anyio` from the async test
functions in test_gemini_embedder.py (so pytest-asyncio auto-detection runs
them) or replace those markers with `@pytest.mark.asyncio` and switch
pytest-asyncio configuration to asyncio_mode="strict" in pyproject.toml; search
for the decorator occurrences (e.g., the `@pytest.mark.anyio` lines around the
async test functions) and update the decorators and/or the pyproject.toml
asyncio_mode accordingly.
- embed()/embed_batch(): add is_query: bool = False to match base class signature - embed_batch() fallback: pass is_query through to self.embed() - Remove redundant _l2_normalize() calls in embed/embed_batch/async_embed_batch; Gemini API already returns L2-normalized vectors, truncate_and_normalize is sufficient - Remove now-dead _l2_normalize() function and unused math import - embedding_config.py: fix typo "Secretfor" → "Secret for" in sk field description - conftest.py: narrow broad except Exception to (ImportError, ModuleNotFoundError, AttributeError) - test_gemini_embedding_it.py: construct fake API key dynamically to avoid hardcoded secret pattern - test_gemini_openviking_it.py: add dimension assertion in test_dimension_variant_add_search; fix tautology assertion (>= 0 → > 0) in test_session_search_smoke
7d10057 to
df86558
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
tests/integration/test_gemini_openviking_it.py (2)
185-189:⚠️ Potential issue | 🟠 MajorUse the session in the search you assert.
The stronger
> 0check helps, butclient.find(...)still ignores the session created on Lines 185-186. This remains a plain find smoke test, so a regression in session-scoped retrieval would still slip through.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_gemini_openviking_it.py` around lines 185 - 189, The test creates a session object (session = client.session(session_id="gemini_it_session") and adds a message, but then calls client.find(...) without scoping to that session; update the find call to use the created session (pass the session object or its session_id) so the search is performed against the session-scoped data (e.g., client.find(..., session=session) or client.find(..., session_id="gemini_it_session") instead of the current call to client.find(query="pytest testing tool")).
149-151:⚠️ Potential issue | 🟠 MajorAssert the runtime dimension, not the config echo.
OpenVikingConfigSingleton.instance().embedding.dimensionis still derived from config, so this test stays green even if the factory dropsdimensionor the active embedder/index schema falls back to a different size. Please inspect the created embedder or the index schema directly here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_gemini_openviking_it.py` around lines 149 - 151, The test currently asserts OpenVikingConfigSingleton.instance().embedding.dimension (a config echo) so change it to assert the runtime embedder/index schema dimension instead: after creating the embedder or index in the test, obtain the actual runtime object (e.g. the created embedder instance or the index schema from the component under test) and assert its reported dimension equals dim rather than checking OpenVikingConfigSingleton.instance().embedding.dimension; locate uses of OpenVikingConfigSingleton.instance() in the test and replace the assertion with a check against the embedder.dimension or index.schema.dimension on the actual created object.
🧹 Nitpick comments (1)
tests/integration/conftest.py (1)
173-178: Close the module-scoped Gemini client in fixture teardown.This fixture creates a real
GeminiDenseEmbedderand never callsclose(), so the underlying HTTP client stays open for the life of the test process. Switching toyieldkeeps the live suite cleaner and less flaky.♻️ Suggested change
`@pytest.fixture`(scope="module", params=GEMINI_MODELS_FIXTURE) def gemini_embedder(request): """Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models.""" from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder model_name, _, _ = request.param - return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768) + embedder = GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768) + yield embedder + embedder.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/conftest.py` around lines 173 - 178, The gemini_embedder fixture creates a GeminiDenseEmbedder but never closes it; change the module-scoped fixture "gemini_embedder" to use yield instead of return: instantiate GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768), yield the instance, and after yield call instance.close() in the teardown so the underlying HTTP client is properly closed; reference the fixture name gemini_embedder and the class GeminiDenseEmbedder to locate the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 357-368: The builder for ("gemini", "dense") currently constructs
GeminiDenseEmbedder without passing EmbeddingConfig.max_concurrent, so changes
to cfg.max_concurrent have no effect; update the factory lambda that builds
GeminiDenseEmbedder to include "max_concurrent": cfg.max_concurrent (or the
appropriate cfg field name) in the kwargs passed to GeminiDenseEmbedder so the
embedder receives and uses the configured concurrency limit. Ensure the key name
matches the GeminiDenseEmbedder constructor parameter and keep the rest of the
existing keys ("model_name", "api_key", "dimension", "task_type") intact.
In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 140-145: The class currently sets a per-model token ceiling in
_token_limit but never enforces it before calling the Gemini SDK; update the
request paths (the embed() and embed_batch() methods in the same class, which
construct requests using self._embed_config and send contents to Gemini) to
validate and enforce self._token_limit: for single-item embed(), check the token
count of the input and either chunk, truncate, or raise a clear local error when
it exceeds _token_limit; for embed_batch(), perform the same per-item check and
also split oversized items into multiple requests or reject with a descriptive
exception. Ensure the enforcement uses the existing _token_limit (fallbacking to
_DEFAULT_TOKEN_LIMIT) and happens before any network call so oversized payloads
never go over the wire.
---
Duplicate comments:
In `@tests/integration/test_gemini_openviking_it.py`:
- Around line 185-189: The test creates a session object (session =
client.session(session_id="gemini_it_session") and adds a message, but then
calls client.find(...) without scoping to that session; update the find call to
use the created session (pass the session object or its session_id) so the
search is performed against the session-scoped data (e.g., client.find(...,
session=session) or client.find(..., session_id="gemini_it_session") instead of
the current call to client.find(query="pytest testing tool")).
- Around line 149-151: The test currently asserts
OpenVikingConfigSingleton.instance().embedding.dimension (a config echo) so
change it to assert the runtime embedder/index schema dimension instead: after
creating the embedder or index in the test, obtain the actual runtime object
(e.g. the created embedder instance or the index schema from the component under
test) and assert its reported dimension equals dim rather than checking
OpenVikingConfigSingleton.instance().embedding.dimension; locate uses of
OpenVikingConfigSingleton.instance() in the test and replace the assertion with
a check against the embedder.dimension or index.schema.dimension on the actual
created object.
---
Nitpick comments:
In `@tests/integration/conftest.py`:
- Around line 173-178: The gemini_embedder fixture creates a GeminiDenseEmbedder
but never closes it; change the module-scoped fixture "gemini_embedder" to use
yield instead of return: instantiate GeminiDenseEmbedder(model_name,
api_key=GOOGLE_API_KEY, dimension=768), yield the instance, and after yield call
instance.close() in the teardown so the underlying HTTP client is properly
closed; reference the fixture name gemini_embedder and the class
GeminiDenseEmbedder to locate the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4ee682f8-b2b7-4274-8e24-320796a70f6e
📒 Files selected for processing (5)
openviking/models/embedder/gemini_embedders.pyopenviking_cli/utils/config/embedding_config.pytests/integration/conftest.pytests/integration/test_gemini_embedding_it.pytests/integration/test_gemini_openviking_it.py
| ("gemini", "dense"): ( | ||
| GeminiDenseEmbedder, | ||
| lambda cfg: { | ||
| "model_name": cfg.model, | ||
| "api_key": cfg.api_key, | ||
| "dimension": cfg.dimension, | ||
| "task_type": ( | ||
| cfg.query_param if context == "query" and cfg.query_param | ||
| else cfg.document_param if context == "document" and cfg.document_param | ||
| else cfg.task_type | ||
| ), | ||
| }, |
There was a problem hiding this comment.
max_concurrent never reaches Gemini’s async batch limiter.
The config model exposes EmbeddingConfig.max_concurrent, but this builder always instantiates GeminiDenseEmbedder with its constructor default. Changing the config will not actually throttle or widen Gemini batch concurrency.
♻️ Suggested wiring
("gemini", "dense"): (
GeminiDenseEmbedder,
lambda cfg: {
"model_name": cfg.model,
"api_key": cfg.api_key,
"dimension": cfg.dimension,
+ "max_concurrent_batches": self.max_concurrent,
"task_type": (
cfg.query_param if context == "query" and cfg.query_param
else cfg.document_param if context == "document" and cfg.document_param
else cfg.task_type
),
},
),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ("gemini", "dense"): ( | |
| GeminiDenseEmbedder, | |
| lambda cfg: { | |
| "model_name": cfg.model, | |
| "api_key": cfg.api_key, | |
| "dimension": cfg.dimension, | |
| "task_type": ( | |
| cfg.query_param if context == "query" and cfg.query_param | |
| else cfg.document_param if context == "document" and cfg.document_param | |
| else cfg.task_type | |
| ), | |
| }, | |
| ("gemini", "dense"): ( | |
| GeminiDenseEmbedder, | |
| lambda cfg: { | |
| "model_name": cfg.model, | |
| "api_key": cfg.api_key, | |
| "dimension": cfg.dimension, | |
| "max_concurrent_batches": self.max_concurrent, | |
| "task_type": ( | |
| cfg.query_param if context == "query" and cfg.query_param | |
| else cfg.document_param if context == "document" and cfg.document_param | |
| else cfg.task_type | |
| ), | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openviking_cli/utils/config/embedding_config.py` around lines 357 - 368, The
builder for ("gemini", "dense") currently constructs GeminiDenseEmbedder without
passing EmbeddingConfig.max_concurrent, so changes to cfg.max_concurrent have no
effect; update the factory lambda that builds GeminiDenseEmbedder to include
"max_concurrent": cfg.max_concurrent (or the appropriate cfg field name) in the
kwargs passed to GeminiDenseEmbedder so the embedder receives and uses the
configured concurrency limit. Ensure the key name matches the
GeminiDenseEmbedder constructor parameter and keep the rest of the existing keys
("model_name", "api_key", "dimension", "task_type") intact.
| self._token_limit = _MODEL_TOKEN_LIMITS.get(model_name, _DEFAULT_TOKEN_LIMIT) | ||
| self._max_concurrent_batches = max_concurrent_batches | ||
| config_kwargs: Dict[str, Any] = {"output_dimensionality": self._dimension} | ||
| if self.task_type: | ||
| config_kwargs["task_type"] = self.task_type | ||
| self._embed_config = types.EmbedContentConfig(**config_kwargs) |
There was a problem hiding this comment.
Enforce the configured token limit before hitting Gemini.
Line 140 stores the per-model ceiling, but none of the request paths use it before sending contents to the SDK. Direct embed() / embed_batch() calls on oversized inputs will still go over the wire and fail remotely instead of being chunked, truncated, or rejected locally with a clear error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openviking/models/embedder/gemini_embedders.py` around lines 140 - 145, The
class currently sets a per-model token ceiling in _token_limit but never
enforces it before calling the Gemini SDK; update the request paths (the embed()
and embed_batch() methods in the same class, which construct requests using
self._embed_config and send contents to Gemini) to validate and enforce
self._token_limit: for single-item embed(), check the token count of the input
and either chunk, truncate, or raise a clear local error when it exceeds
_token_limit; for embed_batch(), perform the same per-item check and also split
oversized items into multiple requests or reject with a descriptive exception.
Ensure the enforcement uses the existing _token_limit (fallbacking to
_DEFAULT_TOKEN_LIMIT) and happens before any network call so oversized payloads
never go over the wire.
df86558 to
148f6e3
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openviking_cli/utils/config/embedding_config.py (1)
278-291:⚠️ Potential issue | 🟠 MajorWire
input_typeinto the OpenAI dense factory.
EmbeddingModelConfig.input_typeis added and normalized above, but this builder never forwards it toOpenAIDenseEmbedder. In the symmetric OpenAI path that makesinput_typea silent no-op, so configs cannot actually forcequery/document/passagebehavior here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking_cli/utils/config/embedding_config.py` around lines 278 - 291, The mapping for ("openai","dense") doesn't forward EmbeddingModelConfig.input_type to OpenAIDenseEmbedder, so input_type is ignored; update the lambda that builds kwargs for OpenAIDenseEmbedder to include "input_type": cfg.input_type (or normalized property name if different) alongside model_name, api_key, api_base, dimension, etc., so OpenAIDenseEmbedder receives and can act on the configured input_type; locate the dict in the ("openai","dense") branch where OpenAIDenseEmbedder and the lambda are returned and add the input_type key to that kwargs dictionary.
♻️ Duplicate comments (3)
tests/integration/test_gemini_openviking_it.py (2)
205-209:⚠️ Potential issue | 🟠 MajorThe session setup is still outside the asserted behavior.
client.find()on Lines 208-209 does not consume the session or the message added on Line 206, so this test only covers plain embedding search. Use the session-aware retrieval path here or drop the unused session scaffolding.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_gemini_openviking_it.py` around lines 205 - 209, The test creates a session via client.session(...) and adds a message with session.add_message(...) but then calls client.find(...) without using that session; either remove the unused session setup or switch to the session-aware retrieval API. Fix by either deleting the session/session.add_message lines, or call the session-aware finder (e.g., pass the session into the find call like client.find(session=session, query="pytest testing tool") or use whatever session.find(...) helper exists) so the added message is actually consumed by the retrieval assertion.
163-168:⚠️ Potential issue | 🟠 MajorThis still only verifies the configured dimension, not the runtime one.
OpenVikingConfigSingleton.instance().embedding.dimensionon Lines 166-168 just re-reads the requested config value. The test still goes green if the live embedder or index schema later falls back to 3072, so please assert the runtime dimension instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_gemini_openviking_it.py` around lines 163 - 168, Replace the config-only assertion with a runtime check: after creating client via make_ov_client(...), inspect the actual embedder/index created by the client (e.g., client.embedder.dimension or client._embedder.dimension, or the index schema such as client.index.schema.embedding_dimension) and assert that that runtime dimension equals dim instead of using OpenVikingConfigSingleton.instance().embedding.dimension; update the test to pull the dimension from the live client/embedder/index and assert equality to dim.openviking_cli/utils/config/embedding_config.py (1)
370-383:⚠️ Potential issue | 🟠 MajorGemini still ignores the configured concurrency limit.
EmbeddingConfig.max_concurrenton Line 225 is the public throttle, but this lambda always instantiatesGeminiDenseEmbedderwith its constructor default. Any override is ignored, which makes rate-limit tuning and throughput control ineffective.♻️ Suggested change
("gemini", "dense"): ( GeminiDenseEmbedder, lambda cfg: { "model_name": cfg.model, "api_key": cfg.api_key, "dimension": cfg.dimension, + "max_concurrent_batches": self.max_concurrent, "task_type": ( cfg.query_param if context == "query" and cfg.query_param else cfg.document_param if context == "document" and cfg.document_param else cfg.task_type ), }, ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking_cli/utils/config/embedding_config.py` around lines 370 - 383, The mapping for ("gemini", "dense") ignores EmbeddingConfig.max_concurrent by not passing it into GeminiDenseEmbedder; update the lambda that constructs GeminiDenseEmbedder to include the concurrency override (use cfg.max_concurrent or EmbeddingConfig.max_concurrent as exposed on cfg) in the args passed to GeminiDenseEmbedder so the instance respects the configured throttle, e.g., add a "max_concurrent" (or the exact constructor param name used by GeminiDenseEmbedder) entry to the returned dict alongside model_name, api_key, dimension and task_type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/conftest.py`:
- Around line 182-188: The fixture gemini_embedder returns a live
GeminiDenseEmbedder instance but never closes it; change the fixture to use a
yield-based teardown so after yielding the GeminiDenseEmbedder (created in
gemini_embedder) you call its close method (e.g., embedder.close()) in the
teardown to release SDK clients/sockets; locate the fixture function
gemini_embedder and replace the direct return with yield and a post-yield
cleanup that calls GeminiDenseEmbedder.close() (or the appropriate shutdown
method on that class) to ensure the client is closed for each parametrized
module.
In `@tests/integration/test_gemini_e2e.py`:
- Around line 83-90: Remove the hard latency assertion that flakes against the
live Gemini API: delete or guard the lines that record t0, compute elapsed, and
assert elapsed < 15 (related to variables t0 and elapsed) in the
GeminiDenseEmbedder async_embed_batch test; keep the functional assertions
(len(results) == 300 and vector length checks). If you want to keep a timing
check, move it to a separate perf/benchmark test or gate it behind an opt-in env
flag (e.g., RUN_PERF) so the integration test only verifies correctness and not
wall-clock SLA.
---
Outside diff comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 278-291: The mapping for ("openai","dense") doesn't forward
EmbeddingModelConfig.input_type to OpenAIDenseEmbedder, so input_type is
ignored; update the lambda that builds kwargs for OpenAIDenseEmbedder to include
"input_type": cfg.input_type (or normalized property name if different)
alongside model_name, api_key, api_base, dimension, etc., so OpenAIDenseEmbedder
receives and can act on the configured input_type; locate the dict in the
("openai","dense") branch where OpenAIDenseEmbedder and the lambda are returned
and add the input_type key to that kwargs dictionary.
---
Duplicate comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 370-383: The mapping for ("gemini", "dense") ignores
EmbeddingConfig.max_concurrent by not passing it into GeminiDenseEmbedder;
update the lambda that constructs GeminiDenseEmbedder to include the concurrency
override (use cfg.max_concurrent or EmbeddingConfig.max_concurrent as exposed on
cfg) in the args passed to GeminiDenseEmbedder so the instance respects the
configured throttle, e.g., add a "max_concurrent" (or the exact constructor
param name used by GeminiDenseEmbedder) entry to the returned dict alongside
model_name, api_key, dimension and task_type.
In `@tests/integration/test_gemini_openviking_it.py`:
- Around line 205-209: The test creates a session via client.session(...) and
adds a message with session.add_message(...) but then calls client.find(...)
without using that session; either remove the unused session setup or switch to
the session-aware retrieval API. Fix by either deleting the
session/session.add_message lines, or call the session-aware finder (e.g., pass
the session into the find call like client.find(session=session, query="pytest
testing tool") or use whatever session.find(...) helper exists) so the added
message is actually consumed by the retrieval assertion.
- Around line 163-168: Replace the config-only assertion with a runtime check:
after creating client via make_ov_client(...), inspect the actual embedder/index
created by the client (e.g., client.embedder.dimension or
client._embedder.dimension, or the index schema such as
client.index.schema.embedding_dimension) and assert that that runtime dimension
equals dim instead of using
OpenVikingConfigSingleton.instance().embedding.dimension; update the test to
pull the dimension from the live client/embedder/index and assert equality to
dim.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a0b79044-2113-46c2-9d65-dd5b53901d07
📒 Files selected for processing (8)
openviking/models/embedder/gemini_embedders.pyopenviking_cli/utils/config/embedding_config.pytests/integration/conftest.pytests/integration/test_gemini_e2e.pytests/integration/test_gemini_embedding_it.pytests/integration/test_gemini_openviking_it.pytests/unit/test_embedding_config_gemini.pytests/unit/test_gemini_embedder.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/unit/test_gemini_embedder.py
- openviking/models/embedder/gemini_embedders.py
- tests/unit/test_embedding_config_gemini.py
tests/integration/conftest.py
Outdated
| @pytest.fixture(scope="module", params=GEMINI_MODELS_FIXTURE) | ||
| def gemini_embedder(request): | ||
| """Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models.""" | ||
| from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder | ||
|
|
||
| model_name, _, _ = request.param | ||
| return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768) |
There was a problem hiding this comment.
Tear down the module-scoped Gemini client.
This fixture returns a live GeminiDenseEmbedder but never closes it. Because it is parametrized over models and reused across tests, the suite can accumulate open SDK clients/sockets until process exit.
♻️ Suggested change
`@pytest.fixture`(scope="module", params=GEMINI_MODELS_FIXTURE)
def gemini_embedder(request):
"""Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models."""
from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder
model_name, _, _ = request.param
- return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
+ embedder = GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768)
+ try:
+ yield embedder
+ finally:
+ embedder.close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.fixture(scope="module", params=GEMINI_MODELS_FIXTURE) | |
| def gemini_embedder(request): | |
| """Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models.""" | |
| from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder | |
| model_name, _, _ = request.param | |
| return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768) | |
| `@pytest.fixture`(scope="module", params=GEMINI_MODELS_FIXTURE) | |
| def gemini_embedder(request): | |
| """Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models.""" | |
| from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder | |
| model_name, _, _ = request.param | |
| embedder = GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768) | |
| try: | |
| yield embedder | |
| finally: | |
| embedder.close() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/conftest.py` around lines 182 - 188, The fixture
gemini_embedder returns a live GeminiDenseEmbedder instance but never closes it;
change the fixture to use a yield-based teardown so after yielding the
GeminiDenseEmbedder (created in gemini_embedder) you call its close method
(e.g., embedder.close()) in the teardown to release SDK clients/sockets; locate
the fixture function gemini_embedder and replace the direct return with yield
and a post-yield cleanup that calls GeminiDenseEmbedder.close() (or the
appropriate shutdown method on that class) to ensure the client is closed for
each parametrized module.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (6)
tests/integration/test_gemini_e2e.py (1)
81-90:⚠️ Potential issue | 🟠 MajorRemove the hard latency SLA from the real-API test.
elapsed < 15will flap with runner load, network jitter, or transient throttling even when batching is correct. Keep the functional assertions here and move timing checks to an opt-in perf/benchmark test.♻️ Suggested fix
- import time - e = GeminiDenseEmbedder("gemini-embedding-2-preview", api_key=GOOGLE_API_KEY, dimension=128) texts = [f"sentence {i}" for i in range(300)] # 3 batches of 100 - t0 = time.monotonic() results = await e.async_embed_batch(texts) - elapsed = time.monotonic() - t0 assert len(results) == 300 assert all(len(r.dense_vector) == 128 for r in results) - assert elapsed < 15 # concurrent should be << 3× serial RTT e.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_gemini_e2e.py` around lines 81 - 90, The test is asserting a hard latency SLA which can flap; remove the timing assertion and keep only the functional checks: instantiate GeminiDenseEmbedder, call async_embed_batch, assert len(results) == 300 and each r.dense_vector has length 128, and drop the elapsed < 15 assertion (or move any timing assertions into a separate, opt-in perf/benchmark test). Locate the timing logic around GeminiDenseEmbedder and async_embed_batch in the test and delete the elapsed measurement assertion while preserving the setup and functional assertions.pyproject.toml (1)
82-89:⚠️ Potential issue | 🟠 Major
pytest-asyncioauto mode still conflicts with the new AnyIO tests.Line 212 still sets
asyncio_mode = "auto", while this PR addsanyiosupport and@pytest.mark.anyiotests. AnyIO’s testing docs explicitly call outpytest-asyncioauto mode as incompatible with the AnyIO plugin, andpytest-asynciodocuments strict mode as the coexistence mode when other async test plugins are in play. Either switch to strict mode and mark asyncio tests explicitly, or drop the AnyIO markers. (anyio.readthedocs.io)Also applies to: 214-216
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 82 - 89, The test configuration currently sets pytest-asyncio asyncio_mode = "auto" while the PR adds AnyIO-marked tests (`@pytest.mark.anyio`), which conflicts; update the test config to use asyncio_mode = "strict" (so pytest-asyncio coexists with the AnyIO plugin) and then explicitly mark asyncio tests with `@pytest.mark.asyncio` where needed, or alternatively remove the AnyIO markers and keep auto mode—change the asyncio_mode setting and adjust test decorators (`@pytest.mark.anyio` / `@pytest.mark.asyncio`) accordingly.openviking/models/embedder/gemini_embedders.py (1)
143-143:⚠️ Potential issue | 🟠 Major
_token_limitis still never enforced before Gemini calls.Line 143 records the per-model ceiling, but
embed(),embed_batch(), andasync_embed_batch()never check it before sendingcontentsto the SDK. Oversized inputs will still fail remotely instead of being chunked, truncated, or rejected locally with a clear error.Also applies to: 150-160, 165-184, 223-225
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking/models/embedder/gemini_embedders.py` at line 143, The per-model ceiling stored in self._token_limit is never enforced before calling Gemini; update embed, embed_batch, and async_embed_batch to validate inputs against self._token_limit (locations around the current embed() implementation and the batch methods referenced) by tokenizing each content (use your existing tokenizer utility) and either chunking long inputs into acceptable-sized pieces, truncating with a clear warning, or raising a descriptive local error instead of sending oversized payloads to the SDK; ensure the chosen behavior is applied consistently in embed(), embed_batch(), and async_embed_batch(), and reference _MODEL_TOKEN_LIMITS/_DEFAULT_TOKEN_LIMIT where needed to determine limits.tests/integration/conftest.py (1)
182-188:⚠️ Potential issue | 🟡 MinorClose the module-scoped Gemini fixture.
gemini_embedderreturns a liveGeminiDenseEmbedderbut never closes it. With multiple parametrized models, the suite can keep SDK clients/sockets open until process exit.♻️ Suggested fix
`@pytest.fixture`(scope="module", params=GEMINI_MODELS_FIXTURE) def gemini_embedder(request): """Module-scoped GeminiDenseEmbedder at dim=768, parametrized over known models.""" from openviking.models.embedder.gemini_embedders import GeminiDenseEmbedder model_name, _, _ = request.param - return GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768) + embedder = GeminiDenseEmbedder(model_name, api_key=GOOGLE_API_KEY, dimension=768) + try: + yield embedder + finally: + embedder.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/conftest.py` around lines 182 - 188, The fixture gemini_embedder returns a GeminiDenseEmbedder without closing it; change the fixture to assign the instance to a local variable, register a teardown via request.addfinalizer that calls the embedder's cleanup method (e.g., embedder.close() or embedder.shutdown(), or await embedder.aclose() if async) and then return the instance; reference the fixture name gemini_embedder and the class GeminiDenseEmbedder so the finalizer ensures SDK clients/sockets are closed after each module param.openviking_cli/utils/config/embedding_config.py (1)
225-227:⚠️ Potential issue | 🟡 Minor
EmbeddingConfig.max_concurrentstill does nothing for Gemini.Lines 225-227 expose the limiter, but the Gemini factory never passes it as
max_concurrent_batches, so tuningmax_concurrenthas no effect onasync_embed_batch().♻️ Suggested fix
("gemini", "dense"): ( GeminiDenseEmbedder, lambda cfg: { "model_name": cfg.model, "api_key": cfg.api_key, "dimension": cfg.dimension, + "max_concurrent_batches": self.max_concurrent, "task_type": ( cfg.query_param if context == "query" and cfg.query_param else cfg.document_param if context == "document" and cfg.document_paramAlso applies to: 370-383
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking_cli/utils/config/embedding_config.py` around lines 225 - 227, EmbeddingConfig.max_concurrent is defined but not forwarded to Gemini's batching limiter; update the Gemini embedding provider/factory to pass EmbeddingConfig.max_concurrent as the max_concurrent_batches argument so async_embed_batch actually uses it. Locate where the Gemini provider/factory (the function/class that instantiates the Gemini embeddings client) constructs or calls async_embed_batch and modify the call/constructor to accept and forward embedding_config.max_concurrent (also update the second occurrence around the other factory/initialization block noted in the review). Ensure the parameter name matches the Gemini client API (max_concurrent_batches) and propagate the value from EmbeddingConfig.max_concurrent through the factory to the async_embed_batch invocations.tests/integration/test_gemini_openviking_it.py (1)
163-168:⚠️ Potential issue | 🟠 MajorThis still checks config, not the active embedding dimension.
OpenVikingConfigSingleton.get_instance().embedding.dimensionjust echoes the configured value. If the Gemini factory silently falls back to 3072 and both indexing/search use that default, this test still passes, so it won't catch the regression it is meant to guard against. Assert the live embedder or index schema dimension instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_gemini_openviking_it.py` around lines 163 - 168, The test is asserting the configured dimension via OpenVikingConfigSingleton rather than the runtime/embedder/index dimension, so replace the config check with an assertion against the live component returned by make_ov_client: after creating client = await make_ov_client(...), query the actual embedder or index schema on that client (e.g., inspect client.embedder.dimension or client.index.schema.dimension or similar runtime property provided by the OpenViking client) and assert it equals dim to ensure the active embedder/index uses the expected dimension instead of relying on the config value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 453-483: The live embedder built by get_embedder() bypasses
Gemini/OpenAI per-context routing; modify get_embedder() so when self.dense and
self.dense.provider (e.g., "gemini" or "openai") exist and
query_param/document_param are present it returns a context-aware wrapper (or
factory callable/object) that accepts a context flag (e.g., is_query or context)
and delegates to _create_embedder(...) with effective_context similar to
_get_contextual_embedder logic; reference get_embedder(),
_get_contextual_embedder(), get_query_embedder(), get_document_embedder(), and
_create_embedder() to locate and implement this wrapper so calls that currently
use get_embedder() (and later invoke embed(..., is_query=...)) will route to the
correct task_type/params.
In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 216-240: async_embed_batch currently sends raw batch items to
client.aio.models.embed_content which breaks parity with embed_batch's handling
of empty/whitespace inputs; change the logic in _embed_one so you first split
batch into non-empty texts (trimmed), keep their original positions, call
client.aio.models.embed_content only with the non-empty list, then reconstruct
results[idx] by inserting zero vectors (use truncate_and_normalize on a zero
vector of length _dimension) for any empty inputs and the normalized embeddings
from the API response for non-empty positions; in the except block do the same
reconstruction but obtain embeddings for non-empty texts via await
anyio.to_thread.run_sync(self.embed, text) and still insert zero vectors for
empties so a single empty item doesn't force the whole chunk to become slow or
change semantics compared to embed_batch.
---
Duplicate comments:
In `@openviking_cli/utils/config/embedding_config.py`:
- Around line 225-227: EmbeddingConfig.max_concurrent is defined but not
forwarded to Gemini's batching limiter; update the Gemini embedding
provider/factory to pass EmbeddingConfig.max_concurrent as the
max_concurrent_batches argument so async_embed_batch actually uses it. Locate
where the Gemini provider/factory (the function/class that instantiates the
Gemini embeddings client) constructs or calls async_embed_batch and modify the
call/constructor to accept and forward embedding_config.max_concurrent (also
update the second occurrence around the other factory/initialization block noted
in the review). Ensure the parameter name matches the Gemini client API
(max_concurrent_batches) and propagate the value from
EmbeddingConfig.max_concurrent through the factory to the async_embed_batch
invocations.
In `@openviking/models/embedder/gemini_embedders.py`:
- Line 143: The per-model ceiling stored in self._token_limit is never enforced
before calling Gemini; update embed, embed_batch, and async_embed_batch to
validate inputs against self._token_limit (locations around the current embed()
implementation and the batch methods referenced) by tokenizing each content (use
your existing tokenizer utility) and either chunking long inputs into
acceptable-sized pieces, truncating with a clear warning, or raising a
descriptive local error instead of sending oversized payloads to the SDK; ensure
the chosen behavior is applied consistently in embed(), embed_batch(), and
async_embed_batch(), and reference _MODEL_TOKEN_LIMITS/_DEFAULT_TOKEN_LIMIT
where needed to determine limits.
In `@pyproject.toml`:
- Around line 82-89: The test configuration currently sets pytest-asyncio
asyncio_mode = "auto" while the PR adds AnyIO-marked tests (`@pytest.mark.anyio`),
which conflicts; update the test config to use asyncio_mode = "strict" (so
pytest-asyncio coexists with the AnyIO plugin) and then explicitly mark asyncio
tests with `@pytest.mark.asyncio` where needed, or alternatively remove the AnyIO
markers and keep auto mode—change the asyncio_mode setting and adjust test
decorators (`@pytest.mark.anyio` / `@pytest.mark.asyncio`) accordingly.
In `@tests/integration/conftest.py`:
- Around line 182-188: The fixture gemini_embedder returns a GeminiDenseEmbedder
without closing it; change the fixture to assign the instance to a local
variable, register a teardown via request.addfinalizer that calls the embedder's
cleanup method (e.g., embedder.close() or embedder.shutdown(), or await
embedder.aclose() if async) and then return the instance; reference the fixture
name gemini_embedder and the class GeminiDenseEmbedder so the finalizer ensures
SDK clients/sockets are closed after each module param.
In `@tests/integration/test_gemini_e2e.py`:
- Around line 81-90: The test is asserting a hard latency SLA which can flap;
remove the timing assertion and keep only the functional checks: instantiate
GeminiDenseEmbedder, call async_embed_batch, assert len(results) == 300 and each
r.dense_vector has length 128, and drop the elapsed < 15 assertion (or move any
timing assertions into a separate, opt-in perf/benchmark test). Locate the
timing logic around GeminiDenseEmbedder and async_embed_batch in the test and
delete the elapsed measurement assertion while preserving the setup and
functional assertions.
In `@tests/integration/test_gemini_openviking_it.py`:
- Around line 163-168: The test is asserting the configured dimension via
OpenVikingConfigSingleton rather than the runtime/embedder/index dimension, so
replace the config check with an assertion against the live component returned
by make_ov_client: after creating client = await make_ov_client(...), query the
actual embedder or index schema on that client (e.g., inspect
client.embedder.dimension or client.index.schema.dimension or similar runtime
property provided by the OpenViking client) and assert it equals dim to ensure
the active embedder/index uses the expected dimension instead of relying on the
config value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 72c90a07-7cf7-45f5-a5db-b5200720e42b
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
openviking/models/embedder/gemini_embedders.pyopenviking_cli/utils/config/embedding_config.pypyproject.tomltests/integration/conftest.pytests/integration/test_gemini_e2e.pytests/integration/test_gemini_embedding_it.pytests/integration/test_gemini_openviking_it.pytests/unit/test_embedding_config_gemini.pytests/unit/test_gemini_embedder.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/test_embedding_config_gemini.py
| def get_query_embedder(self): | ||
| """Get embedder instance for query embeddings.""" | ||
| return self._get_contextual_embedder("query") | ||
|
|
||
| def get_document_embedder(self): | ||
| """Get embedder instance for document/passage embeddings.""" | ||
| return self._get_contextual_embedder("document") | ||
|
|
||
| def _get_contextual_embedder(self, context: str): | ||
| if not self.dense: | ||
| return self.get_embedder() | ||
|
|
||
| provider = (self.dense.provider or "").lower() | ||
| if provider == "openai": | ||
| non_symmetric = ( | ||
| self.dense.query_param is not None or self.dense.document_param is not None | ||
| ) | ||
| effective_context = context if non_symmetric else None | ||
| return self._create_embedder(provider, "dense", self.dense, context=effective_context) | ||
|
|
||
| if provider == "jina": | ||
| return self._create_embedder(provider, "dense", self.dense, context=context) | ||
|
|
||
| if provider == "gemini": | ||
| non_symmetric = ( | ||
| self.dense.query_param is not None or self.dense.document_param is not None | ||
| ) | ||
| effective_context = context if non_symmetric else None | ||
| return self._create_embedder(provider, "dense", self.dense, context=effective_context) | ||
|
|
||
| return self.get_embedder() |
There was a problem hiding this comment.
get_embedder() still bypasses Gemini's query/document routing.
These helpers only work when callers opt into them, but openviking/service/core.py:1-50 still builds the live embedder via config.embedding.get_embedder(). That path creates a single Gemini embedder with a fixed task_type, and openviking/retrieve/hierarchical_retriever.py:1-50 later calls embed(..., is_query=True) on that same instance. So query_param / document_param never affect normal indexing vs retrieval unless the service layer switches to these helpers or get_embedder() returns a context-aware wrapper.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openviking_cli/utils/config/embedding_config.py` around lines 453 - 483, The
live embedder built by get_embedder() bypasses Gemini/OpenAI per-context
routing; modify get_embedder() so when self.dense and self.dense.provider (e.g.,
"gemini" or "openai") exist and query_param/document_param are present it
returns a context-aware wrapper (or factory callable/object) that accepts a
context flag (e.g., is_query or context) and delegates to _create_embedder(...)
with effective_context similar to _get_contextual_embedder logic; reference
get_embedder(), _get_contextual_embedder(), get_query_embedder(),
get_document_embedder(), and _create_embedder() to locate and implement this
wrapper so calls that currently use get_embedder() (and later invoke embed(...,
is_query=...)) will route to the correct task_type/params.
| batches = [texts[i : i + _TEXT_BATCH_SIZE] for i in range(0, len(texts), _TEXT_BATCH_SIZE)] | ||
| results: List[Optional[List[EmbedResult]]] = [None] * len(batches) | ||
| sem = anyio.Semaphore(self._max_concurrent_batches) | ||
|
|
||
| async def _embed_one(idx: int, batch: List[str]) -> None: | ||
| async with sem: | ||
| try: | ||
| response = await self.client.aio.models.embed_content( | ||
| model=self.model_name, contents=batch, config=self._embed_config | ||
| ) | ||
| results[idx] = [ | ||
| EmbedResult( | ||
| dense_vector=truncate_and_normalize(list(emb.values), self._dimension) | ||
| ) | ||
| for emb in response.embeddings | ||
| ] | ||
| except (APIError, ClientError) as e: | ||
| logger.warning( | ||
| "Gemini async batch embed failed (HTTP %d) for batch of %d, falling back", | ||
| e.code, | ||
| len(batch), | ||
| ) | ||
| results[idx] = [ | ||
| await anyio.to_thread.run_sync(self.embed, text) for text in batch | ||
| ] |
There was a problem hiding this comment.
Make async_embed_batch() mirror the sync blank-input behavior.
embed_batch() strips empty/whitespace strings and injects zero vectors, but async_embed_batch() sends the raw batch to client.aio.models.embed_content(). That changes semantics for blank inputs and lets a single empty item force the whole chunk into slow fallback mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openviking/models/embedder/gemini_embedders.py` around lines 216 - 240,
async_embed_batch currently sends raw batch items to
client.aio.models.embed_content which breaks parity with embed_batch's handling
of empty/whitespace inputs; change the logic in _embed_one so you first split
batch into non-empty texts (trimmed), keep their original positions, call
client.aio.models.embed_content only with the non-empty list, then reconstruct
results[idx] by inserting zero vectors (use truncate_and_normalize on a zero
vector of length _dimension) for any empty inputs and the normalized embeddings
from the API response for non-empty positions; in the except block do the same
reconstruction but obtain embeddings for non-empty texts via await
anyio.to_thread.run_sync(self.embed, text) and still insert zero vectors for
empties so a single empty item doesn't force the whole chunk to become slow or
change semantics compared to embed_batch.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
openviking/models/embedder/gemini_embedders.py (1)
311-316:close()relies on SDK private attribute_http_client.Accessing
self.client._http_clientis fragile since_http_clientis an internal implementation detail of thegoogle-genaiSDK and may change or be removed without notice.💡 Consider a more defensive approach
def close(self): - if hasattr(self.client, "_http_client"): - try: - self.client._http_client.close() - except Exception: - pass + # The google-genai SDK manages its own connection lifecycle. + # If explicit cleanup is needed in the future, check SDK docs for a public API. + passAlternatively, if cleanup is critical, document that this relies on SDK internals and may need updates when upgrading the SDK version.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openviking/models/embedder/gemini_embedders.py` around lines 311 - 316, The close() method is accessing the private SDK attribute self.client._http_client which is fragile; update close() to avoid relying on that internal name by first checking for public/official cleanup methods (e.g., a close/async_close method or transport attribute) via getattr/hasattr on self.client and invoking those if present, otherwise fall back to a safe getattr(self.client, "_http_client", None) check and then calling close on it inside a try/except; also add a short comment documenting the fallback to the private attribute so future maintainers know this is an SDK-internal fallback that may need updating when upgrading google-genai.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@openviking/models/embedder/gemini_embedders.py`:
- Around line 311-316: The close() method is accessing the private SDK attribute
self.client._http_client which is fragile; update close() to avoid relying on
that internal name by first checking for public/official cleanup methods (e.g.,
a close/async_close method or transport attribute) via getattr/hasattr on
self.client and invoking those if present, otherwise fall back to a safe
getattr(self.client, "_http_client", None) check and then calling close on it
inside a try/except; also add a short comment documenting the fallback to the
private attribute so future maintainers know this is an SDK-internal fallback
that may need updating when upgrading google-genai.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0d15807f-797f-4cef-8d86-8542b28b6467
📒 Files selected for processing (2)
openviking/models/embedder/gemini_embedders.pytests/unit/test_gemini_embedder.py
1586f65 to
893a1da
Compare
…cengine#816) pull_request event blocks fork PRs behind manual approval and denies access to secrets, so DOUBAO_API_KEY is unavailable and the review silently fails. pull_request_target runs in the base-repo context, giving the workflow access to secrets while still executing the workflow file from main (not from the fork), which keeps the API key safe.
Closes volcengine#724 Adds a complete Helm chart at deploy/helm/openviking/ with: - Deployment with Recreate strategy (RocksDB single-writer) - PVC for persistent data storage - ConfigMap for ov.conf configuration - Optional ingress support - Health probes matching Dockerfile HEALTHCHECK - Configurable resources, replicas, and all ov.conf settings - Installation guide at deploy/helm/README.md
* feat: add Azure OpenAI support for embedding and VLM
Add `azure` as a first-class provider for both embedding models and VLM,
using the official `openai.AzureOpenAI` / `openai.AsyncAzureOpenAI` clients.
Changes:
- embedder: OpenAIDenseEmbedder now accepts `provider` and `api_version`
params; when provider is "azure", initializes AzureOpenAI client with
azure_endpoint and api_version
- VLM: OpenAIVLM switches between OpenAI and AzureOpenAI clients based
on the provider field; VLMFactory maps "azure" to OpenAIVLM
- config: EmbeddingModelConfig and VLMConfig gain `api_version` field;
embedding_config adds ("azure", "dense") factory entry with validation
- registry: add "azure" to VALID_PROVIDERS
- docs: update README_CN.md with Azure provider table entry, config
template fields, usage examples, and full config sample
Made-with: Cursor
* refactor: address PR review — extract helper, add validation, shared constant
- Extract `_build_openai_client_kwargs()` helper in openai_vlm.py to
eliminate duplicated Azure/OpenAI client construction across
get_client() and get_async_client()
- Add `api_base` validation for Azure provider in VLM client methods
(was missing, unlike the embedder which already validated it)
- Extract `DEFAULT_AZURE_API_VERSION` constant in registry.py and
reference it from both embedder and VLM, so future Azure API version
bumps only require a single change
- Note: removing the hardcoded `self.provider = "openai"` override in
OpenAIVLM.__init__ (done in the previous commit) is an intentional
bug fix — it allows VLMBase.provider to correctly reflect the
configured value (e.g. "azure"), which fixes token usage tracking
Made-with: Cursor
…ngine#702 pattern - GeminiDenseEmbedder: accept query_param/document_param, use is_query in embed() and embed_batch() to select task_type at call time - EmbeddingConfig: add Gemini provider, factory, validation, dimension - No get_query_embedder/get_document_embedder/_get_contextual_embedder (removed in volcengine#702; embed(is_query=True/False) is the pattern) - Tests use embed(text, is_query=True/False) pattern throughout - Rebased onto current upstream/main
98f7ae4 to
fba32cf
Compare
…r CI - Remove task_type from EmbeddingModelConfig (query_param/document_param suffice) - Wrap GeminiDenseEmbedder import in try/except (google-genai is optional) - Update tests for removed field
Summary
Adds `GeminiDenseEmbedder` — a production-ready dense embedding provider backed by the official `google-genai` SDK. Includes full `EmbeddingConfig` integration, non-symmetric task-type routing, async concurrent batching, empty-text guard, and 50 unit tests + 5 integration test files.
Files Changed (11 files, +1,429 lines)
`openviking/models/embedder/gemini_embedders.py` (new)
Core implementation of `GeminiDenseEmbedder(DenseEmbedderBase)`:
`openviking_cli/utils/config/embedding_config.py` (+135 lines)
`openviking/models/embedder/init.py` (+4 lines)
Exports `GeminiDenseEmbedder` in public API; updates module docstring.
`examples/ov.conf.example` (+11 lines)
New `embedding_gemini_example` snippet demonstrating `gemini-embedding-2-preview` with non-symmetric `RETRIEVAL_QUERY` / `RETRIEVAL_DOCUMENT` routing and `${GOOGLE_API_KEY}` substitution.
`pyproject.toml` (+8 lines)
`tests/integration/conftest.py` (+119 lines)
`tests/integration/test_gemini_embedding_it.py` (+123 lines)
Real API integration tests (auto-skipped without `GOOGLE_API_KEY`):
`tests/integration/test_gemini_openviking_it.py` (+209 lines)
Full OpenViking add-memory + search workflows (requires `GOOGLE_API_KEY` + compiled VectorDB engine):
`tests/integration/test_gemini_e2e.py` (+114 lines)
E2E HTTP server integration tests using the `server_url` uvicorn fixture.
`tests/unit/test_gemini_embedder.py` (+352 lines)
50 unit tests (all mocked, no API key needed):
`tests/unit/test_embedding_config_gemini.py` (+111 lines)
Unit tests for `EmbeddingConfig` Gemini path: valid construction, task_type validation, missing key rejection, `get_effective_dimension()`, factory kwargs, non-symmetric routing.
Post-Review Fixes Applied
Verification
Summary by CodeRabbit
Release Notes
New Features
Chores