Skip to content

feat: add mistral embedding function support#153

Merged
hnwyllmm merged 2 commits into
oceanbase:developfrom
chakkk309:add-Mistral-embedding-function-support
Jan 31, 2026
Merged

feat: add mistral embedding function support#153
hnwyllmm merged 2 commits into
oceanbase:developfrom
chakkk309:add-Mistral-embedding-function-support

Conversation

@chakkk309

@chakkk309 chakkk309 commented Jan 29, 2026

Copy link
Copy Markdown

Summary

Fix #135.

Integrate Mistral embedding function

Solution Description

Mistral test result:

uv run pytest tests/unit_tests/test_mistral_embedding_function.py -vv   
================================================================================ test session starts ================================================================================
platform darwin -- Python 3.11.13, pytest-9.0.2, pluggy-1.6.0 -- /pyseekdb/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /pyseekdb
configfile: pyproject.toml
plugins: anyio-4.12.1
collected 12 items                                                                                                                                                                  

tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_mistral_env PASSED                                                                    [  8%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_initialization_with_defaults PASSED                                                   [ 16%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_initialization_with_custom_api_key_env PASSED                                         [ 25%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_initialization_with_custom_api_base PASSED                                            [ 33%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_initialization_with_additional_kwargs PASSED                                          [ 41%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_initialization_with_missing_api_key PASSED                                            [ 50%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_dimension_property_for_known_model PASSED                                             [ 58%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_embedding_generation_single_document PASSED                                           [ 66%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_embedding_generation_multiple_documents PASSED                                        [ 75%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_embedding_with_empty_input PASSED                                                     [ 83%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_dimension_of_function PASSED                                                          [ 91%]
tests/unit_tests/test_mistral_embedding_function.py::TestMistralEmbeddingFunction::test_persistence PASSED                                                                    [100%]

================================================================================ 12 passed in 3.22s =================================================================================


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **New Features**
  * Added support for Mistral text embeddings: a new embedding provider is available and can be configured via environment variable and standard embedding settings; supports batch inputs and serialization.

* **Tests**
  * Added comprehensive unit tests for the Mistral embedding provider (guarded for runtime API availability).

<sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@coderabbitai

coderabbitai Bot commented Jan 29, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds a new MistralEmbeddingFunction class, exports it in the embedding_functions package, registers it under "mistral" in EmbeddingFunctionRegistry, and includes unit tests for initialization, usage, and config serialization.

Changes

Cohort / File(s) Summary
New Embedding Implementation
src/pyseekdb/utils/embedding_functions/mistral_embedding_function.py
Adds MistralEmbeddingFunction (subclassing OpenAIBaseEmbeddingFunction) with defaults for model, API base, API key env, dimension lookup, call handling, config serialization, and validation.
Module Exports
src/pyseekdb/utils/embedding_functions/__init__.py
Exports MistralEmbeddingFunction in the package public surface (__all__).
Registry Integration
src/pyseekdb/client/embedding_function.py
Registers "mistral" -> MistralEmbeddingFunction in EmbeddingFunctionRegistry initialization.
Tests
tests/unit_tests/test_mistral_embedding_function.py
Adds unit tests covering initialization variants, embedding calls (single/multiple), empty input handling, dimension checks, config build/restore, and runtime guards for missing env/deps.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client
participant Registry
participant MistralEmbed as MistralEmbeddingFunction
participant MistralAPI as External Mistral API
Client->>Registry: request embedding function("mistral")
Registry-->>Client: return MistralEmbeddingFunction class
Client->>MistralEmbed: instantiate / call(documents)
MistralEmbed->>MistralAPI: POST /v1/embeddings (model, input, api_key)
MistralAPI-->>MistralEmbed: embeddings response
MistralEmbed-->>Client: list[embeddings]

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • hnwyllmm

Poem

🐰 I hopped into code with a curious grin,
Mistral embeddings tucked neatly within,
Keys and models aligned, vectors take flight,
Registry welcomed them, morning to night,
Hooray — new ways to search, shiny and light! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding Mistral embedding function support to the codebase.
Linked Issues check ✅ Passed The PR successfully implements Mistral text embedding function support as required by issue #135, with proper integration into the registry and comprehensive unit tests.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing Mistral embedding function support; no unrelated or out-of-scope modifications were introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/unit_tests/test_mistral_embedding_function.py`:
- Around line 70-83: The test test_initialization_with_custom_api_key_env sets
os.environ[custom_key_env] but never restores or removes it; modify the test to
set the env var using a safe fixture or helper (e.g., env_guard or pytest's
monkeypatch) so the original environment is restored after the test, or
explicitly save the original value and restore/del the custom_key_env in
teardown; ensure you reference the custom_key_env variable used in
test_initialization_with_custom_api_key_env and that the
MistralEmbeddingFunction instantiation still uses api_key_env=custom_key_env.
- Around line 84-94: The test test_initialization_with_custom_api_base currently
uses the default endpoint, so change the custom_base value to a truly different
URL (e.g., "https://custom.mistral.local" or "http://localhost:8000") and
re-initialize MistralEmbeddingFunction(model_name="mistral-embed",
api_base=custom_base) to assert ef.api_base == custom_base; update only the
custom_base string in the test_initialization_with_custom_api_base test to a
non-default URL so the test actually verifies the API base is stored by
MistralEmbeddingFunction.
🧹 Nitpick comments (2)
tests/unit_tests/test_mistral_embedding_function.py (1)

34-51: Consider using pytest.fail() instead of raise AssertionError.

The test_mistral_env method manually raises AssertionError. Using pytest.fail() is more idiomatic and provides better integration with pytest's reporting.

♻️ Suggested improvement
     def test_mistral_env(self):
         """Test if openai package is installed and required environment variables are set."""
         if not is_openai_available():
             print("openai package is not installed")
-            raise AssertionError("openai package is not installed")
+            pytest.fail("openai package is not installed")

         if not os.environ.get("MISTRAL_API_KEY"):
             print("MISTRAL_API_KEY environment variable is not set")
-            raise AssertionError("MISTRAL_API_KEY environment variable is not set")
+            pytest.fail("MISTRAL_API_KEY environment variable is not set")
src/pyseekdb/utils/embedding_functions/mistral_embedding_function.py (1)

110-129: Remove the redundant __call__ override.

The base class OpenAIBaseEmbeddingFunction.__call__ (lines 158-195) implements identical logic. Since MistralEmbeddingFunction passes dimensions=None to super().__init__(), the base class's conditional dimensions check (line 182) will never execute for Mistral instances, making the override unnecessary.

Comment thread tests/unit_tests/test_mistral_embedding_function.py
Comment on lines +84 to +94
def test_initialization_with_custom_api_base(self):
"""Test MistralEmbeddingFunction initialization with custom API base"""
print("\nTesting MistralEmbeddingFunction initialization with custom API base")

self.test_mistral_env()

custom_base = "https://api.mistral.ai/v1"
ef = MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base)
assert ef.api_base == custom_base
print(f" Custom API base: {ef.api_base}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test uses default URL instead of an actual custom API base.

The custom_base is set to "https://api.mistral.ai/v1", which is the same as the default API base. This test doesn't actually verify that a custom base URL is correctly applied.

🔧 Suggested fix
     def test_initialization_with_custom_api_base(self):
         """Test MistralEmbeddingFunction initialization with custom API base"""
         print("\nTesting MistralEmbeddingFunction initialization with custom API base")

         self.test_mistral_env()

-        custom_base = "https://api.mistral.ai/v1"
+        custom_base = "https://custom.mistral.example.com/v1"
         ef = MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base)
         assert ef.api_base == custom_base
         print(f"   Custom API base: {ef.api_base}")

Note: This test only verifies the parameter is stored correctly, not that API calls use it (which would require mocking).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_initialization_with_custom_api_base(self):
"""Test MistralEmbeddingFunction initialization with custom API base"""
print("\nTesting MistralEmbeddingFunction initialization with custom API base")
self.test_mistral_env()
custom_base = "https://api.mistral.ai/v1"
ef = MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base)
assert ef.api_base == custom_base
print(f" Custom API base: {ef.api_base}")
def test_initialization_with_custom_api_base(self):
"""Test MistralEmbeddingFunction initialization with custom API base"""
print("\nTesting MistralEmbeddingFunction initialization with custom API base")
self.test_mistral_env()
custom_base = "https://custom.mistral.example.com/v1"
ef = MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base)
assert ef.api_base == custom_base
print(f" Custom API base: {ef.api_base}")
🤖 Prompt for AI Agents
In `@tests/unit_tests/test_mistral_embedding_function.py` around lines 84 - 94,
The test test_initialization_with_custom_api_base currently uses the default
endpoint, so change the custom_base value to a truly different URL (e.g.,
"https://custom.mistral.local" or "http://localhost:8000") and re-initialize
MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base) to
assert ef.api_base == custom_base; update only the custom_base string in the
test_initialization_with_custom_api_base test to a non-default URL so the test
actually verifies the API base is stored by MistralEmbeddingFunction.

@hnwyllmm hnwyllmm merged commit 2dfa0e6 into oceanbase:develop Jan 31, 2026
7 checks passed
@chakkk309 chakkk309 deleted the add-Mistral-embedding-function-support branch January 31, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement]: Integrate Mistral embedding function in pyseekdb

2 participants