feat: add mistral embedding function support#153
Conversation
📝 WalkthroughWalkthroughAdds a new MistralEmbeddingFunction class, exports it in the embedding_functions package, registers it under "mistral" in EmbeddingFunctionRegistry, and includes unit tests for initialization, usage, and config serialization. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@tests/unit_tests/test_mistral_embedding_function.py`:
- Around line 70-83: The test test_initialization_with_custom_api_key_env sets
os.environ[custom_key_env] but never restores or removes it; modify the test to
set the env var using a safe fixture or helper (e.g., env_guard or pytest's
monkeypatch) so the original environment is restored after the test, or
explicitly save the original value and restore/del the custom_key_env in
teardown; ensure you reference the custom_key_env variable used in
test_initialization_with_custom_api_key_env and that the
MistralEmbeddingFunction instantiation still uses api_key_env=custom_key_env.
- Around line 84-94: The test test_initialization_with_custom_api_base currently
uses the default endpoint, so change the custom_base value to a truly different
URL (e.g., "https://custom.mistral.local" or "http://localhost:8000") and
re-initialize MistralEmbeddingFunction(model_name="mistral-embed",
api_base=custom_base) to assert ef.api_base == custom_base; update only the
custom_base string in the test_initialization_with_custom_api_base test to a
non-default URL so the test actually verifies the API base is stored by
MistralEmbeddingFunction.
🧹 Nitpick comments (2)
tests/unit_tests/test_mistral_embedding_function.py (1)
34-51: Consider usingpytest.fail()instead ofraise AssertionError.The
test_mistral_envmethod manually raisesAssertionError. Usingpytest.fail()is more idiomatic and provides better integration with pytest's reporting.♻️ Suggested improvement
def test_mistral_env(self): """Test if openai package is installed and required environment variables are set.""" if not is_openai_available(): print("openai package is not installed") - raise AssertionError("openai package is not installed") + pytest.fail("openai package is not installed") if not os.environ.get("MISTRAL_API_KEY"): print("MISTRAL_API_KEY environment variable is not set") - raise AssertionError("MISTRAL_API_KEY environment variable is not set") + pytest.fail("MISTRAL_API_KEY environment variable is not set")src/pyseekdb/utils/embedding_functions/mistral_embedding_function.py (1)
110-129: Remove the redundant__call__override.The base class
OpenAIBaseEmbeddingFunction.__call__(lines 158-195) implements identical logic. SinceMistralEmbeddingFunctionpassesdimensions=Nonetosuper().__init__(), the base class's conditional dimensions check (line 182) will never execute for Mistral instances, making the override unnecessary.
| def test_initialization_with_custom_api_base(self): | ||
| """Test MistralEmbeddingFunction initialization with custom API base""" | ||
| print("\nTesting MistralEmbeddingFunction initialization with custom API base") | ||
|
|
||
| self.test_mistral_env() | ||
|
|
||
| custom_base = "https://api.mistral.ai/v1" | ||
| ef = MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base) | ||
| assert ef.api_base == custom_base | ||
| print(f" Custom API base: {ef.api_base}") | ||
|
|
There was a problem hiding this comment.
Test uses default URL instead of an actual custom API base.
The custom_base is set to "https://api.mistral.ai/v1", which is the same as the default API base. This test doesn't actually verify that a custom base URL is correctly applied.
🔧 Suggested fix
def test_initialization_with_custom_api_base(self):
"""Test MistralEmbeddingFunction initialization with custom API base"""
print("\nTesting MistralEmbeddingFunction initialization with custom API base")
self.test_mistral_env()
- custom_base = "https://api.mistral.ai/v1"
+ custom_base = "https://custom.mistral.example.com/v1"
ef = MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base)
assert ef.api_base == custom_base
print(f" Custom API base: {ef.api_base}")Note: This test only verifies the parameter is stored correctly, not that API calls use it (which would require mocking).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_initialization_with_custom_api_base(self): | |
| """Test MistralEmbeddingFunction initialization with custom API base""" | |
| print("\nTesting MistralEmbeddingFunction initialization with custom API base") | |
| self.test_mistral_env() | |
| custom_base = "https://api.mistral.ai/v1" | |
| ef = MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base) | |
| assert ef.api_base == custom_base | |
| print(f" Custom API base: {ef.api_base}") | |
| def test_initialization_with_custom_api_base(self): | |
| """Test MistralEmbeddingFunction initialization with custom API base""" | |
| print("\nTesting MistralEmbeddingFunction initialization with custom API base") | |
| self.test_mistral_env() | |
| custom_base = "https://custom.mistral.example.com/v1" | |
| ef = MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base) | |
| assert ef.api_base == custom_base | |
| print(f" Custom API base: {ef.api_base}") |
🤖 Prompt for AI Agents
In `@tests/unit_tests/test_mistral_embedding_function.py` around lines 84 - 94,
The test test_initialization_with_custom_api_base currently uses the default
endpoint, so change the custom_base value to a truly different URL (e.g.,
"https://custom.mistral.local" or "http://localhost:8000") and re-initialize
MistralEmbeddingFunction(model_name="mistral-embed", api_base=custom_base) to
assert ef.api_base == custom_base; update only the custom_base string in the
test_initialization_with_custom_api_base test to a non-default URL so the test
actually verifies the API base is stored by MistralEmbeddingFunction.
Summary
Fix #135.
Integrate Mistral embedding function
Solution Description
Mistral test result: