chore: intro ruff check and make it happy by PsiACE · Pull Request #125 · oceanbase/pyseekdb

PsiACE · 2026-01-20T12:00:41Z

Summary

Ruff is an extremely fast linter and formatter. We have previously introduced it for code formatting, and using it as a linter will help us adhere to consistent coding standards.

Solution Description

I added it to the pre-commit configuration, and it fixed almost all the issues.

We might need a little rebasing to ensure we're aligned with the main branch, but it won't be too difficult. I'll check these tomorrow.

Summary by CodeRabbit

New Features
- Expanded public API surface with additional exports: AdminAPI, AdminClient, Client, Database, DefaultEmbeddingFunction, EmbeddingFunction, and utility functions.
Refactor
- Modernized type hints across the codebase using Python 3.9+ syntax for improved clarity.
- Standardized embedding function interfaces for consistency.
- Enhanced code organization and imports.
Chores
- Added Ruff linting configuration for code quality.
- Improved error handling and logging throughout.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Chojan Shang <psiace@apache.org>

coderabbitai · 2026-01-20T12:00:47Z

📝 Walkthrough

Walkthrough

This pull request modernizes the codebase's Python typing system by migrating from legacy typing imports (Optional, Union, List, Dict) to PEP 604 union syntax and built-in generics, renames embedding function parameters from input to documents, expands public API exports, adds ruff linting configuration, and applies consistent formatting improvements across source code and tests.

Changes

Cohort / File(s)	Summary
Type Hint Modernization (PEP 604) `src/pyseekdb/client/`, `src/pyseekdb/utils/embedding_functions/`, `pyproject.toml`	Systematic conversion from `Optional[T]` → `T \| None`, `Union[A, B]` → `A \| B`, `List[T]` → `list[T]`, `Dict[K, V]` → `dict[K, V]` across method signatures, properties, and type aliases.
Embedding Function API Refactoring `src/pyseekdb/client/embedding_function.py`, `src/pyseekdb/utils/embedding_functions/*`, `demo/rag/embedding_function_factory.py`, `examples/complete_example.py`, `examples/hybrid_search_example.py`	Renamed `__call__` parameter from `input` to `documents`; updated return types and method signatures; adjusted default embedding function initialization from explicit instance to `None`-based lazy instantiation.
Public API Expansion `src/pyseekdb/__init__.py`, `src/pyseekdb/client/__init__.py`	Added exports for `AdminAPI`, `AdminClient`, `BaseConnection`, `Client`, `Database`, `DefaultEmbeddingFunction`, `EmbeddingFunction`, constants (`DOCUMENT`, `DOCUMENTS`, `EMBEDDINGS`, etc.), and utility functions (`get_default_embedding_function`, `register_embedding_function`).
Configuration & Factory Refactoring `src/pyseekdb/client/client_base.py`, `src/pyseekdb/client/__init__.py`	Introduced unified `_create_server_client` helper; refactored `Client()` and `AdminClient()` factory signatures to use union types; updated parameter handling for path/host/port/user.
Linting & Code Quality `.pre-commit-config.yaml`, `pyproject.toml`	Added ruff-check pre-commit hook with `--exit-non-zero-on-fix`; added comprehensive ruff linting configuration with target-version, select rules, ignore rules, and per-file ignores.
Collection & Query API Updates `src/pyseekdb/client/collection.py`, `src/pyseekdb/client/hybrid_search.py`, `src/pyseekdb/client/filters.py`	Updated method signatures for add/update/upsert/query/get/hybrid_search to use union types; narrowed parameter types for filter builders; added stricter type annotations for payload validation.
Database & Connection Utilities `src/pyseekdb/client/database.py`, `src/pyseekdb/client/base_connection.py`, `src/pyseekdb/client/sql_utils.py`	Updated `Database.__init__` parameter types; added module-level logger to `BaseConnection`; replaced `zip()` with `strict=True` for tighter sequence alignment.
Embedded & Remote Client Implementations `src/pyseekdb/client/client_seekdb_embedded.py`, `src/pyseekdb/client/client_seekdb_server.py`, `src/pyseekdb/client/admin_client.py`	Updated `list_databases` signatures to use union types; refactored type-ignore annotations; consolidated log messages to single-line format.
Utility & Helper Functions `src/pyseekdb/client/version.py`, `src/pyseekdb/client/meta_info.py`, `src/pyseekdb/client/configuration.py`	Updated version parsing error messages; added `ClassVar` annotations for class-level constants; refactored distance validation in `HNSWConfiguration`.
Demo & Example Updates `demo/rag/llm.py`, `demo/rag/seekdb_app.py`, `demo/rag/seekdb_insert.py`, `demo/rag/seekdb_utils.py`, `examples/simple_example.py`, `examples/hybrid_search_example.py`	Updated type hints to modern syntax; removed unused imports; refactored default embedding function handling; consolidated multi-line calls and improved formatting.
Documentation Metadata `docs/conf.py`	Added linter suppression comment (`# noqa: A001`) to copyright declaration.
Integration Tests `tests/integration_tests/`, `tests/unit_tests/`	Updated embedding function test implementations to use `documents` parameter; replaced explicit try/except cleanup with `contextlib.suppress()`; standardized type hints across test fixtures and helper methods; consolidated multi-line calls and print statements.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Rationale: Large scope (~1400+ lines across 60+ files) with mixed homogeneity. While most changes are straightforward type hint modernizations (promoting consistency), critical semantic changes require careful review: embedding function parameter renames affecting public API contracts, expanded public exports that may introduce unintended surface area, default value changes for embedding function initialization, stricter validation in filters and hybrid search, and consolidation of error handling patterns across multiple client implementations. The heterogeneity of changes across client implementations, embedding functions, and test files demands per-domain reasoning despite repetitive patterns.

Possibly related PRs

chore: add pre commit #112: Expanded pre-commit hook configuration and added ruff linting setup with related CI/Makefile updates — directly overlaps with the ruff configuration additions in .pre-commit-config.yaml and pyproject.toml.
support embedding function metadata persistence #119: Extended embedding function registry and public exports for register_embedding_function — directly related to the embedding function API changes and expanded __init__.py exports.
Support more embedding function platform #110: Modified embedding function implementations (OpenAI, LiteLLM, Qwen, SentenceTransformer) — overlaps with the parameter rename (input → documents) and type hint updates across all embedding function files.

Suggested reviewers

hnwyllmm

Poem

🐰 With whiskers twitching, types now gleam,
From Optional dreams to union's stream,
Parameters renamed, embeddings dance,
Public API blooms with cleaner stance,
Modern Python syntax takes its flight,
Our code now shines so crisp and bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main objective: introducing ruff check and fixing linting issues to satisfy it. It directly relates to the primary changes in the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

tests/integration_tests/test_admin_database_management.py (1)
44-92: Use try/finally for cleanup instead of broad except Exception (BLE001).

Catching all exceptions masks the original traceback and violates Ruff's BLE001 rule. A try/finally structure ensures cleanup runs while allowing the exception to propagate naturally, which pytest will catch and fail the test.
💡 Proposed refactor
         try:
             # Step 1: List all databases before test
             print("\n📋 Step 1: List all databases")
             databases_before = admin_client.list_databases()
             assert databases_before is not None
             assert isinstance(databases_before, (list, tuple))
             print(f"   Found {len(databases_before)} databases before test")
             for db in databases_before[:3]:
                 print(f"   - {db.name} (tenant={db.tenant})")
@@ Step 3 through Step 5 omitted for brevity @@
             print("\n🎉 All database management operations completed successfully!")
-        except Exception as e:
-            # Cleanup: try to delete test database if it exists
-            with contextlib.suppress(Exception):
-                admin_client.delete_database(test_db_name)
-            pytest.fail(f"Admin client test failed: {e}")
+        finally:
+            # Cleanup: try to delete test database if it exists
+            with contextlib.suppress(Exception):
+                admin_client.delete_database(test_db_name)
src/pyseekdb/utils/embedding_functions/openai_embedding_function.py (1)
151-163: Document the new TypeError in build_from_config

build_from_config now raises TypeError for non-dict client_kwargs, but the docstring still lists only ValueError. Please update the Raises section to reflect the new error type and keep the message consistent with your linting preferences.
📝 Suggested update
         Raises:
             ValueError: If the configuration is invalid or missing required fields
+            TypeError: If client_kwargs is not a dictionary
@@
         client_kwargs = config.get("client_kwargs", {})
         if not isinstance(client_kwargs, dict):
-            raise TypeError(f"client_kwargs must be a dictionary, but got {client_kwargs}")
+            raise TypeError(f"client_kwargs must be a dictionary, but got {client_kwargs!r}")
tests/integration_tests/test_default_embedding_function.py (1)

83-147: Avoid swallowing unexpected hybrid_search failures

The broad except Exception blocks here can mask real regressions. Consider narrowing to expected exception types or re-raising after logging so unexpected failures still surface.
src/pyseekdb/client/client_base.py (1)
872-886: Explicit None is ignored when persistence exists.

When a persisted embedding function exists and callers pass embedding_function=None, the current logic returns the persisted function anyway. This contradicts the docstring (“explicitly set to None”) and the updated tests that expect a conflict. Please align the behavior (either honor explicit None or raise on any provided parameter).
🛠️ One possible fix (honor explicit None)
-        if embedding_function_persistence is not None and embedding_function is not _NOT_PROVIDED:
-            if embedding_function and embedding_function_persistence.name() != embedding_function.name():
-                raise ValueError(
-                    "Both embedding function from parameter (not _NOT_PROVIDED, default value) and embedding function from persistence provided."
-                )
-            else:
-                return embedding_function_persistence
+        if embedding_function_persistence is not None and embedding_function is not _NOT_PROVIDED:
+            if embedding_function is None:
+                return None
+            if embedding_function_persistence.name() != embedding_function.name():
+                raise ValueError(
+                    "Both embedding function from parameter (not _NOT_PROVIDED, default value) and embedding function from persistence provided."
+                )
+            return embedding_function_persistence
src/pyseekdb/client/hybrid_search.py (1)
268-305: Metadata filters with $and/$or can be misrouted to document filters.

The heuristic treats any dict containing $and/$or as a document filter, but metadata filters also use those operators. This can shift metadata filters into where_document unexpectedly. Consider only classifying as document filters when $contains/$not_contains is present, and require DocumentExpression for explicit document logic.
🐛 Suggested fix
-                if any(key in item for key in ("$contains", "$not_contains", "$and", "$or")):
+                if any(key in item for key in ("$contains", "$not_contains")):

🤖 Fix all issues with AI agents

In `@docs/conf.py`:
- Line 10: Remove the unused noqa suppression from the copyright assignment
line; locate the statement setting copyright = "2025, OceanBase" (the line
containing `# noqa: A001`) and delete the trailing `# noqa: A001` so the lint
directive is no longer present.

In `@src/pyseekdb/client/client_base.py`:
- Around line 1514-1519: In _collection_update(), the embedding is being
serialized as a JSON-like string which differs from add/upsert; change the
serialization to use the same helper used elsewhere by calling
_embedding_to_hexstring(vec_val) for non-None embeddings and append the
resulting value to set_clauses for CollectionFieldNames.EMBEDDING (keeping the
same quoting/formatting used by inserts/upserts); ensure you still skip None
embeddings and preserve existing set_clauses handling so updates store
embeddings in the same binary/hex form as _embedding_to_hexstring.

In `@src/pyseekdb/utils/embedding_functions/litellm_embedding_function.py`:
- Around line 171-172: In LiteLLM embedding function validate the mismatch check
to use the renamed parameter documents instead of the undefined input: inside
the function/method that produces embeddings (references to embeddings and the
previous input), replace the length check and error message to compare
len(embeddings) to len(documents) and raise a ValueError that mentions documents
(e.g., "Expected {len(documents)} embeddings but got {len(embeddings)} from
LiteLLM") so the runtime NameError is removed and the message reflects the
correct parameter.

In `@tests/integration_tests/test_client_creation.py`:
- Around line 193-205: The cleanup currently catches all Exception around
db_client.delete_collection for test_collection_name and
test_collection_name_default; narrow this by catching the specific expected
exception(s) (e.g., CollectionNotFoundError or the client library's specific
error type) and handle those (log/ignore) while allowing unexpected exceptions
to propagate (or re-raise) so test failures are visible; update the two
try/except blocks around db_client.delete_collection to catch only the expected
error type(s) and re-raise or not catch other Exceptions.

In `@tests/integration_tests/test_empty_value_handling.py`:
- Around line 53-56: The broad except in the cleanup block around
delete_collection(collection_name) triggers Ruff BLE001; either narrow the
except to the specific ValueError raised for "collection not found" (e.g.
replace "except Exception as cleanup_error:" with "except ValueError as
cleanup_error:") or explicitly suppress the lint rule for this best-effort test
cleanup by appending "# noqa: BLE001" to the existing except line (keep the
print using cleanup_error and collection_name unchanged).

🧹 Nitpick comments (17)

tests/integration_tests/test_security_sql_injection.py (1)

269-271: Consider catching more specific exceptions.

Ruff flagged except Exception (BLE001) as overly broad. While catching broad exceptions in test code is common practice to ensure clear failure reporting, you could narrow this to catch only expected database/query exceptions for better signal.

That said, since this is test code and the exception details are passed to pytest.fail() for visibility, this is acceptable as-is.
tests/unit_tests/test_collection_name_validation.py (1)
16-16: Remove unused noqa: E402 directive.

The E402 rule (module-level import not at top of file) is not enabled in the Ruff configuration. The noqa comment is unnecessary and can be removed.
Suggested fix
-from pyseekdb.client.client_base import _validate_collection_name  # noqa: E402
+from pyseekdb.client.client_base import _validate_collection_name
pyproject.toml (1)

100-101: preview = true can cause formatting diffs across Ruff versions.

Preview features may change in patch releases. For consistent formatting in CI, either remove this setting or pin the Ruff version strictly (including patch version in lock files).
demo/rag/llm.py (2)
76-80: The try/except/else pattern works but could be simplified.

Using else after try/except is valid Python, but placing return inside the try block (before the except) would be more conventional and slightly clearer. The current approach works correctly.
♻️ Optional simplification
     try:
         # Prepare request parameters
         request_params = {
             "model": model,
             "messages": [
                 {"role": "system", "content": SYSTEM_PROMPT},
                 {"role": "user", "content": USER_PROMPT},
             ],
             "temperature": temperature,
         }

         # Add max_tokens if specified
         if max_tokens:
             request_params["max_tokens"] = max_tokens

         response = client.chat.completions.create(**request_params)
+        return response.choices[0].message.content
     except Exception as e:
         print(f"Error generating LLM response: {e}")
         return f"Sorry, an error occurred while generating the answer: {e!s}"
-    else:
-        return response.choices[0].message.content
120-124: Same pattern as above - consider simplifying.

The else block is valid but unconventional. Same optional refactor applies here.
src/pyseekdb/client/filters.py (2)
84-84: Remove unused noqa directive.

The # noqa: C901 directive is unused since this rule is not enabled in ruff configuration. Per static analysis hint (RUF100).
♻️ Proposed fix
-    def _build_condition(  # noqa: C901
+    def _build_condition(
276-278: Remove unused noqa directive.

Same issue - the # noqa: C901 directive is unused. Per static analysis hint (RUF100).
♻️ Proposed fix
-    def _build_search_filter_condition(  # noqa: C901
+    def _build_search_filter_condition(
tests/unit_tests/test_configuration.py (1)
8-14: Remove unused noqa directive.

The # noqa: E402 directive is unused since this rule is not enabled in ruff configuration. Per static analysis hint (RUF100).
♻️ Proposed fix
-from pyseekdb import Configuration, FulltextAnalyzerConfig, HNSWConfiguration  # noqa: E402
+from pyseekdb import Configuration, FulltextAnalyzerConfig, HNSWConfiguration
tests/integration_tests/test_default_embedding_function.py (1)
215-223: Mark unused documents parameter explicitly

EmptyResultEF.__call__ doesn’t use documents; prefixing it with _ (or adding an inline suppression) makes the intent explicit and avoids unused-arg noise.
♻️ Suggested tweak
-        class EmptyResultEF:
-            def __call__(self, documents):
+        class EmptyResultEF:
+            def __call__(self, _documents):
                 return []
src/pyseekdb/client/client_seekdb_embedded.py (1)
117-119: Remove the unused noqa directive.

Ruff indicates that C901 (complexity) is not enabled, making # noqa: C901 unnecessary. The use_context_manager parameter is intentionally unused as part of interface compliance with BaseClient, which is acceptable.
🔧 Suggested fix
-    def _execute_query_with_cursor(  # noqa: C901
+    def _execute_query_with_cursor(
         self, conn: Any, sql: str, params: list[Any], use_context_manager: bool = True
     ) -> list[dict[str, Any]]:
tests/integration_tests/test_collection_hybrid_search.py (1)

37-120: Test data helper mirrors the builder integration tests.

The _insert_test_data implementation is identical to the one in test_collection_hybrid_search_builder_integration.py. Consider extracting this to a shared test utility to reduce duplication.

The _insert_test_data method is duplicated between this file and test_collection_hybrid_search_builder_integration.py. Consider extracting it to a shared fixture or utility in conftest.py to reduce maintenance burden.
src/pyseekdb/utils/embedding_functions/litellm_embedding_function.py (1)
98-98: Remove unused noqa directive.

Ruff reports that # noqa: C901 is unused because this rule is not enabled. The function complexity appears acceptable without suppression.
♻️ Proposed fix
-    def __call__(self, documents: Documents) -> Embeddings:  # noqa: C901
+    def __call__(self, documents: Documents) -> Embeddings:
tests/unit_tests/test_embedding_function_registry.py (1)

135-136: Consider renaming _input to documents for consistency.

Some test methods use _input while others use documents. While both work functionally, using documents consistently would align better with the protocol definition and other test methods in this file.
demo/rag/embedding_function_factory.py (1)
7-8: Consider importing type aliases from pyseekdb instead of redefining.

Documents and Embeddings are already defined in pyseekdb.client.embedding_function. Redefining them here could lead to inconsistencies if the upstream definitions change.
♻️ Proposed fix
-from pyseekdb import DefaultEmbeddingFunction, EmbeddingFunction
+from pyseekdb import DefaultEmbeddingFunction, EmbeddingFunction
+from pyseekdb.client.embedding_function import Documents, Embeddings

-Documents = str | list[str]
-Embeddings = list[list[float]]
src/pyseekdb/client/client_base.py (2)

257-257: Remove unused # noqa: C901 markers.

These are now flagged as unused; please drop them or enable the complexity rule if you still want the suppression.

Also applies to: 479-479, 763-763, 1230-1230, 1389-1389, 1531-1531, 2163-2163, 2355-2355, 2578-2578, 2684-2684, 2813-2813, 2900-2900, 3010-3010

843-855: Return type should allow None.

_resolve_embedding_function() returns None when no persisted config exists, but the annotation is non-optional. Align the return type to avoid misleading type-checkers.

tests/integration_tests/test_collection_embedding_function.py (1)

642-642: Remove unused # noqa: C901.

Ruff flags this as unused; consider dropping it.

coderabbitai · 2026-01-22T05:23:56Z

 # Project information
 project = "pyseekdb"
-copyright = "2025, OceanBase"
+copyright = "2025, OceanBase"  # noqa: A001


⚠️ Potential issue | 🟡 Minor

Remove unused noqa suppression.
Ruff flags this directive as unused; dropping it keeps lint clean.

🧹 Proposed fix

-copyright = "2025, OceanBase" # noqa: A001 +copyright = "2025, OceanBase"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

copyright = "2025, OceanBase" # noqa: A001

copyright = "2025, OceanBase"

🧰 Tools

🪛 Ruff (0.14.13)

10-10: Unused noqa directive (non-enabled: A001)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents

In `@docs/conf.py` at line 10, Remove the unused noqa suppression from the copyright assignment line; locate the statement setting copyright = "2025, OceanBase" (the line containing `# noqa: A001`) and delete the trailing `# noqa: A001` so the lint directive is no longer present.

coderabbitai · 2026-01-22T05:23:56Z

            if embeddings:
                vec_val = embeddings[i]
                if vec_val is not None:
                    vec_str = "[" + ",".join(map(str, vec_val)) + "]"
-                    set_clauses.append(
-                        f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'"
-                    )
+                    set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'")



⚠️ Potential issue | 🟠 Major

Embedding update uses a different serialization than add/upsert.

_collection_update() currently writes embeddings as a JSON-like string, while add/upsert use _embedding_to_hexstring() (binary literal). This is likely to break vector columns or cause inconsistent storage. Please serialize updates the same way as inserts.

🐛 Suggested fix

- if embeddings: - vec_val = embeddings[i] - if vec_val is not None: - vec_str = "[" + ",".join(map(str, vec_val)) + "]" - set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'") + if embeddings: + vec_val = embeddings[i] + vec_sql = "NULL" if vec_val is None else _embedding_to_hexstring(vec_val) + set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = {vec_sql}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if embeddings:

vec_val = embeddings[i]

if vec_val is not None:

vec_str = "[" + ",".join(map(str, vec_val)) + "]"

set_clauses.append(

f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'"

)

set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'")

if embeddings:

vec_val = embeddings[i]

vec_sql = "NULL" if vec_val is None else _embedding_to_hexstring(vec_val)

set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = {vec_sql}")

🤖 Prompt for AI Agents

In `@src/pyseekdb/client/client_base.py` around lines 1514 - 1519, In _collection_update(), the embedding is being serialized as a JSON-like string which differs from add/upsert; change the serialization to use the same helper used elsewhere by calling _embedding_to_hexstring(vec_val) for non-None embeddings and append the resulting value to set_clauses for CollectionFieldNames.EMBEDDING (keeping the same quoting/formatting used by inserts/upserts); ensure you still skip None embeddings and preserve existing set_clauses handling so updates store embeddings in the same binary/hex form as _embedding_to_hexstring.

coderabbitai · 2026-01-22T05:23:56Z

        if len(embeddings) != len(input):
-            raise ValueError(
-                f"Expected {len(input)} embeddings but got {len(embeddings)} from LiteLLM"
-            )
+            raise ValueError(f"Expected {len(input)} embeddings but got {len(embeddings)} from LiteLLM")


⚠️ Potential issue | 🔴 Critical

Critical bug: input variable is undefined after parameter rename.

The parameter was renamed from input to documents, but the validation on lines 171-172 still references the old input variable. This will raise a NameError at runtime.

🐛 Proposed fix

# Validate that we got the expected number of embeddings - if len(embeddings) != len(input): - raise ValueError(f"Expected {len(input)} embeddings but got {len(embeddings)} from LiteLLM") + if len(embeddings) != len(documents): + raise ValueError(f"Expected {len(documents)} embeddings but got {len(embeddings)} from LiteLLM")

🧰 Tools

🪛 Ruff (0.14.13)

172-172: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@src/pyseekdb/utils/embedding_functions/litellm_embedding_function.py` around lines 171 - 172, In LiteLLM embedding function validate the mismatch check to use the renamed parameter documents instead of the undefined input: inside the function/method that produces embeddings (references to embeddings and the previous input), replace the length check and error message to compare len(embeddings) to len(documents) and raise a ValueError that mentions documents (e.g., "Expected {len(documents)} embeddings but got {len(embeddings)} from LiteLLM") so the runtime NameError is removed and the message reflects the correct parameter.

coderabbitai · 2026-01-22T05:23:56Z

        # Clean up: delete all test collections
        try:
            db_client.delete_collection(test_collection_name)
            print(f"   Cleaned up collection: {test_collection_name}")
        except Exception as cleanup_error:
-            print(
-                f"   Warning: Failed to cleanup {test_collection_name}: {cleanup_error}"
-            )
+            print(f"   Warning: Failed to cleanup {test_collection_name}: {cleanup_error}")

        try:
            db_client.delete_collection(test_collection_name_default)
            print(f"   Cleaned up collection: {test_collection_name_default}")
        except Exception as cleanup_error:
-            print(
-                f"   Warning: Failed to cleanup {test_collection_name_default}: {cleanup_error}"
-            )
+            print(f"   Warning: Failed to cleanup {test_collection_name_default}: {cleanup_error}")



⚠️ Potential issue | 🟡 Minor

Narrow cleanup exception handling

Catching all Exception during cleanup can hide unexpected failures. Consider narrowing to expected exception types or re-raising after logging if cleanup failures should be visible.

🧰 Tools

🪛 Ruff (0.14.13)

197-197: Do not catch blind exception: Exception

(BLE001)

203-203: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@tests/integration_tests/test_client_creation.py` around lines 193 - 205, The cleanup currently catches all Exception around db_client.delete_collection for test_collection_name and test_collection_name_default; narrow this by catching the specific expected exception(s) (e.g., CollectionNotFoundError or the client library's specific error type) and handle those (log/ignore) while allowing unexpected exceptions to propagate (or re-raise) so test failures are visible; update the two try/except blocks around db_client.delete_collection to catch only the expected error type(s) and re-raise or not catch other Exceptions.

coderabbitai · 2026-01-22T05:23:56Z

            try:
                db_client.delete_collection(name=collection_name)
            except Exception as cleanup_error:
-                print(
-                    f"Warning: failed to cleanup collection '{collection_name}': {cleanup_error}"
-                )
+                print(f"Warning: failed to cleanup collection '{collection_name}': {cleanup_error}")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Get repository overview and find the test file git ls-files | head -20 find . -name "test_empty_value_handling.py" -type f

Repository: oceanbase/pyseekdb

Length of output: 583

🏁 Script executed:

# Read the test file to examine the cleanup code and zip usage wc -l tests/integration_tests/test_empty_value_handling.py

Repository: oceanbase/pyseekdb

Length of output: 118

🏁 Script executed:

# Check lines around 53-56 for the cleanup code sed -n '50,60p' tests/integration_tests/test_empty_value_handling.py

Repository: oceanbase/pyseekdb

Length of output: 593

🏁 Script executed:

# Check for zip(strict=True) usage around line 221 sed -n '215,225p' tests/integration_tests/test_empty_value_handling.py

Repository: oceanbase/pyseekdb

Length of output: 670

🏁 Script executed:

# Find the db_client.delete_collection implementation and its exceptions rg -n "def delete_collection" -A 10

Repository: oceanbase/pyseekdb

Length of output: 2550

🏁 Script executed:

# Check project's Python version requirements find . -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "requirements.txt" | xargs grep -l "python" 2>/dev/null | head -5

Repository: oceanbase/pyseekdb

Length of output: 104

🏁 Script executed:

# Look for Python version specification in key config files cat pyproject.toml 2>/dev/null | grep -A 5 "python"

Repository: oceanbase/pyseekdb

Length of output: 189

🏁 Script executed:

# Check Ruff configuration in pyproject.toml or ruff.toml cat pyproject.toml 2>/dev/null | grep -A 20 "\[tool.ruff"

Repository: oceanbase/pyseekdb

Length of output: 656

🏁 Script executed:

# Get complete Ruff configuration cat pyproject.toml | sed -n '/\[tool.ruff.lint\]/,/\[/p' | head -100

Repository: oceanbase/pyseekdb

Length of output: 89

🏁 Script executed:

# Check if BLE rules are enabled - search for "BLE" in config rg -i "BLE" pyproject.toml

Repository: oceanbase/pyseekdb

Length of output: 44

🏁 Script executed:

# Get full delete_collection implementation to see what exceptions it raises sed -n '952,980p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 1243

🏁 Script executed:

# Check if tests directory has specific Ruff ignore rules rg -A 5 '"tests/\*' pyproject.toml

Repository: oceanbase/pyseekdb

Length of output: 166

🏁 Script executed:

# Run Ruff on the specific test file to see if BLE001 is actually flagged rg --version 2>&1 || echo "rg available" # Try to understand what Ruff would flag python3 << 'EOF' import subprocess # Check if ruff is available and run it on the test file try: result = subprocess.run(['ruff', 'check', 'tests/integration_tests/test_empty_value_handling.py', '--select', 'BLE'], capture_output=True, text=True, timeout=10) print("RUFF OUTPUT:") print(result.stdout) if result.stderr: print("STDERR:", result.stderr) print("RETURN CODE:", result.returncode) except FileNotFoundError: print("Ruff not available in sandbox") except Exception as e: print(f"Error: {e}") EOF

Repository: oceanbase/pyseekdb

Length of output: 711

🏁 Script executed:

# Look at what exceptions can be raised in the implementation sed -n '952,1000p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 1823

Resolve Ruff BLE001 by narrowing the exception or explicitly suppressing with # noqa.

The broad except Exception at line 55 violates Ruff's BLE001 rule ("Do not catch blind exception"). Since delete_collection explicitly raises ValueError for collection-not-found cases, and may raise other exceptions from database operations, you have two options:

Catch specific exceptions (preferred if you know the scope):

except ValueError:

Explicitly suppress with # noqa (appropriate for best-effort test cleanup):

except Exception as cleanup_error: # noqa: BLE001

The noqa approach is reasonable here since this is test cleanup code where best-effort cleanup is intentional, but narrowing to ValueError would also work if database operations are guaranteed not to raise other uncaught exceptions.

🧰 Tools

🪛 Ruff (0.14.13)

55-55: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@tests/integration_tests/test_empty_value_handling.py` around lines 53 - 56, The broad except in the cleanup block around delete_collection(collection_name) triggers Ruff BLE001; either narrow the except to the specific ValueError raised for "collection not found" (e.g. replace "except Exception as cleanup_error:" with "except ValueError as cleanup_error:") or explicitly suppress the lint rule for this best-effort test cleanup by appending "# noqa: BLE001" to the existing except line (keep the print using cleanup_error and collection_name unchanged).

PsiACE added 2 commits January 20, 2026 11:53

chore: intro ruff check

ec1881f

*: make ruff happy

5f6ded1

Signed-off-by: Chojan Shang <psiace@apache.org>

PsiACE force-pushed the ruff-check branch from d4784ff to 5f6ded1 Compare January 22, 2026 03:32

Merge upstream/develop

476c279

PsiACE marked this pull request as ready for review January 22, 2026 05:12

coderabbitai Bot reviewed Jan 22, 2026

View reviewed changes

hnwyllmm approved these changes Jan 22, 2026

View reviewed changes

hnwyllmm merged commit d8e6ee6 into oceanbase:develop Jan 22, 2026
7 checks passed

PsiACE deleted the ruff-check branch January 22, 2026 06:14

This was referenced Jan 22, 2026

Support more embedding function platforms #132

Merged

rollback enhanced hybrid search #151

Merged

coderabbitai Bot mentioned this pull request Feb 26, 2026

Refine index configuration validation and typed helpers #173

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: intro ruff check and make it happy#125

chore: intro ruff check and make it happy#125
hnwyllmm merged 3 commits into
oceanbase:developfrom
PsiACE:ruff-check

PsiACE commented Jan 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jan 22, 2026

Uh oh!

coderabbitai Bot Jan 22, 2026

Uh oh!

coderabbitai Bot Jan 22, 2026

Uh oh!

coderabbitai Bot Jan 22, 2026

Uh oh!

coderabbitai Bot Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	copyright = "2025, OceanBase" # noqa: A001
	copyright = "2025, OceanBase"

Conversation

PsiACE commented Jan 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Solution Description

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PsiACE commented Jan 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 20, 2026 •

edited

Loading