Skip to content

chore: intro ruff check and make it happy#125

Merged
hnwyllmm merged 3 commits into
oceanbase:developfrom
PsiACE:ruff-check
Jan 22, 2026
Merged

chore: intro ruff check and make it happy#125
hnwyllmm merged 3 commits into
oceanbase:developfrom
PsiACE:ruff-check

Conversation

@PsiACE

@PsiACE PsiACE commented Jan 20, 2026

Copy link
Copy Markdown
Member

Summary

Ruff is an extremely fast linter and formatter. We have previously introduced it for code formatting, and using it as a linter will help us adhere to consistent coding standards.

Solution Description

I added it to the pre-commit configuration, and it fixed almost all the issues.

We might need a little rebasing to ensure we're aligned with the main branch, but it won't be too difficult. I'll check these tomorrow.

Summary by CodeRabbit

  • New Features

    • Expanded public API surface with additional exports: AdminAPI, AdminClient, Client, Database, DefaultEmbeddingFunction, EmbeddingFunction, and utility functions.
  • Refactor

    • Modernized type hints across the codebase using Python 3.9+ syntax for improved clarity.
    • Standardized embedding function interfaces for consistency.
    • Enhanced code organization and imports.
  • Chores

    • Added Ruff linting configuration for code quality.
    • Improved error handling and logging throughout.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Chojan Shang <psiace@apache.org>
@coderabbitai

coderabbitai Bot commented Jan 20, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This pull request modernizes the codebase's Python typing system by migrating from legacy typing imports (Optional, Union, List, Dict) to PEP 604 union syntax and built-in generics, renames embedding function parameters from input to documents, expands public API exports, adds ruff linting configuration, and applies consistent formatting improvements across source code and tests.

Changes

Cohort / File(s) Summary
Type Hint Modernization (PEP 604)
src/pyseekdb/client/, src/pyseekdb/utils/embedding_functions/, pyproject.toml
Systematic conversion from Optional[T]T | None, Union[A, B]A | B, List[T]list[T], Dict[K, V]dict[K, V] across method signatures, properties, and type aliases.
Embedding Function API Refactoring
src/pyseekdb/client/embedding_function.py, src/pyseekdb/utils/embedding_functions/*, demo/rag/embedding_function_factory.py, examples/complete_example.py, examples/hybrid_search_example.py
Renamed __call__ parameter from input to documents; updated return types and method signatures; adjusted default embedding function initialization from explicit instance to None-based lazy instantiation.
Public API Expansion
src/pyseekdb/__init__.py, src/pyseekdb/client/__init__.py
Added exports for AdminAPI, AdminClient, BaseConnection, Client, Database, DefaultEmbeddingFunction, EmbeddingFunction, constants (DOCUMENT, DOCUMENTS, EMBEDDINGS, etc.), and utility functions (get_default_embedding_function, register_embedding_function).
Configuration & Factory Refactoring
src/pyseekdb/client/client_base.py, src/pyseekdb/client/__init__.py
Introduced unified _create_server_client helper; refactored Client() and AdminClient() factory signatures to use union types; updated parameter handling for path/host/port/user.
Linting & Code Quality
.pre-commit-config.yaml, pyproject.toml
Added ruff-check pre-commit hook with --exit-non-zero-on-fix; added comprehensive ruff linting configuration with target-version, select rules, ignore rules, and per-file ignores.
Collection & Query API Updates
src/pyseekdb/client/collection.py, src/pyseekdb/client/hybrid_search.py, src/pyseekdb/client/filters.py
Updated method signatures for add/update/upsert/query/get/hybrid_search to use union types; narrowed parameter types for filter builders; added stricter type annotations for payload validation.
Database & Connection Utilities
src/pyseekdb/client/database.py, src/pyseekdb/client/base_connection.py, src/pyseekdb/client/sql_utils.py
Updated Database.__init__ parameter types; added module-level logger to BaseConnection; replaced zip() with strict=True for tighter sequence alignment.
Embedded & Remote Client Implementations
src/pyseekdb/client/client_seekdb_embedded.py, src/pyseekdb/client/client_seekdb_server.py, src/pyseekdb/client/admin_client.py
Updated list_databases signatures to use union types; refactored type-ignore annotations; consolidated log messages to single-line format.
Utility & Helper Functions
src/pyseekdb/client/version.py, src/pyseekdb/client/meta_info.py, src/pyseekdb/client/configuration.py
Updated version parsing error messages; added ClassVar annotations for class-level constants; refactored distance validation in HNSWConfiguration.
Demo & Example Updates
demo/rag/llm.py, demo/rag/seekdb_app.py, demo/rag/seekdb_insert.py, demo/rag/seekdb_utils.py, examples/simple_example.py, examples/hybrid_search_example.py
Updated type hints to modern syntax; removed unused imports; refactored default embedding function handling; consolidated multi-line calls and improved formatting.
Documentation Metadata
docs/conf.py
Added linter suppression comment (# noqa: A001) to copyright declaration.
Integration Tests
tests/integration_tests/, tests/unit_tests/
Updated embedding function test implementations to use documents parameter; replaced explicit try/except cleanup with contextlib.suppress(); standardized type hints across test fixtures and helper methods; consolidated multi-line calls and print statements.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Rationale: Large scope (~1400+ lines across 60+ files) with mixed homogeneity. While most changes are straightforward type hint modernizations (promoting consistency), critical semantic changes require careful review: embedding function parameter renames affecting public API contracts, expanded public exports that may introduce unintended surface area, default value changes for embedding function initialization, stricter validation in filters and hybrid search, and consolidation of error handling patterns across multiple client implementations. The heterogeneity of changes across client implementations, embedding functions, and test files demands per-domain reasoning despite repetitive patterns.

Possibly related PRs

  • chore: add pre commit #112: Expanded pre-commit hook configuration and added ruff linting setup with related CI/Makefile updates — directly overlaps with the ruff configuration additions in .pre-commit-config.yaml and pyproject.toml.
  • support embedding function metadata persistence #119: Extended embedding function registry and public exports for register_embedding_function — directly related to the embedding function API changes and expanded __init__.py exports.
  • Support more embedding function platform #110: Modified embedding function implementations (OpenAI, LiteLLM, Qwen, SentenceTransformer) — overlaps with the parameter rename (inputdocuments) and type hint updates across all embedding function files.

Suggested reviewers

  • hnwyllmm

Poem

🐰 With whiskers twitching, types now gleam,
From Optional dreams to union's stream,
Parameters renamed, embeddings dance,
Public API blooms with cleaner stance,
Modern Python syntax takes its flight,
Our code now shines so crisp and bright!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main objective: introducing ruff check and fixing linting issues to satisfy it. It directly relates to the primary changes in the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@PsiACE PsiACE marked this pull request as ready for review January 22, 2026 05:12

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
tests/integration_tests/test_admin_database_management.py (1)

44-92: Use try/finally for cleanup instead of broad except Exception (BLE001).

Catching all exceptions masks the original traceback and violates Ruff's BLE001 rule. A try/finally structure ensures cleanup runs while allowing the exception to propagate naturally, which pytest will catch and fail the test.

💡 Proposed refactor
         try:
             # Step 1: List all databases before test
             print("\n📋 Step 1: List all databases")
             databases_before = admin_client.list_databases()
             assert databases_before is not None
             assert isinstance(databases_before, (list, tuple))
             print(f"   Found {len(databases_before)} databases before test")
             for db in databases_before[:3]:
                 print(f"   - {db.name} (tenant={db.tenant})")
@@ Step 3 through Step 5 omitted for brevity @@
             print("\n🎉 All database management operations completed successfully!")
-        except Exception as e:
-            # Cleanup: try to delete test database if it exists
-            with contextlib.suppress(Exception):
-                admin_client.delete_database(test_db_name)
-            pytest.fail(f"Admin client test failed: {e}")
+        finally:
+            # Cleanup: try to delete test database if it exists
+            with contextlib.suppress(Exception):
+                admin_client.delete_database(test_db_name)
src/pyseekdb/utils/embedding_functions/openai_embedding_function.py (1)

151-163: Document the new TypeError in build_from_config

build_from_config now raises TypeError for non-dict client_kwargs, but the docstring still lists only ValueError. Please update the Raises section to reflect the new error type and keep the message consistent with your linting preferences.

📝 Suggested update
         Raises:
             ValueError: If the configuration is invalid or missing required fields
+            TypeError: If client_kwargs is not a dictionary
@@
         client_kwargs = config.get("client_kwargs", {})
         if not isinstance(client_kwargs, dict):
-            raise TypeError(f"client_kwargs must be a dictionary, but got {client_kwargs}")
+            raise TypeError(f"client_kwargs must be a dictionary, but got {client_kwargs!r}")
tests/integration_tests/test_default_embedding_function.py (1)

83-147: Avoid swallowing unexpected hybrid_search failures

The broad except Exception blocks here can mask real regressions. Consider narrowing to expected exception types or re-raising after logging so unexpected failures still surface.

src/pyseekdb/client/client_base.py (1)

872-886: Explicit None is ignored when persistence exists.

When a persisted embedding function exists and callers pass embedding_function=None, the current logic returns the persisted function anyway. This contradicts the docstring (“explicitly set to None”) and the updated tests that expect a conflict. Please align the behavior (either honor explicit None or raise on any provided parameter).

🛠️ One possible fix (honor explicit None)
-        if embedding_function_persistence is not None and embedding_function is not _NOT_PROVIDED:
-            if embedding_function and embedding_function_persistence.name() != embedding_function.name():
-                raise ValueError(
-                    "Both embedding function from parameter (not _NOT_PROVIDED, default value) and embedding function from persistence provided."
-                )
-            else:
-                return embedding_function_persistence
+        if embedding_function_persistence is not None and embedding_function is not _NOT_PROVIDED:
+            if embedding_function is None:
+                return None
+            if embedding_function_persistence.name() != embedding_function.name():
+                raise ValueError(
+                    "Both embedding function from parameter (not _NOT_PROVIDED, default value) and embedding function from persistence provided."
+                )
+            return embedding_function_persistence
src/pyseekdb/client/hybrid_search.py (1)

268-305: Metadata filters with $and/$or can be misrouted to document filters.

The heuristic treats any dict containing $and/$or as a document filter, but metadata filters also use those operators. This can shift metadata filters into where_document unexpectedly. Consider only classifying as document filters when $contains/$not_contains is present, and require DocumentExpression for explicit document logic.

🐛 Suggested fix
-                if any(key in item for key in ("$contains", "$not_contains", "$and", "$or")):
+                if any(key in item for key in ("$contains", "$not_contains")):
🤖 Fix all issues with AI agents
In `@docs/conf.py`:
- Line 10: Remove the unused noqa suppression from the copyright assignment
line; locate the statement setting copyright = "2025, OceanBase" (the line
containing `# noqa: A001`) and delete the trailing `# noqa: A001` so the lint
directive is no longer present.

In `@src/pyseekdb/client/client_base.py`:
- Around line 1514-1519: In _collection_update(), the embedding is being
serialized as a JSON-like string which differs from add/upsert; change the
serialization to use the same helper used elsewhere by calling
_embedding_to_hexstring(vec_val) for non-None embeddings and append the
resulting value to set_clauses for CollectionFieldNames.EMBEDDING (keeping the
same quoting/formatting used by inserts/upserts); ensure you still skip None
embeddings and preserve existing set_clauses handling so updates store
embeddings in the same binary/hex form as _embedding_to_hexstring.

In `@src/pyseekdb/utils/embedding_functions/litellm_embedding_function.py`:
- Around line 171-172: In LiteLLM embedding function validate the mismatch check
to use the renamed parameter documents instead of the undefined input: inside
the function/method that produces embeddings (references to embeddings and the
previous input), replace the length check and error message to compare
len(embeddings) to len(documents) and raise a ValueError that mentions documents
(e.g., "Expected {len(documents)} embeddings but got {len(embeddings)} from
LiteLLM") so the runtime NameError is removed and the message reflects the
correct parameter.

In `@tests/integration_tests/test_client_creation.py`:
- Around line 193-205: The cleanup currently catches all Exception around
db_client.delete_collection for test_collection_name and
test_collection_name_default; narrow this by catching the specific expected
exception(s) (e.g., CollectionNotFoundError or the client library's specific
error type) and handle those (log/ignore) while allowing unexpected exceptions
to propagate (or re-raise) so test failures are visible; update the two
try/except blocks around db_client.delete_collection to catch only the expected
error type(s) and re-raise or not catch other Exceptions.

In `@tests/integration_tests/test_empty_value_handling.py`:
- Around line 53-56: The broad except in the cleanup block around
delete_collection(collection_name) triggers Ruff BLE001; either narrow the
except to the specific ValueError raised for "collection not found" (e.g.
replace "except Exception as cleanup_error:" with "except ValueError as
cleanup_error:") or explicitly suppress the lint rule for this best-effort test
cleanup by appending "# noqa: BLE001" to the existing except line (keep the
print using cleanup_error and collection_name unchanged).
🧹 Nitpick comments (17)
tests/integration_tests/test_security_sql_injection.py (1)

269-271: Consider catching more specific exceptions.

Ruff flagged except Exception (BLE001) as overly broad. While catching broad exceptions in test code is common practice to ensure clear failure reporting, you could narrow this to catch only expected database/query exceptions for better signal.

That said, since this is test code and the exception details are passed to pytest.fail() for visibility, this is acceptable as-is.

tests/unit_tests/test_collection_name_validation.py (1)

16-16: Remove unused noqa: E402 directive.

The E402 rule (module-level import not at top of file) is not enabled in the Ruff configuration. The noqa comment is unnecessary and can be removed.

Suggested fix
-from pyseekdb.client.client_base import _validate_collection_name  # noqa: E402
+from pyseekdb.client.client_base import _validate_collection_name
pyproject.toml (1)

100-101: preview = true can cause formatting diffs across Ruff versions.

Preview features may change in patch releases. For consistent formatting in CI, either remove this setting or pin the Ruff version strictly (including patch version in lock files).

demo/rag/llm.py (2)

76-80: The try/except/else pattern works but could be simplified.

Using else after try/except is valid Python, but placing return inside the try block (before the except) would be more conventional and slightly clearer. The current approach works correctly.

♻️ Optional simplification
     try:
         # Prepare request parameters
         request_params = {
             "model": model,
             "messages": [
                 {"role": "system", "content": SYSTEM_PROMPT},
                 {"role": "user", "content": USER_PROMPT},
             ],
             "temperature": temperature,
         }

         # Add max_tokens if specified
         if max_tokens:
             request_params["max_tokens"] = max_tokens

         response = client.chat.completions.create(**request_params)
+        return response.choices[0].message.content
     except Exception as e:
         print(f"Error generating LLM response: {e}")
         return f"Sorry, an error occurred while generating the answer: {e!s}"
-    else:
-        return response.choices[0].message.content

120-124: Same pattern as above - consider simplifying.

The else block is valid but unconventional. Same optional refactor applies here.

src/pyseekdb/client/filters.py (2)

84-84: Remove unused noqa directive.

The # noqa: C901 directive is unused since this rule is not enabled in ruff configuration. Per static analysis hint (RUF100).

♻️ Proposed fix
-    def _build_condition(  # noqa: C901
+    def _build_condition(

276-278: Remove unused noqa directive.

Same issue - the # noqa: C901 directive is unused. Per static analysis hint (RUF100).

♻️ Proposed fix
-    def _build_search_filter_condition(  # noqa: C901
+    def _build_search_filter_condition(
tests/unit_tests/test_configuration.py (1)

8-14: Remove unused noqa directive.

The # noqa: E402 directive is unused since this rule is not enabled in ruff configuration. Per static analysis hint (RUF100).

♻️ Proposed fix
-from pyseekdb import Configuration, FulltextAnalyzerConfig, HNSWConfiguration  # noqa: E402
+from pyseekdb import Configuration, FulltextAnalyzerConfig, HNSWConfiguration
tests/integration_tests/test_default_embedding_function.py (1)

215-223: Mark unused documents parameter explicitly

EmptyResultEF.__call__ doesn’t use documents; prefixing it with _ (or adding an inline suppression) makes the intent explicit and avoids unused-arg noise.

♻️ Suggested tweak
-        class EmptyResultEF:
-            def __call__(self, documents):
+        class EmptyResultEF:
+            def __call__(self, _documents):
                 return []
src/pyseekdb/client/client_seekdb_embedded.py (1)

117-119: Remove the unused noqa directive.

Ruff indicates that C901 (complexity) is not enabled, making # noqa: C901 unnecessary. The use_context_manager parameter is intentionally unused as part of interface compliance with BaseClient, which is acceptable.

🔧 Suggested fix
-    def _execute_query_with_cursor(  # noqa: C901
+    def _execute_query_with_cursor(
         self, conn: Any, sql: str, params: list[Any], use_context_manager: bool = True
     ) -> list[dict[str, Any]]:
tests/integration_tests/test_collection_hybrid_search.py (1)

37-120: Test data helper mirrors the builder integration tests.

The _insert_test_data implementation is identical to the one in test_collection_hybrid_search_builder_integration.py. Consider extracting this to a shared test utility to reduce duplication.

The _insert_test_data method is duplicated between this file and test_collection_hybrid_search_builder_integration.py. Consider extracting it to a shared fixture or utility in conftest.py to reduce maintenance burden.

src/pyseekdb/utils/embedding_functions/litellm_embedding_function.py (1)

98-98: Remove unused noqa directive.

Ruff reports that # noqa: C901 is unused because this rule is not enabled. The function complexity appears acceptable without suppression.

♻️ Proposed fix
-    def __call__(self, documents: Documents) -> Embeddings:  # noqa: C901
+    def __call__(self, documents: Documents) -> Embeddings:
tests/unit_tests/test_embedding_function_registry.py (1)

135-136: Consider renaming _input to documents for consistency.

Some test methods use _input while others use documents. While both work functionally, using documents consistently would align better with the protocol definition and other test methods in this file.

demo/rag/embedding_function_factory.py (1)

7-8: Consider importing type aliases from pyseekdb instead of redefining.

Documents and Embeddings are already defined in pyseekdb.client.embedding_function. Redefining them here could lead to inconsistencies if the upstream definitions change.

♻️ Proposed fix
-from pyseekdb import DefaultEmbeddingFunction, EmbeddingFunction
+from pyseekdb import DefaultEmbeddingFunction, EmbeddingFunction
+from pyseekdb.client.embedding_function import Documents, Embeddings

-Documents = str | list[str]
-Embeddings = list[list[float]]
src/pyseekdb/client/client_base.py (2)

257-257: Remove unused # noqa: C901 markers.

These are now flagged as unused; please drop them or enable the complexity rule if you still want the suppression.

Also applies to: 479-479, 763-763, 1230-1230, 1389-1389, 1531-1531, 2163-2163, 2355-2355, 2578-2578, 2684-2684, 2813-2813, 2900-2900, 3010-3010


843-855: Return type should allow None.

_resolve_embedding_function() returns None when no persisted config exists, but the annotation is non-optional. Align the return type to avoid misleading type-checkers.

tests/integration_tests/test_collection_embedding_function.py (1)

642-642: Remove unused # noqa: C901.

Ruff flags this as unused; consider dropping it.

Comment thread docs/conf.py
# Project information
project = "pyseekdb"
copyright = "2025, OceanBase"
copyright = "2025, OceanBase" # noqa: A001

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused noqa suppression.
Ruff flags this directive as unused; dropping it keeps lint clean.

🧹 Proposed fix
-copyright = "2025, OceanBase"  # noqa: A001
+copyright = "2025, OceanBase"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
copyright = "2025, OceanBase" # noqa: A001
copyright = "2025, OceanBase"
🧰 Tools
🪛 Ruff (0.14.13)

10-10: Unused noqa directive (non-enabled: A001)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents
In `@docs/conf.py` at line 10, Remove the unused noqa suppression from the
copyright assignment line; locate the statement setting copyright = "2025,
OceanBase" (the line containing `# noqa: A001`) and delete the trailing `# noqa:
A001` so the lint directive is no longer present.

Comment on lines 1514 to 1519
if embeddings:
vec_val = embeddings[i]
if vec_val is not None:
vec_str = "[" + ",".join(map(str, vec_val)) + "]"
set_clauses.append(
f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'"
)
set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Embedding update uses a different serialization than add/upsert.

_collection_update() currently writes embeddings as a JSON-like string, while add/upsert use _embedding_to_hexstring() (binary literal). This is likely to break vector columns or cause inconsistent storage. Please serialize updates the same way as inserts.

🐛 Suggested fix
-            if embeddings:
-                vec_val = embeddings[i]
-                if vec_val is not None:
-                    vec_str = "[" + ",".join(map(str, vec_val)) + "]"
-                    set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'")
+            if embeddings:
+                vec_val = embeddings[i]
+                vec_sql = "NULL" if vec_val is None else _embedding_to_hexstring(vec_val)
+                set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = {vec_sql}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if embeddings:
vec_val = embeddings[i]
if vec_val is not None:
vec_str = "[" + ",".join(map(str, vec_val)) + "]"
set_clauses.append(
f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'"
)
set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = '{vec_str}'")
if embeddings:
vec_val = embeddings[i]
vec_sql = "NULL" if vec_val is None else _embedding_to_hexstring(vec_val)
set_clauses.append(f"{CollectionFieldNames.EMBEDDING} = {vec_sql}")
🤖 Prompt for AI Agents
In `@src/pyseekdb/client/client_base.py` around lines 1514 - 1519, In
_collection_update(), the embedding is being serialized as a JSON-like string
which differs from add/upsert; change the serialization to use the same helper
used elsewhere by calling _embedding_to_hexstring(vec_val) for non-None
embeddings and append the resulting value to set_clauses for
CollectionFieldNames.EMBEDDING (keeping the same quoting/formatting used by
inserts/upserts); ensure you still skip None embeddings and preserve existing
set_clauses handling so updates store embeddings in the same binary/hex form as
_embedding_to_hexstring.

Comment on lines 171 to +172
if len(embeddings) != len(input):
raise ValueError(
f"Expected {len(input)} embeddings but got {len(embeddings)} from LiteLLM"
)
raise ValueError(f"Expected {len(input)} embeddings but got {len(embeddings)} from LiteLLM")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical bug: input variable is undefined after parameter rename.

The parameter was renamed from input to documents, but the validation on lines 171-172 still references the old input variable. This will raise a NameError at runtime.

🐛 Proposed fix
         # Validate that we got the expected number of embeddings
-        if len(embeddings) != len(input):
-            raise ValueError(f"Expected {len(input)} embeddings but got {len(embeddings)} from LiteLLM")
+        if len(embeddings) != len(documents):
+            raise ValueError(f"Expected {len(documents)} embeddings but got {len(embeddings)} from LiteLLM")
🧰 Tools
🪛 Ruff (0.14.13)

172-172: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@src/pyseekdb/utils/embedding_functions/litellm_embedding_function.py` around
lines 171 - 172, In LiteLLM embedding function validate the mismatch check to
use the renamed parameter documents instead of the undefined input: inside the
function/method that produces embeddings (references to embeddings and the
previous input), replace the length check and error message to compare
len(embeddings) to len(documents) and raise a ValueError that mentions documents
(e.g., "Expected {len(documents)} embeddings but got {len(embeddings)} from
LiteLLM") so the runtime NameError is removed and the message reflects the
correct parameter.

Comment on lines 193 to 205
# Clean up: delete all test collections
try:
db_client.delete_collection(test_collection_name)
print(f" Cleaned up collection: {test_collection_name}")
except Exception as cleanup_error:
print(
f" Warning: Failed to cleanup {test_collection_name}: {cleanup_error}"
)
print(f" Warning: Failed to cleanup {test_collection_name}: {cleanup_error}")

try:
db_client.delete_collection(test_collection_name_default)
print(f" Cleaned up collection: {test_collection_name_default}")
except Exception as cleanup_error:
print(
f" Warning: Failed to cleanup {test_collection_name_default}: {cleanup_error}"
)
print(f" Warning: Failed to cleanup {test_collection_name_default}: {cleanup_error}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Narrow cleanup exception handling

Catching all Exception during cleanup can hide unexpected failures. Consider narrowing to expected exception types or re-raising after logging if cleanup failures should be visible.

🧰 Tools
🪛 Ruff (0.14.13)

197-197: Do not catch blind exception: Exception

(BLE001)


203-203: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@tests/integration_tests/test_client_creation.py` around lines 193 - 205, The
cleanup currently catches all Exception around db_client.delete_collection for
test_collection_name and test_collection_name_default; narrow this by catching
the specific expected exception(s) (e.g., CollectionNotFoundError or the client
library's specific error type) and handle those (log/ignore) while allowing
unexpected exceptions to propagate (or re-raise) so test failures are visible;
update the two try/except blocks around db_client.delete_collection to catch
only the expected error type(s) and re-raise or not catch other Exceptions.

Comment on lines 53 to +56
try:
db_client.delete_collection(name=collection_name)
except Exception as cleanup_error:
print(
f"Warning: failed to cleanup collection '{collection_name}': {cleanup_error}"
)
print(f"Warning: failed to cleanup collection '{collection_name}': {cleanup_error}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Get repository overview and find the test file
git ls-files | head -20
find . -name "test_empty_value_handling.py" -type f

Repository: oceanbase/pyseekdb

Length of output: 583


🏁 Script executed:

# Read the test file to examine the cleanup code and zip usage
wc -l tests/integration_tests/test_empty_value_handling.py

Repository: oceanbase/pyseekdb

Length of output: 118


🏁 Script executed:

# Check lines around 53-56 for the cleanup code
sed -n '50,60p' tests/integration_tests/test_empty_value_handling.py

Repository: oceanbase/pyseekdb

Length of output: 593


🏁 Script executed:

# Check for zip(strict=True) usage around line 221
sed -n '215,225p' tests/integration_tests/test_empty_value_handling.py

Repository: oceanbase/pyseekdb

Length of output: 670


🏁 Script executed:

# Find the db_client.delete_collection implementation and its exceptions
rg -n "def delete_collection" -A 10

Repository: oceanbase/pyseekdb

Length of output: 2550


🏁 Script executed:

# Check project's Python version requirements
find . -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "requirements.txt" | xargs grep -l "python" 2>/dev/null | head -5

Repository: oceanbase/pyseekdb

Length of output: 104


🏁 Script executed:

# Look for Python version specification in key config files
cat pyproject.toml 2>/dev/null | grep -A 5 "python"

Repository: oceanbase/pyseekdb

Length of output: 189


🏁 Script executed:

# Check Ruff configuration in pyproject.toml or ruff.toml
cat pyproject.toml 2>/dev/null | grep -A 20 "\[tool.ruff"

Repository: oceanbase/pyseekdb

Length of output: 656


🏁 Script executed:

# Get complete Ruff configuration
cat pyproject.toml | sed -n '/\[tool.ruff.lint\]/,/\[/p' | head -100

Repository: oceanbase/pyseekdb

Length of output: 89


🏁 Script executed:

# Check if BLE rules are enabled - search for "BLE" in config
rg -i "BLE" pyproject.toml

Repository: oceanbase/pyseekdb

Length of output: 44


🏁 Script executed:

# Get full delete_collection implementation to see what exceptions it raises
sed -n '952,980p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 1243


🏁 Script executed:

# Check if tests directory has specific Ruff ignore rules
rg -A 5 '"tests/\*' pyproject.toml

Repository: oceanbase/pyseekdb

Length of output: 166


🏁 Script executed:

# Run Ruff on the specific test file to see if BLE001 is actually flagged
rg --version 2>&1 || echo "rg available"
# Try to understand what Ruff would flag
python3 << 'EOF'
import subprocess
# Check if ruff is available and run it on the test file
try:
    result = subprocess.run(['ruff', 'check', 'tests/integration_tests/test_empty_value_handling.py', '--select', 'BLE'], 
                          capture_output=True, text=True, timeout=10)
    print("RUFF OUTPUT:")
    print(result.stdout)
    if result.stderr:
        print("STDERR:", result.stderr)
    print("RETURN CODE:", result.returncode)
except FileNotFoundError:
    print("Ruff not available in sandbox")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: oceanbase/pyseekdb

Length of output: 711


🏁 Script executed:

# Look at what exceptions can be raised in the implementation
sed -n '952,1000p' src/pyseekdb/client/client_base.py

Repository: oceanbase/pyseekdb

Length of output: 1823


Resolve Ruff BLE001 by narrowing the exception or explicitly suppressing with # noqa.

The broad except Exception at line 55 violates Ruff's BLE001 rule ("Do not catch blind exception"). Since delete_collection explicitly raises ValueError for collection-not-found cases, and may raise other exceptions from database operations, you have two options:

  1. Catch specific exceptions (preferred if you know the scope):

    except ValueError:
  2. Explicitly suppress with # noqa (appropriate for best-effort test cleanup):

    except Exception as cleanup_error:  # noqa: BLE001

The noqa approach is reasonable here since this is test cleanup code where best-effort cleanup is intentional, but narrowing to ValueError would also work if database operations are guaranteed not to raise other uncaught exceptions.

🧰 Tools
🪛 Ruff (0.14.13)

55-55: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@tests/integration_tests/test_empty_value_handling.py` around lines 53 - 56,
The broad except in the cleanup block around delete_collection(collection_name)
triggers Ruff BLE001; either narrow the except to the specific ValueError raised
for "collection not found" (e.g. replace "except Exception as cleanup_error:"
with "except ValueError as cleanup_error:") or explicitly suppress the lint rule
for this best-effort test cleanup by appending "# noqa: BLE001" to the existing
except line (keep the print using cleanup_error and collection_name unchanged).

@hnwyllmm hnwyllmm merged commit d8e6ee6 into oceanbase:develop Jan 22, 2026
7 checks passed
@PsiACE PsiACE deleted the ruff-check branch January 22, 2026 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants