Skip to content

fix query ids with % char#117

Merged
hnwyllmm merged 4 commits into
oceanbase:developfrom
hnwyllmm:special-char
Jan 19, 2026
Merged

fix query ids with % char#117
hnwyllmm merged 4 commits into
oceanbase:developfrom
hnwyllmm:special-char

Conversation

@hnwyllmm

@hnwyllmm hnwyllmm commented Jan 19, 2026

Copy link
Copy Markdown
Member

Summary

close #116
If there's '%' in the id field, the collection.get would failed to execute.
I use parameterized query with pymysql, so we should treat the id parameter as a parameter in cursor.execute. But it's serialized in SQL now.

Currently:

cursor.execute('SELECT ID, DOCUMENT FROM COLLECTION WHERE id in (CAST('id_%_percent' AS BINARY)) LIMIT %s, OFFSET %s', 5, 0)

After I fix the bug:

cursor.execute('SELECT ID, DOCUMENT FROM COLLECTION WHERE id in (CAST(%s AS BINARY)) LIMIT %s, OFFSET %s', ('id_%_percent', 5, 0))

Summary by CodeRabbit

  • Bug Fixes

    • IDs are now handled safely to avoid formatting issues; percent signs and backslashes in IDs/fields are better escaped.
    • Several info logs were toned down to debug to reduce log noise.
  • Tests

    • Added integration tests covering percent signs, backslashes, quotes, and combined special-character scenarios for insert/get flows.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 19, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR parameterizes ID values in SQL WHERE clauses, adds a percent-escaping utility for SQL strings, lowers some log levels to debug, and adds integration tests for special-character handling in IDs, documents, and metadata.

Changes

Cohort / File(s) Summary
Parameterized ID SQL Construction
src/pyseekdb/client/client_base.py
Added _convert_id_to_sql_with_paramters(id_val) returning CAST(%s AS BINARY) + parameter; _build_where_clause() updated to use this helper; several log statements changed from info→debug.
SQL Escaping Utility
src/pyseekdb/client/sql_utils.py
Added escape_percent_for_sql(value: str) to replace % with %%; minor refactor of a ValueError raise in render_sql_with_params.
Integration Tests for Special Characters
tests/integration_tests/test_special_characters_bugs.py
New integration tests covering backslashes in documents, percent signs in IDs, double quotes in metadata, and a combined special-character scenario using temporary HNSW collections.
Logging & Formatting Tidy-up
src/pyseekdb/client/__init__.py, src/pyseekdb/client/client_seekdb_embedded.py, src/pyseekdb/client/client_seekdb_server.py, src/pyseekdb/client/embedding_function.py, src/pyseekdb/utils/embedding_functions/..., tests/...
Miscellaneous log level reductions (info→debug) and whitespace/formatting changes across client, embedding, and test files; no behavioral changes beyond the ID parameterization and new util.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BaseClient
    participant SQLUtils
    participant DB
    Client->>BaseClient: collection.get(ids / filters)
    BaseClient->>SQLUtils: escape_percent_for_sql(id) (if needed)
    BaseClient->>BaseClient: _convert_id_to_sql_with_paramters(id) -> (sql_fragment, param)
    BaseClient->>DB: execute SQL with parameterized ID and params
    DB-->>BaseClient: query result
    BaseClient-->>Client: return documents/metadatas
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • PR #77 — Overlapping changes to SQL escaping/parameterization and client SQL construction; touches similar helper logic.
  • PR #111 — Related modifications to SQL rendering/parameter helpers and client-side SQL handling affecting ID processing.

Poem

🐰
I nibbled bugs that hid in "%",
Cast IDs safe so queries mend,
I doubled "%" and ran the test,
Backslashes bowed and quotes looked blessed,
Hooray — the queries now ascend! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix query ids with % char' clearly and concisely describes the main change: fixing issues with percent characters in query IDs by implementing parameterized queries.
Linked Issues check ✅ Passed The PR successfully addresses issue #116 by implementing parameterized query handling for IDs containing '%' characters, switching from inline SQL to parameter binding in cursor.execute calls.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the '%' character issue: parameterized ID handling, helper functions for SQL escaping, test coverage, and log-level adjustments for clarity. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 89.47% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/pyseekdb/client/client_base.py`:
- Around line 1794-1799: The loop currently calls
_convert_id_to_sql_with_paramters(id_val) which pre-escapes the ID and returns
an escaped parameter that then gets passed into cursor.execute, causing
double-escaping; change the behavior so the raw ID value is passed as the
parameter instead of an escaped string—either update
_convert_id_to_sql_with_paramters to return an unescaped parameter (raw id_val)
and an SQL placeholder (e.g., "%s"), or stop using its escaped output and pass
the original id_val as the param when appending to params; ensure processed_ids
still contains the SQL fragment/placeholder while params contains the raw id
value so PyMySQL can perform the proper escaping during cursor.execute.
🧹 Nitpick comments (1)
tests/integration_tests/test_special_characters_bugs.py (1)

18-20: Consider adding explicit collection cleanup.

Each test creates a new collection; adding a try/finally teardown (or a small helper) to drop it would keep the test DB clean and reduce long‑run flakiness.

Comment thread src/pyseekdb/client/client_base.py
@hnwyllmm hnwyllmm merged commit 2f68610 into oceanbase:develop Jan 19, 2026
7 checks passed
@hnwyllmm hnwyllmm deleted the special-char branch January 19, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: query failed with % character in ids field

1 participant