fix: prevent SQL injection in Databricks vector store#4558
Merged
kartik-mem0 merged 3 commits intomainfrom Mar 26, 2026
Merged
Conversation
Replace f-string interpolation with Databricks parameterized queries using StatementParameterListItem in delete(), update(), and insert() methods. Add column name validation in update() to reject invalid SQL identifiers from payload keys. Embedding vectors remain inlined as they are numeric arrays not supported by the parameterization API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add explicit type='TIMESTAMP' on StatementParameterListItem for created_at/updated_at columns instead of relying on implicit STRING->TIMESTAMP casting - Fix pre-existing bug: update() used Python list repr [0.1, 0.2] for embedding which is invalid Databricks SQL, now uses _format_sql_value() to produce array(0.1, 0.2) syntax Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kartik-mem0
approved these changes
Mar 26, 2026
This was referenced Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Linked Issue
Closes #4073
Description
The Databricks vector store implementation (
mem0/vector_stores/databricks.py) used f-string interpolation to build SQL queries, creating SQL injection vulnerabilities in three methods. A maliciousmemory_idlike'; DELETE FROM table; --would allow arbitrary SQL execution.Problem
All three write methods were vulnerable:
delete()f"...WHERE memory_id = '{vector_id}'"vector_idinjected directly into SQLupdate()f"{key} = '{value}'"andf"...WHERE memory_id = '{vector_id}'"vector_idinjected into SQL. Payload keys used as column names without validationinsert()_format_sql_value()manual escapingreplace("'", "''")escaping is fragile and not equivalent to parameterized queriesOther vector stores in this codebase (pgvector, azure_mysql) already use parameterized queries correctly.
Solution
delete(): Replaced'{vector_id}'with:vector_idparameter marker +StatementParameterListItemupdate(): Parameterizedvector_idand all payload values with:payload_{key}markers. Added regex validation (^[A-Za-z_][A-Za-z0-9_]*$) on payload keys used as column names to prevent column name injectioninsert(): Replaced_format_sql_value()calls for all user-controlled values with:{col}_{row_index}parameter markers. NULL values use literalNULL. Embedding vectors remain inlined via_format_sql_value()sinceStatementParameterListItemdoes not supportARRAYtypes — these are safe as they contain only numeric floats from the embedding modelAll parameterized queries use the Databricks SDK's
StatementParameterListItemclass, which binds values server-side and never interpolates them into the SQL string.Type of Change
Breaking Changes
N/A
Test Coverage
Testing details
57 tests total (28 new, 29 updated to verify parameterization). All pass.
SQL injection prevention tests (20 tests) — Each of these 4 payloads is tested against
delete(),update()(vector_id and payload values), andinsert()(IDs and data):'; DELETE FROM table; --' OR '1'='1'; DROP TABLE memories; --1' UNION SELECT * FROM secrets --Each test verifies: (1) the malicious string does NOT appear in the SQL statement, (2) it IS safely passed via
StatementParameterListItemparameters.Column name injection tests (4 tests) — Verifies
update()rejects payload keys containing SQL metacharacters (; DROP,' OR, newlines) while still processing valid keys.Functional tests (4 new) — Multi-row insert with unique parameter names, NULL value handling, payload-only update, vector-only update.
Existing tests (29 updated) — All pre-existing tests updated to assert parameterized query usage instead of inline values.
Checklist