fix: align Databricks docs with config and fix query mode selection#4477
Merged
whysosaket merged 2 commits intomainfrom Mar 23, 2026
Merged
fix: align Databricks docs with config and fix query mode selection#4477whysosaket merged 2 commits intomainfrom
whysosaket merged 2 commits intomainfrom
Conversation
…de selection (#3638) - Fix docs to use correct parameter names (collection_name, catalog, schema, table_name, client_id, client_secret) instead of non-existent ones (index_name, source_table_name, service_principal_client_id, etc.) - Fix docstring: index_name → collection_name - Fix hardcoded table name and PK constraint name in _ensure_source_table_exists - Fix enum comparison: normalize string index_type to VectorIndexType enum - Fix query mode in search/get/list to match Databricks SDK contract: query_text for Delta Sync with model endpoint, query_vector otherwise - Fix duplicate assignment: columns = columns = → columns = - Add pytest.importorskip for CI environments without databricks-sdk - Add comprehensive tests covering DIRECT_ACCESS, self-managed vectors, config validation, and end-to-end config→factory→CRUD lifecycle Closes #3638 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14 tasks
whysosaket
approved these changes
Mar 23, 2026
jamebobob
pushed a commit
to jamebobob/mem0-vigil-recall
that referenced
this pull request
Mar 29, 2026
…em0ai#4477) Co-authored-by: utkarsh240799 <utkarsh240799@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes the Databricks vector store documentation and several related bugs discovered during investigation.
The docs referenced parameter names (
index_name,source_table_name,service_principal_client_id, etc.) that don't exist inDatabricksConfig, causing users to getValidationError: Extra fields not allowedwhen following the documentation.Changes:
DatabricksConfigfields (catalog,schema,table_name,collection_name,client_id,client_secret, etc.). All 18 documented params now match the config 1:1.index_name→collection_nameinDatabricks.__init__docstring."logistics_dev.ai.dev_memory"and"pk_dev_memory"withself.fully_qualified_table_nameandf"pk_{self.table_name}"in_ensure_source_table_exists.DatabricksConfig.model_dump()outputs string"DELTA_SYNC"for the defaultindex_type, butVectorIndexTypeis not aStrEnum, so"DELTA_SYNC" == VectorIndexType.DELTA_SYNCisFalse. This broke all enum comparisons for config→factory users. Fixed by normalizing string to enum in__init__.search(),get(), andlist()with the Databricks SDK contract —query_textfor Delta Sync with model endpoint,query_vectorfor Direct Access and Delta Sync with self-managed vectors.columns = columns =→columns =.Type of change
Backwards Compatibility
No existing working functionality is affected. The only scenario that worked before (DELTA_SYNC + model endpoint + direct construction with enum values) produces identical behavior. All other scenarios were already broken:
Memory.from_config()ValueErrorin__init__(enum bug)get()/list()How Has This Been Tested?
Unit Tests (29 tests, all passing)
Added
pytest.importorskip("databricks")for CI environments withoutdatabricks-sdk.Config validation tests:
test_config_rejects_old_doc_params— verifiesindex_namefrom old docs is rejectedtest_config_rejects_source_table_name— verifiessource_table_namefrom old docs is rejectedtest_config_accepts_correct_params— verifies correct param names are acceptedQuery mode tests (SDK alignment):
test_search_delta_sync_text— DELTA_SYNC + model endpoint usesquery_texttest_search_direct_access_vector— DIRECT_ACCESS usesquery_vectortest_search_delta_sync_self_managed_vectors— DELTA_SYNC without model endpoint usesquery_vectortest_search_missing_params_raises— DELTA_SYNC + model endpoint with empty query raisesValueErrortest_get_vector/test_get_vector_direct_access/test_get_vector_delta_sync_self_managed—get()uses correct query param per configtest_list_memories/test_list_memories_direct_access/test_list_memories_delta_sync_self_managed—list()uses correct query param per configtest_list_memories_default_limit—list(limit=None)defaults to 100Hardcoded value fix:
test_ensure_source_table_uses_dynamic_names— PK constraint uses dynamic table name, not hardcodedEnd-to-end tests (config → factory → Databricks → CRUD):
test_e2e_config_to_factory_delta_sync— full config→factory path for DELTA_SYNCtest_e2e_config_to_factory_direct_access— full config→factory path for DIRECT_ACCESStest_e2e_old_docs_config_rejected— old docs config rejected atVectorStoreConfigleveltest_e2e_crud_lifecycle_delta_sync— insert→search→get→list→update→delete for DELTA_SYNCtest_e2e_crud_lifecycle_direct_access— insert→search→get→list for DIRECT_ACCESSReal Databricks API Validation
Tested against a live Databricks Free Edition workspace (DELTA_SYNC +
databricks-bge-large-enmodel endpoint):db.insert()db.search(query='sci-fi movies')query_textdb.get('mem-real-001')query_text+ filterdb.list(filters={'user_id': ...})query_text+ filterdb.list(limit=10)query_textdb.update(vector_id=...)db.delete(...)DIRECT_ACCESS could not be tested on the free tier (only DELTA_SYNC is available), but the SDK contract is well-documented and the code logic is identical — just swapping
query_textforquery_vector.Checklist:
Maintainer Checklist