Skip to content

fix: align Databricks docs with config and fix query mode selection#4477

Merged
whysosaket merged 2 commits intomainfrom
fix/databricks-docs-and-direct-access
Mar 23, 2026
Merged

fix: align Databricks docs with config and fix query mode selection#4477
whysosaket merged 2 commits intomainfrom
fix/databricks-docs-and-direct-access

Conversation

@utkarsh240799
Copy link
Copy Markdown
Contributor

@utkarsh240799 utkarsh240799 commented Mar 21, 2026

Description

Fixes the Databricks vector store documentation and several related bugs discovered during investigation.

The docs referenced parameter names (index_name, source_table_name, service_principal_client_id, etc.) that don't exist in DatabricksConfig, causing users to get ValidationError: Extra fields not allowed when following the documentation.

Changes:

  1. Docs: replaced all incorrect parameter names with the actual DatabricksConfig fields (catalog, schema, table_name, collection_name, client_id, client_secret, etc.). All 18 documented params now match the config 1:1.
  2. Docstring: fixed index_namecollection_name in Databricks.__init__ docstring.
  3. Hardcoded values: replaced hardcoded "logistics_dev.ai.dev_memory" and "pk_dev_memory" with self.fully_qualified_table_name and f"pk_{self.table_name}" in _ensure_source_table_exists.
  4. Enum normalization: DatabricksConfig.model_dump() outputs string "DELTA_SYNC" for the default index_type, but VectorIndexType is not a StrEnum, so "DELTA_SYNC" == VectorIndexType.DELTA_SYNC is False. This broke all enum comparisons for config→factory users. Fixed by normalizing string to enum in __init__.
  5. Query mode selection: aligned search(), get(), and list() with the Databricks SDK contract — query_text for Delta Sync with model endpoint, query_vector for Direct Access and Delta Sync with self-managed vectors.
  6. Minor: fixed duplicate assignment columns = columns =columns =.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update

Backwards Compatibility

No existing working functionality is affected. The only scenario that worked before (DELTA_SYNC + model endpoint + direct construction with enum values) produces identical behavior. All other scenarios were already broken:

Scenario Before After
DELTA_SYNC + model endpoint (direct construction) Worked Identical
Any config via Memory.from_config() ValueError in __init__ (enum bug) Works
DIRECT_ACCESS get()/list() API error (wrong query param) Works
DELTA_SYNC + self-managed vectors API error (wrong query param) Works

How Has This Been Tested?

Unit Tests (29 tests, all passing)

Added pytest.importorskip("databricks") for CI environments without databricks-sdk.

Config validation tests:

  • test_config_rejects_old_doc_params — verifies index_name from old docs is rejected
  • test_config_rejects_source_table_name — verifies source_table_name from old docs is rejected
  • test_config_accepts_correct_params — verifies correct param names are accepted

Query mode tests (SDK alignment):

  • test_search_delta_sync_text — DELTA_SYNC + model endpoint uses query_text
  • test_search_direct_access_vector — DIRECT_ACCESS uses query_vector
  • test_search_delta_sync_self_managed_vectors — DELTA_SYNC without model endpoint uses query_vector
  • test_search_missing_params_raises — DELTA_SYNC + model endpoint with empty query raises ValueError
  • test_get_vector / test_get_vector_direct_access / test_get_vector_delta_sync_self_managedget() uses correct query param per config
  • test_list_memories / test_list_memories_direct_access / test_list_memories_delta_sync_self_managedlist() uses correct query param per config
  • test_list_memories_default_limitlist(limit=None) defaults to 100

Hardcoded value fix:

  • test_ensure_source_table_uses_dynamic_names — PK constraint uses dynamic table name, not hardcoded

End-to-end tests (config → factory → Databricks → CRUD):

  • test_e2e_config_to_factory_delta_sync — full config→factory path for DELTA_SYNC
  • test_e2e_config_to_factory_direct_access — full config→factory path for DIRECT_ACCESS
  • test_e2e_old_docs_config_rejected — old docs config rejected at VectorStoreConfig level
  • test_e2e_crud_lifecycle_delta_sync — insert→search→get→list→update→delete for DELTA_SYNC
  • test_e2e_crud_lifecycle_direct_access — insert→search→get→list for DIRECT_ACCESS

Real Databricks API Validation

Tested against a live Databricks Free Edition workspace (DELTA_SYNC + databricks-bge-large-en model endpoint):

Operation Method Query Mode Result
INSERT db.insert() SQL via warehouse PASSED
SEARCH db.search(query='sci-fi movies') query_text PASSED — returned 2 results with relevance scores
GET db.get('mem-real-001') query_text + filter PASSED — returned correct memory with metadata
LIST (with filter) db.list(filters={'user_id': ...}) query_text + filter PASSED — returned 2 memories
LIST (no filter) db.list(limit=10) query_text PASSED — returned 2 memories
UPDATE db.update(vector_id=...) SQL via warehouse PASSED
DELETE db.delete(...) SQL via warehouse PASSED

DIRECT_ACCESS could not be tested on the free tier (only DELTA_SYNC is available), but the SDK contract is well-documented and the code logic is identical — just swapping query_text for query_vector.

  • Unit Tests
  • End-to-end Tests (mocked)
  • Real Databricks API Validation

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

…de selection (#3638)

- Fix docs to use correct parameter names (collection_name, catalog, schema,
  table_name, client_id, client_secret) instead of non-existent ones
  (index_name, source_table_name, service_principal_client_id, etc.)
- Fix docstring: index_name → collection_name
- Fix hardcoded table name and PK constraint name in _ensure_source_table_exists
- Fix enum comparison: normalize string index_type to VectorIndexType enum
- Fix query mode in search/get/list to match Databricks SDK contract:
  query_text for Delta Sync with model endpoint, query_vector otherwise
- Fix duplicate assignment: columns = columns = → columns =
- Add pytest.importorskip for CI environments without databricks-sdk
- Add comprehensive tests covering DIRECT_ACCESS, self-managed vectors,
  config validation, and end-to-end config→factory→CRUD lifecycle

Closes #3638

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mintlify
Copy link
Copy Markdown
Contributor

mintlify bot commented Mar 21, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
mem0 🟢 Ready View Preview Mar 21, 2026, 6:21 PM

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@whysosaket whysosaket merged commit d8a6960 into main Mar 23, 2026
9 checks passed
@whysosaket whysosaket deleted the fix/databricks-docs-and-direct-access branch March 23, 2026 13:50
jamebobob pushed a commit to jamebobob/mem0-vigil-recall that referenced this pull request Mar 29, 2026
…em0ai#4477)

Co-authored-by: utkarsh240799 <utkarsh240799@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue in docs related to supported vector databases in databricks section

2 participants