Skip to content

feat: integrate turbopuffer as vector database provider#4428

Merged
whysosaket merged 6 commits intomainfrom
feat/turbopuffer-integration
Mar 21, 2026
Merged

feat: integrate turbopuffer as vector database provider#4428
whysosaket merged 6 commits intomainfrom
feat/turbopuffer-integration

Conversation

@utkarsh240799
Copy link
Copy Markdown
Contributor

@utkarsh240799 utkarsh240799 commented Mar 19, 2026

Description

Integrates Turbopuffer as a vector database provider for mem0.

  • Adds TurbopufferDB vector store implementation using the official turbopuffer Python SDK (v1.19+)
  • Adds TurbopufferConfig Pydantic config class
  • Registers the provider in VectorStoreFactory and VectorStoreConfig
  • Adds documentation page and navigation entries

Fixes #2543

Type of change

  • New feature (non-breaking change which adds functionality)

Implementation Details

  • Uses the new Turbopuffer client API (not the legacy tpuf.api_key global pattern)
  • insert() uses row-based namespace.write(upsert_rows=...) with batching
  • update() with vector=None uses patch_rows for payload-only updates (avoids corrupting existing vectors)
  • delete() uses namespace.write(deletes=[id])
  • delete_col() uses namespace.delete_all()
  • search() uses namespace.query(rank_by=("vector", "ANN", ...)) with native filter support
  • get() uses ANN query with ID filter (turbopuffer applies filters before ranking, guaranteeing correctness)
  • list() returns wrapped [results] format for compatibility with main.py's _get_all_from_vector_store and delete_all
  • count() uses namespace.metadata().approx_row_count instead of querying with top_k=10000
  • col_info() uses namespace.metadata() for real namespace stats
  • _parse_output() correctly extracts $dist and attributes from turbopuffer Row objects via model_dump()
  • Score conversion: 1 - cosine_distance = cosine_similarity (mathematically correct for [0, 2] range)
  • Payload key collision protection: id/vector are set after payload.update() to prevent overwrites
  • region defaults to gcp-us-central1 (required by the turbopuffer SDK)
  • Graceful error handling for list_cols(), list(), delete_col(), col_info(), get(), and count() when namespace doesn't exist or permissions are limited

Testing

Testing Methodology

The integration was validated through two layers of testing:

1. Unit Tests (64 tests) — mocked SDK

All turbopuffer SDK calls are mocked, testing every method in isolation with edge cases. Tests use real turbopuffer.types.Row objects (via model_validate) to ensure _parse_output handles the actual SDK data model correctly. Tests are skipped via pytest.importorskip when the SDK is not installed (CI compatibility).

2. End-to-End Tests (12 tests) — real turbopuffer API

Ran against the live turbopuffer API to validate real behavior. This caught two issues that unit tests could not:

  • list_cols() returns 403 when the API key lacks namespace listing permissions → added graceful error handling
  • list() after reset() returns 404 because the namespace no longer exists → added try/except
  • Confirmed that patch_rows (payload-only update) truly preserves the original vector by searching with the old vector after patching

Unit Test Coverage

Category Tests Coverage
Init 5 API key (param, env var, missing), extra params, default region
create_col 2 No-op with/without args
insert 5 IDs + payloads, auto IDs, no payloads, payload key collision protection, batch splitting
_parse_output 4 dist→score, missing dist, strip vector/id/$dist, empty
_convert_filters 8 None, empty, single eq, multiple eq, gte+lte, gte-only, lte-only, mixed
search 4 Basic, with filters, null rows, empty rows
delete 2 String ID, int ID coercion
update 5 Vector+payload (upsert), vector-only, payload-only (patch), key collision, no-op
get 4 Found, not found, null rows, exception
list 6 Wrapped format, filters, empty, zero vector, _get_all compat, delete_all compat
list_cols/delete_col/col_info 5 Success + error handling
count/reset 3 Success, error fallback, delegates to delete_all
Config 5 Defaults, custom values, extra field rejection, missing key, env var
Factory 3 Factory registration, config registry, validation pipeline
OutputData 3 Fields, nullable, main.py payload access pattern

E2E Test Results (against real turbopuffer API)

Step Operation Result
1 Insert 3 vectors with payloads PASS
2 Search (ANN, top match score=1.0) PASS
3 Search with user_id filter PASS
4 Get by ID (existing + missing) PASS
5 Update vector + payload PASS
6 Update payload only (patch) + verify vector preserved PASS
7 List (wrapped format [[results]]) PASS
8 List with filters PASS
9 Col info (namespace metadata) PASS
10 Delete single vector PASS
11 List cols (graceful 403 handling) PASS
12 Reset + verify empty PASS

Regression Testing

No regressions in existing tests:

  • Qdrant tests: 19/19 passed
  • Safe deepcopy config tests: 94/94 passed

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Maintainer Checklist

🤖 Generated with Claude Code

utkarsh240799 and others added 4 commits March 19, 2026 17:55
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ollisions

- region defaults to 'gcp-us-central1' (turbopuffer SDK requires it)
- insert/update now set id/vector after payload to prevent key collisions
- added tests for key collision protection and default region

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@kartik-mem0 kartik-mem0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address the ci errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@utkarsh240799
Copy link
Copy Markdown
Contributor Author

utkarsh240799 commented Mar 19, 2026

please address the ci errors.

ci errors addressed @kartik-mem0

…espaces

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@whysosaket whysosaket merged commit bf9a570 into main Mar 21, 2026
9 checks passed
@whysosaket whysosaket deleted the feat/turbopuffer-integration branch March 21, 2026 13:58
lukaj99 added a commit to lukaj99/mem0 that referenced this pull request Mar 21, 2026
jamebobob pushed a commit to jamebobob/mem0-vigil-recall that referenced this pull request Mar 29, 2026
Co-authored-by: utkarsh240799 <utkarsh240799@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate turbopuffer as a vector database

3 participants