Skip to content

Feature: Zero-Disk-I/O Vector Search with Product Quantization #3150

@lvca

Description

@lvca

Summary

This PR introduces Product Quantization (PQ) support for LSMVectorIndex, enabling zero-disk-I/O approximate vector search with microsecond-level latency. PQ compresses vectors into memory-resident codes, allowing HNSW traversal and scoring without reading full-precision vectors from disk.

Motivation

In RAG systems, vector search latency is critical. Traditional HNSW search requires loading full-precision vectors from disk during graph traversal and re-ranking, resulting in 10-50ms latencies. By using PQ vectors already resident in memory, we can achieve sub-millisecond latencies while accepting a small recall trade-off.

New Features

1. New Quantization Type: PRODUCT

db.getSchema().buildTypeVectorIndex("Document", "embedding")
    .withDimensions(1024)
    .withSimilarity("COSINE")
    .withQuantization("PRODUCT")  // Enable Product Quantization
    .create();

2. PQ Configuration Options

Parameter Default Description
pqSubspaces auto (dims/4, max 512) Number of subspaces (M)
pqClusters 256 Clusters per subspace (K)
pqCenterGlobally true Global centering before encoding
pqTrainingLimit 128000 Max vectors for codebook training

3. Zero-Disk-I/O Search Method

LSMVectorIndex index = (LSMVectorIndex) db.getSchema().getIndexByName("myIndex");

// Approximate search using in-memory PQ vectors (microsecond latency)
List<Pair<RID, Float>> results = index.findNeighborsFromVectorApproximate(queryVector, 10);

// Falls back to exact search if PQ not available

4. Helper Methods

// Check if PQ is ready
boolean ready = index.isPQSearchAvailable();

// Get PQ vector count
int count = index.getPQVectorCount();

// Force rebuild graph + PQ
index.buildVectorGraphNow();

How It Works

Build Phase

  1. During graph construction, PQ codebook is trained on vectors
  2. All vectors are encoded to compressed PQ codes
  3. PQ data persisted to .vecpq file alongside graph

Search Phase

  1. Query vector used to precompute PQ distance tables
  2. HNSW traversal uses PQ scores (no disk I/O for vectors)
  3. Final ranking uses same PQ scores (no re-ranking from disk)

Update Behavior (Eventual Consistency)

  • Single vector updates do NOT immediately update PQ
  • PQ is rebuilt with graph on:
    • Next search (lazy rebuild when mutations pending)
    • Mutation threshold reached (default: 1000)
    • Explicit buildVectorGraphNow() call

Performance Characteristics

Metric Exact Search PQ Approximate Search
Latency 10-50ms (disk-bound) <1ms (CPU/RAM-bound)
Memory Vectors on disk ~1/16 to 1/64 of vector size
Recall 100% ~95-99% (depends on PQ config)

Files Changed

  • VectorQuantizationType.java - Added PRODUCT enum
  • LSMVectorIndexMetadata.java - Added PQ configuration fields
  • LSMVectorIndexPQFile.java - New file for PQ persistence
  • LSMVectorIndex.java - PQ building, loading, and approximate search

Usage Example

// Create database and type
Database db = new DatabaseFactory("./mydb").create();
db.getSchema().createDocumentType("Document")
    .createProperty("embedding", Type.ARRAY_OF_FLOATS);

// Create PQ-enabled vector index
db.getSchema().buildTypeVectorIndex("Document", "embedding")
    .withDimensions(768)
    .withSimilarity("COSINE")
    .withQuantization("PRODUCT")
    .withPQSubspaces(192)  // 768/4 = 192 subspaces
    .create();

// Insert vectors
db.begin();
for (float[] vector : vectors) {
    MutableDocument doc = db.newDocument("Document");
    doc.set("embedding", vector);
    doc.save();
}
db.commit();

// Search (first search triggers PQ build)
LSMVectorIndex index = (LSMVectorIndex) db.getSchema()
    .getIndexByName("Document[embedding]");
    
List<Pair<RID, Float>> results = index.findNeighborsFromVectorApproximate(query, 10);

References

Metadata

Metadata

Assignees

Labels

in progressThe team is actively working on this issue

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions