-
-
Notifications
You must be signed in to change notification settings - Fork 94
Closed
Labels
in progressThe team is actively working on this issueThe team is actively working on this issue
Milestone
Description
Summary
This PR introduces Product Quantization (PQ) support for LSMVectorIndex, enabling zero-disk-I/O approximate vector search with microsecond-level latency. PQ compresses vectors into memory-resident codes, allowing HNSW traversal and scoring without reading full-precision vectors from disk.
Motivation
In RAG systems, vector search latency is critical. Traditional HNSW search requires loading full-precision vectors from disk during graph traversal and re-ranking, resulting in 10-50ms latencies. By using PQ vectors already resident in memory, we can achieve sub-millisecond latencies while accepting a small recall trade-off.
New Features
1. New Quantization Type: PRODUCT
db.getSchema().buildTypeVectorIndex("Document", "embedding")
.withDimensions(1024)
.withSimilarity("COSINE")
.withQuantization("PRODUCT") // Enable Product Quantization
.create();2. PQ Configuration Options
| Parameter | Default | Description |
|---|---|---|
pqSubspaces |
auto (dims/4, max 512) | Number of subspaces (M) |
pqClusters |
256 | Clusters per subspace (K) |
pqCenterGlobally |
true | Global centering before encoding |
pqTrainingLimit |
128000 | Max vectors for codebook training |
3. Zero-Disk-I/O Search Method
LSMVectorIndex index = (LSMVectorIndex) db.getSchema().getIndexByName("myIndex");
// Approximate search using in-memory PQ vectors (microsecond latency)
List<Pair<RID, Float>> results = index.findNeighborsFromVectorApproximate(queryVector, 10);
// Falls back to exact search if PQ not available4. Helper Methods
// Check if PQ is ready
boolean ready = index.isPQSearchAvailable();
// Get PQ vector count
int count = index.getPQVectorCount();
// Force rebuild graph + PQ
index.buildVectorGraphNow();How It Works
Build Phase
- During graph construction, PQ codebook is trained on vectors
- All vectors are encoded to compressed PQ codes
- PQ data persisted to
.vecpqfile alongside graph
Search Phase
- Query vector used to precompute PQ distance tables
- HNSW traversal uses PQ scores (no disk I/O for vectors)
- Final ranking uses same PQ scores (no re-ranking from disk)
Update Behavior (Eventual Consistency)
- Single vector updates do NOT immediately update PQ
- PQ is rebuilt with graph on:
- Next search (lazy rebuild when mutations pending)
- Mutation threshold reached (default: 1000)
- Explicit
buildVectorGraphNow()call
Performance Characteristics
| Metric | Exact Search | PQ Approximate Search |
|---|---|---|
| Latency | 10-50ms (disk-bound) | <1ms (CPU/RAM-bound) |
| Memory | Vectors on disk | ~1/16 to 1/64 of vector size |
| Recall | 100% | ~95-99% (depends on PQ config) |
Files Changed
VectorQuantizationType.java- AddedPRODUCTenumLSMVectorIndexMetadata.java- Added PQ configuration fieldsLSMVectorIndexPQFile.java- New file for PQ persistenceLSMVectorIndex.java- PQ building, loading, and approximate search
Usage Example
// Create database and type
Database db = new DatabaseFactory("./mydb").create();
db.getSchema().createDocumentType("Document")
.createProperty("embedding", Type.ARRAY_OF_FLOATS);
// Create PQ-enabled vector index
db.getSchema().buildTypeVectorIndex("Document", "embedding")
.withDimensions(768)
.withSimilarity("COSINE")
.withQuantization("PRODUCT")
.withPQSubspaces(192) // 768/4 = 192 subspaces
.create();
// Insert vectors
db.begin();
for (float[] vector : vectors) {
MutableDocument doc = db.newDocument("Document");
doc.set("embedding", vector);
doc.save();
}
db.commit();
// Search (first search triggers PQ build)
LSMVectorIndex index = (LSMVectorIndex) db.getSchema()
.getIndexByName("Document[embedding]");
List<Pair<RID, Float>> results = index.findNeighborsFromVectorApproximate(query, 10);References
- JVector Library - Underlying vector search engine
- Product Quantization Paper - Original PQ research
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
in progressThe team is actively working on this issueThe team is actively working on this issue