Add multi-vector late interaction (ColBERT-style) index support

## Summary

Add a new index type for multi-vector (ColBERT/ColPali-style) retrieval alongside the existing `LSM_VECTOR` (HNSW) index. This unlocks citation-grade RAG for agentic use cases (primary consumer: ArcadeBrain).

## Motivation

The current `LSM_VECTOR` index stores one vector per document, which averages token-level signal and loses precision on:
- multi-concept queries
- rare terms / proper nouns
- long documents (books)
- citation-grade retrieval for LLM agents

ColBERT-style late interaction keeps one vector per token and computes MaxSim at query time. Published benchmarks (BEIR) show recall@10 improvements from ~60-70% (dense) to ~85-95% (late interaction) on hard queries. This is the dominant pattern in 2026 for high-precision RAG.

Competitors (Vespa, Qdrant, Weaviate, Milvus) already support this natively. ArcadeDB is the only graph+vector DB that does not.

## Scope (Phase 1: denormalized approach)

Each token of a document is indexed as its own HNSW entry carrying a back-pointer RID to the parent document. Query expands each query token against the HNSW graph, aggregates candidates by parent RID, then applies MaxSim (the existing `vector.multiScore(..., 'MAX')` function can be reused for the fusion step).

### Schema / Metadata
- [ ] Add `multiVector: boolean` and `parentRidProperty: string` fields to `LSMVectorIndexMetadata`
- [ ] Extend `TypeLSMVectorIndexBuilder` with `.withMultiVector(true).withParentProperty("docRid")`
- [ ] Parse new `multiVector` flag in SQL `CREATE INDEX ... METADATA {...}` path
- [ ] Validate: multi-vector index requires a non-indexed `LINK` property for parent RID

### Storage / Insert path
- [ ] Add `LSMVectorIndex.putMulti(RID parentRid, float[][] tokens)` that loops and inserts N entries, each tagged with parentRid
- [ ] Extend `VectorLocationIndex` entry with `parentRid` field
- [ ] Hook into record-update path: on parent update, delete old token set, insert new
- [ ] Hook into record-delete path: cascade-delete all tokens where `parentRid = deletedRid`

### Query path
- [ ] New SQL function `SQLFunctionVectorMultiNeighbors` in `function/sql/vector/` - signature `vector.multiNeighbors('Type[prop]', float[][], k, {efSearch, candidateMultiplier})`
- [ ] Algorithm: for each query token -> HNSW search top-k' (default k'=k*10); collect candidate parent RIDs; compute MaxSim per candidate; sort, return top-k
- [ ] Reuse existing `vector.multiScore(..., 'MAX')` aggregation
- [ ] Register alias `vectorMultiNeighbors` for naming consistency

### Cypher integration
- [ ] Add procedure `db.index.vector.queryMultiNodes(indexName, k, float[][])` in `query/opencypher/procedures/db/`

### Parameter binding
- [ ] Verify `float[][]` / nested JSON array binds correctly through `PostCommandHandler` / `PostQueryHandler`
- [ ] Test both SQL literal `[[0.1, 0.2],[0.3, 0.4]]` and HTTP JSON body param

### Tests (TDD)
- [ ] `LSMMultiVectorIndexTest` - 10 docs x 5 tokens x 64 dims, assert top-k matches brute-force MaxSim
- [ ] Regression: cascade delete, update parent vectors
- [ ] HTTP-level test in `server/` module with nested array param
- [ ] Cypher test for `db.index.vector.queryMultiNodes`
- [ ] 1k-docs end-to-end test tagged `@Tag("slow")`

## Out of Scope (future phases)

- **Phase 2**: Contiguous binary serialization for 2D float arrays (`ARRAY_OF_FLOATS_2D` type). Current `ARRAY_OF_FLOATS` uses VarInt-wrapped IEEE754 (~5 bytes/float worst case) - a dense format halves storage.
- **Phase 3**: Native `LSM_MULTIVECTOR` index type with PLAID centroid pre-filter, product quantization for multi-vector.

## Compatibility

- Additive only - new index type, existing `LSM_VECTOR` untouched.
- No new dependencies (JVector 4.0 already handles it).
- Zero breaking changes.

## Estimated effort

~2 weeks for one engineer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add multi-vector late interaction (ColBERT-style) index support #3970

Summary

Motivation

Scope (Phase 1: denormalized approach)

Schema / Metadata

Storage / Insert path

Query path

Cypher integration

Parameter binding

Tests (TDD)

Out of Scope (future phases)

Compatibility

Estimated effort

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add multi-vector late interaction (ColBERT-style) index support #3970

Description

Summary

Motivation

Scope (Phase 1: denormalized approach)

Schema / Metadata

Storage / Insert path

Query path

Cypher integration

Parameter binding

Tests (TDD)

Out of Scope (future phases)

Compatibility

Estimated effort

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions