Vector data¶

CrateDB natively supports vector embeddings for efficient similarity search using k-nearest neighbour (kNN) algorithms. This makes it a powerful engine for building AI-powered applications involving semantic search, recommendations, anomaly detection, and multimodal analytics, all in the simplicity of SQL.

Whether you’re working with text, images, sensor data, or any domain represented as high-dimensional embeddings, CrateDB enables real-time vector search at scale, in combination with other data types like full-text, geospatial, and time-series.

Data Type: FLOAT_VECTOR¶

CrateDB has a native FLOAT_VECTOR type type with the following key characteristics:

Fixed-length float arrays (1-2048 dimensions)
Backed by Lucene’s HNSW approximate nearest neighbor (ANN) search
Similarity and scoring exposed via KNN_MATCH and VECTOR_SIMILARITY.

Example: Define a Table with Vector Embeddings

CREATE TABLE documents (
  title TEXT,
  content TEXT,
  embedding FLOAT_VECTOR(3)
);

FLOAT_VECTOR(3) declares a vector column with 3 floats.

Ingestion: Working with Embeddings¶

You can ingest vectors in several ways:

Precomputed embeddings from models:
```
INSERT INTO documents (title, embedding)
VALUES ('AI and Databases', [0.12, 0.34, 0.01]);
```
You must insert the exact number of floats defined in the table or an error will be thrown.
Batched imports via COPY FROM using JSON or CSV.
CrateDB doesn’t currently compute embeddings internally — you bring your own model or use pipelines that call CrateDB.

Querying Vectors with SQL¶

Use KNN_MATCH to perform similarity search:

SELECT title, content, _score
FROM documents
WHERE knn_match(embedding, [3.14, 5.1, 8.2], 2)
ORDER BY _score DESC;

This ranks results by vector similarity to the vector supplied by searching top 2 nearest neighbours.

Vector data¶

Data Type: FLOAT_VECTOR¶

Ingestion: Working with Embeddings¶

Querying Vectors with SQL¶

See also¶