Skip to content

Implement QBit data type for approximate vector search#87922

Merged
rienath merged 11 commits intomasterfrom
qbit
Oct 2, 2025
Merged

Implement QBit data type for approximate vector search#87922
rienath merged 11 commits intomasterfrom
qbit

Conversation

@rienath
Copy link
Copy Markdown
Member

@rienath rienath commented Sep 30, 2025

Changelog category (leave one):

  • Experimental Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implement QBit data type that stores vectors in bit-sliced format and L2DistanceTransposed function that allows approximate vector search where precision-speed trade-off is controlled by a parameter.

Closes: #77088

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

QBit type

The QBit data type provides efficient vector storage for approximate similarity search. Under the hood, QBit stores data as a tuple of FixedStrings. Each FixedString column stores one bit position from all vector elements. For example:

  • Column 1 stores the 1st bit from every element of every vector
  • Column 2 stores the 2nd bit from every element of every vector
  • And so on...

Within each column, each row contains the bits from one vector's elements at that bit position. This allows reading only the most significant bits for faster approximate calculations, or more bits for higher accuracy.

Key features:
  • Vectors are stored at full precision while allowing fine-grained quantization at search time
  • Choose how many bits to read during search – fewer bits for faster queries, more bits for higher accuracy
Creating QBit
column_name QBit(element_type, dimension)

Where:

  • element_type: BFloat16, Float32, or Float64
  • dimension: number of elements in each vector

Example:

CREATE TABLE test (id UInt32, vec QBit(Float64, 3)) ENGINE = Memory;
INSERT INTO test VALUES (1, [0, 1, 2]);
SELECT * FROM test;
┌─id─┬─vec─────┐
│  1 │ [0,1,2] │
└────┴─────────┘
QBit subcolumns

Access individual bit planes using .N syntax where N is the bit position:

  • BFloat16: 16 subcolumns (.1 to .16)
  • Float32: 32 subcolumns (.1 to .32)
  • Float64: 64 subcolumns (.1 to .64)
Vector search functions

L2DistanceTransposed(vector1, vector2, p) - Calculates approximate Euclidean distance between vectors

  • vector1: QBit vector
  • vector2: Array vector
  • p: Number of bits to use (1 to element bit-width) - controls precision vs speed. The data will be read up to precision p, saving I/O.

Example:

-- Quantize Float64 to 16 bits
SELECT L2DistanceTransposed(vec, array(1.0, 2.0, 3.0), 16) FROM qbit;
┌─L2DistanceTr⋯., 3.], 16)─┐
│       1.7320508075688772 │
└──────────────────────────┘

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Sep 30, 2025

Workflow [PR], commit [f74b120]

@clickhouse-gh clickhouse-gh bot added the pr-experimental Experimental Feature label Sep 30, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Sep 30, 2025

@ClickHouse/integrations team, please, take a look

@Avogar Avogar self-assigned this Sep 30, 2025
@rienath
Copy link
Copy Markdown
Member Author

rienath commented Oct 2, 2025

PR / Install packages (amd_debug) (pull_request)

  • Unrelated: Not enough space for clickhouse binary in /usr/bin, required 5.81 GiB, available 5.30 GiB. (NOT_ENOUGH_SPACE)

The performance test failures are due to QBit being unavailable on master

@rienath rienath enabled auto-merge October 2, 2025 09:33
@rienath rienath added this pull request to the merge queue Oct 2, 2025
Merged via the queue into master with commit 045e795 Oct 2, 2025
12 of 22 checks passed
@rienath rienath deleted the qbit branch October 2, 2025 10:04
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 2, 2025
@rienath rienath mentioned this pull request Oct 10, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-experimental Experimental Feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bit-sliced format for vectors

3 participants