Skip to content

Intern task: Quantization#77018

Closed
nikita4109 wants to merge 36 commits intoClickHouse:masterfrom
nikita4109:quantization
Closed

Intern task: Quantization#77018
nikita4109 wants to merge 36 commits intoClickHouse:masterfrom
nikita4109:quantization

Conversation

@nikita4109
Copy link
Copy Markdown

@nikita4109 nikita4109 commented Mar 2, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Functions for quantizations

  • Added new functions for packing and unpacking floating-point vector arrays into/from FixedStrings with various quantization levels (16-bit, 8-bit, 4-bit, and 1-bit)
  • Implemented optimized distance calculation functions (L2 distance, cosine similarity) that operate directly on quantized vectors
  • These functions significantly improve performance for vector embedding search operations while reducing memory usage

These additions enable more efficient storage and searching of vector embeddings, which are essential for semantic search, recommendation systems, and other machine learning applications.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Performance Benchmark Results

Random vectors

Quantization Level L2 Distance (seconds) Quantized L2 Distance (seconds)
16-bit 0.5948 0.5588
8-bit 0.5823 0.3221
4-bit 0.5866 0.1694
1-bit 0.5836 0.0353
Quantization Level Cosine Distance (seconds) Quantized Cosine Distance (seconds)
16-bit 0.9454 0.6906
8-bit 0.9288 0.5229
4-bit 0.9437 0.2794
1-bit 0.9239 0.0576

Hacker News comments

Quantization Level L2 Distance (seconds) Quantized L2 Distance (seconds)
16-bit 26.6642 20.4774
8-bit 26.8680 6.8990
4-bit 20.1942 4.1797
1-bit 29.3374 1.0389
Quantization Level Cosine Distance (seconds) Quantized Cosine Distance (seconds)
16-bit 32.4704 24.6458
8-bit 38.5771 10.2922
4-bit 33.7280 6.0128
1-bit 32.7070 1.4331

Quality Comparison

Experimental Setup

  • Input Data: 500 random vectors with 2048 dimensions
  • Query Set: First 10 vectors used as query vectors
  • Evaluation Metric: Precision and recall based on top-10 nearest neighbors

Methods Tested

  1. Quantization Methods:

    • 16-bit quantization (half-precision float)
    • 8-bit quantization (two formats: SFP and Minifloat)
    • 4-bit quantization
    • 1-bit quantization (binary)
  2. Distance Metrics:

    • L2 (Euclidean) distance
    • Cosine distance
Quantization Method Distance Metric Precision Recall Avg. Matches (of 10) Storage Reduction
16-bit L2 99.11% 99.11% 9.82 50%
SFP 8-bit L2 7.64% 7.34% 0.56 75%
Minifloat 8-bit L2 4.07% 4.06% 0.38 75%
4-bit L2 1.98% 1.98% 0.20 87.5%
1-bit L2 1.98% 1.98% 0.20 97.5%
16-bit Cosine 99.11% 99.11% 9.82 50%
SFP 8-bit Cosine 92.33% 92.33% 9.14 75%
Minifloat 8-bit Cosine 52.95% 52.99% 5.23 75%
4-bit Cosine 8.89% 8.99% 0.88 87.5%
1-bit Cosine 1.98% 1.98% 0.20 97.5%

@thevar1able thevar1able added the can be tested Allows running workflows for external contributors label Mar 2, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 2, 2025

Workflow [PR], commit [126c869]

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Mar 2, 2025
@alexey-milovidov
Copy link
Copy Markdown
Member

Thanks! Please take a look at the bugs found by fuzzers.
Can we also add a comparison on the quality?

@rschu1ze rschu1ze mentioned this pull request Mar 5, 2025
@rschu1ze rschu1ze changed the title Quantization Intern task: Quantization Mar 22, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Sep 11, 2025

Workflow [PR], commit [174dcc9]

Summary:
15 failures out of 106 shown:

job_name test_name status info comment
Fast test failure
02415_all_new_functions_must_be_documented FAIL
Build (amd_debug) dropped
Build (amd_release) dropped
Build (amd_asan) dropped
Build (amd_tsan) dropped
Build (amd_msan) dropped
Build (amd_ubsan) dropped
Build (amd_binary) dropped
Build (arm_release) dropped
Build (arm_asan) dropped
Build (arm_coverage) dropped
Build (arm_binary) dropped
Build (amd_darwin) dropped
Build (arm_darwin) dropped
Build (arm_v80compat) dropped

Comment on lines +3 to +11
#include <Columns/ColumnConst.h>
#include <Columns/ColumnFixedString.h>
#include <DataTypes/DataTypeFixedString.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/IFunction.h>
#include <base/types.h>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all includes are actually used. Removing unnecessary ones (in all committed .cpp/h files) would improve readability, build times and prevent transitive includes

Comment on lines +399 to +410
REGISTER_FUNCTION(Quantize16Bit)
{
FunctionDocumentation::Description description = " ";
FunctionDocumentation::Syntax syntax = " ";
FunctionDocumentation::Arguments argument = {{" ", " "}};
FunctionDocumentation::ReturnedValue returned_value = {" "};
FunctionDocumentation::Examples examples = {{" ", " ", " "}};
FunctionDocumentation::IntroducedIn introduced_in = {25, 10};
FunctionDocumentation::Category categories = FunctionDocumentation::Category::Unknown;
FunctionDocumentation documentation = {description, syntax, argument, returned_value, examples, introduced_in, categories};
factory.registerFunction<FunctionQuantize16Bit>(documentation);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added documentation templates in all files where needed. Please fill them up to help the users to use this feature

#include <cstdint>


namespace DB
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is difficult to understand what is happening without any context. To make it easier, add a comment to explain what this file is for at the top of the.h files. Refer to any of these for examples:


String getName() const override { return name; }
size_t getNumberOfArguments() const override { return 1; }
bool isInjective(const ColumnsWithTypeAndName &) const override { return false; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's false by default, we don't need to override

<fill_query>

ALTER TABLE test.vectors
UPDATE vector_quantized = quantize8Bit(vector, 2048)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not exist

<fill_query>

ALTER TABLE test.vectors
UPDATE vector_quantized = quantize8Bit(vector, 2048)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +68 to +75
ADD COLUMN vector_quantized FixedString(4096);

</fill_query>

<fill_query>

ALTER TABLE test.vectors
UPDATE vector_quantized = quantize16Bit(vector, 2048)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need FixedString(4096) if we then use only 2048 bytes?

return std::make_shared<DataTypeFloat32>();
}

ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
Copy link
Copy Markdown
Member

@rienath rienath Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) CREATE TABLE sfp8 (`id` String, `quantized` FixedString(384)) ENGINE = MergeTree ORDER BY id;
:) INSERT INTO sfp8 SELECT id, quantizeSFP8Bit(vector, 384) FROM hackernews;

Code: 49.DB::Exception: Block structure mismatch in function connect between ApplySquashingTransform and ConvertingTransform stream: different columns:
quantized FixedString(384) FixedString(size = 0)
quantized FixedString(384) FixedString(size = 0). (LOGICAL_ERROR)

But with an extra byte it works

:) CREATE TABLE sfp8 (`id` String, `quantized` FixedString(385)) ENGINE = MergeTree ORDER BY id;

Same story with quantizeMini8Bit

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the small version of hackernews so that you don't have to download the huge one

@rienath rienath self-assigned this Nov 24, 2025
@rienath
Copy link
Copy Markdown
Member

rienath commented Nov 24, 2025

Closing for now due to missing documentation and a bug, but hope someone can build on this work later. @nikita4109 maybe you will finish it one day :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature unfinished code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants