Storages: Optimize vector search in scenarios with updates by Lloyd-Pottiger · Pull Request #9597 · pingcap/tiflash

Lloyd-Pottiger · 2024-11-08T07:20:51Z

What problem does this PR solve?

Issue Number: ref #9600 , close #9599

Problem Summary:

What is changed and how it works?

Before, the read path of vector search is:

load stable vector index.
read all packs that contain TopK results.
load all ColumnFileTiny vector index.
read all ColumnFileTiny that contain TopK results.
read Memtable.

A simple example:

We have cf_0 (indexed, offset: 100-110), cf_1 (indexed, offset: 110-120), cf_3 (indexed, offset: 120-130), cf_4 (memtable, offset: 130-140), dmfile_0 (indexed, offset: 0-100, each 10 rows as a pack), to find top5:

load stable vector index: top5 is [0, 10, 13, 25, 99]
read all packs that contain TopK results: read pack_0, pack_1, pack_2, pack_9
load all ColumnFileTiny vector index: top5 is [100, 101, 115, 116, 119]
read all ColumnFileTiny that contain TopK results: read cf_0, cf_1
read Memtable: read cf_4.

So, in this example, we need to read 3 packs and 3 ColumnFileTinys.

This PR changes the read path to:

load stable vector index and all ColumnFileTiny vector index.
read all packs that contain TopK results.
read all ColumnFileTiny that contain TopK results.
read Memtable.

Still using the above example, now to find top5:

load stable vector index and all ColumnFileTiny vector index: top5 is [0, 10, 13, 100, 101]
read all packs that contain TopK results: read pack_0, pack_1
read all ColumnFileTiny that contain TopK results: read cf_0
read Memtable: read cf_4

Now, we only need to read 2 packs and 2 ColumnFileTinys.

Benchmark

Run VectorDBBench, about 750k stable, 250k delta with vector index, 4k delta without vector index.

Note: should disable compact tiflash replica

diff --git a/vectordb_bench/backend/clients/tidb_serverless/tidb.py b/vectordb_bench/backend/clients/tidb_serverless/tidb.py
index b927568..fe9d5e3 100644
--- a/vectordb_bench/backend/clients/tidb_serverless/tidb.py
+++ b/vectordb_bench/backend/clients/tidb_serverless/tidb.py
@@ -104,9 +104,9 @@ class TiDBServeless(VectorDB):
             else:
                 break
 
-        log.info("Begin compact tiflash replica")
-        self._compact_tiflash()
-        log.info("Successful compacted tiflash replica")
+        # log.info("Begin compact tiflash replica")
+        # self._compact_tiflash()
+        # log.info("Successful compacted tiflash replica")
 
         log_reduce_seq = 0
         while True:

Without this PR (after fix #9599)

"metrics": {
    "max_load_count": 0,
    "load_duration": 747.9345,
    "qps": 71.4146,
    "serial_latency_p99": 0.1081,
    "recall": 0.9327
}

With this PR

"metrics": {
    "max_load_count": 0,
    "load_duration": 778.9001,
    "qps": 125.046,
    "serial_latency_p99": 0.0851,
    "recall": 0.931
}

All data are in stable

"metrics": {
    "max_load_count": 0,
    "load_duration": 0.0,
    "qps": 205.3793,
    "serial_latency_p99": 0.0282,
    "recall": 0.9016
}

Improve 75% the performance of vector search in scenarios with updates.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

JaySon-Huang · 2024-11-11T03:01:01Z

Can you briefly describe what changes you've done in this PR?

Lloyd-Pottiger · 2024-11-11T04:04:33Z

Can you briefly describe what changes you've done in this PR?

done

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com> fix Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com> fix Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

JaySon-Huang · 2024-11-11T08:38:38Z

dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileTinyVectorIndexReader.cpp

+        // Each ColomnFileTiny has its own vector index, rowid_start_offset is the offset of the ColmnFilePersistSet.
+        vec_index->get(rowid - rowid_start_offset, value);


Note: these lines fix the recall issue: #9599

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

dbms/src/Storages/DeltaMerge/File/DMFileBlockInputStream.cpp

dbms/src/Storages/DeltaMerge/Filter/RSOperator.cpp

dbms/src/Storages/DeltaMerge/DeltaMergeStore_Statistics.cpp

CalvinNeo · 2024-11-12T07:51:46Z

Seem that what is changed is the order of these operations(the yellow and the green lines swapped their order). Could you explain more about why changing order can improve performance? I guess you can somehow reduce the vector index loaded from either ColumnFileTiny or Stable layer, but I am not so clear.

Lloyd-Pottiger · 2024-11-12T08:06:04Z

Seem that what is changed is the order of these operations(the yellow and the green lines swapped their order). Could you explain more about why changing order can improve performance? I guess you can somehow reduce the vector index loaded from either ColumnFileTiny or Stable layer, but I am not so clear.

Add an example in PR description.

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Lloyd-Pottiger · 2024-11-12T09:12:11Z

tiflash_127.speedscope.json

Flamegraph

ti-chi-bot · 2024-11-13T08:20:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: breezewish, JinheLin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [JinheLin,breezewish]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2024-11-13T08:20:07Z

[LGTM Timeline notifier]

Timeline:

2024-11-13 06:21:12.963395754 +0000 UTC m=+423635.154264753: ☑️ agreed by JinheLin.
2024-11-13 08:20:06.557816524 +0000 UTC m=+430768.748685523: ☑️ agreed by breezewish.

Lloyd-Pottiger · 2024-11-13T08:20:14Z

/hold

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Lloyd-Pottiger · 2024-11-13T08:23:09Z

/unhold

Lloyd-Pottiger · 2024-11-13T09:29:58Z

/cherry-pick release-8.5

ti-chi-bot · 2024-11-13T09:30:36Z

@Lloyd-Pottiger: new pull request created to branch release-8.5: #9609.

Details

In response to this:

/cherry-pick release-8.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

JaySon-Huang · 2024-11-13T07:41:09Z

dbms/src/Storages/DeltaMerge/Segment.cpp

+    auto * vector_index_stream = dynamic_cast<ConcatVectorIndexBlockInputStream *>(skippable_stream.get());
+    auto stream = std::make_shared<BitmapFilterBlockInputStream>(columns_to_read, skippable_stream, bitmap_filter);
+    if (vector_index_stream)
+    {
+        // Squash blocks to reduce the number of blocks.
+        return std::make_shared<SquashingBlockInputStream>(


Suggested change

auto * vector_index_stream = dynamic_cast<ConcatVectorIndexBlockInputStream *>(skippable_stream.get());

auto stream = std::make_shared<BitmapFilterBlockInputStream>(columns_to_read, skippable_stream, bitmap_filter);

if (vector_index_stream)

{

// Squash blocks to reduce the number of blocks.

return std::make_shared<SquashingBlockInputStream>(

auto stream = std::make_shared<BitmapFilterBlockInputStream>(columns_to_read, skippable_stream, bitmap_filter);

if (auto * vector_index_stream = dynamic_cast<ConcatVectorIndexBlockInputStream *>(skippable_stream.get()); vector_index_stream)

{

// For vector search, there are more likely to return small blocks from different sub-streams. Squash blocks to reduce the number of blocks thus improve the performance of upper layer.

return std::make_shared<SquashingBlockInputStream>(

JaySon-Huang · 2024-11-13T08:44:10Z

dbms/src/Storages/DeltaMerge/ConcatSkippableBlockInputStream.cpp

+    auto select_size = search_results.size() > topk ? topk : search_results.size();
+    auto top_k_end = search_results.begin() + select_size;
+    std::nth_element(search_results.begin(), top_k_end, search_results.end(), [](const auto & lhs, const auto & rhs) {
+        return lhs.distance < rhs.distance;
+    });
+    search_results.resize(select_size);
+    std::vector<UInt32> selected_rows;
+    selected_rows.reserve(search_results.size());
+    for (const auto & row : search_results)
+        selected_rows.push_back(row.key);


Suggested change

auto select_size = search_results.size() > topk ? topk : search_results.size();

auto top_k_end = search_results.begin() + select_size;

std::nth_element(search_results.begin(), top_k_end, search_results.end(), [](const auto & lhs, const auto & rhs) {

return lhs.distance < rhs.distance;

});

search_results.resize(select_size);

std::vector<UInt32> selected_rows;

selected_rows.reserve(search_results.size());

for (const auto & row : search_results)

selected_rows.push_back(row.key);

const auto select_size = std::min(search_results.size(), topk);

auto top_k_end = search_results.begin() + select_size;

std::nth_element(search_results.begin(), top_k_end, search_results.end(), [](const auto & lhs, const auto & rhs) {

return lhs.distance < rhs.distance;

});

std::vector<UInt32> selected_rows(select_size);

for (size_t i = 0; i < select_size; ++i)

selected_rows[i] = search_results[i].key);

This could avoid re-allocate the vector size

Better add a check for select_size != 0. Because we can not ensure the length of index_stream->load() is always larger than 0 in this class

JaySon-Huang · 2024-11-13T08:46:04Z

dbms/src/Storages/DeltaMerge/ConcatSkippableBlockInputStream.cpp

+
+    UInt32 precedes_rows = 0;
+    std::vector<VectorIndexViewer::SearchResult> search_results;
+    for (size_t i = 0; i < stream->children.size(); ++i)


Suggested change

for (size_t i = 0; i < stream->children.size(); ++i)

assert(stream->children.size() == index_streams.size()); // otherwise the `row.key` of the search result is not correct

for (size_t i = 0; i < stream->children.size(); ++i)

…9609) close #9599, ref #9600 Improve 75% the performance of vector search in scenarios with updates. Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com> Co-authored-by: Lloyd-Pottiger <yan1579196623@gmail.com> Co-authored-by: JaySon <tshent@qq.com>

Lloyd-Pottiger marked this pull request as draft November 11, 2024 03:03

ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2024

Lloyd-Pottiger changed the title ~~Storages: refactor vector search~~ Storages: Optimize vector search in update scenario Nov 11, 2024

Lloyd-Pottiger marked this pull request as ready for review November 11, 2024 04:02

ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2024

Lloyd-Pottiger changed the title ~~Storages: Optimize vector search in update scenario~~ Storages: Optimize vector search in scenarios with updates Nov 11, 2024

Storages: refactor vector search

83c44a1

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com> fix Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com> fix Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Lloyd-Pottiger force-pushed the op-vs branch from f0ddea4 to 83c44a1 Compare November 11, 2024 05:20

Lloyd-Pottiger requested review from CalvinNeo, JaySon-Huang, JinheLin and breezewish and removed request for breezewish November 11, 2024 05:56

refine

e632066

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

JaySon-Huang reviewed Nov 11, 2024

View reviewed changes

fix

2d2e164

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

JaySon-Huang reviewed Nov 11, 2024

View reviewed changes

dbms/src/Storages/DeltaMerge/File/DMFileBlockInputStream.cpp Show resolved Hide resolved

JaySon-Huang reviewed Nov 11, 2024

View reviewed changes

dbms/src/Storages/DeltaMerge/Filter/RSOperator.cpp Show resolved Hide resolved

JaySon-Huang reviewed Nov 11, 2024

View reviewed changes

dbms/src/Storages/DeltaMerge/DeltaMergeStore_Statistics.cpp Show resolved Hide resolved

Add SquashingBlockInputStream

bc3051b

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

JinheLin approved these changes Nov 13, 2024

View reviewed changes

ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Nov 13, 2024

breezewish approved these changes Nov 13, 2024

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 13, 2024

ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024

Lloyd-Pottiger and others added 2 commits November 13, 2024 16:22

comments

2e0bf24

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Merge branch 'master' into op-vs

3cd1a83

ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024

ti-chi-bot bot merged commit 9dc8374 into pingcap:master Nov 13, 2024

Lloyd-Pottiger deleted the op-vs branch November 13, 2024 09:29

ti-chi-bot mentioned this pull request Nov 13, 2024

Storages: Optimize vector search in scenarios with updates (#9597) #9609

Merged

12 tasks

JaySon-Huang reviewed Nov 13, 2024

View reviewed changes

JaySon-Huang added a commit to JaySon-Huang/tiflash that referenced this pull request Nov 14, 2024

Pick some review address from pingcap#9597

5ea0223

Lloyd-Pottiger mentioned this pull request Nov 15, 2024

Improve the MVCC performance with rows in the delta-layer #9616

Closed

		// Each ColomnFileTiny has its own vector index, rowid_start_offset is the offset of the ColmnFilePersistSet.
		vec_index->get(rowid - rowid_start_offset, value);

	for (size_t i = 0; i < stream->children.size(); ++i)
	assert(stream->children.size() == index_streams.size()); // otherwise the `row.key` of the search result is not correct
	for (size_t i = 0; i < stream->children.size(); ++i)

Conversation

Lloyd-Pottiger commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Benchmark

Check List

Release note

Uh oh!

JaySon-Huang commented Nov 11, 2024

Uh oh!

Lloyd-Pottiger commented Nov 11, 2024

Uh oh!

JaySon-Huang Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CalvinNeo commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lloyd-Pottiger commented Nov 12, 2024

Uh oh!

Lloyd-Pottiger commented Nov 12, 2024

Uh oh!

ti-chi-bot bot commented Nov 13, 2024

Uh oh!

ti-chi-bot bot commented Nov 13, 2024

[LGTM Timeline notifier]

Uh oh!

Lloyd-Pottiger commented Nov 13, 2024

Uh oh!

Lloyd-Pottiger commented Nov 13, 2024

Uh oh!

Lloyd-Pottiger commented Nov 13, 2024

Uh oh!

ti-chi-bot commented Nov 13, 2024

Uh oh!

JaySon-Huang Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

JaySon-Huang Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

JaySon-Huang Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

JaySon-Huang Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Lloyd-Pottiger commented Nov 8, 2024 •

edited

Loading

JaySon-Huang Nov 11, 2024 •

edited

Loading

CalvinNeo commented Nov 12, 2024 •

edited

Loading