Skip to content

Storages: Optimize vector search in scenarios with updates#9597

Merged
ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
Lloyd-Pottiger:op-vs
Nov 13, 2024
Merged

Storages: Optimize vector search in scenarios with updates#9597
ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
Lloyd-Pottiger:op-vs

Conversation

@Lloyd-Pottiger
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger commented Nov 8, 2024

What problem does this PR solve?

Issue Number: ref #9600 , close #9599

Problem Summary:

What is changed and how it works?

Before, the read path of vector search is:

  1. load stable vector index.
  2. read all packs that contain TopK results.
  3. load all ColumnFileTiny vector index.
  4. read all ColumnFileTiny that contain TopK results.
  5. read Memtable.

A simple example:

We have cf_0 (indexed, offset: 100-110), cf_1 (indexed, offset: 110-120), cf_3 (indexed, offset: 120-130), cf_4 (memtable, offset: 130-140), dmfile_0 (indexed, offset: 0-100, each 10 rows as a pack), to find top5:

  1. load stable vector index: top5 is [0, 10, 13, 25, 99]
  2. read all packs that contain TopK results: read pack_0, pack_1, pack_2, pack_9
  3. load all ColumnFileTiny vector index: top5 is [100, 101, 115, 116, 119]
  4. read all ColumnFileTiny that contain TopK results: read cf_0, cf_1
  5. read Memtable: read cf_4.

So, in this example, we need to read 3 packs and 3 ColumnFileTinys.

This PR changes the read path to:

  1. load stable vector index and all ColumnFileTiny vector index.
  2. read all packs that contain TopK results.
  3. read all ColumnFileTiny that contain TopK results.
  4. read Memtable.

Still using the above example, now to find top5:

  1. load stable vector index and all ColumnFileTiny vector index: top5 is [0, 10, 13, 100, 101]
  2. read all packs that contain TopK results: read pack_0, pack_1
  3. read all ColumnFileTiny that contain TopK results: read cf_0
  4. read Memtable: read cf_4

Now, we only need to read 2 packs and 2 ColumnFileTinys.

Benchmark

Run VectorDBBench, about 750k stable, 250k delta with vector index, 4k delta without vector index.

Note: should disable compact tiflash replica

diff --git a/vectordb_bench/backend/clients/tidb_serverless/tidb.py b/vectordb_bench/backend/clients/tidb_serverless/tidb.py
index b927568..fe9d5e3 100644
--- a/vectordb_bench/backend/clients/tidb_serverless/tidb.py
+++ b/vectordb_bench/backend/clients/tidb_serverless/tidb.py
@@ -104,9 +104,9 @@ class TiDBServeless(VectorDB):
             else:
                 break
 
-        log.info("Begin compact tiflash replica")
-        self._compact_tiflash()
-        log.info("Successful compacted tiflash replica")
+        # log.info("Begin compact tiflash replica")
+        # self._compact_tiflash()
+        # log.info("Successful compacted tiflash replica")
 
         log_reduce_seq = 0
         while True:

Without this PR (after fix #9599)

"metrics": {
    "max_load_count": 0,
    "load_duration": 747.9345,
    "qps": 71.4146,
    "serial_latency_p99": 0.1081,
    "recall": 0.9327
}

With this PR

"metrics": {
    "max_load_count": 0,
    "load_duration": 778.9001,
    "qps": 125.046,
    "serial_latency_p99": 0.0851,
    "recall": 0.931
}

All data are in stable

"metrics": {
    "max_load_count": 0,
    "load_duration": 0.0,
    "qps": 205.3793,
    "serial_latency_p99": 0.0282,
    "recall": 0.9016
}
Improve 75% the performance of vector search in scenarios with updates.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 8, 2024
@JaySon-Huang
Copy link
Contributor

Can you briefly describe what changes you've done in this PR?

@Lloyd-Pottiger Lloyd-Pottiger marked this pull request as draft November 11, 2024 03:03
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2024
@Lloyd-Pottiger Lloyd-Pottiger changed the title Storages: refactor vector search Storages: Optimize vector search in update scenario Nov 11, 2024
@Lloyd-Pottiger Lloyd-Pottiger marked this pull request as ready for review November 11, 2024 04:02
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2024
@Lloyd-Pottiger Lloyd-Pottiger changed the title Storages: Optimize vector search in update scenario Storages: Optimize vector search in scenarios with updates Nov 11, 2024
@Lloyd-Pottiger
Copy link
Contributor Author

Can you briefly describe what changes you've done in this PR?

done

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

fix

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

fix

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Comment on lines +41 to +42
// Each ColomnFileTiny has its own vector index, rowid_start_offset is the offset of the ColmnFilePersistSet.
vec_index->get(rowid - rowid_start_offset, value);
Copy link
Contributor

@JaySon-Huang JaySon-Huang Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: these lines fix the recall issue: #9599

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
@CalvinNeo
Copy link
Member

CalvinNeo commented Nov 12, 2024

image

Seem that what is changed is the order of these operations(the yellow and the green lines swapped their order). Could you explain more about why changing order can improve performance? I guess you can somehow reduce the vector index loaded from either ColumnFileTiny or Stable layer, but I am not so clear.

@Lloyd-Pottiger
Copy link
Contributor Author

image

Seem that what is changed is the order of these operations(the yellow and the green lines swapped their order). Could you explain more about why changing order can improve performance? I guess you can somehow reduce the vector index loaded from either ColumnFileTiny or Stable layer, but I am not so clear.

Add an example in PR description.

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
@Lloyd-Pottiger
Copy link
Contributor Author

tiflash_127.speedscope.json

Flamegraph

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Nov 13, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Nov 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: breezewish, JinheLin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [JinheLin,breezewish]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 13, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Nov 13, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-11-13 06:21:12.963395754 +0000 UTC m=+423635.154264753: ☑️ agreed by JinheLin.
  • 2024-11-13 08:20:06.557816524 +0000 UTC m=+430768.748685523: ☑️ agreed by breezewish.

@Lloyd-Pottiger
Copy link
Contributor Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024
Lloyd-Pottiger and others added 2 commits November 13, 2024 16:22
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
@Lloyd-Pottiger
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024
@ti-chi-bot ti-chi-bot bot merged commit 9dc8374 into pingcap:master Nov 13, 2024
@Lloyd-Pottiger Lloyd-Pottiger deleted the op-vs branch November 13, 2024 09:29
@Lloyd-Pottiger
Copy link
Contributor Author

/cherry-pick release-8.5

@ti-chi-bot
Copy link
Member

@Lloyd-Pottiger: new pull request created to branch release-8.5: #9609.

Details

In response to this:

/cherry-pick release-8.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Comment on lines +3524 to +3529
auto * vector_index_stream = dynamic_cast<ConcatVectorIndexBlockInputStream *>(skippable_stream.get());
auto stream = std::make_shared<BitmapFilterBlockInputStream>(columns_to_read, skippable_stream, bitmap_filter);
if (vector_index_stream)
{
// Squash blocks to reduce the number of blocks.
return std::make_shared<SquashingBlockInputStream>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto * vector_index_stream = dynamic_cast<ConcatVectorIndexBlockInputStream *>(skippable_stream.get());
auto stream = std::make_shared<BitmapFilterBlockInputStream>(columns_to_read, skippable_stream, bitmap_filter);
if (vector_index_stream)
{
// Squash blocks to reduce the number of blocks.
return std::make_shared<SquashingBlockInputStream>(
auto stream = std::make_shared<BitmapFilterBlockInputStream>(columns_to_read, skippable_stream, bitmap_filter);
if (auto * vector_index_stream = dynamic_cast<ConcatVectorIndexBlockInputStream *>(skippable_stream.get()); vector_index_stream)
{
// For vector search, there are more likely to return small blocks from different sub-streams. Squash blocks to reduce the number of blocks thus improve the performance of upper layer.
return std::make_shared<SquashingBlockInputStream>(

Comment on lines +213 to +222
auto select_size = search_results.size() > topk ? topk : search_results.size();
auto top_k_end = search_results.begin() + select_size;
std::nth_element(search_results.begin(), top_k_end, search_results.end(), [](const auto & lhs, const auto & rhs) {
return lhs.distance < rhs.distance;
});
search_results.resize(select_size);
std::vector<UInt32> selected_rows;
selected_rows.reserve(search_results.size());
for (const auto & row : search_results)
selected_rows.push_back(row.key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto select_size = search_results.size() > topk ? topk : search_results.size();
auto top_k_end = search_results.begin() + select_size;
std::nth_element(search_results.begin(), top_k_end, search_results.end(), [](const auto & lhs, const auto & rhs) {
return lhs.distance < rhs.distance;
});
search_results.resize(select_size);
std::vector<UInt32> selected_rows;
selected_rows.reserve(search_results.size());
for (const auto & row : search_results)
selected_rows.push_back(row.key);
const auto select_size = std::min(search_results.size(), topk);
auto top_k_end = search_results.begin() + select_size;
std::nth_element(search_results.begin(), top_k_end, search_results.end(), [](const auto & lhs, const auto & rhs) {
return lhs.distance < rhs.distance;
});
std::vector<UInt32> selected_rows(select_size);
for (size_t i = 0; i < select_size; ++i)
selected_rows[i] = search_results[i].key);

This could avoid re-allocate the vector size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add a check for select_size != 0. Because we can not ensure the length of index_stream->load() is always larger than 0 in this class


UInt32 precedes_rows = 0;
std::vector<VectorIndexViewer::SearchResult> search_results;
for (size_t i = 0; i < stream->children.size(); ++i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (size_t i = 0; i < stream->children.size(); ++i)
assert(stream->children.size() == index_streams.size()); // otherwise the `row.key` of the search result is not correct
for (size_t i = 0; i < stream->children.size(); ++i)

JaySon-Huang added a commit to JaySon-Huang/tiflash that referenced this pull request Nov 14, 2024
ti-chi-bot bot pushed a commit that referenced this pull request Nov 14, 2024
…9609)

close #9599, ref #9600

Improve 75% the performance of vector search in scenarios with updates.

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Co-authored-by: JaySon <tshent@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector search lower recall when there is multiple ColumnFile in delta layer with vector index

6 participants