Skip to content

*: support vector search#9486

Merged
ti-chi-bot[bot] merged 30 commits intomasterfrom
feature/vector-index
Sep 30, 2024
Merged

*: support vector search#9486
ti-chi-bot[bot] merged 30 commits intomasterfrom
feature/vector-index

Conversation

@Lloyd-Pottiger
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger commented Sep 27, 2024

What problem does this PR solve?

Issue Number: close #9032

Problem Summary:

What is changed and how it works?

*: support vector search

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Lloyd-Pottiger and others added 21 commits July 31, 2024 15:08
ref #9032

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: JaySon-Huang <tshent@qq.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref #9032

Co-authored-by: JaySon-Huang <tshent@qq.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref #9032

storage: Use mmap to view vector index

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: JaySon-Huang <tshent@qq.com>
ref #9032

storage: Add vector search metrics

Signed-off-by: Wish <breezewish@outlook.com>

Co-authored-by: Wenxuan <breezewish@outlook.com>
ref #9032

*: use SimSIMD for vectors

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
ref #9032

storage: Add system.dt_local_indexes

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
ref #9032

DMFile: Support modify DMFile meta

---------

Signed-off-by: Wish <breezewish@outlook.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Co-authored-by: Wenxuan <breezewish@outlook.com>
Co-authored-by: JaySon <tshent@qq.com>
ref #9032

storage: Force evict when downloading vector index files

Signed-off-by: Wish <breezewish@outlook.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: Wenxuan <breezewish@outlook.com>
ref #9032

storage: add local indexer scheduler

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref #9032

storage: Support adding vector index in background

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
close #9032

storage: Abort vector index building as soon as possible

Signed-off-by: Wish <breezewish@outlook.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: Wenxuan <breezewish@outlook.com>
Co-authored-by: Lloyd-Pottiger <60744015+Lloyd-Pottiger@users.noreply.github.com>
ref #9032

ddl: Support parsing VectorIndex defined in IndexInfo

Co-authored-by: JaySon <tshent@qq.com>
ref #9032

storage: support the HTTP API of sync table schema

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: Lynn <zimu_xia@126.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref #9032

storage: cache PK column in memory

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…yncTableSchema (#9451)

ref #9032

*: support vector index and adding/dropping vector index when doing syncTableSchema

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: JaySon <tshent@qq.com>
ref #9032

ddl: Adapt with the latest vector index def
ref #9032

Storage: Support multiple vec indexes on the same column

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…is dropped (#9475)

### What problem does this PR solve?

Issue Number: ref #9032

Problem Summary:

### What is changed and how it works?

Pick tidbcloud/tiflash-cse#283 and
tidbcloud/tiflash-cse#300

* Unify the logic of `generateLocalIndexInfos` and `initLocalIndexInfos`
* Print 1 logging for the vector index added/dropped/existing in one
table instead. This can avoid the flood of logging when tiflash restart
with lots of table with vector index defined
* Support drop the vector index defined on ColumnInfo after the Column
has been dropped in TiDB
* Add more ut in the DeltaMergeStore read level
* vector search fallback when top_k = max uint32

```commit-message

```

### Check List

Tests <!-- At least one of them must be included. -->

- [ ] Unit test
- [ ] Integration test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No code

Side effects

- [ ] Performance regression: Consumes more CPU
- [ ] Performance regression: Consumes more Memory
- [ ] Breaking backward compatibility

Documentation

- [ ] Affects user behaviors
- [ ] Contains syntax changes
- [ ] Contains variable changes
- [ ] Contains experimental features
- [ ] Changes MySQL compatibility

### Release note

<!-- bugfix or new feature needs a release note -->

```release-note
None
```

---------

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: jinhelin <linjinhe33@gmail.com>
@Lloyd-Pottiger Lloyd-Pottiger marked this pull request as draft September 27, 2024 08:59
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 27, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 27, 2024

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 27, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 27, 2024

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

JaySon-Huang and others added 3 commits September 27, 2024 09:55
)

close #9485

vector: Fix ColumnArray does not work well with CHBlockChunkCodec

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Storage: Add error message when fail to build local index
JaySon-Huang and others added 4 commits September 29, 2024 09:16
…oat32), Nullable(Array(Float32))" (#9490)

ref #9032

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
storage: remove vector_index in column level
@JaySon-Huang JaySon-Huang marked this pull request as ready for review September 30, 2024 05:00
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 30, 2024
@JaySon-Huang JaySon-Huang removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2024
@JaySon-Huang JaySon-Huang reopened this Sep 30, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporary disable clang-tidy for this PR because too many file changes and it takes too long

Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. labels Sep 30, 2024
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 30, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 30, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-09-30 08:01:06.261755596 +0000 UTC m=+255421.681968632: ☑️ agreed by JaySon-Huang.
  • 2024-09-30 08:13:12.453821332 +0000 UTC m=+256147.874034343: ☑️ agreed by zanmato1984.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 30, 2024

@zimulala: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, zanmato1984, zimulala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,zanmato1984]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector Search

4 participants