[Fix](partial-update) Correct the alignment process when the table has sequence column and add cases #25346

bobhan1 · 2023-10-12T06:56:12Z

Proposed changes

This PR fix the alignment process during publish phase when conflict occurs during concurrent partial updates: if we encounter a row with the same key and larger value in sequence column, it means that there exists another load which introduces a row with the same keys and larger sequence column value published successfully after the commit phase of the current load. We should act as follows:

If the columns we update include sequence column, we should delete the current row becase the partial update on the current row has been overwritten by the previous one with larger sequence column value.
Otherwise, we should combine the values of the missing columns in the previous row and the values of the including columns in the current row into a new row.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

bobhan1 · 2023-10-12T06:56:23Z

run buildall

bobhan1 · 2023-10-12T07:00:54Z

run buildall

dataroaring

LGTM

github-actions · 2023-10-12T09:59:08Z

PR approved by at least one committer and no changes requested.

github-actions · 2023-10-12T09:59:11Z

PR approved by anyone and no changes requested.

liaoxin01

LGTM

regression-test/suites/unique_with_mow_p0/partial_update/test_partial_update_parallel.groovy

bobhan1 · 2023-10-12T11:18:13Z

run buildall

bobhan1 · 2023-10-13T11:20:47Z

run buildall

zhannngchen

LGTM

github-actions · 2023-10-13T11:27:15Z

PR approved by at least one committer and no changes requested.

github-actions · 2023-10-13T11:27:28Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-10-16T02:04:09Z

clang-tidy review says "All clean, LGTM! 👍"

bobhan1 · 2023-10-16T03:10:15Z

run buildall

github-actions · 2023-10-16T03:19:41Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-10-17T07:05:16Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-10-17T07:05:53Z

clang-tidy review says "All clean, LGTM! 👍"

doris-robot · 2023-10-17T07:39:56Z

TeamCity be ut coverage result:
Function Coverage: 37.02% (8284/22376)
Line Coverage: 29.07% (66450/228614)
Region Coverage: 27.75% (34473/124230)
Branch Coverage: 24.38% (17521/71878)
Coverage Report: http://coverage.selectdb-in.cc/coverage/de1190a37b147592c2a5a260370549840943ce0d_de1190a37b147592c2a5a260370549840943ce0d/report/index.html

doris-robot · 2023-10-17T08:02:01Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.1 seconds
stream load tsv: 554 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162117359 Bytes

bobhan1 · 2023-10-17T08:28:20Z

run p0

bobhan1 · 2023-10-17T09:07:04Z

run p0

bobhan1 · 2023-10-17T09:12:36Z

run buildall

github-actions · 2023-10-17T09:31:42Z

clang-tidy review says "All clean, LGTM! 👍"

doris-robot · 2023-10-17T09:41:15Z

TeamCity be ut coverage result:
Function Coverage: 37.01% (8280/22374)
Line Coverage: 29.05% (66408/228561)
Region Coverage: 27.74% (34461/124213)
Branch Coverage: 24.36% (17508/71884)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3246ed229fde9be1b599ca58cb8a7840f0ac7d90_3246ed229fde9be1b599ca58cb8a7840f0ac7d90/report/index.html

doris-robot · 2023-10-17T09:56:44Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.11 seconds
stream load tsv: 556 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17161999736 Bytes

zhannngchen

LGTM

zhannngchen · 2023-10-18T01:46:16Z

run p0

…s when the table has sequence column and add cases #25346" (#25789)

…s sequence column and add cases (apache#25346) This PR fix the alignment process during publish phase when conflict occurs during concurrent partial updates: if we encounter a row with the same key and larger value in sequence column, it means that there exists another load which introduces a row with the same keys and larger sequence column value published successfully after the commit phase of the current load. We should act as follows: - If the columns we update include sequence column, we should delete the current row becase the partial update on the current row has been overwritten by the previous one with larger sequence column value. - Otherwise, we should combine the values of the missing columns in the previous row and the values of the including columns in the current row into a new row.

…update (#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert #25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.

…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.

bobhan1 force-pushed the coverage-row-store-concurrent-partial-update branch from 4c04b73 to cb55e9a Compare October 12, 2023 06:58

dataroaring previously approved these changes Oct 12, 2023

View reviewed changes

dataroaring added the dev/2.0.3 label Oct 12, 2023

github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 12, 2023

github-actions bot added the reviewed label Oct 12, 2023

liaoxin01 approved these changes Oct 12, 2023

View reviewed changes

zhannngchen reviewed Oct 12, 2023

View reviewed changes

regression-test/suites/unique_with_mow_p0/partial_update/test_partial_update_parallel.groovy Show resolved Hide resolved

bobhan1 dismissed dataroaring’s stale review via b8def92 October 12, 2023 11:17

bobhan1 force-pushed the coverage-row-store-concurrent-partial-update branch from cb55e9a to b8def92 Compare October 12, 2023 11:17

github-actions bot removed the approved Indicates a PR has been approved by one committer. label Oct 12, 2023

bobhan1 force-pushed the coverage-row-store-concurrent-partial-update branch from b8def92 to 9f82ffc Compare October 13, 2023 11:19

bobhan1 changed the title ~~[Coverage](merge-on-write) add concurrent partial updates case for row store table~~ [Fix](merge-on-write) Correct the alignment process when the table has sequence column and add cases Oct 13, 2023

bobhan1 requested a review from zhannngchen October 13, 2023 11:22

zhannngchen previously approved these changes Oct 13, 2023

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 13, 2023

bobhan1 force-pushed the coverage-row-store-concurrent-partial-update branch from 9f82ffc to bbf535d Compare October 16, 2023 01:54

bobhan1 dismissed zhannngchen’s stale review via f3a1729 October 16, 2023 03:09

bobhan1 force-pushed the coverage-row-store-concurrent-partial-update branch from bbf535d to f3a1729 Compare October 16, 2023 03:09

github-actions bot removed the approved Indicates a PR has been approved by one committer. label Oct 16, 2023

update

3246ed2

bobhan1 force-pushed the coverage-row-store-concurrent-partial-update branch from de1190a to 3246ed2 Compare October 17, 2023 09:12

zhannngchen approved these changes Oct 18, 2023

View reviewed changes

zhannngchen merged commit 64aeeb9 into apache:master Oct 18, 2023

xiaokang added the merge_conflict label Oct 22, 2023

bobhan1 mentioned this pull request Oct 23, 2023

[branch-2.0-pick] Picks "[Fix](partial-update) Correct the alignment process when the table has sequence column and add cases #25346" #25789

Merged

xiaokang pushed a commit that referenced this pull request Oct 23, 2023

[branch-2.0-pick] "[Fix](partial-update) Correct the alignment proces…

8093b4b

…s when the table has sequence column and add cases #25346" (#25789)

xiaokang added dev/2.0.3-merged and removed dev/2.0.3 labels Oct 23, 2023

bobhan1 mentioned this pull request Nov 1, 2024

[Fix](flexible partial update) Fix some problems in flexible partial update #41701

Merged

bobhan1 mentioned this pull request Feb 14, 2025

[Opt](partial update) Add some cases for partial update #47900

Merged

16 tasks

[Fix](partial-update) Correct the alignment process when the table has sequence column and add cases #25346

[Fix](partial-update) Correct the alignment process when the table has sequence column and add cases #25346

Uh oh!

Conversation

bobhan1 commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Further comments

Uh oh!

bobhan1 commented Oct 12, 2023

Uh oh!

bobhan1 commented Oct 12, 2023

Uh oh!

dataroaring left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 12, 2023

Uh oh!

github-actions bot commented Oct 12, 2023

Uh oh!

liaoxin01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bobhan1 commented Oct 12, 2023

Uh oh!

bobhan1 commented Oct 13, 2023

Uh oh!

zhannngchen left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 13, 2023

Uh oh!

github-actions bot commented Oct 13, 2023

Uh oh!

github-actions bot commented Oct 16, 2023

Uh oh!

bobhan1 commented Oct 16, 2023

Uh oh!

github-actions bot commented Oct 16, 2023

Uh oh!

github-actions bot commented Oct 17, 2023

Uh oh!

github-actions bot commented Oct 17, 2023

Uh oh!

doris-robot commented Oct 17, 2023

Uh oh!

doris-robot commented Oct 17, 2023

Uh oh!

bobhan1 commented Oct 17, 2023

Uh oh!

bobhan1 commented Oct 17, 2023

Uh oh!

bobhan1 commented Oct 17, 2023

Uh oh!

github-actions bot commented Oct 17, 2023

Uh oh!

doris-robot commented Oct 17, 2023

Uh oh!

doris-robot commented Oct 17, 2023

Uh oh!

zhannngchen left a comment

Choose a reason for hiding this comment

Uh oh!

zhannngchen commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bobhan1 commented Oct 12, 2023 •

edited

Loading