-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](flexible partial update) Fix some problems in flexible partial update #41701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](flexible partial update) Fix some problems in flexible partial update #41701
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
TeamCity be ut coverage result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
24069a4 to
378ef2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
d21f1b1 to
d2e1b4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
6428fa3 to
f177418
Compare
|
run buildall |
|
TeamCity be ut coverage result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
run buildall |
TPC-H: Total hot run time: 40999 ms |
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 192400 ms |
ClickBench: Total hot run time: 32.6 s |
79fe26c to
02cb8af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
run buildall |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 40756 ms |
TPC-DS: Total hot run time: 192006 ms |
ClickBench: Total hot run time: 33.62 s |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 33716 ms |
TPC-DS: Total hot run time: 186267 ms |
ClickBench: Total hot run time: 30.31 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run cloud_p0 |
zhannngchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…_update` (#53765) ### What problem does this PR solve? introduced in #41701 When doing alignment for partial update in publish phase, `read_index_old` may be not continuous if some rows in current block has delete signs. Thus `old_block_delete_signs[read_index_old[idx]]` may overflow when all rows in current block have delete signs because `old_block_delete_signs` is empty.
…_update` (apache#53765) ### What problem does this PR solve? introduced in apache#41701 When doing alignment for partial update in publish phase, `read_index_old` may be not continuous if some rows in current block has delete signs. Thus `old_block_delete_signs[read_index_old[idx]]` may overflow when all rows in current block have delete signs because `old_block_delete_signs` is empty.
…_update` (apache#53765) ### What problem does this PR solve? introduced in apache#41701 When doing alignment for partial update in publish phase, `read_index_old` may be not continuous if some rows in current block has delete signs. Thus `old_block_delete_signs[read_index_old[idx]]` may overflow when all rows in current block have delete signs because `old_block_delete_signs` is empty.
In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it.
revert [Fix](partial-update) Correct the alignment process when the table has sequence column and add cases #25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.