-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](partial-update) Correct the alignment process when the table has sequence column and add cases #25346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](partial-update) Correct the alignment process when the table has sequence column and add cases #25346
Conversation
|
run buildall |
4c04b73 to
cb55e9a
Compare
|
run buildall |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
liaoxin01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
regression-test/suites/unique_with_mow_p0/partial_update/test_partial_update_parallel.groovy
Show resolved
Hide resolved
cb55e9a to
b8def92
Compare
|
run buildall |
b8def92 to
9f82ffc
Compare
|
run buildall |
zhannngchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
clang-tidy review says "All clean, LGTM! 👍" |
9f82ffc to
bbf535d
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
bbf535d to
f3a1729
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
run p0 |
1 similar comment
|
run p0 |
de1190a to
3246ed2
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
zhannngchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
run p0 |
…s sequence column and add cases (apache#25346) This PR fix the alignment process during publish phase when conflict occurs during concurrent partial updates: if we encounter a row with the same key and larger value in sequence column, it means that there exists another load which introduces a row with the same keys and larger sequence column value published successfully after the commit phase of the current load. We should act as follows: - If the columns we update include sequence column, we should delete the current row becase the partial update on the current row has been overwritten by the previous one with larger sequence column value. - Otherwise, we should combine the values of the missing columns in the previous row and the values of the including columns in the current row into a new row.
…s sequence column and add cases (apache#25346) This PR fix the alignment process during publish phase when conflict occurs during concurrent partial updates: if we encounter a row with the same key and larger value in sequence column, it means that there exists another load which introduces a row with the same keys and larger sequence column value published successfully after the commit phase of the current load. We should act as follows: - If the columns we update include sequence column, we should delete the current row becase the partial update on the current row has been overwritten by the previous one with larger sequence column value. - Otherwise, we should combine the values of the missing columns in the previous row and the values of the including columns in the current row into a new row.
…update (#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert #25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
…update (apache#41701) 1. Fix incorrect result when there are insert after delete for same key in one load. In flexible partial update, if there is an insert after a delete on the same key in one batch, the final result will be always delete that row in table, which is incorrect. The reason is that when doing aggregation for rows with same keys in memtable, we treat delete sign column as an ordinary column. So if there is a row with delete sign in that batch, the final aggregated row will always have a delete sign and any insert on the same key after it will be lost. In the above situation, we should keep those rows and don't merge them when doing aggregation in memtable. When flush the memtable to segment, we should first apply the semantic of delete if exists and then insert the row after it. 2. fix flexible partial update don't use row store column to read data 3. Fix flexible partial update incorrect read index when resolving conflict in publish phase 4. add some default value cases and fix on update current timestamp 5. fix publish phase alignment problem when table has sequence column and the rowset to read has multi segments. 6. add some publish alignment case when table has sequence column 7. ~revert apache#25346, we use the sequence value in flush phase as its final value and will not do alignment in publish phase if the partial update load doesn't specified sequence column.~ 8. fix auto-increment column value is wrongly updated when it's not specified. 9. Fix the problem that the value of sequence column may decrease. 10. Only read seq col and seq map col when the previous row is deleted in partial update and flexible partial update when doing alignment. 11. Fix the value of auto-increment column is not consistent between replicas.
Proposed changes
This PR fix the alignment process during publish phase when conflict occurs during concurrent partial updates: if we encounter a row with the same key and larger value in sequence column, it means that there exists another load which introduces a row with the same keys and larger sequence column value published successfully after the commit phase of the current load. We should act as follows:
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...