Skip to content

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Jun 13, 2025

Versions

  • dev
  • 3.0
  • 2.1
  • 2.0

Languages

  • Chinese
  • English

Docs Checklist

  • Checked by AI
  • Test Cases Built

@bobhan1 bobhan1 force-pushed the add-partial-udpate-new-key-behavior-doc branch from befcff5 to 45de723 Compare June 13, 2025 05:59
dataroaring pushed a commit to apache/doris that referenced this pull request Jun 25, 2025
…f newly inserted rows in partial update (#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
@bobhan1 bobhan1 force-pushed the add-partial-udpate-new-key-behavior-doc branch from a4959a3 to 5981bca Compare June 25, 2025 07:11
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 25, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 25, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 25, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 25, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 9, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 9, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 9, 2025
…f newly inserted rows in partial update (apache#41232)

Currently, Doris use strict mode to decide if newly inserted rows should
be appended or report an error in partial update, which is hard to use.
This PR add a new session variable and load property
`partial_update_new_key_behavior` to control the behavior of newly
inserted rows in partial update.
`partial_update_new_key_behavior` has ~three~ two options:
- `APPEND`: append the newly inserted rows
- ~`IGNORE`: delete the newly inserted rows silently(will not be taken
into filtered rows)~
- `ERROR`: report error if meet newly inserted rows, and the error msg
will contains one row's keys which is not in table.

---
The reason for not supporting `IGNORE` mode: To support `IGNORE` mode,
we need to add delete sign for newly inserted rows in partial update to
delete them rather than use delete bitmap mark to delete them because
compaction will not use delete bitmap when reading data. Also, we need
to record the rows whose delete sign is added by us in this situation
for resolving conflicts in publish phase to avoid wrongly delete the
rows if there are another concurrent load insert some of these rows
successfully. This increases code complexity and is error-prone.

Doc: apache/doris-website#2472
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring force-pushed the add-partial-udpate-new-key-behavior-doc branch from 5981bca to 88fa98b Compare August 8, 2025 03:14
@dataroaring
Copy link
Contributor

run buildall

@dataroaring dataroaring merged commit 7a28d9d into apache:master Aug 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants