Skip to content

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Aug 22, 2024

This PR add the ability to update different columns for each row in one stream load
Doc: apache/doris-website#1140

Example

MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL, 
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true"); 
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+

test1.json:

{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from bd91d77 to 40a5580 Compare August 22, 2024 04:06
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@bobhan1 bobhan1 changed the title [Draft](partial update) Support flexible partial update in stream load with json files [Feature](partial update) Support flexible partial update in stream load with json files Aug 22, 2024
@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from 21b47f7 to edb8bf3 Compare August 22, 2024 07:50
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from edb8bf3 to a82fc76 Compare August 22, 2024 10:59
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch 3 times, most recently from 001ea2a to 830813c Compare August 23, 2024 02:54
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from 830813c to 3e7e9f5 Compare August 23, 2024 03:02
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch 12 times, most recently from 92da59e to ccec48f Compare August 23, 2024 06:59
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from ccec48f to 139660e Compare August 23, 2024 07:17
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Oct 10, 2024
…oad with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
cjj2010 pushed a commit to cjj2010/doris that referenced this pull request Oct 12, 2024
…oad with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
## Example
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL, 
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true"); 
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
amorynan pushed a commit to amorynan/doris that referenced this pull request Oct 12, 2024
…oad with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
## Example
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL, 
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true"); 
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
…eam load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
…eam load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile
@bobhan1 bobhan1 mentioned this pull request Oct 14, 2024
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
…eam load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
…eam load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
…eam load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 16, 2024
…eam load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 21, 2024
dataroaring pushed a commit to apache/doris-website that referenced this pull request Oct 29, 2024
Doc for apache/doris#39756

# Versions 

- [x] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0

# Languages

- [x] Chinese
- [x] English
@Carl-Zhou-CN
Copy link
Member

@dataroaring @zhannngchen hi, is there any plan to release this PR? What risks are involved

@bobhan1
Copy link
Contributor Author

bobhan1 commented Jun 25, 2025

@Carl-Zhou-CN the feature will probably be in doris 4.0

@Carl-Zhou-CN
Copy link
Member

@Carl-Zhou-CN the feature will probably be in doris 4.0

Thank you very much for your response. I currently need this functionality urgently. If I merge and use it myself, what issues should I be aware of?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Jun 25, 2025

@Carl-Zhou-CN the feature will probably be in doris 4.0

Thank you very much for your response. I currently need this functionality urgently. If I merge and use it myself, what issues should I be aware of?

This PR has problems and some them are fixed in #41701.
This feature may require high iops and io throughput.

@Carl-Zhou-CN
Copy link
Member

@Carl-Zhou-CN the feature will probably be in doris 4.0

Thank you very much for your response. I currently need this functionality urgently. If I merge and use it myself, what issues should I be aware of?

This PR has problems and some them are fixed in #41701. This feature may require high iops and io throughput.

Ok. Thank you.

bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 4, 2025
…oad with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 7, 2025
…oad with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 7, 2025
…oad with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 8, 2025
…oad with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 8, 2025
…oad with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 8, 2025
…oad with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```
morrySnow pushed a commit that referenced this pull request Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants