Skip to content

Conversation

@wsjz
Copy link
Contributor

@wsjz wsjz commented Aug 25, 2023

Proposed changes

support nereids load grammar.

we will convert the broker load stmt to insert into clause:

  1. rename broker load to bulk load.
  2. add load grammar to nereids optimizer.
  3. convert to insert into clause with table value function.

#24221

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions
Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


}

private String rewriteByPrecedingFilter(Expression whereExpr, Expression precedingFilterExpr) {
return whereExpr.toSql();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that toSql() to recover the origin sql string exactly.
How about just save the origin sql string when parsing the Load Stmt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no effects

@wsjz
Copy link
Contributor Author

wsjz commented Sep 4, 2023

run buildall

@wsjz wsjz marked this pull request as ready for review September 5, 2023 06:14
@wsjz wsjz changed the title [feature](Load)support nereids load [feature](Load)(step1)support nereids load, add load grammar Sep 5, 2023
@wsjz
Copy link
Contributor Author

wsjz commented Sep 5, 2023

run buildall

@wsjz
Copy link
Contributor Author

wsjz commented Sep 7, 2023

run buildall

@wsjz
Copy link
Contributor Author

wsjz commented Sep 7, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.87 seconds
stream load tsv: 527 seconds loaded 74807831229 Bytes, about 135 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17161920596 Bytes

@wsjz
Copy link
Contributor Author

wsjz commented Sep 8, 2023

run buildall

@wsjz
Copy link
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.19 seconds
stream load tsv: 619 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162623572 Bytes

@wsjz
Copy link
Contributor Author

wsjz commented Sep 18, 2023

run p0

@wsjz
Copy link
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.61 seconds
stream load tsv: 613 seconds loaded 74807831229 Bytes, about 116 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.5 seconds inserted 10000000 Rows, about 350K ops/s
storage size: 17162186050 Bytes

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.45 seconds
stream load tsv: 620 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162257137 Bytes

@wsjz
Copy link
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 49.27 seconds
stream load tsv: 616 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162495225 Bytes

@wsjz
Copy link
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.09 seconds
stream load tsv: 596 seconds loaded 74807831229 Bytes, about 119 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162110312 Bytes

@wsjz
Copy link
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.78 seconds
stream load tsv: 595 seconds loaded 74807831229 Bytes, about 119 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162426406 Bytes

@wsjz
Copy link
Contributor Author

wsjz commented Sep 19, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 48.44 seconds
stream load tsv: 600 seconds loaded 74807831229 Bytes, about 118 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17162456442 Bytes

@wsjz
Copy link
Contributor Author

wsjz commented Sep 20, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.57 seconds
stream load tsv: 616 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17162201019 Bytes

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 20, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 677b342 into apache:master Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants