Skip to content

Conversation

@kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Nov 24, 2023

Proposed changes

Opt gzip decompression by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib.

Test result:

  • env: 1 node(16 cores, 64G).
  • parquet column: 100 million rows of char(255) column.
  • result: 9.09 s -> 6.04 s.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

defined(__i386) || defined(_M_IX86)
class GzipBlockCompressionByLibdeflate final : public GzipBlockCompression {
public:
GzipBlockCompressionByLibdeflate() : GzipBlockCompression() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use '= default' to define a trivial default constructor [modernize-use-equals-default]

Suggested change
GzipBlockCompressionByLibdeflate() : GzipBlockCompression() {}
GzipBlockCompressionByLibdeflate() : GzipBlockCompression() = default;

}
~GzipBlockCompressionByLibdeflate() override = default;

Status decompress(const Slice& input, Slice* output) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'decompress' can be made static [readability-convert-member-functions-to-static]

Suggested change
Status decompress(const Slice& input, Slice* output) override {
static Status decompress(const Slice& input, Slice* output) override {

@github-actions
Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In thirdparty/build-thirdparty.sh line 139:
. "${TP_DIR}/vars.sh"
  ^-----------------^ SC1094 (warning): Parsing of sourced file failed. Ignoring it.


In thirdparty/build-thirdparty.sh line 143:
if [[ "${CLEAN}" -eq 1 ]] && [[ -d "${TP_SOURCE_DIR}" ]]; then
                                    ^--------------^ SC2154 (warning): TP_SOURCE_DIR is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 283:
        find "${TP_INSTALL_DIR}/lib64" -name "*.dylib" -delete
              ^---------------^ SC2154 (warning): TP_INSTALL_DIR is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 307:
        if [[ ! -f "${TP_LIB_DIR}/$1" ]]; then
                    ^-----------^ SC2154 (warning): TP_LIB_DIR is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 317:
    check_if_source_exist "${LIBBACKTRACE_SOURCE}"
                           ^--------------------^ SC2154 (warning): LIBBACKTRACE_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 320:
    CPPFLAGS="-I${TP_INCLUDE_DIR}" \
                ^---------------^ SC2154 (warning): TP_INCLUDE_DIR is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 331:
    check_if_source_exist "${LIBEVENT_SOURCE}"
                           ^----------------^ SC2154 (warning): LIBEVENT_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 359:
    check_if_source_exist "${OPENSSL_SOURCE}"
                           ^---------------^ SC2154 (warning): OPENSSL_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 384:
    check_if_source_exist "${THRIFT_SOURCE}"
                           ^--------------^ SC2154 (warning): THRIFT_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 417:
    check_if_source_exist "${PROTOBUF_SOURCE}"
                           ^----------------^ SC2154 (warning): PROTOBUF_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 449:
    check_if_source_exist "${GFLAGS_SOURCE}"
                           ^--------------^ SC2154 (warning): GFLAGS_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 467:
    check_if_source_exist "${GLOG_SOURCE}"
                           ^------------^ SC2154 (warning): GLOG_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 497:
    check_if_source_exist "${GTEST_SOURCE}"
                           ^-------------^ SC2154 (warning): GTEST_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 515:
    check_if_source_exist "${RAPIDJSON_SOURCE}"
                           ^-----------------^ SC2154 (warning): RAPIDJSON_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 532:
    check_if_source_exist "${SNAPPY_SOURCE}"
                           ^--------------^ SC2154 (warning): SNAPPY_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 557:
    check_if_source_exist "${GPERFTOOLS_SOURCE}"
                           ^------------------^ SC2154 (warning): GPERFTOOLS_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 576:
    check_if_source_exist "${ZLIB_SOURCE}"
                           ^------------^ SC2154 (warning): ZLIB_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 597:
    check_if_source_exist "${LZ4_SOURCE}"
                           ^-----------^ SC2154 (warning): LZ4_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 611:
    check_if_source_exist "${ZSTD_SOURCE}"
                           ^------------^ SC2154 (warning): ZSTD_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 626:
    check_if_source_exist "${BZIP_SOURCE}"
                           ^------------^ SC2154 (warning): BZIP_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 634:
    check_if_source_exist "${LZO2_SOURCE}"
                           ^------------^ SC2154 (warning): LZO2_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 648:
    check_if_source_exist "${CURL_SOURCE}"
                           ^------------^ SC2154 (warning): CURL_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 671:
    check_if_source_exist "${RE2_SOURCE}"
                           ^-----------^ SC2154 (warning): RE2_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 682:
    check_if_source_exist "${RAGEL_SOURCE}"
                           ^-------------^ SC2154 (warning): RAGEL_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 695:
    check_if_source_exist "${HYPERSCAN_SOURCE}"
                           ^-----------------^ SC2154 (warning): HYPERSCAN_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 713:
    check_if_source_exist "${BOOST_SOURCE}"
                           ^-------------^ SC2154 (warning): BOOST_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 732:
    check_if_source_exist "${MYSQL_SOURCE}"
                           ^-------------^ SC2154 (warning): MYSQL_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 778:
    check_if_source_exist "${LEVELDB_SOURCE}"
                           ^---------------^ SC2154 (warning): LEVELDB_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 794:
    check_if_source_exist "${BRPC_SOURCE}"
                           ^------------^ SC2154 (warning): BRPC_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 834:
    check_if_source_exist "${ROCKSDB_SOURCE}"
                           ^---------------^ SC2154 (warning): ROCKSDB_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 861:
    check_if_source_exist "${CYRUS_SASL_SOURCE}"
                           ^------------------^ SC2154 (warning): CYRUS_SASL_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 881:
    check_if_source_exist "${LIBRDKAFKA_SOURCE}"
                           ^------------------^ SC2154 (warning): LIBRDKAFKA_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 904:
    check_if_source_exist "${ODBC_SOURCE}"
                           ^------------^ SC2154 (warning): ODBC_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 918:
    check_if_source_exist "${FLATBUFFERS_SOURCE}"
                           ^-------------------^ SC2154 (warning): FLATBUFFERS_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 947:
    check_if_source_exist "${CARES_SOURCE}"
                           ^-------------^ SC2154 (warning): CARES_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 963:
    check_if_source_exist "${GRPC_SOURCE}"
                           ^------------^ SC2154 (warning): GRPC_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 997:
    check_if_source_exist "${ARROW_SOURCE}"
                           ^-------------^ SC2154 (warning): ARROW_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1003:
    export ARROW_BROTLI_URL="${TP_SOURCE_DIR}/${BROTLI_NAME}"
                                              ^------------^ SC2154 (warning): BROTLI_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1004:
    export ARROW_GLOG_URL="${TP_SOURCE_DIR}/${GLOG_NAME}"
                                            ^----------^ SC2154 (warning): GLOG_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1005:
    export ARROW_LZ4_URL="${TP_SOURCE_DIR}/${LZ4_NAME}"
                                           ^---------^ SC2154 (warning): LZ4_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1006:
    export ARROW_FLATBUFFERS_URL="${TP_SOURCE_DIR}/${FLATBUFFERS_NAME}"
                                                   ^-----------------^ SC2154 (warning): FLATBUFFERS_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1007:
    export ARROW_ZSTD_URL="${TP_SOURCE_DIR}/${ZSTD_NAME}"
                                            ^----------^ SC2154 (warning): ZSTD_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1008:
    export ARROW_JEMALLOC_URL="${TP_SOURCE_DIR}/${JEMALLOC_ARROW_NAME}"
                                                ^--------------------^ SC2154 (warning): JEMALLOC_ARROW_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1009:
    export ARROW_Thrift_URL="${TP_SOURCE_DIR}/${THRIFT_NAME}"
                                              ^------------^ SC2154 (warning): THRIFT_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1010:
    export ARROW_SNAPPY_URL="${TP_SOURCE_DIR}/${SNAPPY_NAME}"
                                              ^------------^ SC2154 (warning): SNAPPY_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1011:
    export ARROW_ZLIB_URL="${TP_SOURCE_DIR}/${ZLIB_NAME}"
                                            ^----------^ SC2154 (warning): ZLIB_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1012:
    export ARROW_XSIMD_URL="${TP_SOURCE_DIR}/${XSIMD_NAME}"
                                             ^-----------^ SC2154 (warning): XSIMD_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1013:
    export ARROW_ORC_URL="${TP_SOURCE_DIR}/${ORC_NAME}"
                                           ^---------^ SC2154 (warning): ORC_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1014:
    export ARROW_GRPC_URL="${TP_SOURCE_DIR}/${GRPC_NAME}"
                                            ^----------^ SC2154 (warning): GRPC_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1015:
    export ARROW_PROTOBUF_URL="${TP_SOURCE_DIR}/${PROTOBUF_NAME}"
                                                ^--------------^ SC2154 (warning): PROTOBUF_NAME is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1080:
    check_if_source_exist "${ABSEIL_SOURCE}"
                           ^--------------^ SC2154 (warning): ABSEIL_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1098:
    check_if_source_exist "${S2_SOURCE}"
                           ^----------^ SC2154 (warning): S2_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1121:
    check_if_source_exist "${BITSHUFFLE_SOURCE}"
                           ^------------------^ SC2154 (warning): BITSHUFFLE_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1199:
    check_if_source_exist "${CROARINGBITMAP_SOURCE}"
                           ^----------------------^ SC2154 (warning): CROARINGBITMAP_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1224:
    check_if_source_exist "${FMT_SOURCE}"
                           ^-----------^ SC2154 (warning): FMT_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1239:
    check_if_source_exist "${PARALLEL_HASHMAP_SOURCE}"
                           ^------------------------^ SC2154 (warning): PARALLEL_HASHMAP_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1246:
    check_if_archive_exist "${PDQSORT_FILE}"
                            ^-------------^ SC2154 (warning): PDQSORT_FILE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1253:
    check_if_source_exist "${LIBDIVIDE_SOURCE}"
                           ^-----------------^ SC2154 (warning): LIBDIVIDE_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1260:
    check_if_source_exist "${ORC_SOURCE}"
                           ^-----------^ SC2154 (warning): ORC_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1289:
    check_if_source_exist "${CCTZ_SOURCE}"
                           ^------------^ SC2154 (warning): CCTZ_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1303:
    check_if_source_exist "${DATATABLES_SOURCE}"
                           ^------------------^ SC2154 (warning): DATATABLES_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1318:
    if [[ ! -f "${TSAN_HEADER_FILE}" ]]; then
                ^-----------------^ SC2154 (warning): TSAN_HEADER_FILE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1329:
    check_if_source_exist "${AWS_SDK_SOURCE}"
                           ^---------------^ SC2154 (warning): AWS_SDK_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1370:
    check_if_source_exist "${LZMA_SOURCE}"
                           ^------------^ SC2154 (warning): LZMA_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1394:
    check_if_source_exist "${XML2_SOURCE}"
                           ^------------^ SC2154 (warning): XML2_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1416:
    check_if_source_exist "${IDN_SOURCE}"
                           ^-----------^ SC2154 (warning): IDN_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1430:
    check_if_source_exist "${GSASL_SOURCE}"
                           ^-------------^ SC2154 (warning): GSASL_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1446:
    check_if_source_exist "${KRB5_SOURCE}"
                           ^------------^ SC2154 (warning): KRB5_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1466:
    check_if_source_exist "${HDFS3_SOURCE}"
                           ^-------------^ SC2154 (warning): HDFS3_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1497:
    check_if_source_exist "${JEMALLOC_DORIS_SOURCE}"
                           ^----------------------^ SC2154 (warning): JEMALLOC_DORIS_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1517:
        check_if_source_exist "${LIBUNWIND_SOURCE}"
                               ^-----------------^ SC2154 (warning): LIBUNWIND_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1539:
    check_if_source_exist "${BENCHMARK_SOURCE}"
                           ^-----------------^ SC2154 (warning): BENCHMARK_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1564:
    check_if_source_exist "${SIMDJSON_SOURCE}"
                           ^----------------^ SC2154 (warning): SIMDJSON_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1582:
    check_if_source_exist "${NLOHMANN_JSON_SOURCE}"
                           ^---------------------^ SC2154 (warning): NLOHMANN_JSON_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1596:
    check_if_source_exist "${SSE2NEON_SOURCE}"
                           ^----------------^ SC2154 (warning): SSE2NEON_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1603:
    check_if_source_exist "${XXHASH_SOURCE}"
                           ^--------------^ SC2154 (warning): XXHASH_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1612:
    check_if_source_exist "${BINUTILS_SOURCE}"
                           ^----------------^ SC2154 (warning): BINUTILS_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1626:
    check_if_source_exist "${GETTEXT_SOURCE}"
                           ^---------------^ SC2154 (warning): GETTEXT_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1643:
    check_if_source_exist "${CONCURRENTQUEUE_SOURCE}"
                           ^-----------------------^ SC2154 (warning): CONCURRENTQUEUE_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1650:
    check_if_source_exist "${FAST_FLOAT_SOURCE}"
                           ^------------------^ SC2154 (warning): FAST_FLOAT_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1657:
    check_if_source_exist "${HADOOP_LIBS_SOURCE}"
                           ^-------------------^ SC2154 (warning): HADOOP_LIBS_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1673:
    check_if_source_exist "${DRAGONBOX_SOURCE}"
                           ^-----------------^ SC2154 (warning): DRAGONBOX_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1688:
    check_if_source_exist "${AVX2NEON_SOURCE}"
                           ^----------------^ SC2154 (warning): AVX2NEON_SOURCE is referenced but not assigned.


In thirdparty/build-thirdparty.sh line 1696:
    check_if_source_exist "${LIBDEFLATE_SOURCE}"
                           ^------------------^ SC2154 (warning): LIBDEFLATE_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 49:
. "${TP_DIR}/vars.sh"
  ^-----------------^ SC1094 (warning): Parsing of sourced file failed. Ignoring it.


In thirdparty/download-thirdparty.sh line 133:
for TP_ARCH in "${TP_ARCHIVES[@]}"; do
                ^---------------^ SC2154 (warning): TP_ARCHIVES is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 138:
        if ! download_func "${!NAME}" "${!URL}" "${TP_SOURCE_DIR}" "${!MD5SUM}"; then
                                                 ^--------------^ SC2154 (warning): TP_SOURCE_DIR is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 225:
cd "${TP_SOURCE_DIR}/${ABSEIL_SOURCE}"
                     ^--------------^ SC2154 (warning): ABSEIL_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 227:
    patch -p1 <"${TP_PATCH_DIR}/absl.patch"
                ^-------------^ SC2154 (warning): TP_PATCH_DIR is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 234:
if [[ "${GLOG_SOURCE}" == "glog-0.4.0" ]]; then
       ^------------^ SC2154 (warning): GLOG_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 252:
cd "${TP_SOURCE_DIR}/${GTEST_SOURCE}"
                     ^-------------^ SC2154 (warning): GTEST_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 261:
cd "${TP_SOURCE_DIR}/${MYSQL_SOURCE}"
                     ^-------------^ SC2154 (warning): MYSQL_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 270:
cd "${TP_SOURCE_DIR}/${LIBEVENT_SOURCE}"
                     ^----------------^ SC2154 (warning): LIBEVENT_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 280:
cd "${TP_SOURCE_DIR}/${GSASL_SOURCE}"
                     ^-------------^ SC2154 (warning): GSASL_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 290:
cd "${TP_SOURCE_DIR}/${CYRUS_SASL_SOURCE}"
                     ^------------------^ SC2154 (warning): CYRUS_SASL_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 299:
cd "${TP_SOURCE_DIR}/${ODBC_SOURCE}"
                     ^------------^ SC2154 (warning): ODBC_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 308:
if [[ "${ROCKSDB_SOURCE}" == "rocksdb-5.14.2" ]]; then
       ^---------------^ SC2154 (warning): ROCKSDB_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 319:
if [[ "${ARROW_SOURCE}" == "arrow-apache-arrow-13.0.0" ]]; then
       ^-------------^ SC2154 (warning): ARROW_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 330:
if [[ "${LIBRDKAFKA_SOURCE}" == "librdkafka-1.8.2" ]]; then
       ^------------------^ SC2154 (warning): LIBRDKAFKA_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 341:
if [[ "${JEMALLOC_DORIS_SOURCE}" = "jemalloc-5.3.0" ]]; then
       ^----------------------^ SC2154 (warning): JEMALLOC_DORIS_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 353:
if [[ "${HYPERSCAN_SOURCE}" == "vectorscan-vectorscan-5.4.7" ]]; then
       ^-----------------^ SC2154 (warning): HYPERSCAN_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 363:
cd "${TP_SOURCE_DIR}/${AWS_SDK_SOURCE}"
                     ^---------------^ SC2154 (warning): AWS_SDK_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 381:
if [[ "${SIMDJSON_SOURCE}" = "simdjson-3.0.1" ]]; then
       ^----------------^ SC2154 (warning): SIMDJSON_SOURCE is referenced but not assigned.


In thirdparty/download-thirdparty.sh line 391:
if [[ "${BRPC_SOURCE}" == 'brpc-1.4.0' ]]; then
       ^------------^ SC2154 (warning): BRPC_SOURCE is referenced but not assigned.


In thirdparty/vars.sh line 483:
LIBDEFLATE_NAME="libdeflate-1.19.tar.gz
^-- SC1009 (info): The mentioned syntax error was in this simple command.
                ^-- SC1078 (warning): Did you forget to close this double quoted string?


In thirdparty/vars.sh line 484:
LIBDEFLATE_SOURCE="libdeflate-1.19"
                  ^-- SC1079 (info): This is actually an end quote, but due to next char it looks suspect.


In thirdparty/vars.sh line 560:
if [[ "$(uname -s)" == 'Darwin' ]]; then
                  ^-- SC1078 (warning): Did you forget to close this double quoted string?


In thirdparty/vars.sh line 573:
    read -r -a TP_ARCHIVES <<<"${TP_ARCHIVES[*]} BINUTILS GETTEXT"
                              ^-- SC1079 (info): This is actually an end quote, but due to next char it looks suspect.
                                                                 ^-- SC1073 (error): Couldn't parse this double quoted string. Fix to allow more checks.


In thirdparty/vars.sh line 576:

^-- SC1072 (error): Expected end of double quoted string. Fix any mentioned problems and try again.

For more information:
  https://www.shellcheck.net/wiki/SC1078 -- Did you forget to close this doub...
  https://www.shellcheck.net/wiki/SC1094 -- Parsing of sourced file failed. I...
  https://www.shellcheck.net/wiki/SC2154 -- ABSEIL_SOURCE is referenced but n...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.



shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
thirdparty/vars.sh:573:66: reached EOF without closing quote "
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@kaka11chen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@morningman
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit b9bdb279a64b5fb8a610bb3e4ec9428f0a1cde49, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4957	4635	4684	4635
q2	357	176	158	158
q3	2040	1924	1894	1894
q4	1382	1276	1238	1238
q5	3958	3926	4026	3926
q6	247	135	138	135
q7	1406	870	884	870
q8	2784	2793	2797	2793
q9	9776	9574	9422	9422
q10	3456	3545	3531	3531
q11	370	249	249	249
q12	445	298	297	297
q13	4586	3810	3813	3810
q14	319	292	285	285
q15	580	534	535	534
q16	667	582	580	580
q17	1148	979	949	949
q18	7766	7297	7254	7254
q19	1689	1678	1670	1670
q20	526	294	284	284
q21	4370	3962	3960	3960
q22	465	372	368	368
Total cold run time: 53294 ms
Total hot run time: 48842 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4582	4588	4621	4588
q2	340	228	279	228
q3	4035	3994	3984	3984
q4	2699	2682	2686	2682
q5	9499	9530	9531	9530
q6	245	121	127	121
q7	3006	2466	2484	2466
q8	4450	4477	4475	4475
q9	12922	12834	12888	12834
q10	4101	4176	4168	4168
q11	776	698	670	670
q12	998	821	823	821
q13	4288	3618	3539	3539
q14	394	345	345	345
q15	584	529	533	529
q16	734	661	669	661
q17	3849	3847	3871	3847
q18	9471	8877	8882	8877
q19	1819	1779	1788	1779
q20	2388	2058	2049	2049
q21	8812	8664	8669	8664
q22	922	795	810	795
Total cold run time: 80914 ms
Total hot run time: 77652 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.96 seconds
stream load tsv: 576 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 31 seconds loaded 2358488459 Bytes, about 72 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17165209795 Bytes

@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen marked this pull request as ready for review November 27, 2023 09:52
@kaka11chen kaka11chen changed the title [Opt](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms. [Opt](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib. Nov 27, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 27, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 99915c1c9855d5fd6b390d20b47fa89e59ab4e4b, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4880	4659	4648	4648
q2	359	163	159	159
q3	2052	1920	1918	1918
q4	1403	1260	1250	1250
q5	3939	3934	4022	3934
q6	257	128	130	128
q7	1448	895	879	879
q8	2828	2808	2776	2776
q9	9707	9649	9657	9649
q10	3460	3536	3496	3496
q11	390	239	240	239
q12	431	277	298	277
q13	4522	3828	3808	3808
q14	318	284	291	284
q15	577	520	518	518
q16	658	582	576	576
q17	1135	978	935	935
q18	7876	7562	7420	7420
q19	1672	1676	1689	1676
q20	568	283	298	283
q21	4424	3999	3980	3980
q22	468	377	386	377
Total cold run time: 53372 ms
Total hot run time: 49210 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4590	4586	4586	4586
q2	329	237	243	237
q3	4041	4010	4042	4010
q4	2708	2719	2721	2719
q5	9622	9565	9683	9565
q6	256	124	128	124
q7	3028	2466	2505	2466
q8	4424	4458	4453	4453
q9	13266	13141	13121	13121
q10	4057	4146	4162	4146
q11	780	668	654	654
q12	971	831	823	823
q13	4288	3566	3582	3566
q14	399	354	358	354
q15	568	516	517	516
q16	734	673	687	673
q17	3897	3937	3906	3906
q18	9584	9080	9063	9063
q19	1827	1784	1767	1767
q20	2418	2070	2081	2070
q21	8858	8638	8757	8638
q22	857	792	785	785
Total cold run time: 81502 ms
Total hot run time: 78242 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.18 seconds
stream load tsv: 562 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 26 seconds loaded 2358488459 Bytes, about 86 MB/s
stream load orc: 69 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17098463708 Bytes

@morningman morningman merged commit 6512645 into apache:master Nov 28, 2023
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Nov 28, 2023
…64 platforms: 1. Add libdeflate lib. (apache#27542)

Test result:

- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 9.09 s -> 6.04 s.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Nov 28, 2023
…64 platforms: 1. Add libdeflate lib. (apache#27542)

Test result:

- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 9.09 s -> 6.04 s.
morningman pushed a commit that referenced this pull request Nov 28, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (#27669)

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in #27542.
morningman pushed a commit that referenced this pull request Nov 30, 2023
…64 platforms: 1. Add libdeflate lib. (#27542) (#27711)

Backport from #27542.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Nov 30, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669)

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in apache#27542.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Nov 30, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669)

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in apache#27542.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Nov 30, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669)

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in apache#27542.
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon added a commit that referenced this pull request Dec 4, 2023
* [chore](case) Use correct insert stmt for cold heat separation case #27546 (#27585)

Co-authored-by: AlexYue <yj976240184@gmail.com>

* [enhance](S3) Print the error detail for every s3 operation (#27572) (#27615)

* [nereids] fix stats error when using dateTime type filter #27571 (#27577)

* [fix](planner)sort node should materialized required slots for itself #27605 (#27620)

* [fix](Nereids) non-deterministic expression should not be constant (#27606) (#27631)

* [enhancement](stats) Add process for aggstate type #27640 (#27642)

* [Fix](statistics)Fix bug and improve auto analyze. (#27626) (#27657)

1. Implement needReAnalyzeTable for ExternalTable. For now, external table will not be reanalyzed in 10 days.
2. For HiveMetastoreCache.loadPartitions, handle the empty iterator case to avoid Index out of boundary exception.
3. Wrap handle show analyze loop with try catch, so that when one table failed (for example, catalog dropped so the table couldn't be found anymore), we can still show the other tables.
4. For now, only OlapTable and Hive HMSExternalTable support sample analyze, throw exception for other types of table.
5. In StatisticsCollector, call constructJob after createTableLevelTaskForExternalTable to avoid NPE.

* [profile](bugfix) should not cache profile content because the profile may not be a full profile (#27635)

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>

* [Enhance](fe) Support setting initial root password when FE firstly launch (#27438) (#27603)

* [opt](plan) only lock olap table when query plan #27639 (#27656)

bp #27639

* select coordinator node from user's tag when exec streaming load (#27106) (#27677)

* [fix](statistics)Need to recalculate health value when table row count become 0  #27673 (#27674)

backport #27673

* [fix](statistics)Fix sample min max npe bug  #27702 (#27707)

backport #27702

* [Bug](join) try fix wrong _has_null_in_build_side setted (#27684) (#27710)

* [Fix](show-load)Show load npe(userinfo is null) (#27698) (#27719)

* [pick](nereids)temporary partition is always pruned #27636 (#27722)

* [enhancement](stats) limit bq cap size for analyze task #27685 (#27687)

* [improvement](statistics) Add config for the threshold of column count for auto analyze #27713 (#27723)

* [doc](fix) k8s operator docs fix to 2.0 (#27476)

* [Improvement](planner)support select tablets with nereids optimize #23164 #23365 (#27740)

#23164
#23365

* [FIX](complextype)fix complex type hash equals (#27743)

* [fix](statistics) Fix show auto analyze missing jobs bug (#27761)

* [bugfix](topn) fix coredump in copy_column_data_to_block when nullable mismatch

return RuntimeError if copy_column_data_to_block nullable mismatch to avoid coredump in input_col_ptr->filter_by_selector(sel_rowid_idx, select_size, raw_res_ptr) .

The problem is reported by a doris user but I can not reproduce it, so there is no testcase added currently.

* [opt](stats) Use escape rather than base64 for min/max value #27746 (#27748)

* [refactor](http) disable snapshot and get_log_file api (#27724) (#27770)

* [branch-2.0](pick 27738) Warning log to trace send fragment #27738 (#27760)

* [branch-2.0](pick #27771) Add more detail msg for waitRPC exception (#27773)

* [Bug](pipeline) prevent PipelineFragmentContext destruct early (#27790)

* [deps](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib.  (#27542) (#27711)

Backport from #27542.

* [FIX](case)fix case truncate table first #27792

* [doc](stats) add auto_analyze_table_width_threshold description. (#27818) (#27832)

* [fix](bdbje) Fix bdbje logging level not work (#27597) (#27788)

* `EnvironmentConfig.FILE_LOGGING_LEVEL` only set FileHandlerLevel, we should
   set logger level firstly, otherwise it will not take effect.

* [Opt](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate lib. (#27669) (#27801)

Backport from #27669.

* [branch-2.0](fix) Fix broken exception message #27836

* [Bug](func) coredump in equal for null in function (#27843)

* [minor](stats) Update olap table row count after analyze (#27858)

pick from master #27814

* [fix](stats)min and max return NaN when table is empty (#27863)

fix analyze empty table and min/max null value bug:
1. Skip empty analyze task for sample analyze task. (Full analyze task already skipped).
2. Check sample rows is not 0 before calculate the scale factor.
3. Remove ' in sql template after remove base64 encoding for min/max value.

backport #27862

* [minor](stats) Throw error when sync analyze failed (#27846)

pick from master #27845

* [fix](stats) Don't save colToPartitions anymore to save mem (#27880)

pick from master #27879

* [fix](nereids) set operation's result type is wrong if decimal overflows (#27872)

pick from master #27870

* [Config] Modify the default value of tablet_schema_cache_recycle_interval (#27877)

* [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode(#27842) (#27851)

* [fix](fe) Fix show frontends npt in some situations (#27295) (#27789)

```
java.lang.NullPointerException: null
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getMasterSocket(ReplicationGroupAdmin.java:191)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.doMessageExchange(ReplicationGroupAdmin.java:607)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getGroup(ReplicationGroupAdmin.java:406)
    at org.apache.doris.ha.BDBHA.getElectableNodes(BDBHA.java:132)
    at org.apache.doris.common.proc.FrontendsProcNode.getFrontendsInfo(FrontendsProcNode.java:84)
    at org.apache.doris.qe.ShowExecutor.handleShowFrontends(ShowExecutor.java:1923)
    at org.apache.doris.qe.ShowExecutor.execute(ShowExecutor.java:355)
    at org.apache.doris.qe.StmtExecutor.handleShow(StmtExecutor.java:2113)
    ...
```

* [branch-2.0](fix) Fix extremely high CPU usage caused by rf merge #27894 (#27895)

* [fix](stacktrace) ignore stacktrace for error code INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED (#27898)

* ignore stacktrace for error INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED

* AndBlockColumnPredicate::evaluate

* [opt](nereids) Branch-2.0: remove partition & histogram from col stats to reduce memory usage #27885 (#27896)

* [pick](Nereids) temporary partition is selected only if user manually specified: Branch-2.0 #27893 (#27905)

* [fix](multi-catalog)support the max compute partition prune (#27154) (#27902)

backport #27154

* [fix](Nereids) should not push down project to the nullable side of outer join #27912 (#27913)

* fix compile

---------

Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: xzj7019 <131111794+xzj7019@users.noreply.github.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: AKIRA <33112463+Kikyou1997@users.noreply.github.com>
Co-authored-by: Jibing-Li <64681310+Jibing-Li@users.noreply.github.com>
Co-authored-by: yiguolei <676222867@qq.com>
Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: DuRipeng <453243496@qq.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Pxl <pxl290@qq.com>
Co-authored-by: Calvin Kirs <acm_master@163.com>
Co-authored-by: minghong <englefly@gmail.com>
Co-authored-by: catpineapple <42031973+catpineapple@users.noreply.github.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: Kang <kxiao.tiger@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
Co-authored-by: Lei Zhang <27994433+SWJTU-ZhangLei@users.noreply.github.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…64 platforms: 1. Add libdeflate lib. (apache#27542)

Test result:

- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 9.09 s -> 6.04 s.
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669)

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in apache#27542.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.4 reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants