Skip to content

Better support of GCP storage#38069

Merged
CurtizJ merged 7 commits intoClickHouse:masterfrom
CurtizJ:better-support-gcp
Jun 23, 2022
Merged

Better support of GCP storage#38069
CurtizJ merged 7 commits intoClickHouse:masterfrom
CurtizJ:better-support-gcp

Conversation

@CurtizJ
Copy link
Copy Markdown
Member

@CurtizJ CurtizJ commented Jun 15, 2022

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jun 15, 2022
@CurtizJ CurtizJ marked this pull request as ready for review June 15, 2022 15:02
@CurtizJ CurtizJ requested a review from alesapin June 15, 2022 15:02
Comment on lines -392 to -397
if (size == 0)
{
LOG_TRACE(log, "Skipping single part upload. Buffer is empty.");
return;
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this PR fixes removing of empty parts from GCP storage. The bug was here: we didn't write empty files at all and then we couldn't delete them. It doesn't reproduce with aws s3, because in it DeleteObject always returns OK even if key doesn't exist.

@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Jun 15, 2022

@Mergifyio update

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jun 15, 2022

update

✅ Branch has been successfully updated

@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Jun 17, 2022

@Mergifyio update

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jun 17, 2022

update

✅ Branch has been successfully updated

@rschu1ze rschu1ze self-assigned this Jun 17, 2022
Copy link
Copy Markdown
Member

@rschu1ze rschu1ze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two smaller comments besides Sashas comments.

@alesapin
Copy link
Copy Markdown
Member

test_storage_s3/test.py::test_seekable_formats_url (✕2)

Flaky now?

@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Jun 21, 2022

Always has been #35765

=================================== FAILURES ===================================
__________________________ test_seekable_formats_url ___________________________
[gw0] linux -- Python 3.8.10 /usr/bin/python3

started_cluster = <helpers.cluster.ClickHouseCluster object at 0x7f937bff59d0>

    def test_seekable_formats_url(started_cluster):
        bucket = started_cluster.minio_bucket
        instance = started_cluster.instances["dummy"]
    
        table_function = f"s3(s3_parquet, structure='a Int32, b String', format='Parquet')"
>       instance.query(
            f"insert into table function {table_function} select number, randomString(100) from numbers(5000000) settings s3_truncate_on_insert=1"
        )

test_storage_s3/test.py:1085: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
helpers/cluster.py:2884: in query
    return self.client.query(
helpers/client.py:32: in query
    return self.get_query_request(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <helpers.client.CommandRequest object at 0x7f937bf883d0>

    def get_answer(self):
        self.process.wait()
        self.stdout_file.seek(0)
        self.stderr_file.seek(0)
    
        stdout = self.stdout_file.read().decode("utf-8", errors="replace")
        stderr = self.stderr_file.read().decode("utf-8", errors="replace")
    
        if (
            self.timer is not None
            and not self.process_finished_before_timeout
            and not self.ignore_error
        ):
            logging.debug(f"Timed out. Last stdout:{stdout}, stderr:{stderr}")
            raise QueryTimeoutExceedException("Client timed out!")
    
        if (self.process.returncode != 0 or stderr) and not self.ignore_error:
>           raise QueryRuntimeException(
                "Client failed! Return code: {}, stderr: {}".format(
                    self.process.returncode, stderr
                ),
                self.process.returncode,
                stderr,
            )
E           helpers.client.QueryRuntimeException: Client failed! Return code: 243, stderr: Received exception from server (version 22.7.1):
E           Code: 499. DB::Exception: Received from 172.16.1.7:9000. DB::Exception: The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed. Tags:"2b5853b29381db5b12b64c947ec85f44" "ad6cd21bb34a6c02e477d09d1e6718e1" "d7b28b4f96e9e5223fb8d9be7013e2cc" "258494eeab9e793a9cc1c79fb677695a" "3ff3dfe951b1fac810fd9599536049f2" "baf09b8e4a16833314c804eddc65704b" "c544f603d7a4977f972a2ba2d5c9ad8f" "92fa436dbc6dd2901147fc172f8668ec" "47194aa2f43f60bc8c1e4702d490fd16" "826f860d717822086746cdd2bc3389cb" "2a6f67ea8039390491d5a03a45ca84b7" "fb773e05c40e1ecf1760a9a4754b40ee" "eaf235c8788bc64ca8e297bd03a6c6cc" "8ad7b100a5602e6b8989f93cffdca459" "938b3ef3288dd55364a1392af37fb5cf" "b0ce02777fdd0ed7034b6a5454700db8" "4f54c1737fcd2b8fad95efd4e5fe4689" "cd3e671d6e9fb0f2c366b85bb3e150f0" "920b28a4ac35d0fda036695b40c46b2d" "c3d8e8209067371e5746b84838da9f7a" "29ab47ca62d839c9c75b1907d3a8fd47" "ba3d61a6ad05144425d910e4109bccff" "ee09b0b71260df9070415699be3fdb64" "f55feb7ce4574fd1c85884fe0dc67445" "94d7eb3d55b8ccc3ad98a67fa8caf465" "a7fe28284d8c6a96af1c4f251c9b4aa8" "07012a95eacd8b57e280488e69b28d75" "3057f0871e289732276a4d1ae5a8f9de" "ff50970b654d489d102af50b453e4e05" "9c67190d29757288235c5506a599055d". Stack trace:
E           
E           0. ./build_docker/../contrib/libcxx/include/exception:133: Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x36f9eec9 in /usr/bin/clickhouse
E           1. ./build_docker/../src/Common/Exception.cpp:69: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xd76fecc in /usr/bin/clickhouse
E           2. ./build_docker/../contrib/libcxx/include/string:1445: DB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, fmt::v8::join_view<std::__1::__wrap_iter<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*>, std::__1::__wrap_iter<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*>, char> >(int, fmt::v8::basic_format_string<char, fmt::v8::type_identity<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&>::type, fmt::v8::type_identity<fmt::v8::join_view<std::__1::__wrap_iter<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*>, std::__1::__wrap_iter<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*>, char> >::type>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, fmt::v8::join_view<std::__1::__wrap_iter<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*>, std::__1::__wrap_iter<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*>, char>&&) @ 0x2b9ce84e in /usr/bin/clickhouse
E           3. ./build_docker/../src/IO/WriteBufferFromS3.cpp:373: DB::WriteBufferFromS3::completeMultipartUpload() @ 0x2b9c9203 in /usr/bin/clickhouse
E           4. ./build_docker/../src/IO/WriteBuffer.h:127: DB::WriteBuffer::finalize() @ 0xd72e5a3 in /usr/bin/clickhouse
E           5. ./build_docker/../src/Storages/StorageS3.cpp:626: DB::StorageS3Sink::onFinish() @ 0x2f033e80 in /usr/bin/clickhouse
E           6. ./build_docker/../src/Processors/Transforms/ExceptionKeepingTransform.cpp:103: DB::runStep(std::__1::function<void ()>, DB::ThreadStatus*, std::__1::atomic<unsigned long>*) @ 0x305fc352 in /usr/bin/clickhouse
E           7. ./build_docker/../contrib/libcxx/include/__functional/function.h:813: DB::ExceptionKeepingTransform::work() @ 0x305fb28e in /usr/bin/clickhouse
E           8. ./build_docker/../src/Processors/Executors/ExecutionThreadContext.cpp:0: DB::ExecutionThreadContext::executeTask() @ 0x300dbaad in /usr/bin/clickhouse
E           9. ./build_docker/../src/Processors/Executors/PipelineExecutor.cpp:222: DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*) @ 0x300be413 in /usr/bin/clickhouse
E           10. ./build_docker/../src/Processors/Executors/PipelineExecutor.cpp:0: DB::PipelineExecutor::executeImpl(unsigned long) @ 0x300bca2f in /usr/bin/clickhouse
E           11. ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:284: DB::PipelineExecutor::execute(unsigned long) @ 0x300bbe70 in /usr/bin/clickhouse
E           12. ./build_docker/../contrib/libcxx/include/atomic:939: void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPool::ThreadFromGlobalPool<DB::CompletedPipelineExecutor::execute()::$_0>(DB::CompletedPipelineExecutor::execute()::$_0&&)::'lambda'(), void ()> >(std::__1::__function::__policy_storage const*) @ 0x300ba377 in /usr/bin/clickhouse
E           13. ./build_docker/../contrib/libcxx/include/__functional/function.h:0: ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xd8ba6a7 in /usr/bin/clickhouse
E           14. ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:312: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()> >(void*) @ 0xd8c27dc in /usr/bin/clickhouse
E           15. ? @ 0x7fcd2752d609 in ?
E           16. clone @ 0x7fcd27452133 in ?
E           . (S3_ERROR)
E           (query: insert into table function s3(s3_parquet, structure='a Int32, b String', format='Parquet') select number, randomString(100) from numbers(5000000) settings s3_truncate_on_insert=1)

@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Jun 23, 2022

Stateless tests (release, s3 storage, actions) : SystemLog (system.opentelemetry_span_log): Queue is full for system log 'DB::OpenTelemetrySpanLog' at 1944719 is flaky in s3 check
example with another system table: https://s3.amazonaws.com/clickhouse-test-reports/0/7cc87d9a65572ba06206d06f770929a7ba245d6f/stateless_tests__release__s3_storage__actions_.html

AST Fuzzer and Stress Test are unrelated.

@CurtizJ CurtizJ merged commit 7efbae7 into ClickHouse:master Jun 23, 2022
azat added a commit to azat/ClickHouse that referenced this pull request Jul 4, 2022
GCS server does not handle requests with port, and simply report an
error:

```xml
    <?xml version="1.0"?>
    <?xml version='1.0' encoding='UTF-8'?>
    <Error>
        <Code>InvalidURI</Code>
        <Message>Couldn't parse the specified URI.</Message>
        <Details>Invalid URL: storage.googleapis.com:443/...</Details>
    </Error>
```

Removing the port fixes the issue. Note that there is port in the Host
header anyway.

Note, this is a problem only for proxy in a tunnel mode, since only it
sends such requests, other sends requests directly via HTTP methods.

Refs: ClickHouse/poco#22 (comment) (cc @Jokser)
Refs: ClickHouse/poco#63
Refs: ClickHouse#38069 (cc @CurtizJ)
Cc: @alesapin @kssenii

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants