Skip to content

Add server UUID for disks access checks (read/read-by-offset/write/delete) to avoid possible races#43143

Merged
kssenii merged 11 commits intoClickHouse:masterfrom
azat:disks/s3-check-fix
Nov 21, 2022
Merged

Add server UUID for disks access checks (read/read-by-offset/write/delete) to avoid possible races#43143
kssenii merged 11 commits intoClickHouse:masterfrom
azat:disks/s3-check-fix

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Nov 10, 2022

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add server UUID for disks access checks (read/read-by-offset/write/delete) to avoid possible races. Plus now disks checks will be done for all disks, and not only S3/Azure.

Otherwise, if you are lucky enough, race condition is possible, and you can get some errors because one server already removed the file while another was trying to read it.

But it was possible only for:

  • batch deletes check for "s3" disk
  • and all checks for "s3_plain" disk, since this disk does not uses random names

Fixes: #37791

@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Nov 10, 2022
@kssenii kssenii self-assigned this Nov 10, 2022
@alexey-milovidov
Copy link
Copy Markdown
Member

It should be added to all disk directories, not only s3.

@azat
Copy link
Copy Markdown
Member Author

azat commented Nov 17, 2022

It should be added to all disk directories, not only s3.

You mean Azure disk, or all object storages, or all disks? What you have in mind?

P.S. for now there is only checks for S3 and Azure.

@alexey-milovidov
Copy link
Copy Markdown
Member

All disks.

@azat azat marked this pull request as draft November 18, 2022 04:45
@azat azat force-pushed the disks/s3-check-fix branch from 28b6ec7 to b678097 Compare November 18, 2022 12:46
@azat azat marked this pull request as ready for review November 18, 2022 12:46
@azat azat changed the title Add server UUID for the S3 disks checks to avoid possible races Add server UUID for disks access checks (read/read-by-offset/write/delete) to avoid possible races Nov 18, 2022
@azat azat force-pushed the disks/s3-check-fix branch 3 times, most recently from 46fc609 to a0deb94 Compare November 19, 2022 09:51
@azat
Copy link
Copy Markdown
Member Author

azat commented Nov 19, 2022

Integration tests (tsan) [1/4] — fail: 1, passed: 561, flaky: 0

  • test_distributed_ddl/test.py::test_simple_alters[configs] query timeout - looks flaky (tested locally - passed)

Integration tests (tsan) [2/4] — fail: 1, passed: 529, flaky: 2

  • test_create_user_and_login/test.py::test_login_as_dropped_user_xml - flaky (locally not reproduced)
E               Exception: Expected either output '1' or one of errors '['no user with such name', 'not found in user directories']', got output= and error=Received exception from server (version 22.12.1):
E               Code: 192. DB::Exception: Received from 172.16.24.2:9000. DB::Exception: C: User has been dropped. Stack trace:

Integration tests flaky check (asan) — Failed tests found: error: 0, passed: 0, fail: 2, skipped: 11, flaky: 0

  • test_disk_types, hm mkdirs for hdfs does not always helps

@azat azat force-pushed the disks/s3-check-fix branch 2 times, most recently from 3926691 to 1c751fb Compare November 19, 2022 19:53
@azat
Copy link
Copy Markdown
Member Author

azat commented Nov 20, 2022

Failures unrelated:

AST fuzzer (ubsan) — Logical error: 'Invalid number of rows in Chunk column Object position 3: expected 1, got 0'.

Stateless tests (tsan) [2/5] — fail: 1, passed: 909, skipped: 3

  • 01111_create_drop_replicated_db_stress
2022-11-19 21:30:00 [localhost] 2022.11.19 21:29:37.186300 [ 15741 ] {302301ba-6c95-4a5b-9c09-b8a10e85e6b2}  executeQuery: Code: 218. DB::Exception: Table default_repl_2.rmt_1422_11882_17658 is dropped. (TABLE_IS_DROPPED) (version 22.12.1.1) (from [::1]:49612) (comment: 01111_create_drop_replicated_db_stress.sh) (in query: insert into default_repl_2.rmt_1422_11882_17658 values ), Stack trace (when copying this message, always include the lines below):
2022-11-19 21:30:00 Code: 218. DB::Exception: Received from localhost:9000. DB::Exception: Table default_repl_2.rmt_1422_11882_17658 is dropped. (TABLE_IS_DROPPED)

Stress test (msan) — OOM killer (or signal 9) in clickhouse-server.log

2022.11.20 00:41:18.990557 [ 328526 ] {} <Error> 86d52eda-305e-4b27-8bcc-5c736281143f::all_0_4_10_41(MutateFromLogEntryTask): virtual bool DB::ReplicatedMergeMutateTaskBase::executeStep(): Code: 47. DB::Exception: Missing columns: 'value4' while processing query: 'value2, value3, value4, value0, key, key, value0, value4, value3, value2', required columns: 'value2' 'value3' 'value4' 'value0' 'key', maybe you meant: ['value2','value3','value0','value0','key']. (UNKNOWN_IDENTIFIER), Stack trace (when copying this message, always include the lines below):

And something with MergeTree

Stress test (ubsan) — Backward compatibility check: Error message in clickhouse-server.log (see bc_check_error_messages.txt)

2022.11.20 00:27:29.824241 [ 811379 ] {} <Error> d1c900ac-35ff-4156-a969-e0a310902f9f::all_0_38_9_51(MutateFromLogEntryTask): virtual bool DB::ReplicatedMergeMutateTaskBase::executeStep(): Code: 47. DB::Exception: Missing columns: 'value3' while processing query: 'value5, value3, value0, key, key, value0, value3, value5', required columns: 'value5' 'value3' 'value0' 'key', maybe you meant: ['value5','value0','value0','key']. (UNKNOWN_IDENTIFIER), Stack trace (when copying this message, always include the lines below):

@azat azat requested a review from kssenii November 20, 2022 12:43
azat added 9 commits November 20, 2022 16:11
Otherwise, if you are lucky enough, race condition is possible, and you
can get some errors because one server already removed the file while
another was trying to read it.

But it was possible only for:
- batch deletes check for "s3" disk
- and all checks for "s3_plain" disk, since this disk does not uses
  random names

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Previously we had such access (read/write/delete) checks only for S3 and
Azure disks (read/read-by-offset/write/delete), this patch adds check
for all disks.

Also I've added the disk name into IDisk interface, since it is required
for the error message in IDisk::checkAccess(), but I have to add
DiskEncrypted::encrypted_name due DiskEncrypted inherits from
DiskDecorator not from IDisk, and so does not have ability to set disk
name (DiskEncrypted could pass the disk name to the DiskDecorator, but
it is not used anywere, and besides this will require to duplicate the
name for each user of DiskDecorator).

Also from nwo on, skip_access_check will make sense for every disk, plus
now startup() called for each disk (before it was missed for some of
them).

And I've added skip_access_check as as a member for DiskRestartProxy,
since it calls startup() on restart().

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This will fix clickhouse-disks

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
stacktrace:
    contrib/libcxx/src/condition_variable.cpp:47::std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&)
    contrib/libcxx/src/shared_mutex.cpp:65::std::__1::shared_timed_mutex::lock_shared()
    src/Disks/DiskRestartProxy.cpp:229::DB::DiskRestartProxy::writeFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned long, DB::WriteMode, DB::WriteSettings const&)
    src/Disks/IDisk.cpp:0::DB::IDisk::checkAccessImpl(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)
    contrib/libcxx/include/string:1499::DB::IDisk::checkAccess()
    src/Disks/IDisk.cpp:0::DB::IDisk::startup(std::__1::shared_ptr<DB::Context const>, bool)
    src/Disks/DiskRestartProxy.cpp:375::DB::DiskRestartProxy::restart(std::__1::shared_ptr<DB::Context const>)
    contrib/libcxx/include/__memory/shared_ptr.h:701::DB::InterpreterSystemQuery::restartDisk(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&)
    src/Interpreters/InterpreterSystemQuery.cpp:508::DB::InterpreterSystemQuery::execute()
    src/Interpreters/executeQuery.cpp:0::DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*)
    src/Interpreters/executeQuery.cpp:1083::DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum)
    src/Server/TCPHandler.cpp:0::DB::TCPHandler::runImpl()
    src/Server/TCPHandler.cpp:1904::DB::TCPHandler::run()
    contrib/poco/Net/src/TCPServerConnection.cpp:57::Poco::Net::TCPServerConnection::start()
    contrib/libcxx/include/__memory/unique_ptr.h:48::Poco::Net::TCPServerDispatcher::run()
    contrib/poco/Foundation/src/ThreadPool.cpp:213::Poco::PooledThread::run()
    contrib/poco/Foundation/include/Poco/SharedPtr.h:156::Poco::ThreadImpl::runnableEntry(void*)

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
…r.py)

Right now cluster.py first creates HDFS and then creates clickhouse in
one API call, so you cannot interract and add missing mkdirs for the
clickhouse, fix this by using root dir where it is possible.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
azat added 2 commits November 20, 2022 16:35
But it wasn't a problem before, since before it was not possible to have
readonly==true.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
@azat azat force-pushed the disks/s3-check-fix branch from 1c751fb to c393af2 Compare November 20, 2022 15:40
@azat
Copy link
Copy Markdown
Member Author

azat commented Nov 20, 2022

Test failures looks unrelated:

Stateless tests (tsan) [2/5] — fail: 1, passed: 909, skipped: 3

Stress test (msan) — OOM killer (or signal 9) in clickhouse-server.log

Looks like slow parts removal, see also #43277 (comment)

@kssenii kssenii merged commit a59dac6 into ClickHouse:master Nov 21, 2022
@azat azat deleted the disks/s3-check-fix branch November 21, 2022 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The specified key does not exist (S3_ERROR) on server startup in functional tests with Replicated database

4 participants