Add server UUID for disks access checks (read/read-by-offset/write/delete) to avoid possible races by azat · Pull Request #43143 · ClickHouse/ClickHouse

azat · 2022-11-10T19:30:16Z

Changelog category (leave one):

Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add server UUID for disks access checks (read/read-by-offset/write/delete) to avoid possible races. Plus now disks checks will be done for all disks, and not only S3/Azure.

Otherwise, if you are lucky enough, race condition is possible, and you can get some errors because one server already removed the file while another was trying to read it.

But it was possible only for:

batch deletes check for "s3" disk
and all checks for "s3_plain" disk, since this disk does not uses random names

Fixes: #37791

alexey-milovidov · 2022-11-12T14:26:35Z

It should be added to all disk directories, not only s3.

azat · 2022-11-17T20:22:48Z

It should be added to all disk directories, not only s3.

You mean Azure disk, or all object storages, or all disks? What you have in mind?

P.S. for now there is only checks for S3 and Azure.

alexey-milovidov · 2022-11-17T21:12:23Z

All disks.

azat · 2022-11-19T19:33:29Z

Integration tests (tsan) [1/4] — fail: 1, passed: 561, flaky: 0

test_distributed_ddl/test.py::test_simple_alters[configs] query timeout - looks flaky (tested locally - passed)

Integration tests (tsan) [2/4] — fail: 1, passed: 529, flaky: 2

test_create_user_and_login/test.py::test_login_as_dropped_user_xml - flaky (locally not reproduced)

E               Exception: Expected either output '1' or one of errors '['no user with such name', 'not found in user directories']', got output= and error=Received exception from server (version 22.12.1):
E               Code: 192. DB::Exception: Received from 172.16.24.2:9000. DB::Exception: C: User has been dropped. Stack trace:

Integration tests flaky check (asan) — Failed tests found: error: 0, passed: 0, fail: 2, skipped: 11, flaky: 0

test_disk_types, hm mkdirs for hdfs does not always helps

azat · 2022-11-20T12:42:52Z

Failures unrelated:

AST fuzzer (ubsan) — Logical error: 'Invalid number of rows in Chunk column Object position 3: expected 1, got 0'.

Logical error: Invalid number of rows in Chunk column Object #40400

Stateless tests (tsan) [2/5] — fail: 1, passed: 909, skipped: 3

01111_create_drop_replicated_db_stress

2022-11-19 21:30:00 [localhost] 2022.11.19 21:29:37.186300 [ 15741 ] {302301ba-6c95-4a5b-9c09-b8a10e85e6b2}  executeQuery: Code: 218. DB::Exception: Table default_repl_2.rmt_1422_11882_17658 is dropped. (TABLE_IS_DROPPED) (version 22.12.1.1) (from [::1]:49612) (comment: 01111_create_drop_replicated_db_stress.sh) (in query: insert into default_repl_2.rmt_1422_11882_17658 values ), Stack trace (when copying this message, always include the lines below):
2022-11-19 21:30:00 Code: 218. DB::Exception: Received from localhost:9000. DB::Exception: Table default_repl_2.rmt_1422_11882_17658 is dropped. (TABLE_IS_DROPPED)

Stress test (msan) — OOM killer (or signal 9) in clickhouse-server.log

2022.11.20 00:41:18.990557 [ 328526 ] {} <Error> 86d52eda-305e-4b27-8bcc-5c736281143f::all_0_4_10_41(MutateFromLogEntryTask): virtual bool DB::ReplicatedMergeMutateTaskBase::executeStep(): Code: 47. DB::Exception: Missing columns: 'value4' while processing query: 'value2, value3, value4, value0, key, key, value0, value4, value3, value2', required columns: 'value2' 'value3' 'value4' 'value0' 'key', maybe you meant: ['value2','value3','value0','value0','key']. (UNKNOWN_IDENTIFIER), Stack trace (when copying this message, always include the lines below):

And something with MergeTree

Stress test (ubsan) — Backward compatibility check: Error message in clickhouse-server.log (see bc_check_error_messages.txt)

2022.11.20 00:27:29.824241 [ 811379 ] {} <Error> d1c900ac-35ff-4156-a969-e0a310902f9f::all_0_38_9_51(MutateFromLogEntryTask): virtual bool DB::ReplicatedMergeMutateTaskBase::executeStep(): Code: 47. DB::Exception: Missing columns: 'value3' while processing query: 'value5, value3, value0, key, key, value0, value3, value5', required columns: 'value5' 'value3' 'value0' 'key', maybe you meant: ['value5','value0','value0','key']. (UNKNOWN_IDENTIFIER), Stack trace (when copying this message, always include the lines below):

src/Disks/DiskLocal.cpp

src/Disks/DiskMemory.cpp

src/Disks/IDisk.cpp

Otherwise, if you are lucky enough, race condition is possible, and you can get some errors because one server already removed the file while another was trying to read it. But it was possible only for: - batch deletes check for "s3" disk - and all checks for "s3_plain" disk, since this disk does not uses random names Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Previously we had such access (read/write/delete) checks only for S3 and Azure disks (read/read-by-offset/write/delete), this patch adds check for all disks. Also I've added the disk name into IDisk interface, since it is required for the error message in IDisk::checkAccess(), but I have to add DiskEncrypted::encrypted_name due DiskEncrypted inherits from DiskDecorator not from IDisk, and so does not have ability to set disk name (DiskEncrypted could pass the disk name to the DiskDecorator, but it is not used anywere, and besides this will require to duplicate the name for each user of DiskDecorator). Also from nwo on, skip_access_check will make sense for every disk, plus now startup() called for each disk (before it was missed for some of them). And I've added skip_access_check as as a member for DiskRestartProxy, since it calls startup() on restart(). Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

This will fix clickhouse-disks Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

stacktrace: contrib/libcxx/src/condition_variable.cpp:47::std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) contrib/libcxx/src/shared_mutex.cpp:65::std::__1::shared_timed_mutex::lock_shared() src/Disks/DiskRestartProxy.cpp:229::DB::DiskRestartProxy::writeFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned long, DB::WriteMode, DB::WriteSettings const&) src/Disks/IDisk.cpp:0::DB::IDisk::checkAccessImpl(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) contrib/libcxx/include/string:1499::DB::IDisk::checkAccess() src/Disks/IDisk.cpp:0::DB::IDisk::startup(std::__1::shared_ptr<DB::Context const>, bool) src/Disks/DiskRestartProxy.cpp:375::DB::DiskRestartProxy::restart(std::__1::shared_ptr<DB::Context const>) contrib/libcxx/include/__memory/shared_ptr.h:701::DB::InterpreterSystemQuery::restartDisk(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) src/Interpreters/InterpreterSystemQuery.cpp:508::DB::InterpreterSystemQuery::execute() src/Interpreters/executeQuery.cpp:0::DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) src/Interpreters/executeQuery.cpp:1083::DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) src/Server/TCPHandler.cpp:0::DB::TCPHandler::runImpl() src/Server/TCPHandler.cpp:1904::DB::TCPHandler::run() contrib/poco/Net/src/TCPServerConnection.cpp:57::Poco::Net::TCPServerConnection::start() contrib/libcxx/include/__memory/unique_ptr.h:48::Poco::Net::TCPServerDispatcher::run() contrib/poco/Foundation/src/ThreadPool.cpp:213::Poco::PooledThread::run() contrib/poco/Foundation/include/Poco/SharedPtr.h:156::Poco::ThreadImpl::runnableEntry(void*) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

…r.py) Right now cluster.py first creates HDFS and then creates clickhouse in one API call, so you cannot interract and add missing mkdirs for the clickhouse, fix this by using root dir where it is possible.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

But it wasn't a problem before, since before it was not possible to have readonly==true. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

azat · 2022-11-20T20:05:09Z

Test failures looks unrelated:

Stateless tests (tsan) [2/5] — fail: 1, passed: 909, skipped: 3

01926_order_by_desc_limit - Fix flaky 01926_order_by_desc_limit #43408

Stress test (msan) — OOM killer (or signal 9) in clickhouse-server.log

Looks like slow parts removal, see also #43277 (comment)

robot-ch-test-poll2 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Nov 10, 2022

kssenii self-assigned this Nov 10, 2022

kssenii approved these changes Nov 11, 2022

View reviewed changes

kssenii mentioned this pull request Nov 11, 2022

The specified key does not exist (S3_ERROR) on server startup in functional tests with Replicated database #37791

Closed

alexey-milovidov mentioned this pull request Nov 13, 2022

Add documentation for system.asynchronous_metrics. #43194

Merged

azat marked this pull request as draft November 18, 2022 04:45

azat force-pushed the disks/s3-check-fix branch from 28b6ec7 to b678097 Compare November 18, 2022 12:46

azat marked this pull request as ready for review November 18, 2022 12:46

azat changed the title ~~Add server UUID for the S3 disks checks to avoid possible races~~ Add server UUID for disks access checks (read/read-by-offset/write/delete) to avoid possible races Nov 18, 2022

azat force-pushed the disks/s3-check-fix branch 3 times, most recently from 46fc609 to a0deb94 Compare November 19, 2022 09:51

azat force-pushed the disks/s3-check-fix branch 2 times, most recently from 3926691 to 1c751fb Compare November 19, 2022 19:53

azat requested a review from kssenii November 20, 2022 12:43

kssenii reviewed Nov 20, 2022

View reviewed changes

src/Disks/DiskLocal.cpp Outdated Show resolved Hide resolved

src/Disks/DiskMemory.cpp Outdated Show resolved Hide resolved

src/Disks/DiskMemory.cpp Outdated Show resolved Hide resolved

src/Disks/IDisk.cpp Outdated Show resolved Hide resolved

azat added 9 commits November 20, 2022 16:11

Make disks checks only for clickhouse-server

44f23c2

This will fix clickhouse-disks Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Create disk directory before access check for local disk

11be9b9

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

tests: fix test_disk_types (by adding mkdir for HDFS)

0678fba

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

tests: fix hdfs disks (chicken and an egg problem with current cluste…

3e42ffd

…r.py) Right now cluster.py first creates HDFS and then creates clickhouse in one API call, so you cannot interract and add missing mkdirs for the clickhouse, fix this by using root dir where it is possible.

Change the name pattern for memory disks

38c0092

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Change DiskLocal::setup() signature (it always return true)

cb60576

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

azat added 2 commits November 20, 2022 16:35

Check if local disk is readonly before access checks

e6e223a

Fix is_read_only property of local disks

c393af2

But it wasn't a problem before, since before it was not possible to have readonly==true. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

azat force-pushed the disks/s3-check-fix branch from 1c751fb to c393af2 Compare November 20, 2022 15:40

kssenii approved these changes Nov 21, 2022

View reviewed changes

kssenii merged commit a59dac6 into ClickHouse:master Nov 21, 2022

azat deleted the disks/s3-check-fix branch November 21, 2022 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add server UUID for disks access checks (read/read-by-offset/write/delete) to avoid possible races#43143

Add server UUID for disks access checks (read/read-by-offset/write/delete) to avoid possible races#43143
kssenii merged 11 commits intoClickHouse:masterfrom
azat:disks/s3-check-fix

azat commented Nov 10, 2022 •

edited

Loading

Uh oh!

alexey-milovidov commented Nov 12, 2022

Uh oh!

azat commented Nov 17, 2022

Uh oh!

alexey-milovidov commented Nov 17, 2022

Uh oh!

azat commented Nov 19, 2022

Uh oh!

azat commented Nov 20, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

azat commented Nov 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

azat commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Uh oh!

alexey-milovidov commented Nov 12, 2022

Uh oh!

azat commented Nov 17, 2022

Uh oh!

alexey-milovidov commented Nov 17, 2022

Uh oh!

azat commented Nov 19, 2022

Uh oh!

azat commented Nov 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

azat commented Nov 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

azat commented Nov 10, 2022 •

edited

Loading

azat commented Nov 20, 2022 •

edited

Loading