-
Notifications
You must be signed in to change notification settings - Fork 8.3k
DatabaseReplicated: settings that make query synchronous may not work for not initial replicas #34818
Description
We have some settings such as database_atomic_wait_for_drop_and_detach_synchronously, replication_alter_partitions_sync and mutations_sync that allow to wait for query to be actually finished. However, it may not work as expected if database engine is Replicated. When query is finished on initiator, we wait for it to finish on other database replicas. More precisely, we wait for replicas to create log/query-xxxxx/finished/full_replica_name node. The problem is that DatabaseReplicated may create this node earlier (when "commit point" is passed, but query is not fully completed).
Example:
https://s3.amazonaws.com/clickhouse-test-reports/34749/9b753c84f099eca8fe6271f006ef7d8ca47ae00a/stateless_tests__release__databasereplicated__actions__[1/2].html
(the second replica has created "finished" node when table is marked is dropped, but not actually dropped, so initiator returned "Ok" table's metadata was removed from ZK)
Similar scenario is possible for some ALTERs
I'm not sure what is the best way to fix it. We can simply create another node in ZK when query execution is fully completed and make DDLQueryStatusSource wait for this node to appear instead of "finished" node. On the other hand, we should not allow running "ON CLUSTER" queries with enabled "sync" settings at all or should provide another way to wait for such queries, because waiting in the main thread of DDLWorker is awful.
This issue is slightly related to #23513.