Always start embedded Keeper in async mode
Changelog category (leave one):
- Improvement
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Embedded Keeper will always start in the background allowing ClickHouse to start without achieving quorum.
Because of #40148, we shouldn't have anything that relies on an active ZK cluster during startup so we can safely start embedded Keeper in the background. cc @tavplubix @alesapin
Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/
How Replicated database engine will react?
How Replicated database engine will react?
https://github.com/ClickHouse/ClickHouse/blob/b90a806c35a32be1643fd7ce4d1aec4ea4eb9837/tests/integration/test_replicated_database/test.py#L781
ReplicatedAccessStorage has the same problem ReplicatedMergeTree had.
Startup fails if there is no ZK connection.
Everything else looks okay.
Do I understand correctly that now ReplicatedMergeTree (and DatabaseReplicated) will start in readonly mode if Keeper starts slowly than tables and databases? Of course they will connect to Keeper later in background and eventually everything will work, but my concern is that server may start to accept connections before replicated tables and databases have actually started up. However, it's probably not a problem if hosts are restarted one by one, so replicated tables and databases will simply connect to Keeper on other host.
@tavplubix that is correct. What we can do is still use the timeout and instead of failing to start CH, just fall back to starting Keeper in the background. That way we can give some priority to Keeper to try to connect first.
This task cannot be merged anyway until ReplicatedAccessStorage can also connect in the background so we can iterate a bit on the desired solution.
Seems like it breaks test_keeper_zookeeper_converter: https://play.clickhouse.com/play?user=play#c2VsZWN0IAp0b1N0YXJ0T2ZEYXkoY2hlY2tfc3RhcnRfdGltZSkgYXMgZCwKY291bnQoKSwgIGdyb3VwVW5pcUFycmF5KHB1bGxfcmVxdWVzdF9udW1iZXIpLCAgYW55KHJlcG9ydF91cmwpCmZyb20gY2hlY2tzIHdoZXJlICcyMDIyLTA2LTAxJyA8PSBjaGVja19zdGFydF90aW1lIGFuZCB0ZXN0X25hbWUgbGlrZSAnJXRlc3Rfa2VlcGVyX3pvb2tlZXBlcl9jb252ZXJ0ZXIlJyBhbmQgdGVzdF9zdGF0dXMgaW4gKCdGQUlMJywgJ0ZMQUtZJykgZ3JvdXAgYnkgZCBvcmRlciBieSBkIGRlc2M=
This was merged 13 days ago, it seems the issues started earlier than that.