Proper Materialized views startup dependencies#72123
Proper Materialized views startup dependencies#72123serxa merged 58 commits intoClickHouse:masterfrom
Conversation
|
Hello @serxa , could you have a look please since you worked on async table loading? The change really cures the issue, although I am still working on it because I am not sure why
|
|
This is an automated comment for commit cd1cc0c with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
f63342c to
85631db
Compare
|
Hello @serxa , could you have a look, please. |
They serve different purposes.
Any unnecessary load dependency may potentially slow down server startup because more tables will be loaded sequentially instead of in parallel. So we tried our best to separate different kinds of dependencies. IIRC referential dependencies exist only to avoid dropping a table that is mentioned in the definition of another table, but such tables could be loaded in parallel. cc @tavplubix |
|
@serxa, thanks for reacting and for clarification. I have to admit that I am not sure that current solution is optimal/most elegant. I see following options
(3) eliminates need of checkDependencies, while these methods are needed for (1) and (2), but since the code is the same, it can be a single method of DataBaseCatalog. Suggestions are appreciated. |
|
BTW, if I am not mistaken, check for ErrorCodes::TOO_MANY_MATERIALIZED_VIEWS |
You are completely right, this is nonsense. I think option (3) is almost the right way to do it. We should get rid of flowchart TD
load_table1["load table1"] --> startup_table1["startup table1"]
load_table2["load table2"] --> startup_table2["startup table2"]
load_table1 --> load_table2
Which prevents execution parallel loading. But for views, we could instead use dependencies between startup jobs to make the following order of execution: flowchart TD
load_table1["load table1"] --> startup_table1["startup table1"]
load_table2["load table2"] --> startup_table2["startup table2"]
startup_table1 --> startup_table2
This won't prevent parallel execution of load jobs. And startup jobs are not long and it would not be a problem. Whenever a table is accessed by INSERT query it waits (blocks) until the startup job is done. So this logic remains the same and will automagically resolve the issue instead of the To implement this we should pass Seems not too hard at the first glance. Would you like to give it a try in this PR? |
What do you mean? |
We do this check in StorageMaterializedView ctor based on data collected in StorageMaterializedView::startup . |
Yes.
Should we limit the scope of this extra loop by streaming alike tables only?
Will see ;)
Yes, sure. BTW, I have an impression that in case of streaming tables and their MVs we don't have an implicit synchronization point, "Whenever a table is accessed by INSERT query it waits (blocks) until the startup job is done" does not work for them, that's why checkDependencies() are needed. Does it make sense? |
I'm not very deep in that streaming code, but yes, looks like that is true.
No, I believe it is needed for all tables. So it will be not an extra loop, but improvement of existing one that is already present there (current loop just iterates tables in some unspecified order). In #72589 streaming is not mentioned, but async_load_tables was identified as the root cause. It is just because without async loading there was no way to do INSERT in the middle (when the source table is started, but target is not started yet). With async load of tables it is now possible. So we will fix both issues by introducing these new dependencies directly into AsyncLoader DAG. |
|
@serxa, I have several assorted concerns/questions/assumptions (probably not all of them make sense) based on your graphs and explanations.
It might be more convenient to communicate via Telegram. May I write you (I know username)? |
76491d0 to
a6eac5a
Compare
088a82b to
618a436
Compare
877c24c to
0b39ba8
Compare
|
Looking into test_storage_s3_queue/test.py::test_registry failure. Update: matter of delay. I've managed to reproduce this on my laptop, artificially slowing it down. @kssenii , I suggest increasing timeout. |
8e82023 to
cd1cc0c
Compare
|
I resolved new conflicts. It seems now mergeable again. Tests are green except the flaky one.
This thing I do not understand: |
It seems to be the same problem of long tests, but it showed an empty report, so I did not realize it at the first glance. Let's override mergeable check one more time and wait for private CI to finish. |
Seems reasonable unfortunately. |
|
Hello @serxa , |
|
I think I messed up while resolving conflicts in private branch. And now there are failed tests. I was distracted, but I'll try to fix it now. |
| if (!table_id.table_name.empty()) | ||
| { | ||
| mv_to_dependency = table_id; | ||
| if (mv_to_dependency->getDatabaseName().empty()) |
There was a problem hiding this comment.
There are two places where getDatabaseName().empty() is called. But getDatabaseName() throws an exception on an empty database. I dont know why it has not been found by public CI, but some private tests found this problem.
There was a problem hiding this comment.
Great finding, I'll fix it by EOD.
There was a problem hiding this comment.
@serxa , could you share anything that facilitates reproduction?
Cannot manage yet.
Probably this code is 'inspired' by
There was a problem hiding this comment.
Well, I believe the problem is specific to private code because it uses catalog for tracking dependencies and got exception because it uses uuid instead of db.table DB::Exception: Received from 172.16.2.10:9000. DB::Exception: Database name is empty: While processing CREATE MATERIALIZED VIEW _ UUID 'ddfb88c6-d551-4467-bc53-b9244f3a8ef0' REFRESH EVERY 1 YEAR SETTINGS all_replicas = 1 APPEND TO INNER UUID '8e252499-6086-443b-8b65-a466db04ccb6' (x Int64) ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') ORDER BY x SETTINGS index_granularity = 8192 DEFINER = default SQL SECURITY DEFINER AS SELECT rand() AS x
There was a problem hiding this comment.
Let's just make it similar to what it was. I think it should do the trick
QualifiedTableName target_name{table_id.database_name, table_id.table_name};
if (target_name.database.empty())
target_name.database = current_database;
Head branch was pushed to by a user without write access
|
Private tests should be fixed now. At least local check does not reveal any problems. But public CI is in bad shape at the moment... |
|
@ilejn Let's merge master into this branch one more time |
|
Private Sync is finally green, but for some reason, CI here does not recognize this fact here. |
c7357e5
|
I'm so glad it is finally merged. Thanks to everybody involved! |
|
For visibility: #82222 |
|
Any chance of backporting it to 25.3? |
No chances, this rework is too big |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
A materialized view can start too late, e.g. after the Kafka table that streams to it.