Skip to content

rework pushing to views#77309

Merged
CheSema merged 12 commits intomasterfrom
chesema-deduplicate-sync-inserts
May 2, 2025
Merged

rework pushing to views#77309
CheSema merged 12 commits intomasterfrom
chesema-deduplicate-sync-inserts

Conversation

@CheSema
Copy link
Copy Markdown
Member

@CheSema CheSema commented Mar 7, 2025

This PR contains the preparation step on a way to implement deduplication with async inserts with materialized views.

Here I changed:

  • recursion formula how pipeline is built. Now it is simple: InsertDependenciesBuilder::createPostSink calls createPostSink.
  • the logic how the stats are collected. ThreadStatus::internal_thread is removed. Instead ThreadGroup::createForMaterializedView is used. For each MV a new designated ThreadGroup is created. All the stats are accumulated in the counters inside the thread group.
  • code around logToQueryViewsLog. Now there is less code/classes to write the logs. No ViewRuntimeStats class is needed.
  • feature materialized_views_ignore_errors, this PR includes changes from fix setting materialized_views_ignore_errors #73802
  • resolve a race in *StorageLogs that resulted in dropping the data when MV is removed.

Also I faced and fixed:

  • LocalConnection did not send final stats at the end
  • PushingAsyncPipelineExecutor hanged at the timeout with BREAK policy
  • half of the *Log storages had wrong implementation of checkDependencies method. It returned true when there are no dependencies at all.
  • StorageURLSink connected remote url at the creating pipeline stage. I moved it to the execution stage.
  • StorageSystemZooKeeper::write had strange header configuration.

A bit more explanation about guarantees when exception happen:

  • if there is an exception (no matter where, in landing table or in dependent tables) than there is no guarantee what data is inserted. Query has status NOT_OK Example: exception happened in MV, than the data may be or may not be inserted in landing table.
  • if materialized_views_ignore_errors=true, than all exceptions in dependent tables have no effect on query execution. The landing table should have all data inserted. Query would have status OK, system.query_views_log would have the actual status for each dependent table.

Related issue: #73015

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Refactoring pushing to views.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@CheSema CheSema assigned CheSema and Michicosun and unassigned CheSema Mar 7, 2025
@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch from 07f6203 to 25e0b25 Compare March 7, 2025 15:16
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 7, 2025

Workflow [PR], commit [7929bc0]

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 7, 2025
@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch from 25e0b25 to 8fd35ae Compare March 7, 2025 15:30
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Mar 7, 2025

I looked at test 01268_mv_scalars
It is finny, but it catches a bug: #77327
May be I will fix it here.

@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch from 07e94f9 to 323fc70 Compare March 11, 2025 12:12
@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch 2 times, most recently from b04c0ff to 7467a76 Compare March 20, 2025 22:05
@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch 2 times, most recently from cde7cac to 457c1e0 Compare March 24, 2025 12:46
@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch from 4706987 to a27fb47 Compare March 31, 2025 16:46
@CheSema CheSema changed the title deduplicate async inserts rework pushing to views Apr 1, 2025
@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch 2 times, most recently from 30fa6b0 to 18ff169 Compare April 2, 2025 17:13
@CheSema CheSema marked this pull request as ready for review April 3, 2025 13:02
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Apr 4, 2025

test_storage_rabbitmq/test_failed_connection.py::test_rabbitmq_restore_failed_connection_without_losses_2
#71049

@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch 3 times, most recently from 9000bfd to e2488f5 Compare April 7, 2025 23:12
@CheSema
Copy link
Copy Markdown
Member Author

CheSema commented Apr 8, 2025

Stateless tests release: LOGICAL_ERROR
void DB::ObjectStorageQueueMetadataFactory::remove(const std::string &, const StorageID &): Code: 49. DB::Exception: Cannot unregister: table \'969a73ea-6b28-4558-a5da-2db8682ec758\' is not registered.
#78285
#78541

Stateless test msan:
02286_drop_filesystem_cache
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=77309&sha=e2488f541ae8733e2cc96dffbce74224c1e1b004&name_0=PR&name_1=Stateless+tests+%28msan%2C+2%2F4%29
#75876

@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch 2 times, most recently from 636ed03 to 3c509fd Compare April 22, 2025 18:34
@CheSema CheSema force-pushed the chesema-deduplicate-sync-inserts branch from 4b55b78 to 92b8268 Compare April 25, 2025 16:19
@CheSema CheSema added this pull request to the merge queue May 2, 2025
Merged via the queue into master with commit 120768e May 2, 2025
116 of 122 checks passed
@CheSema CheSema deleted the chesema-deduplicate-sync-inserts branch May 2, 2025 09:03
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 2, 2025
nikitamikhaylov added a commit that referenced this pull request May 6, 2025
…-sync-inserts"

This reverts commit 120768e, reversing
changes made to 0d42a5c.
github-merge-queue bot pushed a commit that referenced this pull request May 7, 2025
CheSema added a commit that referenced this pull request May 8, 2025
Asm0deusss pushed a commit to Asm0deusss/ClickHouse that referenced this pull request May 14, 2025
…eduplicate-sync-inserts"

This reverts commit 120768e, reversing
changes made to 0d42a5c.
github-merge-queue bot pushed a commit that referenced this pull request May 16, 2025
…-to-views

Revert "Revert "rework pushing to views #77309""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants