Skip to content

Use cluster table functions automatically if parallel replicas are enabled#70659

Merged
alexey-milovidov merged 15 commits intomasterfrom
auto-cluster-engines
Mar 13, 2025
Merged

Use cluster table functions automatically if parallel replicas are enabled#70659
alexey-milovidov merged 15 commits intomasterfrom
auto-cluster-engines

Conversation

@thevar1able
Copy link
Copy Markdown
Member

@thevar1able thevar1able commented Oct 14, 2024

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Replace table functions with their -Cluster alternatives if parallel replicas are enabled. Fixes #65024

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

CI Settings (Only check the boxes if you know what you are doing):

  • Allow: All Required Checks
  • Allow: Stateless tests
  • Allow: Stateful tests
  • Allow: Integration Tests
  • Allow: Performance tests
  • Allow: All Builds
  • Allow: batch 1, 2 for multi-batch jobs
  • Allow: batch 3, 4, 5, 6 for multi-batch jobs

  • Exclude: Style check
  • Exclude: Fast test
  • Exclude: All with ASAN
  • Exclude: All with TSAN, MSAN, UBSAN, Coverage
  • Exclude: All with aarch64, release, debug

  • Run only fuzzers related jobs (libFuzzer fuzzers, AST fuzzers, etc.)
  • Exclude: AST fuzzers

  • Do not test
  • Woolen Wolfdog
  • Upload binaries for special builds
  • Disable merge-commit
  • Disable CI cache

@robot-ch-test-poll robot-ch-test-poll added the pr-improvement Pull request with some product improvements label Oct 14, 2024
@thevar1able thevar1able changed the title Use cluster functions automatically if parallel replicas are enabled Use cluster table functions automatically if parallel replicas are enabled Oct 18, 2024
@thevar1able thevar1able force-pushed the auto-cluster-engines branch 3 times, most recently from a1c2b48 to 0ac14f6 Compare November 4, 2024 16:17
@thevar1able thevar1able force-pushed the auto-cluster-engines branch 3 times, most recently from 84b81cc to cb20417 Compare November 13, 2024 15:13
@thevar1able thevar1able marked this pull request as ready for review November 13, 2024 15:13
@thevar1able thevar1able force-pushed the auto-cluster-engines branch 2 times, most recently from 2a1d093 to 3ff22e3 Compare November 19, 2024 15:27
@Michicosun Michicosun self-assigned this Nov 19, 2024
Copy link
Copy Markdown
Member

@nickitat nickitat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@thevar1able thevar1able force-pushed the auto-cluster-engines branch 4 times, most recently from 7a7864c to 44079bd Compare December 9, 2024 15:56
@thevar1able thevar1able force-pushed the auto-cluster-engines branch 3 times, most recently from db1f28c to 46c234b Compare December 16, 2024 06:07
@thevar1able thevar1able force-pushed the auto-cluster-engines branch 2 times, most recently from fc8e52a to 930b13a Compare December 16, 2024 11:45
const auto & current_settings = new_context->getSettingsRef();
auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover(current_settings);

size_t max_replicas_to_use = current_settings[Setting::max_parallel_replicas];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to limit all other queries (for example with explicitly specified Cluster function) to this number of used shards? By default it is 1 - so it is equivalent to disable cluster functions at all?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the default value of max_parallel_replicas to 1000.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not change it to 1000 now, as it breaks too much stuff, but keep the old behaviour if max_parallel_replicas is set to 1.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With enable_parallel_replicas = false it shouldn't matter.

Copy link
Copy Markdown
Member Author

@thevar1able thevar1able Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First it is allow_experimental_parallel_reading_from_replicas, and second it breaks exactly in the same way with it set to false.

I agree, it shouldn't, but here we are.

Copy link
Copy Markdown
Member Author

@thevar1able thevar1able Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok there is enable_parallel_replicas, but is it used anywhere? https://pastila.nl/?00a5bc6c/4efc387b98c06c1c64b483fdc04aab56#Y4GdPVZrAdhCymqZEmH5AA==

EDIT: it is an alias. My bad.

Copy link
Copy Markdown
Member

@devcrafter devcrafter Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok there is enable_parallel_replicas, but is it used anywhere?

Later, we'll switch to it in all the places

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the default value of max_parallel_replicas to 1000.

Split into #74504

)", BETA) \
\
DECLARE(Bool, parallel_replicas_for_cluster_engines, true, R"(
Replace table function engines with their -Cluster alternatives
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this elaborate a bit more on the functionality?

@thevar1able thevar1able force-pushed the auto-cluster-engines branch 3 times, most recently from 2ebaa0b to 96a1339 Compare March 11, 2025 15:11
@thevar1able thevar1able force-pushed the auto-cluster-engines branch from b36a0ec to 8d6ac55 Compare March 12, 2025 20:50
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Mar 13, 2025
Merged via the queue into master with commit 65afe6f Mar 13, 2025
125 checks passed
@alexey-milovidov alexey-milovidov deleted the auto-cluster-engines branch March 13, 2025 04:44
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cluster engines (such as s3Cluster) should be used automatically with parallel replicas