Skip to content

Rewrite regular expression related functions into simpler and more efficient forms#81992

Merged
alexey-milovidov merged 17 commits intoClickHouse:masterfrom
amosbird:optimize_rewrite_regexp_functions
Jul 27, 2025
Merged

Rewrite regular expression related functions into simpler and more efficient forms#81992
alexey-milovidov merged 17 commits intoClickHouse:masterfrom
amosbird:optimize_rewrite_regexp_functions

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

@amosbird amosbird commented Jun 17, 2025

Resolves #81981

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Introduced the optimize_rewrite_regexp_functions setting (enabled by default), which allows the optimizer to rewrite certain replaceRegexpAll, replaceRegexpOne, and extract calls into simpler and more efficient forms when specific regular expression patterns are detected. (issue #81981)

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@amosbird amosbird marked this pull request as ready for review June 17, 2025 05:10
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 17, 2025

Workflow [PR], commit [545162b]

Summary:

job_name test_name status info comment
Build (ppc64le) error
Stateless tests (amd_ubsan, parallel) failure
03040_dynamic_type_alters_2_compact_merge_tree FAIL

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Jun 17, 2025
@novikd novikd self-assigned this Jun 17, 2025
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jun 22, 2025
Copy link
Copy Markdown
Member

@novikd novikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks really good, but merging requires some work. Also, it'd be really nice to see any measures of performance improvement.

@amosbird
Copy link
Copy Markdown
Collaborator Author

PR looks really good, but merging requires some work. Also, it'd be really nice to see any measures of performance improvement.

Sure! I'll try to address the reviews sometime this week.

@amosbird
Copy link
Copy Markdown
Collaborator Author

@novikd All review comments have been addressed and conflicts resolved.

Also, it'd be really nice to see any measures of performance improvement.

SELECT
    REGEXP_REPLACE(Referer, '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k,
    AVG(length(Referer)) AS l,
    COUNT(*) AS c,
    MIN(Referer)
FROM hits
WHERE Referer != ''
GROUP BY k
HAVING COUNT(*) > 100000
ORDER BY l DESC
LIMIT 25
FORMAT `Null`
SETTINGS optimize_rewrite_regexp_functions = 0

0 rows in set. Elapsed: 2.863 sec. Processed 99.93 million rows, 7.56 GB (34.91 million rows/s., 2.64 GB/s.)
Peak memory usage: 3.21 GiB.


SELECT
    REGEXP_REPLACE(Referer, '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k,
    AVG(length(Referer)) AS l,
    COUNT(*) AS c,
    MIN(Referer)
FROM hits
WHERE Referer != ''
GROUP BY k
HAVING COUNT(*) > 100000
ORDER BY l DESC
LIMIT 25
FORMAT `Null`
SETTINGS optimize_rewrite_regexp_functions = 1

0 rows in set. Elapsed: 1.564 sec. Processed 99.93 million rows, 7.56 GB (63.89 million rows/s., 4.83 GB/s.)
Peak memory usage: 1.53 GiB.

@clickhouse-gh clickhouse-gh bot added the submodule changed At least one submodule changed in this PR. label Jul 18, 2025
@alexey-milovidov
Copy link
Copy Markdown
Member

@novikd, performance improvements are at https://benchmark.clickhouse.com/

@amosbird amosbird removed the submodule changed At least one submodule changed in this PR. label Jul 18, 2025
@alexey-milovidov alexey-milovidov self-assigned this Jul 27, 2025
@alexey-milovidov alexey-milovidov merged commit 1ca54cd into ClickHouse:master Jul 27, 2025
243 of 246 checks passed
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 27, 2025
@amosbird amosbird mentioned this pull request Sep 8, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A few rewrite rules for regexp functions

5 participants