-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently regex_replace assumes that even the scalars for pattern/replacement are arrays (since they've been cast that way). This is making each call regex_replace pay a significant overhead since instead of iterating through only the variadic fields (e.g. the source string), we iterate through everything. This makes us pay the overhead for per-replacement pre-processing and more importantly the overhead of using hashmaps to cache regexes (unnecessary lookups).
Describe the solution you'd like
We can determine whether the replace arguments are scalars and if so we should be able to add a new case to the implementation where the pattern and replacement are scalar. Initially, we might want to only use this check for patterns, but this optimization can later be extended to replacement as well (the overhead of replacement is relatively small, but it is still none).
Describe alternatives you've considered
We can leave it as is, but this seems to be a common case and it is also something we fail particularly bad at clickhouse bench.
Additional context
Main issue on regex_replace by @Dandandan on #3518.