Skip to content

Optimize regex_replace with a known pattern / replacement #3613

@isidentical

Description

@isidentical

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently regex_replace assumes that even the scalars for pattern/replacement are arrays (since they've been cast that way). This is making each call regex_replace pay a significant overhead since instead of iterating through only the variadic fields (e.g. the source string), we iterate through everything. This makes us pay the overhead for per-replacement pre-processing and more importantly the overhead of using hashmaps to cache regexes (unnecessary lookups).

Describe the solution you'd like
We can determine whether the replace arguments are scalars and if so we should be able to add a new case to the implementation where the pattern and replacement are scalar. Initially, we might want to only use this check for patterns, but this optimization can later be extended to replacement as well (the overhead of replacement is relatively small, but it is still none).

Describe alternatives you've considered
We can leave it as is, but this seems to be a common case and it is also something we fail particularly bad at clickhouse bench.

Additional context
Main issue on regex_replace by @Dandandan on #3518.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions