-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: support Binary for regexp_replace
#7365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f18efad to
bab7a3e
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Weijun-H -- this is looking close. I am not sure about the binary support for regex replace
It might make sense to break the min/max support into its own PR (that looks good to me) and then we can work on regex_replace as another
|
|
||
| # No groupy | ||
| query error DataFusion error: Internal error: Min/Max accumulator not implemented for type Binary\. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker | ||
| statement ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line should be changed to query rather than statement -- otherwise this test is simply ensuring the query doesn't error but not checking the results
| let func = specializer_func(args)?; | ||
| func(args) | ||
| } | ||
| DataType::Binary => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make the coercion more general and not have to add something specific in the physical implementation
In general the coercion from one type to another is handled in https://github.com/apache/arrow-datafusion/blob/b6fe8159fd21ea101b84abc2629dbb484c8ad326/datafusion/expr/src/type_coercion/functions.rs#L192
Is there any chance you can try adding a coercion there for binary -> utf8 which would allow DataFusion to automatically try and cast binary --> UTF8 if users try and call a function or operator that requires a Utf8?
| } | ||
|
|
||
| #[test] | ||
| fn min_binary() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think the test in aggregate.slt will be sufficient -- we don't need these extra unit tests
| Ok(match arg_type { | ||
| DataType::LargeUtf8 => $largeUtf8Type, | ||
| DataType::Utf8 => $utf8Type, | ||
| DataType::LargeUtf8 | DataType::LargeBinary=> $largeUtf8Type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this -- this implies that any function that takes a binary input will produce binary output which doesn't seem right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| BuiltinScalarFunction::RegexpReplace => Signature::one_of( | ||
| vec![ | ||
| Exact(vec![Utf8, Utf8, Utf8]), | ||
| Exact(vec![Binary, Utf8, Utf8]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to try and cast the binary --> utf8 at plan time rather than support a binary argument and then cast at plan time as it is more general and will apply to other functions
I left another comment below to this effect
min/max and regrep_replaceregrep_replace
3e4dc21 to
754c20d
Compare
regrep_replaceregexp_replace
b0d591c to
22da4b5
Compare
4e97406 to
de51ea3
Compare
de51ea3 to
658468f
Compare
Which issue does this PR close?
Parts #7345
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?