Skip to content

Conversation

@Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Aug 22, 2023

Which issue does this PR close?

Parts #7345

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@Weijun-H Weijun-H force-pushed the support-clickhouse-test branch from f18efad to bab7a3e Compare August 22, 2023 09:49
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Aug 22, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Weijun-H -- this is looking close. I am not sure about the binary support for regex replace

It might make sense to break the min/max support into its own PR (that looks good to me) and then we can work on regex_replace as another


# No groupy
query error DataFusion error: Internal error: Min/Max accumulator not implemented for type Binary\. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
statement ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line should be changed to query rather than statement -- otherwise this test is simply ensuring the query doesn't error but not checking the results

let func = specializer_func(args)?;
func(args)
}
DataType::Binary => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make the coercion more general and not have to add something specific in the physical implementation

In general the coercion from one type to another is handled in https://github.com/apache/arrow-datafusion/blob/b6fe8159fd21ea101b84abc2629dbb484c8ad326/datafusion/expr/src/type_coercion/functions.rs#L192

Is there any chance you can try adding a coercion there for binary -> utf8 which would allow DataFusion to automatically try and cast binary --> UTF8 if users try and call a function or operator that requires a Utf8?

}

#[test]
fn min_binary() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think the test in aggregate.slt will be sufficient -- we don't need these extra unit tests

Ok(match arg_type {
DataType::LargeUtf8 => $largeUtf8Type,
DataType::Utf8 => $utf8Type,
DataType::LargeUtf8 | DataType::LargeBinary=> $largeUtf8Type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this -- this implies that any function that takes a binary input will produce binary output which doesn't seem right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it back -- this change works. I have incorporated it into #7840

Thanks again @parkma99

BuiltinScalarFunction::RegexpReplace => Signature::one_of(
vec![
Exact(vec![Utf8, Utf8, Utf8]),
Exact(vec![Binary, Utf8, Utf8]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to try and cast the binary --> utf8 at plan time rather than support a binary argument and then cast at plan time as it is more general and will apply to other functions

I left another comment below to this effect

@Weijun-H Weijun-H changed the title feat: support Binary for min/max and regrep_replace feat: support Binary for regrep_replace Aug 24, 2023
@Weijun-H Weijun-H force-pushed the support-clickhouse-test branch from 3e4dc21 to 754c20d Compare August 24, 2023 15:11
@Weijun-H Weijun-H changed the title feat: support Binary for regrep_replace feat: support Binary for regexp_replace Aug 24, 2023
@Weijun-H Weijun-H force-pushed the support-clickhouse-test branch 2 times, most recently from b0d591c to 22da4b5 Compare August 24, 2023 15:50
@github-actions github-actions bot removed the physical-expr Changes to the physical-expr crates label Aug 24, 2023
@Weijun-H Weijun-H force-pushed the support-clickhouse-test branch 3 times, most recently from 4e97406 to de51ea3 Compare August 25, 2023 01:02
@Weijun-H
Copy link
Member Author

Thanks @Weijun-H -- this is looking close. I am not sure about the binary support for regex replace

It might make sense to break the min/max support into its own PR (that looks good to me) and then we can work on regex_replace as another

I split the min/max part into #7397

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants