Skip to content

Conversation

@devanbenz
Copy link
Contributor

Which issue does this PR close?

Closes #11953

Rationale for this change

This will add StringView support for the TRANSLATE() function as part of the work to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See: #10918 for more background.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: Devan <devandbenz@gmail.com>
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Aug 13, 2024
@devanbenz devanbenz marked this pull request as ready for review August 13, 2024 15:19
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @devanbenz

I tried this function out and it seems to get an internal error with now when trying to run with a StringArray column

DataFusion CLI v41.0.0
> create table foo as values (arrow_cast('foo', 'Utf8View'));
0 row(s) fetched.
Elapsed 0.027 seconds.

> SELECT  TRANSLATE(column1, 'foo', 'bar') from foo;
Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker


fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
match args[0].data_type() {
DataType::Utf8View => make_scalar_function(translate::<i32>, vec![])(args),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this I think you'll likely also need to change the implementation of translate as well (see for example what @PsiACE has done in #11970)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good I'll take a look 👍

Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
@devanbenz
Copy link
Contributor Author

@alamb I still need to implement the test case but the cast error should be gone now. I tried it out in the CLI (oops should have done that before! 😅) thanks for taking a look :)

> create table foo as values (arrow_cast('foo', 'Utf8View'));
0 row(s) fetched.
Elapsed 0.026 seconds.

> SELECT  TRANSLATE(column1, 'foo', 'bar') from foo;
+------------------------------------------------+
| translate(foo.column1,Utf8("foo"),Utf8("bar")) |
+------------------------------------------------+
| foo                                            |
+------------------------------------------------+
1 row(s) fetched.
Elapsed 0.011 seconds.

Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
@devanbenz devanbenz requested a review from alamb August 13, 2024 20:34
Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful @devanbenz -- thank you 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update TRANSLATE scalar function to support Utf8View

2 participants