-
Notifications
You must be signed in to change notification settings - Fork 1.9k
implement utf8_view for replace #12004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking nice @thinh2 🙏 -- I'll start CI for this PR
| } | ||
| } | ||
|
|
||
| fn replace_view(args: &[ArrayRef]) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
| 01)Projection: replace(__common_expr_1, Utf8("foo"), Utf8("bar")) AS c1, replace(__common_expr_1, CAST(test.column2_utf8view AS Utf8), Utf8("bar")) AS c2 | ||
| 02)--Projection: CAST(test.column1_utf8view AS Utf8) AS __common_expr_1, test.column2_utf8view | ||
| 03)----TableScan: test projection=[column1_utf8view, column2_utf8view] | ||
| 01)Projection: replace(test.column1_utf8view, Utf8("foo"), Utf8("bar")) AS c1, replace(test.column1_utf8view, test.column2_utf8view, Utf8("bar")) AS c2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a test that shows this query running (aka run
SELECT
REPLACE(column1_utf8view, 'foo', 'bar') as c1,
REPLACE(column1_utf8view, column2_utf8view, 'bar') as c2
FROM test;in addition to EXPLAIN
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alamb, after updating the replace.rs to process Utf8View, my sql logic test still uses the CAST in its logical plan and this test is failing. Do you have any idea to debug this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alamb, I already found the issue. The reason is that I didn't update the function's signature to Utf8View.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @thinh2 -- thank you 🙏
I pushed another commit to your branch that adds a slt test to very execution, and merged up from main
Thank you very much for your contribution
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @thinh2
| vec![ | ||
| Exact(vec![Utf8View, Utf8View, Utf8View]), | ||
| Exact(vec![Utf8, Utf8, Utf8]), | ||
| Exact(vec![LargeUtf8, LargeUtf8, LargeUtf8]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for also implementing support for LargeUTF8
|
Looks like there are some clippy errors and test updates needed to get CI clean on this PR |
|
Hi @alamb, currently my code fails at this sql logic test https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/functions.slt#L825 The reason is because of replace's signature order. If I push Utf8 before Utf8View, it will pass. However, test in string_view.slt will fail. I am trying to understand the functions test above to fix it. |
|
Hi @alamb, I already fixed all the test errors. Could you please check and run the CI pipeline again? |
|
Thank you @thinh2 |
|
🚀 |
Which issue does this PR close?
Closes #11913 .
Rationale for this change
Before this change, the
replacefunction only supportutf8andlargeUtf8. We used to castutf8Viewtoutf8, which reduces the performance of the function. This PR is to add support forutf8Viewinsidereplacefunction.What changes are included in this PR?
replaceforutf8Viewreplacefunctionsqllogictestto match ourutf8Viewexpectation (no CAST in the logical plans)replaceto handleutf8View. Without this change, the sql query will still castutf8Viewtoutf8Are these changes tested?
replacefunctionsqllogictestto match ourutf8Viewexpectation (no CAST in the logical plans)Are there any user-facing changes?
No