Split out arrow-string (#2594)#3295
Conversation
| use std::sync::Arc; | ||
|
|
||
| use regex::Regex; | ||
| /// Perform SQL `array ~ regex_array` operation on [`StringArray`] / [`LargeStringArray`]. |
There was a problem hiding this comment.
These functions are moved from comparison.rs
| //! [here](https://doc.rust-lang.org/stable/core/arch/) for more information. | ||
| //! | ||
|
|
||
| use crate::array::*; |
There was a problem hiding this comment.
The like and regex kernels are moved into arrow-string. The remaining kernels will be moved into an arrow-ord crate in a follow up PR
There was a problem hiding this comment.
What will be left in arrow-compute 🤔
There was a problem hiding this comment.
Nothing 🎉
Edit: well nothing once I also split out the arithmetic kernels, the end goal is the top-level arrow is just a re-export of other crates
| ($offset_ty: ty, $result_ty: ty, $kernel: ident, $value: expr, $expected: expr) => {{ | ||
| let array = GenericBinaryArray::<$offset_ty>::from($value); | ||
| let result = $kernel(&array)?; | ||
| let result = $kernel(&array).unwrap(); |
There was a problem hiding this comment.
is this a drive by cleanup to use unwrap rather than Error in the tests?
There was a problem hiding this comment.
Yes, it means you get an actual backtrace as opposed to some random error from somewhere 😆
| //! [here](https://doc.rust-lang.org/stable/core/arch/) for more information. | ||
| //! | ||
|
|
||
| use crate::array::*; |
There was a problem hiding this comment.
What will be left in arrow-compute 🤔
| (cd arrow-array && cargo publish) | ||
| (cd arrow-select && cargo publish) | ||
| (cd arrow-cast && cargo publish) | ||
| (cd arrow-string && cargo publish) |
So no changes outside the noise threshold |
Which issue does this PR close?
Part of #2594
Rationale for this change
Splits out string kernels into a crate called
arrow-string. Whilst these are primarily concerned with UTF8 strings, some also handle binary arrays. I therefore went witharrow-stringinstead ofarrow-strasstris very specifically just UTF-8 in Rust, whereas the general concept of a string extends to binary strings. Or something like that...What changes are included in this PR?
Are there any user-facing changes?