fix: string array numpy conversion fails with int32 offsets from parquet#3697
Conversation
…anModesitt/awkward into dcm/fix-parquet-string-int32-offsets
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files🚀 New features to boost your workflow:
|
|
Think the GPU Test failures are unrelated? Seems like a CMake/compiler configuration issues in the action. |
I think so too. The CUDA kernels have been implemented correctly already. I'll have a look. Thanks! |
|
The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3697 |
|
The failure on macos-14 was due to a stale cache, so I cleared the cache and re-run it. The GPU failure is pretty confusing because other PRs work fine. I'll look into it some more. |
|
Oh it seems like it's also a cache issue. In other PRs it's using a cached wheel, but in this one it's not. I'll fix it. |
ianna
left a comment
There was a problem hiding this comment.
@DylanModesitt - Great! Thanks for fixing it. The tests pass, I'll enable auto-merge. Thanks.
Closes: #3696
Fixes a bug where converting string arrays to numpy fails after deserializing from parquet with
string_to32=True(the default). Upon deserialization, the resultingListOffsetArrayhas int32 offsets instead of int64 & the utf8 string conversion kernels only had int64 offset specializations.Added int32 and uint32 kernel specializations for the three UTF8/padding kernels:
awkward_NumpyArray_prepare_utf8_to_utf32_paddedawkward_NumpyArray_utf8_to_utf32_paddedawkward_NumpyArray_pad_zero_to_length