feat: Support custom type in Arrow schema import and export#16615
feat: Support custom type in Arrow schema import and export#16615rui-mo wants to merge 2 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
jinchengchenghh
left a comment
There was a problem hiding this comment.
Link arrow and velox_bridge_test to test Spark specified type is a heavy overload, and the arrow bridge test should be in its own directory, so I would suggest to register a simple dummy type to test arrow bridge.
| @@ -1339,6 +1314,12 @@ TypePtr importFromArrowImpl( | |||
| const char* format, | |||
| const ArrowSchema& arrowSchema) { | |||
| VELOX_CHECK_NOT_NULL(format); | |||
| for (const auto& name : getCustomTypeNames()) { | |||
There was a problem hiding this comment.
Could we add a test in Arrow Bridge and register a dummy data type to test this feature?
There was a problem hiding this comment.
I added tests by registering a dummy type in the Arrow bridge tests. But I'm not sure whether we still need to verify that the Arrow conversion for TIMESTAMP_NTZ works correctly. Thanks.
There was a problem hiding this comment.
I would prefer to remove Arrow conversion for TIMESTAMP_NTZ verification
| @@ -99,6 +99,10 @@ class TimestampNTZTypeFactory : public CustomTypeFactory { | |||
| return std::make_shared<fuzzer::RandomInputGenerator<int64_t>>( | |||
| config.seed_, TIMESTAMP_NTZ(), config.nullRatio_); | |||
| } | |||
|
|
|||
| const char* getArrowFormatString() const override { | |||
There was a problem hiding this comment.
And just test this API.
There was a problem hiding this comment.
Added relevant tests.
Custom types typically rely on an underlying physical type. While during Velox–
Arrow conversion, there is no corresponding type definition available for
custom types. This PR introduces a new
getArrowFormatStringAPI inCustomTypeFactory, allowing custom types to provide their own Arrow schemaidentifier string. With this change, schema conversion for Spark’s
TIMESTAMP_NTZtype is enabled, and tests have been added to cover both schema conversion and
data conversion.