Skip to content

feat: Support custom type in Arrow schema import and export#16615

Closed
rui-mo wants to merge 2 commits intofacebookincubator:mainfrom
rui-mo:wip_bridge
Closed

feat: Support custom type in Arrow schema import and export#16615
rui-mo wants to merge 2 commits intofacebookincubator:mainfrom
rui-mo:wip_bridge

Conversation

@rui-mo
Copy link
Copy Markdown
Contributor

@rui-mo rui-mo commented Mar 3, 2026

Custom types typically rely on an underlying physical type. While during Velox–
Arrow conversion, there is no corresponding type definition available for
custom types. This PR introduces a new getArrowFormatString API in
CustomTypeFactory, allowing custom types to provide their own Arrow schema
identifier string. With this change, schema conversion for Spark’s TIMESTAMP_NTZ
type is enabled, and tests have been added to cover both schema conversion and
data conversion.

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 3, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 562636c
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/69a8784d6eeded000848755e

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 3, 2026
Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link arrow and velox_bridge_test to test Spark specified type is a heavy overload, and the arrow bridge test should be in its own directory, so I would suggest to register a simple dummy type to test arrow bridge.

@@ -1339,6 +1314,12 @@ TypePtr importFromArrowImpl(
const char* format,
const ArrowSchema& arrowSchema) {
VELOX_CHECK_NOT_NULL(format);
for (const auto& name : getCustomTypeNames()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test in Arrow Bridge and register a dummy data type to test this feature?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests by registering a dummy type in the Arrow bridge tests. But I'm not sure whether we still need to verify that the Arrow conversion for TIMESTAMP_NTZ works correctly. Thanks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to remove Arrow conversion for TIMESTAMP_NTZ verification

@@ -99,6 +99,10 @@ class TimestampNTZTypeFactory : public CustomTypeFactory {
return std::make_shared<fuzzer::RandomInputGenerator<int64_t>>(
config.seed_, TIMESTAMP_NTZ(), config.nullRatio_);
}

const char* getArrowFormatString() const override {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just test this API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added relevant tests.

@rui-mo rui-mo closed this by deleting the head repository Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants