Skip to content

Conversation

@kszucs
Copy link
Member

@kszucs kszucs commented Mar 16, 2020

This add support for casting from ExtensionType. We should probably add support for converting to an ExtensionType, but we can handle it in a follow-up.

@github-actions
Copy link

@kszucs
Copy link
Member Author

kszucs commented Mar 16, 2020

The test failure is related to #6632

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to its own storage type (so eg not a different integer bit), still returns an ExtensionArray. With the example from the test:

In [3]: arr.cast(pa.int64()) 
Out[3]: 
<pyarrow.lib.ExtensionArray object at 0x7f4b45ca5fa8>
[
  1,
  2,
  3,
  4
]

In [4]: arr.cast(pa.int32()) 
Out[4]: 
<pyarrow.lib.Int32Array object at 0x7f4b45ca5348>
[
  1,
  2,
  3,
  4
]

I am not fully sure that this is the expected behaviour.

@kszucs
Copy link
Member Author

kszucs commented Mar 17, 2020

Nice catch, it was caused by the identity shortcut. Added a separate kernel which handles the identity case properly.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now! (but not very familiar with the cast kernel code)

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few notes

@kszucs
Copy link
Member Author

kszucs commented Mar 17, 2020

@bkietz please take a look again

kszucs and others added 2 commits March 17, 2020 21:48
Co-Authored-By: Benjamin Kietzman <bengilgit@gmail.com>
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title misled me when I reviews this, it should be "Support casting the storage of an ExtensionArray" or similar

// construct an ArrayData object with the underlying storage type
auto new_input = input.array()->Copy();
new_input->type = storage_type_;
return InvokeWithAllocation(ctx, storage_caster_.get(), new_input, out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it allocate if the out_type and storage_type are the same?

Copy link
Member Author

@kszucs kszucs Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests on both C++ and Python side, seems like no allocation happens.

}
return true;
return (other.extension_name() == this->extension_name());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be the default implementation of ExtensionEquals

Copy link
Member Author

@kszucs kszucs Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name and the storage type should be equal, created a follow-up JIRA https://issues.apache.org/jira/browse/ARROW-8143

@wesm
Copy link
Member

wesm commented Mar 18, 2020

There was more C++ code in the implementation than I expected, not sure what could be done to make implementing something that "seems easy" (on paper) easier

@kszucs kszucs changed the title ARROW-7858: [C++][Python] Support casting an Extension type to its storage type ARROW-7858: [C++][Python] Support casting from ExtensionArray Mar 18, 2020
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks @kszucs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants