BUG: fix recursion for objects in ma.flatten_structured_array#30975
BUG: fix recursion for objects in ma.flatten_structured_array#30975antareepsarkar wants to merge 1 commit intonumpy:mainfrom
Conversation
|
Can you reproduce the recursion error from the original issue? |
|
This isn't the approach I was trying to explain on the other PR. The refactor is more extensive, because it should stop iterating the array as a whole and then working with individual elements. |
Actually, I forgot to mention in the description that the recursion error has stopped after #30855. |
|
@seberg Consider this example: ndtype = [('a', object), ('b', [('c', int), ('d', float)])]
a = np.array([([[1, 2, 3]], (3, 3.14)), ([[4, 5, 6]], (2, 2.7))], dtype=ndtype)
# Here, np.ma.flatten_structured_array(a) results in:
array([[list([[1, 2, 3]]), 3, 3.14], [list([[4, 5, 6]]), 2, 2.7]], dtype=object)Probably, this results in what is expected (or may be not?). I forgot to add such testcases, sorry for that. Should I add those now? Actually, I did things like this because, since it is a structured array and if there are no more fields inside a field then it must match the correct data type else structured array would have errored while declaration itself. But, if you find mistakes or edge cases, I will try to rectify them. |
|
I have added tests for object arrays. Please check it. |
|
@antareepsarkar this makes sense, but it would be good to avoid unpacking each element individually. Rather we should do that in a single loop for the whole array in one go. |
|
@seberg |
|
Are the test failures related to adding release notes? |
|
It seems something is wrong with the pandas intersphinx download: |
|
Yes, it should be Line 426 in 758d986 |
|
I see. They recommend pinning to a version, i.e. |
|
Please merge or rebase off main, there was a fix merged. Unfortunately CircleCI does not merge from HEAD before building, so you need to do it. |
What I was hoping for is to avoid iterating element by element and rather iterate the dtype itself directly. You need the number of total fields then to allocate the result array first, but the helper function has a kwarg to help with that. We also need to be aware of subarray dtypes unfortunately (those with Alternatively, we would build a new structured dtype with the identical structure but every field the new like |
|
Thanks for helping and clarifying. I made it sync with main and tests passed. |
|
@seberg I have tried to change the things. Please check. I also tried benchmarking the changes (with these): # 1
class flatten1(Benchmark):
def setup(self):
self.a = np.array([1], [('a', int, (100, 10, 100))])
def time_1(self):
np.ma.flatten_structured_array(self.a)
# 2
class flatten2(Benchmark):
def setup(self):
b = [('b', float)]
a = [('a', b)]
self.a = np.array([1], [('a', a, (100, 10, 100))])
def time_1(self):
np.ma.flatten_structured_array(self.a)The speed has improved to some extent for |
resolves #29349 to quite an extent.
It probably needs a release note but I want to be sure about the changes.
And, since AI disclosure policy has been added recently, I want to mention that in no way AI was used for this. Even the language(description, comments) is written by me.