Skip to content

BUG: fix recursion for objects in ma.flatten_structured_array#30975

Open
antareepsarkar wants to merge 1 commit intonumpy:mainfrom
antareepsarkar:core
Open

BUG: fix recursion for objects in ma.flatten_structured_array#30975
antareepsarkar wants to merge 1 commit intonumpy:mainfrom
antareepsarkar:core

Conversation

@antareepsarkar
Copy link
Copy Markdown
Contributor

@antareepsarkar antareepsarkar commented Mar 9, 2026

resolves #29349 to quite an extent.

It probably needs a release note but I want to be sure about the changes.

And, since AI disclosure policy has been added recently, I want to mention that in no way AI was used for this. Even the language(description, comments) is written by me.

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 9, 2026

Can you reproduce the recursion error from the original issue?

@seberg
Copy link
Copy Markdown
Member

seberg commented Mar 9, 2026

This isn't the approach I was trying to explain on the other PR. The refactor is more extensive, because it should stop iterating the array as a whole and then working with individual elements.
I don't think the tests actually hit the cases that should be changed by this (nor does the PR change it). What would change is if an object array contains lists, which the new behavior would stop flattening as well.

@antareepsarkar
Copy link
Copy Markdown
Contributor Author

Can you reproduce the recursion error from the original issue?

Actually, I forgot to mention in the description that the recursion error has stopped after #30855.

@antareepsarkar
Copy link
Copy Markdown
Contributor Author

antareepsarkar commented Mar 9, 2026

@seberg
I have taken ideas from what you mentioned in #30855 for this PR. But, I was unable to apply np.result_type() on the flattened data type and loop using it (because flattening made the names of the data types get lost).
So, I did this thing.

Consider this example:

ndtype = [('a', object), ('b', [('c', int), ('d', float)])]
a = np.array([([[1, 2, 3]], (3, 3.14)), ([[4, 5, 6]], (2, 2.7))], dtype=ndtype)

# Here, np.ma.flatten_structured_array(a) results in:
array([[list([[1, 2, 3]]), 3, 3.14],  [list([[4, 5, 6]]), 2, 2.7]], dtype=object)

Probably, this results in what is expected (or may be not?). I forgot to add such testcases, sorry for that. Should I add those now?

Actually, I did things like this because, since it is a structured array and if there are no more fields inside a field then it must match the correct data type else structured array would have errored while declaration itself.

But, if you find mistakes or edge cases, I will try to rectify them.

@antareepsarkar
Copy link
Copy Markdown
Contributor Author

antareepsarkar commented Mar 10, 2026

I have added tests for object arrays. Please check it.

@seberg
Copy link
Copy Markdown
Member

seberg commented Mar 11, 2026

@antareepsarkar this makes sense, but it would be good to avoid unpacking each element individually. Rather we should do that in a single loop for the whole array in one go.

@antareepsarkar
Copy link
Copy Markdown
Contributor Author

@seberg
Are the changes okay?

@antareepsarkar
Copy link
Copy Markdown
Contributor Author

Are the test failures related to adding release notes?

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 16, 2026

It seems something is wrong with the pandas intersphinx download:
intersphinx inventory 'https://pandas.pydata.org/pandas-docs/stable/objects.inv' not fetchable due to <class 'requests.exceptions.HTTPError'>: 404 Client Error: Not Found for url: https://pandas.pydata.org/pandas-docs/stable/objects.inv. Something must have changed in their docs build.

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 16, 2026

Yes, it should be https://pandas.pydata.org/docs/objects.inv here

'pandas': ('https://pandas.pydata.org/pandas-docs/stable', None),
. I wonder if they did that on purpose

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 16, 2026

I see. They recommend pinning to a version, i.e. https://pandas.pydata.org/pandas-docs/version/2.3.

@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 16, 2026

Please merge or rebase off main, there was a fix merged. Unfortunately CircleCI does not merge from HEAD before building, so you need to do it.

@seberg
Copy link
Copy Markdown
Member

seberg commented Mar 16, 2026

Are the changes okay?

What I was hoping for is to avoid iterating element by element and rather iterate the dtype itself directly. You need the number of total fields then to allocate the result array first, but the helper function has a kwarg to help with that.

We also need to be aware of subarray dtypes unfortunately (those with dtype.shape != ()).

Alternatively, we would build a new structured dtype with the identical structure but every field the new like np.lib.recfunctions.structured_to_unstructured. Unfortunately, the .view() part doesn't work for us. However, I actually think np.ndarray() is unsafe and will just allow this.

@antareepsarkar
Copy link
Copy Markdown
Contributor Author

Thanks for helping and clarifying.

I made it sync with main and tests passed.

@antareepsarkar
Copy link
Copy Markdown
Contributor Author

@seberg
Sorry for the delay, I got caught up in something.

I have tried to change the things. Please check.

I also tried benchmarking the changes (with these):

# 1
class flatten1(Benchmark):
    def setup(self):
        self.a = np.array([1], [('a', int, (100, 10, 100))])
    def time_1(self):
        np.ma.flatten_structured_array(self.a)
# 2
class flatten2(Benchmark):
    def setup(self):
        b = [('b', float)]
        a = [('a', b)]
        self.a = np.array([1], [('a', a, (100, 10, 100))])
    def time_1(self):
        np.ma.flatten_structured_array(self.a)

The speed has improved to some extent for 1 and is kind of the same for 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Refine np.ma.flatten_structured_array to flatten via descr fields rather than elements

3 participants