Skip to content

ENH: Add slots to NDArrayOperatorsMixin allowing them in subclasses#23113

Merged
seberg merged 3 commits intonumpy:mainfrom
hmaarrfk:slots_for_mixins
Jan 27, 2023
Merged

ENH: Add slots to NDArrayOperatorsMixin allowing them in subclasses#23113
seberg merged 3 commits intonumpy:mainfrom
hmaarrfk:slots_for_mixins

Conversation

@hmaarrfk
Copy link
Copy Markdown
Contributor

I use the mixins in a few different file backed arrays.

However, the lack of slots make it difficult for me to use slots.

I mostly use slots to ensure that performance optimized code doesn't create unecessary references to large chunks of memory.

If all parent classes do not have __slots__ defined, I think that Python (3.9) just ignores __slots__ alltogether.

Thank you for considering.

I use the mixings in a few different file backed arrays.

However, the lack of slots make it difficult for me to use slots.

I mostly use slots to ensure that performance optimized code doesn't
create unecessary references to large chunks of memory.

If all parent classes do not have `__slots__` defined, I think that Python
(3.9) just ignores `__slots__` alltogether.

Thank you for considering.
@seberg
Copy link
Copy Markdown
Member

seberg commented Jan 27, 2023

xref gh-22876. I don't think there is anything downside, so happy to add it. Do you think it should have a release note? (in which case, please add one :)).

@hmaarrfk
Copy link
Copy Markdown
Contributor Author

I do think it is useful to add a release note for this. I didn't know where they were stored I couldn't find it in the contributor guide either. Can you point me to them?

@seberg
Copy link
Copy Markdown
Member

seberg commented Jan 27, 2023

thought its mentioned somewhere, anyway: doc/release/upcoming_changes/README.md

@hmaarrfk
Copy link
Copy Markdown
Contributor Author

(Sorry for testing on your CI).

Let me know if you want me to change the commit history.

Sorry for not searching for this "little feature". I'll make sure to look through issues/PRs in the future and reference them appropriately.

@seberg seberg changed the title ENH: Add slots to NDArrayOperatorsMixin to make subclsasing smoother ENH: Add slots to NDArrayOperatorsMixin allowing them in subclasses Jan 27, 2023
Copy link
Copy Markdown
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hmaarrfk looks good and I think it is safe. I didn't remember that - works also and not just * for bullets, but release notes render OK here :).


m = np.ma.masked_array([1, 3, 5], mask=[False, True, False])
wm = WrappedArray(m)
assert_raises(AttributeError, wm.__setattr__, "not_an_attr", 2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer with pytest.raises() now, but the file uses this, so it seems fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i first wrote the test with pytest.raises then i realized pytest was never imported. So I copied the style from this file.

@jpivarski
Copy link
Copy Markdown

I think there is a downside: if subclasses of NDArrayOperatorsMixin in downstream dependencies don't also define __slots__, then it becomes impossible to assign to __class__ (i.e. dynamically change the class of an existing object)1. Classes with __slots__ have a different memory layout than classes without __slots__, even if we're talking about __slots__ = () in a superclass.

It came up here: scikit-hep/awkward#2534.

Footnotes

  1. You could argue that we shouldn't do that, that it's not a good OOP practice. But we have specialized reasons for it and do it in a controlled way within an inheritance tree. It's just that part of the inheritance tree now has a different memory layout than the rest.

@seberg
Copy link
Copy Markdown
Member

seberg commented Jun 21, 2023

@jpivarski do you have a preference if we are to do anything about it? I don't care either way, its a niche feature, so I don't mind if either of you just monkey-patches things.

But it sounds like it breaks awkward array users who upgrade NumPy, so happy to revert for 1.25.1 (even backport only).

@jpivarski
Copy link
Copy Markdown

Since it's already been released in a NumPy version, the most steady thing is to not change it again, or else there would be a window of NumPy versions with this niche issue. Well, that window would be only between 1.25.0 and 1.25.1, right? That's pretty narrow.

What about two classes to provide with __slots__ and without __slots__? There's probably a good way to make them share most of their code. Apart from the complication that one patch version, 1.25.0, has slots in the class with the traditional name, I think it would make the most sense for the traditional name to not have slots while a new one, NDArrayOperatorsMixinWithSlots, has slots. (Code that depends on whether it has slots or not would want to avoid that one version, to minimize complexity.)

What do you think?

@seberg
Copy link
Copy Markdown
Member

seberg commented Jun 21, 2023

What do you think?

I think I will hit the merge button on any PR that wants to do something about this, no matter what :). After giving @hmaarrfk a bit of time to protest.

(I don't love duplication, but we have this in one form, having the hack too seems OK.)

@hmaarrfk
Copy link
Copy Markdown
Contributor Author

I saw this and was trying to think of a good response.

Slots have tremendously helped my low level code that had to do with memory management and instruments. As i understanding it, CPython believes there are some performance improvements?

Is the proposal to keep slots in 1.26.0? Or revert slots for ever?

I don't think i understand the memory layout challenges that awkward is facing.

If the idea is to add back slots in this class in 1.26.0 i don't see the point in reverting in 1.25.1 it seems that awkward has already adapted???

(Ps, my own PR broke my own code too... Lol, but the small fixes i believe helped the quality overall)

@hmaarrfk
Copy link
Copy Markdown
Contributor Author

If the changes would be reverted. I would likely reimplements the class myself. I do see a way to likely have both options (slots and not) be provided, though it seems like an anti-pattern

@seberg
Copy link
Copy Markdown
Member

seberg commented Jun 21, 2023

it seems that awkward has already adapted???

Right, and if @jpivarski doesn't mind we can also just leave it at that and ask those who need __class__ assignment to add the hack.
The main problem is for end-users who get cryptic errors right now (I presume), if reverting helps them, that seems OK to me. (Especially if that might actually affect more than awkward array)

@jpivarski
Copy link
Copy Markdown

We (Awkward Array) are still deciding how we're going to adapt. The first idea was to monkey-patch the NDArrayOperatorsMixin to make a version without __slots__. The second was to just implement our own NDArrayOperatorsMixin.

I don't think NumPy should fully revert, since it's so important for @hmaarrfk's use-case and probably others like it. For Awkward Array, we'll always want a non-slots version because this is a protocol that is passed down to dependents of Awkward, it leads to a cryptic error message, and the Python metadata associated with an ak.Array will never be small enough for slots to be a relevant performance improvement.

So now I think it's just a question of whether the adjustment should be entirely on Awkward's side or if NumPy should provide a with-slots and a without-slots (with nearly 100% code reuse!). If we think this is so niche that it's unlikely that other dependents of NumPy will run into it, we'll adjust in Awkward only. Dynamically changing the type of an instantiated object does sound pretty niche.

My original message here was just meant as a heads-up, until you asked me if NumPy should adjust. Originally, I would have agreed with @seberg that

I don't think there is anything downside

but—surprisingly!—there is. This would have been hard to predict, but since NumPy is so low in the stack, little things can have surprising consequences for someone, somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants