Conversation
astropy/table/tests/test_info.py
Outdated
| with pytest.raises(AttributeError) as err: | ||
| c[:].info.name | ||
| assert 'failed access "info" attribute' in str(err.value) | ||
| # No problem any more? |
There was a problem hiding this comment.
I actually don't know why this test started failing...
|
@mhvk - this looks great from the description, but clearly needs a decent chunk of time to dig into and evaluate. So it's on my radar, but probably not in the next few days... |
|
No worries - but if you do have a small amount of time, it would be lovely to get #8854 out of the way... |
|
@mvhk - have you had a look at https://www.python.org/dev/peps/pep-0487/ ? This is available as of Python 3.6. Not clear if the |
|
(Taking this out of the inline comments as I think it goes to the heart of the matter.) @taldcroft - The approach with just defining descriptors on the class is indeed a lot clearer! The main reason I went for the meta-class approach was to keep the API similar, with its The question then is whether we are happy to change it, which in turn depends on whether anybody outside of astropy relies on it. In principle, I guess the metaclass could help here, by doing the auto-generation while emitting a deprecation warning. But of course easier if we can do without it... |
|
p.s. Have been meaning to look at the effects of PEP 487... Definitely useful for simple registration like Time fornats. |
Ah yes, I didn't immediately think of the API. That's possibly thorny... So what about some radical thoughts just for the sake of conversation:
|
astropy/utils/data_info.py
Outdated
| # arbitrary attributes could be set (as a by-product, save | ||
| # some memory). | ||
| dct.setdefault('__slots__', []) | ||
| # Set a default __doc__, mostly to avoid InheritDocstrings |
There was a problem hiding this comment.
I have no idea what InheritDocstrings does, but will #9024 help?
There was a problem hiding this comment.
It copied __doc__ to attributes - now that we do not use that anymore internally (#8881), this should indeed not be needed any more!
... testing ...
Indeed! So, removed. Nice catch!
| self._represent_as_dict_attrs = attrs | ||
|
|
||
| out = super()._represent_as_dict() | ||
| out = {} |
There was a problem hiding this comment.
Use data_info._get_obj_attrs_map. (Which maybe needs to be made public?).
Or maybe _represent_as_dict could take an optional attrs to allow overriding self._represent_as_dict_attrs?
There was a problem hiding this comment.
Much better. Here, represent_as_dict itself is overridden, but, yes, a super().represent_as_dict(attrs) would work. I'm not quite sure it is worth changing, especially if we're considering a larger overhaul.
There was a problem hiding this comment.
Ah, darn, I now see why you suggested it: _get_obj_attrs_map is not on DataInfo but in the data_info module...
|
Having given a decent first pass through this code and seen the timing, it looks great! So it might be a reasonable strategy to get this in shape to merge and separately think about a bigger redesign to make the code more elegant / readable and maybe even faster. |
354b7bf to
6b28365
Compare
|
OK, I've implemented the suggested changes. I like the idea of merging this, and worrying about a simpler design separately. The one outstanding question, I think, is whether it is worth the hassle with |
|
Here are some timing results on my laptop with your branch and master. I get qualitatively similar results. This also explores the speedup of a different strategy (attrs are stored as object attributes, not dict elements). from weakref import ref as weakref
from astropy.utils.data_info import DataInfo
from astropy.table import MaskedColumn%astroDataInfo._attr_defaults # Shows this is the faster-info-via-meta-class branchclass A:
info = DataInfo()a = A()a.info.description = 'asdf'timeit a.infoMASTER: 2.5 µs ± 43.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit a.info.descriptionMASTER: 3.39 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) a.__dict__class SimpleDataInfo:
"""Simplest possible data info container"""
__slots__ = ['name', 'description', 'dtype', 'meta', 'unit', '_parent_ref']class B:
"""Use SimpleDataInfo via a class property"""
@property
def info(self):
try:
info = self._info
except AttributeError:
info = self._info = SimpleDataInfo()
finally:
# info._parent_ref = self # weakref(self)
return info b = B()b.info.name = 'asdf'timeit b.info.nameclass FDataInfo:
"""Descriptor that roughly matches the property above"""
def __get__(self, instance, owner_cls):
try:
info = instance._info
except AttributeError:
if instance is None:
# This is an unbound descriptor on the class
self._parent_cls = owner_cls
return self
else:
info = instance._info = SimpleDataInfo()
finally:
# info._parent_ref = weakref.ref(instance)
return infoclass C:
info = FDataInfo()c = C()c.info.description = 'hello'timeit c.info.descriptionm = MaskedColumn([1, 2])
m.info.description = 'asdf'timeit m.descriptiontimeit m.info.descriptionjunk = SimpleDataInfo()
m.junk = junktimeit m.junkm.junk.name = 'name'timeit m.junk.name |
|
@mvhk - planning to get back to this now that you are returned from vacation. My recollection is that there were some things to explore, but that this PR itself was in reasonably good shape and definitely helps performance. |
|
Yes, I think the only real question here is whether we want to explicitly prevent people from setting arbitrary attributes on |
|
👍 on explicitly preventing people from setting arbitrary attributes on The current implementation as slots is thus fine to me. If we even move to a different implementation that does the same thing then that will also be fine. |
taldcroft
left a comment
There was a problem hiding this comment.
@mhvk - I gave this another go through today and don't see any issues apart from the one trivial (unrelated) fix and the issue with the lost parent. With those (and a rebase to fix the CHANGES conflict), this should be good to merge!
| out['mask'] = col.mask | ||
| self._represent_as_dict_attrs += ('mask',) | ||
|
|
||
| elif method is 'null_value': |
There was a problem hiding this comment.
Not sure how this got there, but it should be method == 'null_value'.
There was a problem hiding this comment.
Also not sure, but coincidentally being fixed in #9225, so I'll leave it here.
astropy/utils/data_info.py
Outdated
|
|
||
| else: | ||
| raise AttributeError("""\ | ||
| except TypeError: |
There was a problem hiding this comment.
Something is amok in this logic because it is not correctly catching the case below (which is exactly the same as the test_lost_parent_error unit test that got deleted). This should print the informative message instead of an obscure AttributeError.
In [17]: c = Column([1,2,3], name='a')
In [18]: c[:].info.name
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-18-6440802ca069> in <module>()
----> 1 c[:].info.name
~/git/astropy/astropy/utils/data_info.py in __get__(self, instance, owner_cls)
212 return self
213
--> 214 return getattr(instance._parent, self.attr)
215
216 def __set__(self, instance, value):
AttributeError: 'NoneType' object has no attribute 'name'
There was a problem hiding this comment.
Very weird - there must have been an intermediate state where it didn't happen, since the test was failing. In any case, I'll fix and reinstate the test!
Gets rid of __getattr__ and __setattr__, but now bad names can be stored.
Also cleans up some other code, in particular by now making use of the fact that dicts follow insertion order.
Makes code a bit clearer, though not much of a speed-up.
And put back the test that that gives an informative error message.
6b28365 to
30a07db
Compare
|
OK, I reinstated the test and fixed the code so that the appropriate message is given again. The difference was that now there always is a |
@taldcroft - I decided to try the metaclass approach to
DataInfo, where for every attribute one automatically generates a descriptor that either gets/sets in_attrsor from the parent. It is a bit of magic in the meta-class, but not really that much more than there was before in__getattr__and__setattr__. Note that I had to move to__slots__in order to continue to disallow setting arbitrary attributes; not completely clear whether that is worth it. I have some other comments in-line.Comparing this PR to current master:
The last item is the most relevant for possibly moving everything to
info-463 nsis still about 6 times slower than just%timeit m.name. But it is not obvious to me that with the general speed-up here, it would still be a limiting factor.