Skip to content

Conversation

@bsipocz
Copy link
Member

@bsipocz bsipocz commented Sep 26, 2019

To address #8586

I think #7728 might be also fixed, but I can still generate cases that raise a an error in case of QTables, so would not claim that issue just yet.

NotImplementedError: join requires masking column 'aaa' but column type Quantity does not support masking

@bsipocz bsipocz added this to the v4.0 milestone Sep 26, 2019
@bsipocz bsipocz requested review from astrofrog and taldcroft and removed request for taldcroft September 26, 2019 03:42
@astropy-bot astropy-bot bot added the table label Sep 26, 2019
left_mask = ~right_mask
if np.any(left_mask):
out[out_name][left_mask] = left[left_name].take(left_out)
left_mixin_mask = ~right_mask
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in fact a bug fix, so I'm happy to resubmit it as a separate PR to be backported.
(I couldn't really come up with a nice failing examples as I was running into errors (or was using Time columns as key what is a new thing)

ValueError: NumPy boolean array indexing assignment cannot assign 6 input values to the 4 output values where the mask is true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to split this out so it can be backported, and it would be worthwhile adding a regression test. Here's a minimal test that triggers the error:

from astropy import units as u
from astropy.table import QTable, join

tab1 = QTable()
tab1['index'] = [1, 2, 3, 4, 5]
tab1['flux1'] = [2, 3, 2, 1, 1] * u.Jy
tab1['flux2'] = [2, 3, 2, 1, 1] * u.Jy

tab2 = QTable()
tab2['index'] = [3, 4, 5, 6]
tab2['flux1'] = [2, 1, 1, 3] * u.Jy
tab2['flux2'] = [2, 1, 1, 3] * u.Jy

join(tab1, tab2, keys=('index', 'flux1', 'flux2'), join_type='outer')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this example still doesn't work. But, then it seems that np.where now seems to work nicely with Quantities, so no need to go into this branch of the code.

@bsipocz
Copy link
Member Author

bsipocz commented Sep 26, 2019

Let me know if you prefer the changelog to be in the timeseries section

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I think it'd indeed be good to split out the bug fix and add a regression test.

left_mask = ~right_mask
if np.any(left_mask):
out[out_name][left_mask] = left[left_name].take(left_out)
left_mixin_mask = ~right_mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to split this out so it can be backported, and it would be worthwhile adding a regression test. Here's a minimal test that triggers the error:

from astropy import units as u
from astropy.table import QTable, join

tab1 = QTable()
tab1['index'] = [1, 2, 3, 4, 5]
tab1['flux1'] = [2, 3, 2, 1, 1] * u.Jy
tab1['flux2'] = [2, 3, 2, 1, 1] * u.Jy

tab2 = QTable()
tab2['index'] = [3, 4, 5, 6]
tab2['flux1'] = [2, 1, 1, 3] * u.Jy
tab2['flux2'] = [2, 1, 1, 3] * u.Jy

join(tab1, tab2, keys=('index', 'flux1', 'flux2'), join_type='outer')

@bsipocz
Copy link
Member Author

bsipocz commented Oct 3, 2019

Wohooho, actually I think that branch of the code is not needed any more as all mixins seems to work (except SkyCoords, but those are failing much sooner than that part).

Currently everything is pushed to this branch to see how much CI likes it. I have a suspicion that this may very much build upon the recent changes in masking, so might not work in the bugfix branch anyway.

@bsipocz
Copy link
Member Author

bsipocz commented Oct 3, 2019

Hmm, interesting. Backporting this (well, the table.opeartions parts) to 3.2.x gives me a UnitConversionError, the same we see on 32bit.

@bsipocz
Copy link
Member Author

bsipocz commented Oct 3, 2019

OK, so the straight backport would need #8808, as well as the currently failing builds that use numpy 1.16. It's a pity.

@bsipocz
Copy link
Member Author

bsipocz commented Oct 4, 2019

This is now built on top of #9313, and thanks to #8808, the logic can be further simplified once our minimum np version is 1.17.

Don't merge before #9313

@bsipocz bsipocz requested a review from astrofrog October 4, 2019 06:12
@bsipocz
Copy link
Member Author

bsipocz commented Oct 4, 2019

rebased now and ready for final review.

@taldcroft
Copy link
Member

@bsipocz - this does appear to work, but there is a hidden issue in that they key array that gets created for sorting is an object array of individual Time objects. Here I inserted a print(out_keys) statement into the code (after line 786 in operations.py):

In [2]: tm = Time([40001,40002], format='cxcsec')
In [3]: tm2 = tm.copy()
In [4]: t1 = Table([tm, [1,2]], names=['tm', 'a'])
In [5]: t2 = Table([tm2, [3,4]], names=['tm', 'b'])
In [6]: table.join(t1, t2, join_type='outer')
[(<Time object: scale='tt' format='cxcsec' value=40001.00000000001>,)
 (<Time object: scale='tt' format='cxcsec' value=40001.00000000001>,)
 (<Time object: scale='tt' format='cxcsec' value=40002.0>,)
 (<Time object: scale='tt' format='cxcsec' value=40002.0>,)]

So if you have a time series with a million entries then memory might blow up.

@taldcroft
Copy link
Member

I think that the trick used in Time.argsort can be used here:

def argsort(self, axis=-1):

It just requires a little juggling to replace the 1-column time key with a 2-column version with jd_approx and jd_remainder. I opened up your branch and started playing with this to see if it gets messy.

@bsipocz
Copy link
Member Author

bsipocz commented Oct 11, 2019

After #9340 this works out of the box.

Do we still want to add test for the TimeSeries?

out[out_name] = col_cls.info.new_like(cols, n_out, metadata_conflicts, out_name)

if issubclass(col_cls, Column):
if not NUMPY_LT_1_17 or issubclass(col_cls, (Column, Time)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the NUMPY_L_1_17 stuff still useful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'll definitely keep that in here. I'm not sure about the usefulness of the tests though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I don't know enough about the time series code to know. But maybe it's a good idea as a regression to ensure future changes in Table or TimeSeries don't break things.

@bsipocz
Copy link
Member Author

bsipocz commented Oct 12, 2019

OK. This one is now gutted out, the test is what remained, and the change that np.where now works for np >=1.17, no need to go into the workaround branch of the conditional.

@bsipocz bsipocz changed the title Enable Time as keys for Table/TimeSeries join Adding test for TimeSeries join Oct 12, 2019
Copy link
Member

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@astrofrog astrofrog merged commit 2d10078 into astropy:master Oct 21, 2019
@bsipocz bsipocz deleted the timeseries_join branch September 16, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants