Adding test for TimeSeries join #9287

bsipocz · 2019-09-26T03:42:05Z

To address #8586

I think #7728 might be also fixed, but I can still generate cases that raise a an error in case of QTables, so would not claim that issue just yet.

NotImplementedError: join requires masking column 'aaa' but column type Quantity does not support masking

bsipocz · 2019-09-26T04:07:45Z

astropy/table/operations.py

-                left_mask = ~right_mask
-                if np.any(left_mask):
-                    out[out_name][left_mask] = left[left_name].take(left_out)
+                left_mixin_mask = ~right_mask


This is in fact a bug fix, so I'm happy to resubmit it as a separate PR to be backported.
(I couldn't really come up with a nice failing examples as I was running into errors (or was using Time columns as key what is a new thing)

ValueError: NumPy boolean array indexing assignment cannot assign 6 input values to the 4 output values where the mask is true

It probably makes sense to split this out so it can be backported, and it would be worthwhile adding a regression test. Here's a minimal test that triggers the error:

from astropy import units as u from astropy.table import QTable, join tab1 = QTable() tab1['index'] = [1, 2, 3, 4, 5] tab1['flux1'] = [2, 3, 2, 1, 1] * u.Jy tab1['flux2'] = [2, 3, 2, 1, 1] * u.Jy tab2 = QTable() tab2['index'] = [3, 4, 5, 6] tab2['flux1'] = [2, 1, 1, 3] * u.Jy tab2['flux2'] = [2, 1, 1, 3] * u.Jy join(tab1, tab2, keys=('index', 'flux1', 'flux2'), join_type='outer')

Hmm, this example still doesn't work. But, then it seems that np.where now seems to work nicely with Quantities, so no need to go into this branch of the code.

bsipocz · 2019-09-26T04:14:33Z

Let me know if you prefer the changelog to be in the timeseries section

astrofrog

This looks good, but I think it'd indeed be good to split out the bug fix and add a regression test.

astrofrog · 2019-10-02T12:53:15Z

astropy/table/operations.py

-                left_mask = ~right_mask
-                if np.any(left_mask):
-                    out[out_name][left_mask] = left[left_name].take(left_out)
+                left_mixin_mask = ~right_mask


It probably makes sense to split this out so it can be backported, and it would be worthwhile adding a regression test. Here's a minimal test that triggers the error:

from astropy import units as u from astropy.table import QTable, join tab1 = QTable() tab1['index'] = [1, 2, 3, 4, 5] tab1['flux1'] = [2, 3, 2, 1, 1] * u.Jy tab1['flux2'] = [2, 3, 2, 1, 1] * u.Jy tab2 = QTable() tab2['index'] = [3, 4, 5, 6] tab2['flux1'] = [2, 1, 1, 3] * u.Jy tab2['flux2'] = [2, 1, 1, 3] * u.Jy join(tab1, tab2, keys=('index', 'flux1', 'flux2'), join_type='outer')

bsipocz · 2019-10-03T04:58:09Z

Wohooho, actually I think that branch of the code is not needed any more as all mixins seems to work (except SkyCoords, but those are failing much sooner than that part).

Currently everything is pushed to this branch to see how much CI likes it. I have a suspicion that this may very much build upon the recent changes in masking, so might not work in the bugfix branch anyway.

bsipocz · 2019-10-03T05:20:16Z

Hmm, interesting. Backporting this (well, the table.opeartions parts) to 3.2.x gives me a UnitConversionError, the same we see on 32bit.

bsipocz · 2019-10-03T22:51:10Z

OK, so the straight backport would need #8808, as well as the currently failing builds that use numpy 1.16. It's a pity.

bsipocz · 2019-10-04T06:12:39Z

This is now built on top of #9313, and thanks to #8808, the logic can be further simplified once our minimum np version is 1.17.

Don't merge before #9313

bsipocz · 2019-10-04T18:18:27Z

rebased now and ready for final review.

taldcroft · 2019-10-07T22:10:44Z

@bsipocz - this does appear to work, but there is a hidden issue in that they key array that gets created for sorting is an object array of individual Time objects. Here I inserted a print(out_keys) statement into the code (after line 786 in operations.py):

In [2]: tm = Time([40001,40002], format='cxcsec')
In [3]: tm2 = tm.copy()
In [4]: t1 = Table([tm, [1,2]], names=['tm', 'a'])
In [5]: t2 = Table([tm2, [3,4]], names=['tm', 'b'])
In [6]: table.join(t1, t2, join_type='outer')
[(<Time object: scale='tt' format='cxcsec' value=40001.00000000001>,)
 (<Time object: scale='tt' format='cxcsec' value=40001.00000000001>,)
 (<Time object: scale='tt' format='cxcsec' value=40002.0>,)
 (<Time object: scale='tt' format='cxcsec' value=40002.0>,)]

So if you have a time series with a million entries then memory might blow up.

taldcroft · 2019-10-07T22:15:03Z

I think that the trick used in Time.argsort can be used here:

astropy/astropy/time/core.py

Line 1454 in 0812f6b

def argsort(self, axis=-1):

It just requires a little juggling to replace the 1-column time key with a 2-column version with jd_approx and jd_remainder. I opened up your branch and started playing with this to see if it gets messy.

bsipocz · 2019-10-11T20:54:24Z

After #9340 this works out of the box.

Do we still want to add test for the TimeSeries?

taldcroft · 2019-10-11T22:04:46Z

astropy/table/operations.py

            out[out_name] = col_cls.info.new_like(cols, n_out, metadata_conflicts, out_name)

-            if issubclass(col_cls, Column):
+            if not NUMPY_LT_1_17 or issubclass(col_cls, (Column, Time)):


Is the NUMPY_L_1_17 stuff still useful?

yes, I'll definitely keep that in here. I'm not sure about the usefulness of the tests though?

Well I don't know enough about the time series code to know. But maybe it's a good idea as a regression to ensure future changes in Table or TimeSeries don't break things.

bsipocz · 2019-10-12T00:56:28Z

OK. This one is now gutted out, the test is what remained, and the change that np.where now works for np >=1.17, no need to go into the workaround branch of the conditional.

taldcroft

LGTM

bsipocz added time timeseries labels Sep 26, 2019

bsipocz added this to the v4.0 milestone Sep 26, 2019

bsipocz requested review from astrofrog and taldcroft and removed request for taldcroft September 26, 2019 03:42

astropy-bot bot added the table label Sep 26, 2019

bsipocz commented Sep 26, 2019

View reviewed changes

bsipocz removed the time label Sep 26, 2019

bsipocz force-pushed the timeseries_join branch from 55cde32 to 7ea52df Compare September 26, 2019 04:14

astrofrog requested changes Oct 2, 2019

View reviewed changes

bsipocz mentioned this pull request Oct 4, 2019

Fix np.where logic in table.join #9313

Merged

bsipocz force-pushed the timeseries_join branch from 52c8f3b to 9e44b3c Compare October 4, 2019 06:10

bsipocz requested a review from astrofrog October 4, 2019 06:12

bsipocz force-pushed the timeseries_join branch from 9e44b3c to f0c0aa6 Compare October 4, 2019 18:18

bsipocz added the Ready-for-final-review label Oct 4, 2019

taldcroft mentioned this pull request Oct 8, 2019

Add DataInfo get_sortable_arrays to allow mixins as key columns in join #9340

Merged

bsipocz added 2 commits October 11, 2019 13:14

Adding test for TimeSeries join

1abf509

Use times in new rows that are unique

51406af

taldcroft reviewed Oct 11, 2019

View reviewed changes

np.where works for Time mixins, use faster branch

5b09597

bsipocz force-pushed the timeseries_join branch from f0c0aa6 to 5b09597 Compare October 12, 2019 00:54

bsipocz added no-changelog-entry-needed testing labels Oct 12, 2019

bsipocz changed the title ~~Enable Time as keys for Table/TimeSeries join~~ Adding test for TimeSeries join Oct 12, 2019

taldcroft approved these changes Oct 20, 2019

View reviewed changes

astrofrog approved these changes Oct 21, 2019

View reviewed changes

astrofrog merged commit 2d10078 into astropy:master Oct 21, 2019

bsipocz deleted the timeseries_join branch September 16, 2024 21:23

Uh oh!

Adding test for TimeSeries join #9287

Adding test for TimeSeries join #9287

Uh oh!

Conversation

bsipocz commented Sep 26, 2019

Uh oh!

bsipocz Sep 26, 2019

Choose a reason for hiding this comment

Uh oh!

astrofrog Oct 2, 2019

Choose a reason for hiding this comment

Uh oh!

bsipocz Oct 3, 2019

Choose a reason for hiding this comment

Uh oh!

bsipocz commented Sep 26, 2019

Uh oh!

astrofrog left a comment

Choose a reason for hiding this comment

Uh oh!

astrofrog Oct 2, 2019

Choose a reason for hiding this comment

Uh oh!

bsipocz commented Oct 3, 2019

Uh oh!

bsipocz commented Oct 3, 2019

Uh oh!

bsipocz commented Oct 3, 2019

Uh oh!

bsipocz commented Oct 4, 2019

Uh oh!

bsipocz commented Oct 4, 2019

Uh oh!

taldcroft commented Oct 7, 2019

Uh oh!

taldcroft commented Oct 7, 2019

Uh oh!

bsipocz commented Oct 11, 2019

Uh oh!

taldcroft Oct 11, 2019

Choose a reason for hiding this comment

Uh oh!

bsipocz Oct 11, 2019

Choose a reason for hiding this comment

Uh oh!

taldcroft Oct 11, 2019

Choose a reason for hiding this comment

Uh oh!

bsipocz commented Oct 12, 2019

Uh oh!

taldcroft left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants