MAINT,API: ignore and NULL fasttake/fastputmask ArrFuncs slots by seberg · Pull Request #14942 · numpy/numpy

seberg · 2019-11-19T20:40:48Z

Always NULL the fasttake slot, and instead move the optimized versions
by doing itemsize specialization in the Take function itself.

I still need to check that there is now speed regression.

numpy/core/src/multiarray/item_selection.c

numpy/core/src/multiarray/arraytypes.h

numpy/core/src/multiarray/item_selection.c

seberg · 2019-11-19T23:18:01Z

Think I addressed those, will squash in a bit. Maybe it is due to debug build, but now changed it to split up on chunk (because why not, it just speeds up certain cases). And it seems memmove is much faster even for 3 bools (which miss the specialized loops), due to memmove rather than a loop for 0 to 3. So overall, it seems like this is a pretty big speed win actually, plus it makes the error message cleaner...

seberg · 2019-11-19T23:19:11Z

I want to do the same thing for fastput, planning to do a new PR for that though right now.

numpy/core/src/multiarray/item_selection.c

seberg · 2019-11-20T17:47:27Z

The current timings are below. Basically, take is slower for 3 elements (which is too bad, since it was a strength of the old take compared to advanced indexing. That could probably be resolved by putting in the loop over the inner-size instead of the large chunk memmove.

For some reason complex256 is optimized much better in take (maybe alignment? maybe just compiler fluctuations). fastputmask is a bit slower, not sure why, but usually it is not a huge amount. And it is not a very common function.

Details

       before           after         ratio
     [01123f61]       [f75bf201]
     <clean-fasttake~3>                 
+      6.42±0.1μs      10.3±0.05μs     1.60  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'complex256')
+      3.62±0.1μs      5.64±0.03μs     1.56  bench_itemselection.PutMask.time_dense(False, 'complex256')
+      6.16±0.1μs       9.46±0.1μs     1.54  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'complex256')
+      6.73±0.1μs      10.3±0.07μs     1.53  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'complex256')
+      4.00±0.2μs      5.97±0.03μs     1.49  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'complex256')
+     4.14±0.08μs      6.02±0.05μs     1.46  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'complex256')
+     3.84±0.05μs      5.53±0.03μs     1.44  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'complex256')
+     6.18±0.09μs      7.94±0.02μs     1.28  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'int16')
+      8.39±0.2μs       10.2±0.3μs     1.21  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'complex256')
       6.14±0.9μs      7.40±0.02μs    ~1.20  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'float16')
       8.37±0.4μs       10.1±0.2μs    ~1.20  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'complex256')
+     5.87±0.04μs      6.97±0.02μs     1.19  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'int32')
+     6.06±0.09μs       7.10±0.2μs     1.17  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'complex256')
+     6.04±0.05μs      7.04±0.04μs     1.17  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'complex64')
         8.92±2μs       10.4±0.2μs    ~1.16  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'complex256')
+     6.16±0.06μs      7.09±0.07μs     1.15  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'complex256')
+      7.04±0.1μs      7.95±0.02μs     1.13  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'float16')
+     6.38±0.09μs       7.16±0.1μs     1.12  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'complex256')
       6.29±0.2μs      7.06±0.02μs    ~1.12  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'int64')
+     2.05±0.01μs         2.28±0μs     1.11  bench_itemselection.PutMask.time_dense(True, 'float16')
+        2.64±0μs      2.93±0.01μs     1.11  bench_itemselection.PutMask.time_dense(False, 'int64')
+        2.06±0μs         2.28±0μs     1.11  bench_itemselection.PutMask.time_dense(True, 'float32')
+        2.10±0μs      2.31±0.01μs     1.10  bench_itemselection.PutMask.time_dense(True, 'int64')
+     2.06±0.01μs         2.27±0μs     1.10  bench_itemselection.PutMask.time_dense(True, 'int16')
      2.65±0.03μs      2.90±0.01μs     1.10  bench_itemselection.PutMask.time_dense(False, 'complex64')
      2.58±0.01μs      2.80±0.01μs     1.08  bench_itemselection.PutMask.time_dense(False, 'int16')
      2.58±0.01μs      2.80±0.01μs     1.08  bench_itemselection.PutMask.time_dense(False, 'float16')
       2.69±0.2μs      2.90±0.02μs     1.08  bench_itemselection.PutMask.time_dense(False, 'float64')
      2.60±0.01μs         2.81±0μs     1.08  bench_itemselection.PutMask.time_dense(False, 'float32')
      2.60±0.01μs         2.80±0μs     1.08  bench_itemselection.PutMask.time_dense(False, 'int32')
      4.91±0.04μs      5.28±0.05μs     1.07  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'complex128')
       5.18±0.2μs      5.55±0.05μs     1.07  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'complex128')
      6.06±0.06μs      6.47±0.04μs     1.07  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'complex64')
       6.13±0.2μs      6.55±0.06μs     1.07  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'float64')
       7.04±0.4μs      7.39±0.02μs     1.05  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'float16')
      7.05±0.03μs      7.39±0.02μs     1.05  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'int16')
      3.32±0.03μs      3.47±0.03μs     1.05  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'complex128')
       6.14±0.6μs      6.40±0.04μs     1.04  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'float32')
         2.56±0μs      2.62±0.01μs     1.02  bench_itemselection.PutMask.time_dense(True, 'complex128')
       6.31±0.2μs      6.44±0.08μs     1.02  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'int32')
         2.33±0μs      2.37±0.01μs     1.02  bench_itemselection.PutMask.time_sparse(False, 'complex256')
      2.96±0.04μs      3.02±0.05μs     1.02  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'float16')
      2.99±0.03μs      3.04±0.03μs     1.02  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'int16')
      4.33±0.03μs      4.35±0.02μs     1.00  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'int16')
      4.36±0.02μs      4.37±0.03μs     1.00  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'float16')
       7.03±0.1μs      7.04±0.07μs     1.00  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'float64')
      4.43±0.04μs      4.43±0.04μs     1.00  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'complex64')
       4.83±0.2μs      4.80±0.02μs     0.99  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'int16')
      3.20±0.03μs      3.18±0.01μs     0.99  bench_itemselection.PutMask.time_dense(False, 'complex128')
      3.23±0.02μs      3.20±0.03μs     0.99  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'float16')
      4.43±0.02μs      4.39±0.02μs     0.99  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'int64')
      4.84±0.02μs      4.80±0.01μs     0.99  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'float16')
      4.98±0.02μs      4.92±0.04μs     0.99  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'int64')
      4.97±0.05μs      4.91±0.02μs     0.99  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'float64')
      3.27±0.02μs      3.23±0.02μs     0.99  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'int16')
         2.33±0μs      2.30±0.01μs     0.99  bench_itemselection.PutMask.time_dense(True, 'float64')
      4.39±0.02μs      4.34±0.03μs     0.99  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'float32')
      3.08±0.04μs      3.04±0.01μs     0.99  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'complex64')
      4.41±0.03μs      4.35±0.02μs     0.99  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'float64')
      2.31±0.01μs      2.28±0.01μs     0.99  bench_itemselection.PutMask.time_dense(True, 'int32')
         2.31±0μs      2.27±0.01μs     0.98  bench_itemselection.PutMask.time_sparse(False, 'float64')
      4.92±0.02μs      4.84±0.04μs     0.98  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'float32')
      4.96±0.04μs      4.86±0.01μs     0.98  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'complex64')
      2.35±0.01μs      2.30±0.01μs     0.98  bench_itemselection.PutMask.time_dense(True, 'complex64')
      4.95±0.04μs      4.84±0.03μs     0.98  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'int32')
      3.35±0.03μs      3.26±0.04μs     0.97  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'float64')
      3.09±0.02μs      3.00±0.03μs     0.97  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'int64')
      2.34±0.01μs         2.27±0μs     0.97  bench_itemselection.PutMask.time_sparse(False, 'int16')
      3.35±0.03μs      3.23±0.02μs     0.96  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'float32')
      3.33±0.03μs      3.21±0.03μs     0.96  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'int32')
      3.50±0.02μs      3.37±0.08μs     0.96  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'complex128')
      3.12±0.03μs      3.00±0.05μs     0.96  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'float64')
      3.38±0.04μs      3.25±0.02μs     0.96  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'complex64')
      3.38±0.02μs      3.24±0.03μs     0.96  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'int64')
      3.08±0.03μs      2.96±0.03μs     0.96  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'float32')
      7.25±0.09μs      6.90±0.04μs     0.95  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'float32')
         7.83±2μs      7.41±0.08μs     0.95  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'int16')
      5.35±0.04μs      5.04±0.03μs     0.94  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'complex128')
      3.17±0.02μs      2.94±0.02μs     0.93  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'int32')
      4.67±0.03μs      4.32±0.03μs     0.93  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'int32')
      3.34±0.03μs      3.07±0.04μs     0.92  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'complex128')
-     3.05±0.04μs      2.75±0.04μs     0.90  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'float64')
-     6.77±0.05μs      6.04±0.04μs     0.89  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'complex128')
-     4.98±0.05μs      4.44±0.07μs     0.89  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'complex128')
-     3.09±0.02μs      2.75±0.03μs     0.89  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'int64')
-     2.98±0.03μs      2.65±0.05μs     0.89  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'int16')
-     3.10±0.02μs      2.75±0.02μs     0.89  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'complex64')
-     4.39±0.04μs      3.89±0.02μs     0.89  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'float64')
         7.42±2μs      6.57±0.04μs    ~0.88  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'complex64')
-     2.96±0.04μs         2.62±0μs     0.88  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'float16')
-     3.05±0.01μs      2.69±0.02μs     0.88  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'float32')
-        2.29±0μs      2.00±0.01μs     0.87  bench_itemselection.PutMask.time_sparse(True, 'complex128')
-     4.44±0.02μs      3.86±0.03μs     0.87  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'complex64')
-     2.30±0.01μs      2.00±0.02μs     0.87  bench_itemselection.PutMask.time_sparse(True, 'complex64')
-     4.31±0.03μs      3.75±0.05μs     0.87  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'int16')
-     4.51±0.04μs      3.92±0.07μs     0.87  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'int64')
-     4.33±0.09μs      3.76±0.02μs     0.87  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'float16')
      7.26±0.05μs       6.27±0.1μs    ~0.86  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'complex128')
-     2.31±0.02μs         1.99±0μs     0.86  bench_itemselection.PutMask.time_sparse(True, 'int16')
-     4.42±0.06μs      3.82±0.02μs     0.86  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'float32')
-     7.01±0.07μs      6.04±0.06μs     0.86  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'complex128')
-     3.08±0.01μs      2.65±0.02μs     0.86  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'int32')
-     2.31±0.03μs         1.99±0μs     0.86  bench_itemselection.PutMask.time_sparse(True, 'int32')
-     2.33±0.04μs         1.99±0μs     0.86  bench_itemselection.PutMask.time_sparse(True, 'float64')
-      4.49±0.2μs      3.83±0.02μs     0.85  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'int32')
         7.61±2μs      6.47±0.02μs    ~0.85  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'int32')
-     2.38±0.01μs      2.00±0.01μs     0.84  bench_itemselection.PutMask.time_sparse(True, 'longfloat')
-        2.80±0μs      2.27±0.01μs     0.81  bench_itemselection.PutMask.time_sparse(False, 'int32')
-        8.10±1μs      6.51±0.05μs     0.80  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'float64')
-        7.04±0μs      5.64±0.04μs     0.80  bench_itemselection.PutMask.time_dense(True, 'complex256')
-        8.41±1μs      6.55±0.04μs     0.78  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'int64')
-     2.99±0.02μs      2.27±0.01μs     0.76  bench_itemselection.PutMask.time_sparse(False, 'longfloat')
-        8.54±2μs      6.47±0.02μs     0.76  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'float32')
-     3.02±0.01μs         2.28±0μs     0.75  bench_itemselection.PutMask.time_sparse(False, 'int64')
-     3.02±0.01μs         2.27±0μs     0.75  bench_itemselection.PutMask.time_sparse(False, 'complex64')
-     3.02±0.01μs      2.27±0.01μs     0.75  bench_itemselection.PutMask.time_sparse(False, 'complex128')
-      8.72±0.9μs      6.54±0.04μs     0.75  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'int64')
-     3.05±0.02μs      2.29±0.01μs     0.75  bench_itemselection.PutMask.time_sparse(False, 'float16')
         8.67±2μs      6.45±0.02μs    ~0.74  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'complex128')
-     3.07±0.01μs      2.28±0.01μs     0.74  bench_itemselection.PutMask.time_sparse(False, 'float32')
-      5.08±0.2μs      3.45±0.06μs     0.68  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'complex64')
-     4.97±0.04μs      3.34±0.02μs     0.67  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'float64')
-     3.10±0.02μs         2.08±0μs     0.67  bench_itemselection.PutMask.time_sparse(True, 'complex256')
-     9.09±0.05μs      6.05±0.03μs     0.67  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'longfloat')
-     3.01±0.01μs         1.99±0μs     0.66  bench_itemselection.PutMask.time_sparse(True, 'float16')
-      9.16±0.7μs      6.04±0.07μs     0.66  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'longfloat')
-     3.03±0.01μs      1.99±0.01μs     0.66  bench_itemselection.PutMask.time_sparse(True, 'float32')
-     5.21±0.05μs      3.41±0.03μs     0.65  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'complex64')
-      3.06±0.2μs         1.99±0μs     0.65  bench_itemselection.PutMask.time_sparse(True, 'int64')
-     5.36±0.04μs      3.44±0.04μs     0.64  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'int64')
-     5.38±0.02μs      3.43±0.04μs     0.64  bench_itemselection.Take.time_contiguous((1000, 1), 'clip', 'longfloat')
-     5.57±0.07μs      3.49±0.05μs     0.63  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'int64')
-     5.21±0.03μs      3.25±0.03μs     0.62  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'int32')
-     4.98±0.04μs      3.09±0.07μs     0.62  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'float64')
-     5.65±0.02μs      3.46±0.03μs     0.61  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'float64')
-     5.33±0.03μs      3.22±0.02μs     0.60  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'int16')
-     4.99±0.05μs      3.00±0.04μs     0.60  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'int16')
-     5.09±0.04μs      3.03±0.01μs     0.60  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'int32')
-     5.05±0.04μs      2.98±0.03μs     0.59  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'complex64')
-     4.45±0.01μs      2.63±0.01μs     0.59  bench_itemselection.PutMask.time_dense(True, 'longfloat')
-     9.01±0.06μs      5.28±0.03μs     0.59  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'clip', 'longfloat')
-        11.8±2μs      6.84±0.03μs     0.58  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'complex128')
-      12.0±0.1μs      6.77±0.06μs     0.56  bench_itemselection.Take.time_contiguous((1000, 3), 'raise', 'longfloat')
-     4.96±0.01μs      2.70±0.03μs     0.54  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'int32')
-     5.01±0.03μs      2.72±0.02μs     0.54  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'float32')
-     4.98±0.02μs      2.66±0.03μs     0.53  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'float16')
-     5.87±0.03μs      3.07±0.03μs     0.52  bench_itemselection.Take.time_contiguous((1000, 1), 'wrap', 'longfloat')
-     5.91±0.05μs      3.07±0.04μs     0.52  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'int64')
-     10.7±0.06μs      5.52±0.03μs     0.52  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'longfloat')
-     12.3±0.06μs      6.28±0.08μs     0.51  bench_itemselection.Take.time_contiguous((1000, 3), 'clip', 'longfloat')
-     6.52±0.04μs      3.23±0.04μs     0.50  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'float16')
-     6.53±0.03μs      3.19±0.02μs     0.49  bench_itemselection.PutMask.time_dense(False, 'longfloat')
-     6.88±0.03μs      3.35±0.04μs     0.49  bench_itemselection.Take.time_contiguous((1000, 1), 'raise', 'longfloat')
-     6.97±0.04μs      3.29±0.02μs     0.47  bench_itemselection.Take.time_contiguous((1000, 2), 'raise', 'float32')
-      6.39±0.1μs      2.95±0.01μs     0.46  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'float16')
-     9.98±0.04μs      4.44±0.04μs     0.44  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'wrap', 'longfloat')
-      6.39±0.4μs      2.68±0.04μs     0.42  bench_itemselection.Take.time_contiguous((1000, 2), 'wrap', 'int16')
-      12.0±0.2μs      5.00±0.04μs     0.42  bench_itemselection.Take.time_contiguous((2, 1000, 1), 'raise', 'longfloat')
-        17.3±4μs      6.49±0.03μs     0.38  bench_itemselection.Take.time_contiguous((1000, 3), 'wrap', 'longfloat')
-        8.42±1μs      3.03±0.03μs     0.36  bench_itemselection.Take.time_contiguous((1000, 2), 'clip', 'float32')

numpy/core/src/multiarray/item_selection.c

seberg · 2019-11-20T23:06:26Z

OK, I hope this is more or less converging. There is a slightly slowdown for certain takes, but I am not too concerned, and it could be adapted later if necessary. The complex256 slowdown is somewhat annoying, but I do not think it really matters much and may well just get magically fixed with other compilers or so.

seberg · 2019-11-21T00:39:33Z

Hmmm, the azure error seems real. While float96 should not be optimized, something goes horribly wrong. Does memmove have alignment requirements, or is this a compiler bug with the -O3 optimization level? Grrr....

seberg · 2019-11-21T01:20:03Z

Oooops, it was a stupid typo... The issue was only that none of the tests hit the default path, except on the 32bit (actually linux) platform.

Always NULL the fasttake slot, and instead move the optimized versions by doing itemsize specialization in the Take function itself.

This removes the use of the fastputmask ArrFuncs slot from within NumPy and always leaves it NULL.

seberg · 2020-01-15T20:03:21Z

From the call: It sounds like nobody is too concerned about ignoring these slots and a slight difference in speed trade-offs (and I guess if someone notices slowdowns we can optimize). Technical review awaiting of course.

mattip · 2020-01-15T20:48:28Z

doc/release/upcoming_changes/14942.compatibility.rst

+Fasttake and fastputmask slots are deprecated and NULL'ed
+---------------------------------------------------------
+The fasttake and fastputmask slots are now never used internally.
+They will always set to NULL for builtin dtypes.


"They will always be NULL" or "They will always be set to NULL ..."

mattip · 2020-01-15T20:54:29Z

In PyArray_RegisterDataType, maybe error out if the fields are not NULL?

seberg · 2020-01-15T21:12:03Z

Sounds like a good idea, although maybe a warning (forever) that it is just ignored is just as good? Removes any possible breaking, and at some point we have to deprecate that whole function anyway...

mattip · 2020-01-16T00:24:18Z

any possible breaking

Up to you. I would argue that

anyone taking the effort to write a function that is then ignored might appreciate getting an error rather than a warning
it is hard to find a non-existent call to their function in the code base, leading to a hard-to-debug situation where their code is never called by NumPy but may be called by some third party package

When we remove the functions the error will go away too.

seberg · 2020-01-16T00:33:19Z

numpy/core/src/multiarray/usertypes.c

        return -1;
    }
+
+    if (test_deprecated_arrfuncs_members(f) < 0) {


Put into its own helper in anticiapation of more functions joining. In particular fastclip is already deprecated at execution time, and should be added here as well now (as well as get the same documentation updates). Planning to do that once this is in.

seberg · 2020-01-16T00:34:05Z

Testing this seems a bit annoying, so I only tested all the cases manually right now... I can try to think of a way to test it reasonably...

seberg · 2020-01-16T00:45:29Z

I opted for warning, the alternative is to use the clip warning at the place where the calculation happens, but I do not think that makes sense here: chances are you are not actually doing that calculation, and the results should never change. You are right for someone who writes a dtype, I am thinking of the possibility of someone only using one.

As mentioned above: I think clip should be deprecated in both places (here would be enough I suppose, but I do not feel like removing the other one), I will do that as a followup.

mattip · 2020-01-16T02:41:18Z

Thanks @seberg

eric-wieser reviewed Nov 19, 2019

View reviewed changes

numpy/core/src/multiarray/item_selection.c Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 19, 2019

View reviewed changes

numpy/core/src/multiarray/arraytypes.h Outdated Show resolved Hide resolved

seberg force-pushed the clean-fasttake branch from 627e6a1 to b52f33d Compare November 19, 2019 21:47

seberg changed the title ~~MAINT,API: Do not use fasttake slot of ArrFuncs and null it~~ MAINT,API: Do not use fasttake slot of ArrFuncs and NULL it Nov 19, 2019

seberg force-pushed the clean-fasttake branch 3 times, most recently from 857b0cd to 8d7cad0 Compare November 19, 2019 22:11

eric-wieser reviewed Nov 19, 2019

View reviewed changes

numpy/core/src/multiarray/item_selection.c Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 19, 2019

View reviewed changes

numpy/core/src/multiarray/item_selection.c Outdated Show resolved Hide resolved

seberg commented Nov 19, 2019

View reviewed changes

numpy/core/src/multiarray/item_selection.c Outdated Show resolved Hide resolved

seberg force-pushed the clean-fasttake branch from 4856e13 to e67c1ce Compare November 20, 2019 02:07

eric-wieser reviewed Nov 20, 2019

View reviewed changes

numpy/core/src/multiarray/item_selection.c Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 20, 2019

View reviewed changes

numpy/core/src/multiarray/item_selection.c Outdated Show resolved Hide resolved

seberg changed the title ~~MAINT,API: Do not use fasttake slot of ArrFuncs and NULL it~~ MAINT,API: Do not use fasttake/fastputmask slots of ArrFuncs and NULL them Nov 20, 2019

seberg changed the title ~~MAINT,API: Do not use fasttake/fastputmask slots of ArrFuncs and NULL them~~ MAINT,API: NULL fasttake/fastputmask slots of ArrFuncs and ignore them Nov 20, 2019

seberg changed the title ~~MAINT,API: NULL fasttake/fastputmask slots of ArrFuncs and ignore them~~ MAINT,API: ignore and NULL fasttake/fastputmask slots of ArrFuncs Nov 20, 2019

seberg changed the title ~~MAINT,API: ignore and NULL fasttake/fastputmask slots of ArrFuncs~~ MAINT,API: ignore and NULL fasttake/fastputmask ArrFuncs slots Nov 20, 2019

seberg force-pushed the clean-fasttake branch from e67c1ce to 6ec673a Compare November 20, 2019 17:44

eric-wieser reviewed Nov 20, 2019

View reviewed changes

numpy/core/src/multiarray/item_selection.c Outdated Show resolved Hide resolved

charris added 03 - Maintenance component: numpy._core 63 - C API Changes or additions to the C API. Mailing list should usually be notified. labels Nov 20, 2019

seberg force-pushed the clean-fasttake branch 2 times, most recently from f02ee2f to 6e32571 Compare November 20, 2019 23:04

seberg force-pushed the clean-fasttake branch 2 times, most recently from 9199017 to d8d6c36 Compare November 21, 2019 01:52

seberg force-pushed the clean-fasttake branch from d8d6c36 to 9cf0660 Compare January 7, 2020 18:08

seberg mentioned this pull request Jan 7, 2020

BENCH: Add basic benchmarks for take and putmask #15280

Merged

seberg added 2 commits January 7, 2020 12:19

MAINT,API: Do not use fasttake slot of ArrFuncs and null it

41ff3e9

Always NULL the fasttake slot, and instead move the optimized versions by doing itemsize specialization in the Take function itself.

MAINT,API: Do not use fastputmask dtype slot and always NULL it

b425ce6

This removes the use of the fastputmask ArrFuncs slot from within NumPy and always leaves it NULL.

seberg force-pushed the clean-fasttake branch from 9cf0660 to b425ce6 Compare January 7, 2020 18:20

seberg added triage review Issue/PR to be discussed at the next triage meeting triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Jan 15, 2020

mattip reviewed Jan 15, 2020

View reviewed changes

DEP: Actually deprecate fasttake and fastclipmask

68e8d47

seberg force-pushed the clean-fasttake branch from 9769e8c to 68e8d47 Compare January 16, 2020 00:31

seberg commented Jan 16, 2020

View reviewed changes

mattip merged commit feb0794 into numpy:master Jan 16, 2020

seberg deleted the clean-fasttake branch January 16, 2020 02:56

Uh oh!

Conversation

seberg commented Nov 19, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seberg commented Nov 19, 2019

Uh oh!

seberg commented Nov 19, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seberg commented Nov 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

seberg commented Nov 20, 2019

Uh oh!

seberg commented Nov 21, 2019

Uh oh!

seberg commented Nov 21, 2019

Uh oh!

seberg commented Jan 15, 2020

Uh oh!

mattip Jan 15, 2020

Choose a reason for hiding this comment

Uh oh!

mattip commented Jan 15, 2020

Uh oh!

seberg commented Jan 15, 2020

Uh oh!

mattip commented Jan 16, 2020

Uh oh!

seberg Jan 16, 2020

Choose a reason for hiding this comment

Uh oh!

seberg commented Jan 16, 2020

Uh oh!

seberg commented Jan 16, 2020

Uh oh!

mattip commented Jan 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

seberg commented Nov 20, 2019 •

edited

Loading