ENH: Implement faster argument parsing by seberg · Pull Request #15099 · numpy/numpy

seberg · 2019-12-11T19:11:11Z

This speeds up argument parsing quite a bit. Unfortunately, actually the local code for argparsing within np.array itself is faster for an intermediate number of kwargs... Timing these speedups is tricky, in %timeit it is very clear, e.g. try arr.astype(arr.dtype, copy=False).

This currently works by adding a static struct to the calling function; There are probably a few micro-optimization tries in there that should be cleanedup. So for now marking it "draft" to see what people think.

One thing missing: ufunc argparsing does the same thing, so that duplicates code (that is quite a bit of cleanup which should go in here; I am dreading the merge conflict with the dtypes branch though...).

This replaces the PyArgs_ParseTupleAndKeywords in most cases.

this also moves asarray/asanyarray, etc. to C to make up for the slight loss in argument parsing speeds it should now be preferred to use these functions when possible. This also uses the preferred functions in a few places

numpy/core/src/multiarray/conversion_utils.c

eric-wieser · 2019-12-11T20:00:14Z

To what extent can you use METH_FASTCALL | METH_KEYWORDS for a speed boost here?

seberg · 2019-12-11T20:02:57Z

Hmmm, I had not looked at it too much, since I thought it was not supported. But I suppose the important thing is that it is not "officially" supported, and probably everyone supports it, and they are making it official, so we can just use it on any python version. I will look into it, adapting the code to it should be fairly straight forwrad.

eric-wieser · 2019-12-11T20:06:13Z

Using argument clinic might be another way to get some performance boosts

seberg · 2019-12-11T20:08:13Z

I looked at argument clinic at some point. But it is pretty baked into Python itself and seemed like we would have to extract it completely; Plus, quite frankly, I am not sure I like the large change in the actual code that it seemed like it does (although maybe I looked too much at more complex cases defining a whole class). Not sure I feel like going there. OTOH, of course this is 200 lines of code that is hand written for us...

seberg · 2019-12-11T20:17:51Z

numpy/core/src/multiarray/common.c

+ * if no match was found), or -1 on failure.
+ */
+static NPY_INLINE npy_intp
+locate_key(PyObject *const *kwnames, PyObject *key)


Hmm, this was the ufunc code, my code was slightly different... A bit of a mess up here, so it might change (although making it into an inlined function may be nicer.

seberg · 2019-12-11T21:13:16Z

Hehe, you are right of course... shoudl have move seriously considered it earlier. METH_FASTCALL almost halves of a simple np.array(array, copy=False) call. The ufunc changes are a bit tricker with it (they need to percolate into the override changes). But if this is allowed, it is a huge gain.

seberg · 2019-12-11T21:15:54Z

Hmmm, it might be that the main reason that I did not consider clinic is that I stored it as using private API, which would have been the FASTCALL switch. This is pretty nice for now for sure, I wonder if clinic really does all that much better...

eric-wieser · 2019-12-11T21:54:21Z

I wonder if clinic really does all that much better

I think the benefit is it generates the code that parses the arguments, which saves you from doing so.

Note also that FASTCALL is not supported on 3.6, so this would need to wait a release or two first.

seberg · 2019-12-11T22:12:54Z

Note also that FASTCALL is not supported on 3.6, so this would need to wait a release or two first.

Ah, than that is why I did not consider it. Note that the changes in the code are fairly limited. I could easily solve this with something like 3-4 #ifdef blocks and macros that expand to the two alternative signatures. Considering that this speeds up np.asarray(array) a factor of two, I think that may just be worth it?

seberg · 2019-12-11T22:54:10Z

Here are the timing (with METH_FASTCALL) on my computer. I will try and write the generic code depending on whether or not METH_FASTCALL is available. Adding this to ufuncs should be interesting (although maybe a second step, the main annoyance is the override machinery, although it should really be very easy in the end).

Master vs. `METH_FASTCALL` comparison UPDATE: Including ufunc (not including `tp_vectorcall` since python 3.7

       before           after         ratio
     [6be17044]       [e52e247f]
     <master>         <faster-argparsing>
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'complex256')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'longfloat')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'complex256')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'longfloat')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'complex256')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'longfloat')
+         240±3ms        607±400ms     2.53  bench_function_base.Histogram2D.time_fine_binning
+       322±0.4μs         468±20μs     1.45  bench_lib.Nan.time_nanmin(200000, 0)
+       328±0.3μs          437±7μs     1.33  bench_lib.Nan.time_nanmin(200000, 0.1)
+       160±0.4μs        209±0.1μs     1.31  bench_core.UnpackBits.time_unpackbits_axis1
+       132±0.1μs        170±0.2μs     1.29  bench_function_base.Sort.time_argsort('quick', 'int64', ('uniform',))
+       374±0.4μs         480±10μs     1.28  bench_lib.Nan.time_nanmin(200000, 2.0)
+     1.71±0.01ms       2.16±0.4ms     1.27  bench_lib.Pad.time_pad((4, 4, 4, 4), (0, 32), 'constant')
+       127±0.2μs        160±0.2μs     1.26  bench_function_base.Sort.time_argsort('quick', 'int16', ('uniform',))
+       224±0.2μs          283±1μs     1.26  bench_function_base.Sort.time_argsort('merge', 'int64', ('sorted_block', 10))
+         730±2μs        923±0.3μs     1.26  bench_function_base.Sort.time_sort('heap', 'int64', ('ordered',))
+         772±2μs          956±1μs     1.24  bench_function_base.Sort.time_sort('heap', 'int64', ('reversed',))
      2.92±0.04ms      3.61±0.02ms    ~1.24  bench_io.Copy.time_memcpy_large_out_of_place('float32')
+       115±0.2ms        142±0.2ms     1.23  bench_function_base.Sort.time_sort_worst
+     10.0±0.03μs      12.3±0.05μs     1.23  bench_core.UnpackBits.time_unpackbits
+        1.00±0ms         1.22±0ms     1.22  bench_function_base.Sort.time_sort('heap', 'int64', ('sorted_block', 10))
+       964±0.5μs         1.18±0ms     1.22  bench_function_base.Sort.time_sort('heap', 'int64', ('sorted_block', 1000))
+        1.05±0ms         1.27±0ms     1.21  bench_function_base.Sort.time_sort('heap', 'int64', ('sorted_block', 100))
+        1.16±0ms         1.39±0ms     1.20  bench_function_base.Sort.time_sort('heap', 'int64', ('random',))
+       149±0.1μs        178±0.3μs     1.20  bench_function_base.Sort.time_argsort('merge', 'int64', ('sorted_block', 100))
+         301±3ms          357±2ms     1.19  bench_core.CorrConv.time_convolve(100000, 10000, 'full')
+     7.62±0.01μs      8.99±0.06μs     1.18  bench_avx.AVX_UFunc.time_ufunc('trunc', 1, 'd')
+       103±0.1μs        121±0.4μs     1.17  bench_function_base.Sort.time_argsort('quick', 'int16', ('ordered',))
+     8.03±0.03μs      9.38±0.03μs     1.17  bench_avx.AVX_UFunc.time_ufunc('square', 1, 'd')
+       959±0.8μs         1.11±0ms     1.16  bench_function_base.Sort.time_argsort('merge', 'int64', ('random',))
+       133±0.2μs         154±20μs     1.16  bench_indexing.Indexing.time_op('indexes_rand_', 'I', '=1')
       8.70±0.2ms      10.0±0.03ms    ~1.15  bench_lib.Pad.time_pad((4, 4, 4, 4), (0, 32), 'wrap')
         274±20ms         314±10ms    ~1.15  bench_core.CorrConv.time_correlate(100000, 10000, 'valid')
+      10.2±0.2ms         11.6±1ms     1.14  bench_lib.Pad.time_pad((4, 4, 4, 4), (0, 32), 'reflect')
+         302±3ms          343±2ms     1.13  bench_core.CorrConv.time_convolve(100000, 10000, 'same')
+     13.1±0.05μs      14.9±0.04μs     1.13  bench_function_base.Sort.time_sort('merge', 'int16', ('ordered',))
+       220±0.5μs        247±0.3μs     1.12  bench_function_base.Sort.time_argsort('quick', 'float64', ('uniform',))
+     13.3±0.04μs      14.9±0.05μs     1.12  bench_function_base.Sort.time_sort('merge', 'int16', ('uniform',))
+     5.73±0.05μs      6.39±0.06μs     1.12  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 1, 'f')
+       161±0.2μs        179±0.3μs     1.11  bench_function_base.Sort.time_argsort('quick', 'int16', ('reversed',))
+     90.5±0.08μs       99.9±0.3μs     1.10  bench_function_base.Sort.time_argsort('merge', 'int64', ('sorted_block', 1000))
+         286±3ms          315±2ms     1.10  bench_core.CorrConv.time_convolve(100000, 10000, 'valid')
+         889±2μs          979±2μs     1.10  bench_function_base.Sort.time_sort('heap', 'float64', ('ordered',))
+        1.15±0ms         1.26±0ms     1.10  bench_function_base.Sort.time_sort('heap', 'float64', ('sorted_block', 1000))
      5.88±0.03ms       6.46±0.7ms     1.10  bench_core.Core.time_identity_3000
      5.87±0.03ms       6.45±0.6ms     1.10  bench_core.Core.time_eye_3000
       16.8±0.1μs       18.4±0.1μs     1.10  bench_function_base.Sort.time_sort('merge', 'int64', ('uniform',))
      17.0±0.07μs       18.6±0.2μs     1.09  bench_function_base.Sort.time_sort('merge', 'int64', ('ordered',))
       88.1±0.1μs       96.3±0.1μs     1.09  bench_function_base.Sort.time_sort('quick', 'int64', ('ordered',))
      2.80±0.07μs      3.05±0.03μs     1.09  bench_io.Copy.time_memcpy('int16')
       71.9±0.1μs       78.4±0.3μs     1.09  bench_ufunc.CustomInplace.time_double_add
          408±2μs          445±3μs     1.09  bench_ufunc.UFunc.time_ufunc_types('sign')
        234±0.5μs        254±0.2μs     1.09  bench_ufunc.UFunc.time_ufunc_types('isnan')
      7.63±0.01ms       8.29±0.7ms     1.09  bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <class 'int'>)
       47.7±0.2μs       51.7±0.2μs     1.08  bench_function_base.Sort.time_sort('heap', 'int64', ('uniform',))
       73.5±0.2μs       79.3±0.2μs     1.08  bench_ufunc.CustomInplace.time_float_add
          950±1μs         1.02±0ms     1.08  bench_function_base.Sort.time_sort('heap', 'float64', ('reversed',))
          766±2μs          825±1μs     1.08  bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 10))
          734±1μs        791±0.3μs     1.08  bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 100))
        903±0.8μs        972±0.9μs     1.08  bench_function_base.Sort.time_argsort('quick', 'int64', ('random',))
      4.31±0.02μs      4.64±0.07μs     1.08  bench_avx.AVX_UFunc.time_ufunc('rint', 1, 'f')
       92.7±0.2μs       99.7±0.5μs     1.08  bench_ufunc.CustomInplace.time_double_add_temp
      14.3±0.04ms      15.3±0.06ms     1.07  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 8, 'linear_ramp')
        564±0.3μs        606±0.8μs     1.07  bench_function_base.Sort.time_argsort('quick', 'int64', ('sorted_block', 1000))
          441±4μs          472±1μs     1.07  bench_ufunc.UFunc.time_ufunc_types('less')
          263±2μs        282±0.2μs     1.07  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint8', (4, 4))
      33.0±0.08μs      35.3±0.04μs     1.07  bench_function_base.Sort.time_argsort('merge', 'float64', ('ordered',))
          950±8μs      1.02±0.02ms     1.07  bench_lib.Pad.time_pad((4, 4, 4, 4), 8, 'wrap')
          357±4μs          381±2μs     1.07  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint16', (4, 4))
      7.65±0.01ms      8.16±0.01ms     1.07  bench_reduce.AddReduceSeparate.time_reduce(0, 'complex256')
        141±0.1μs        150±0.1μs     1.07  bench_function_base.Sort.time_sort('quick', 'int64', ('reversed',))
         1.24±0ms         1.32±0ms     1.07  bench_function_base.Sort.time_sort('heap', 'float64', ('sorted_block', 100))
         1.36±0ms         1.45±0ms     1.07  bench_function_base.Sort.time_sort('heap', 'float64', ('random',))
       12.5±0.5ms       13.3±0.1ms     1.07  bench_lib.Pad.time_pad((4, 4, 4, 4), (0, 32), 'linear_ramp')
       8.41±0.2μs       8.96±0.1μs     1.07  bench_io.Copy.time_strided_assign('int16')
      4.39±0.03μs      4.68±0.04μs     1.07  bench_avx.AVX_UFunc.time_ufunc('trunc', 1, 'f')
          236±3μs          251±2μs     1.06  bench_function_base.Sort.time_sort('merge', 'float64', ('sorted_block', 10))
       2.78±0.2ms       2.95±0.2ms     1.06  bench_lib.Pad.time_pad((1024, 1024), (0, 32), 'mean')
      33.1±0.03μs      35.2±0.05μs     1.06  bench_function_base.Sort.time_argsort('merge', 'float64', ('uniform',))
         1.20±0ms         1.28±0ms     1.06  bench_function_base.Sort.time_sort('heap', 'float64', ('sorted_block', 10))
      7.29±0.03μs       7.73±0.1μs     1.06  bench_avx.AVX_UFunc.time_ufunc('floor', 1, 'd')
          169±2μs          178±3μs     1.06  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint8', (2, 2))
        166±0.6μs        175±0.6μs     1.05  bench_ufunc.UFunc.time_ufunc_types('left_shift')
      1.25±0.06ms      1.32±0.05ms     1.05  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint64', (2, 2))
      3.59±0.01ms      3.77±0.04ms     1.05  bench_io.LoadtxtReadUint64Integers.time_read_uint64(1000)
       82.2±0.2μs       86.2±0.2μs     1.05  bench_function_base.Sort.time_sort('merge', 'int64', ('sorted_block', 1000))
         4.94±0ms      5.18±0.07ms     1.05  bench_core.CorrConv.time_convolve(100000, 100, 'valid')
          457±4μs          479±3μs     1.05  bench_ufunc.UFunc.time_ufunc_types('greater_equal')
         1.05±0ms         1.10±0ms     1.05  bench_function_base.Sort.time_argsort('quick', 'float64', ('random',))
          908±2μs         952±40μs     1.05  bench_indexing.Indexing.time_op('indexes_rand_', 'np.ix_(I, I)', '=1')
         4.95±0ms      5.19±0.08ms     1.05  bench_core.CorrConv.time_convolve(100000, 100, 'same')
          179±2ms          187±1ms     1.04  bench_app.LaplaceInplace.time_it('inplace')
       22.7±0.2μs       23.7±0.1μs     1.04  bench_function_base.Sort.time_sort('merge', 'int64', ('reversed',))
       15.8±0.3μs       16.5±0.1μs     1.04  bench_avx.AVX_UFunc.time_ufunc('square', 4, 'd')
          456±3μs          476±1μs     1.04  bench_ufunc.UFunc.time_ufunc_types('less_equal')
        150±0.4μs        157±0.3μs     1.04  bench_function_base.Sort.time_sort('merge', 'int64', ('sorted_block', 100))
       9.30±0.2μs       9.67±0.1μs     1.04  bench_avx.AVX_UFunc.time_ufunc('square', 4, 'f')
          628±5μs         654±20μs     1.04  bench_reduce.AddReduceSeparate.time_reduce(0, 'int64')
      1.05±0.02ms       1.09±0.1ms     1.04  bench_linalg.Eindot.time_matmul_a_b
          721±7μs         750±40μs     1.04  bench_lib.Nan.time_nanmean(200000, 0)
      5.14±0.05μs      5.34±0.01μs     1.04  bench_avx.AVX_UFunc.time_ufunc('floor', 1, 'f')
        770±0.4μs        800±0.8μs     1.04  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 10))
        220±0.3μs        229±0.6μs     1.04  bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'object'>)
       49.3±0.2ms       51.2±0.3ms     1.04  bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <class 'str'>)
        885±0.6μs          918±2μs     1.04  bench_function_base.Sort.time_argsort('quick', 'float64', ('sorted_block', 10))
      1.14±0.01ms       1.19±0.2ms     1.04  bench_linalg.Linalg.time_op('det', 'int32')
        857±0.7μs          887±1μs     1.04  bench_function_base.Sort.time_argsort('quick', 'float64', ('sorted_block', 100))
        446±0.5μs          461±3μs     1.04  bench_ufunc.UFunc.time_ufunc_types('not_equal')
       49.3±0.2ms      51.1±0.05ms     1.04  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <class 'str'>)
        597±0.6μs          618±2μs     1.03  bench_ufunc.UFunc.time_ufunc_types('fmin')
       14.9±0.2μs       15.4±0.2μs     1.03  bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float64'>)
        749±0.9μs        775±0.4μs     1.03  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 100))
       15.3±0.4μs       15.9±0.2μs     1.03  bench_avx.AVX_UFunc.time_ufunc('ceil', 4, 'd')
          644±2μs          666±1μs     1.03  bench_ufunc.UFunc.time_ufunc_types('nextafter')
          673±3ms          696±3ms     1.03  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100000)
       35.4±0.2ms       36.6±0.3ms     1.03  bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(10000)
       5.22±0.7ms       5.40±0.7ms     1.03  bench_function_base.Histogram2D.time_small_coverage
        915±0.7μs        945±0.5μs     1.03  bench_function_base.Sort.time_argsort('quick', 'int16', ('random',))
        330±0.2μs          341±1μs     1.03  bench_core.CountNonzero.time_count_nonzero(3, 10000, <class 'object'>)
        570±0.7μs        588±0.3μs     1.03  bench_function_base.Sort.time_argsort('quick', 'int16', ('sorted_block', 1000))
          519±3ns          536±3ns     1.03  bench_ufunc.UFunc.time_ufunc_types('isnat')
        115±0.4μs        118±0.3μs     1.03  bench_indexing.Indexing.time_op('indexes_rand_', 'I', '')
          674±2ms         695±10ms     1.03  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100000)
      58.6±0.08μs       60.4±0.5μs     1.03  bench_function_base.Sort.time_sort('heap', 'int16', ('uniform',))
      88.9±0.09μs      91.5±0.05μs     1.03  bench_io.CopyTo.time_copyto_8_dense
          700±2μs          720±7μs     1.03  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100)
       6.05±0.7ms       6.22±0.7ms     1.03  bench_function_base.Histogram2D.time_full_coverage
          218±2μs        224±0.3μs     1.03  bench_indexing.Indexing.time_op('indexes_rand_', ':,I', '')
       111±0.05μs        114±0.4μs     1.03  bench_core.CountNonzero.time_count_nonzero(1, 10000, <class 'object'>)
      7.65±0.04μs      7.86±0.05μs     1.03  bench_avx.AVX_UFunc.time_ufunc('rint', 2, 'f')
          716±1μs          735±5μs     1.03  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100)
       11.8±0.2μs      12.1±0.07μs     1.03  bench_avx.AVX_UFunc.time_ufunc('ceil', 2, 'd')
        700±0.8μs          719±2μs     1.03  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100)
      2.00±0.01ms         2.05±0ms     1.03  bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(550)
      8.67±0.05μs      8.90±0.03μs     1.03  bench_avx.AVX_UFunc.time_ufunc('rint', 4, 'f')
       11.9±0.2μs       12.2±0.1μs     1.02  bench_avx.AVX_UFunc.time_ufunc('trunc', 2, 'd')
      8.86±0.03ms      9.08±0.01ms     1.02  bench_lib.Pad.time_pad((4194304,), 1, 'wrap')
      41.0±0.07ms       42.0±0.1ms     1.02  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <class 'object'>)
      3.62±0.02ms      3.71±0.02ms     1.02  bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(1000)
      5.09±0.01ms      5.21±0.07ms     1.02  bench_core.CorrConv.time_convolve(100000, 100, 'full')
       11.5±0.1μs      11.8±0.07μs     1.02  bench_io.Copy.time_memcpy('complex64')
          448±1μs          458±3μs     1.02  bench_ufunc.UFunc.time_ufunc_types('equal')
          169±5μs          173±2μs     1.02  bench_function_base.Sort.time_argsort('merge', 'int16', ('reversed',))
        305±0.8μs        312±0.6μs     1.02  bench_reduce.AddReduceSeparate.time_reduce(0, 'float32')
      8.82±0.03μs      9.03±0.07μs     1.02  bench_avx.AVX_UFunc.time_ufunc('ceil', 4, 'f')
      8.09±0.06μs      8.28±0.05μs     1.02  bench_avx.AVX_UFunc.time_ufunc('square', 2, 'f')
         3.67±0ms      3.76±0.08ms     1.02  bench_reduce.AddReduceSeparate.time_reduce(0, 'longfloat')
        449±0.7μs        460±0.6μs     1.02  bench_ufunc.UFunc.time_ufunc_types('greater')
      4.82±0.04μs      4.92±0.04μs     1.02  bench_avx.AVX_UFunc.time_ufunc('square', 1, 'f')
      41.1±0.09ms       42.1±0.1ms     1.02  bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'object'>)
      1.12±0.03ms      1.14±0.03ms     1.02  bench_linalg.Eindot.time_dot_a_b
        669±0.6ms          683±4ms     1.02  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100000)
        115±0.2μs        118±0.8μs     1.02  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10)
      13.7±0.04μs       14.0±0.2μs     1.02  bench_ma.Indexing.time_0d(True, 1, 1000)
        128±200μs        131±200μs     1.02  bench_shape_base.Block2D.time_block2d((256, 256), 'uint64', (2, 2))
       11.6±0.2μs       11.9±0.1μs     1.02  bench_avx.AVX_UFunc.time_ufunc('rint', 2, 'd')
       11.1±0.5ms      11.3±0.04ms     1.02  bench_ufunc.Broadcast.time_broadcast
      7.58±0.03μs      7.74±0.03μs     1.02  bench_avx.AVX_UFunc.time_ufunc('floor', 2, 'f')
      1.10±0.02ms      1.13±0.02ms     1.02  bench_lib.Nan.time_nanvar(200000, 0.1)
       61.7±0.2ms      62.9±0.06ms     1.02  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <class 'object'>)
      8.87±0.02ms      9.05±0.03ms     1.02  bench_lib.Pad.time_pad((4194304,), 1, 'constant')
      8.90±0.01ms      9.08±0.02ms     1.02  bench_lib.Pad.time_pad((4194304,), 8, 'reflect')
         81.7±2ms         83.4±2ms     1.02  bench_ma.Concatenate.time_it('masked', 2000)
      20.6±0.04ms      21.0±0.06ms     1.02  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 1000000, <class 'object'>)
      21.8±0.03ms       22.2±0.2ms     1.02  bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'object'>)
      20.6±0.07ms      21.0±0.06ms     1.02  bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <class 'object'>)
      9.05±0.02ms         9.23±0ms     1.02  bench_lib.Pad.time_pad((4194304,), 8, 'linear_ramp')
      8.88±0.02ms       9.05±0.1ms     1.02  bench_lib.Pad.time_pad((4194304,), (0, 32), 'wrap')
          806±3ms          821±3ms     1.02  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100000)
        872±0.7μs        888±0.6μs     1.02  bench_function_base.Sort.time_sort('heap', 'int16', ('ordered',))
      4.47±0.01μs      4.56±0.02μs     1.02  bench_avx.AVX_UFunc.time_ufunc('absolute', 1, 'f')
         1.31±0ms         1.34±0ms     1.02  bench_ufunc.UFunc.time_ufunc_types('remainder')
      5.06±0.01ms      5.16±0.02ms     1.02  bench_reduce.AddReduceSeparate.time_reduce(1, 'float16')
      61.8±0.09ms       62.9±0.1ms     1.02  bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <class 'object'>)
         2.00±0ms      2.04±0.01ms     1.02  bench_io.LoadtxtReadUint64Integers.time_read_uint64(550)
          533±3ms          543±3ms     1.02  bench_io.LoadtxtCSVStructured.time_loadtxt_csv_struct_dtype
       65.4±0.3ms       66.6±0.3ms     1.02  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10000)
      5.09±0.03ms      5.18±0.04ms     1.02  bench_lib.Nan.time_nanmedian(200000, 2.0)
         1.31±0ms         1.34±0ms     1.02  bench_ufunc.UFunc.time_ufunc_types('mod')
       66.6±0.4ms       67.8±0.3ms     1.02  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10000)
      58.4±0.06μs       59.4±0.9μs     1.02  bench_core.CorrConv.time_convolve(1000, 100, 'full')
      7.42±0.05μs      7.55±0.09μs     1.02  bench_avx.AVX_UFunc.time_ufunc('rint', 1, 'd')
      3.52±0.02ms      3.59±0.02ms     1.02  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 8, 'mean')
        337±0.1μs          343±3μs     1.02  bench_ufunc.UFunc.time_ufunc_types('positive')
         1.15±0ms         1.17±0ms     1.02  bench_function_base.Sort.time_sort('heap', 'int16', ('sorted_block', 10))
          584±3μs        594±0.9μs     1.02  bench_ufunc.UFunc.time_ufunc_types('minimum')
      5.16±0.01μs      5.25±0.06μs     1.02  bench_avx.AVX_UFunc.time_ufunc('ceil', 1, 'f')
      3.73±0.06ms      3.80±0.08ms     1.02  bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1
       88.1±0.2ms         89.6±1ms     1.02  bench_ma.Concatenate.time_it('ndarray+masked', 2000)
      4.92±0.01ms       5.01±0.2ms     1.02  bench_core.CorrConv.time_correlate(100000, 100, 'valid')
      1.65±0.04ms      1.68±0.03ms     1.02  bench_lib.Pad.time_pad((256, 128, 1), 8, 'mean')
          283±1μs        288±0.4μs     1.02  bench_indexing.Indexing.time_op('indexes_rand_', ':,I', '=1')
      4.04±0.01ms      4.10±0.01ms     1.02  bench_lib.Nan.time_nanquantile(200000, 0)
      4.93±0.01ms       5.01±0.2ms     1.02  bench_core.CorrConv.time_correlate(100000, 100, 'same')
       26.4±0.4μs       26.8±0.3μs     1.02  bench_io.Copy.time_memcpy('complex128')
          668±2ms          679±3ms     1.02  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100000)
      5.09±0.02ms      5.17±0.01ms     1.02  bench_io.Copy.time_memcpy_large_out_of_place('complex64')
      1.52±0.06ms      1.54±0.03ms     1.02  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint64', (4, 4))
      1.13±0.01ms      1.15±0.02ms     1.02  bench_lib.Nan.time_nanstd(200000, 0.1)
      2.05±0.01ms         2.08±0ms     1.02  bench_reduce.AddReduceSeparate.time_reduce(0, 'complex128')
       11.7±0.1μs      11.9±0.07μs     1.02  bench_io.Copy.time_memcpy('float64')
      13.6±0.05μs       13.8±0.2μs     1.01  bench_ma.Indexing.time_0d(True, 2, 10)
       65.7±0.3ms       66.6±0.3ms     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10000)
       80.2±0.5ms       81.4±0.5ms     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000)
       15.8±0.5μs       16.1±0.5μs     1.01  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 4, 'd')
      18.8±0.08ms       19.0±0.2ms     1.01  bench_records.Records.time_fromstring_formats_as_list
      4.16±0.01ms         4.21±0ms     1.01  bench_lib.Nan.time_nanmedian(200000, 0)
      4.62±0.02μs      4.68±0.01μs     1.01  bench_core.CountNonzero.time_count_nonzero(3, 100, <class 'object'>)
         1.30±0ms         1.32±0ms     1.01  bench_function_base.Sort.time_argsort('heap', 'int16', ('sorted_block', 1000))
      13.3±0.05μs      13.5±0.07μs     1.01  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 2, 'd')
      2.33±0.03ms      2.36±0.01ms     1.01  bench_lib.Pad.time_pad((256, 128, 1), 8, 'reflect')
          257±6μs          261±4μs     1.01  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint16', (2, 2))
      1.41±0.01ms      1.43±0.01ms     1.01  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(200)
      1.44±0.02ms      1.46±0.02ms     1.01  bench_lib.Nan.time_nanvar(200000, 2.0)
      6.00±0.05μs       6.08±0.1μs     1.01  bench_ufunc.CustomScalar.time_less_than_scalar2(<class 'numpy.float32'>)
         5.06±0ms       5.13±0.2ms     1.01  bench_core.CorrConv.time_correlate(100000, 100, 'full')
       5.47±0.3μs       5.54±0.1μs     1.01  bench_io.Copy.time_memcpy('float32')
      2.82±0.01ms      2.86±0.03ms     1.01  bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'bool'>)
       19.7±0.1μs       20.0±0.2μs     1.01  bench_io.Copy.time_strided_assign('complex128')
      5.18±0.03ms      5.25±0.03ms     1.01  bench_lib.Nan.time_nanpercentile(200000, 2.0)
       56.8±0.2ms       57.5±0.2ms     1.01  bench_linalg.Eindot.time_einsum_ijk_jil_kl
      2.04±0.07ms      2.07±0.07ms     1.01  bench_core.Temporaries.time_large2
      8.88±0.01ms      8.99±0.09ms     1.01  bench_lib.Pad.time_pad((4194304,), 8, 'constant')
       8.86±0.1μs       8.97±0.3μs     1.01  bench_avx.AVX_UFunc.time_ufunc('floor', 4, 'f')
      39.6±0.09ms       40.1±0.2ms     1.01  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3, 5, 7])
       66.9±0.2ms       67.7±0.3ms     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10000)
      8.87±0.01ms       8.98±0.1ms     1.01  bench_lib.Pad.time_pad((4194304,), 8, 'edge')
       129±0.09μs        131±0.4μs     1.01  bench_function_base.Sort.time_argsort('quick', 'int64', ('ordered',))
          642±2μs          650±5μs     1.01  bench_core.CountNonzero.time_count_nonzero_axis(3, 10000, <class 'object'>)
      32.7±0.02ms       33.1±0.2ms     1.01  bench_core.CountNonzero.time_count_nonzero(3, 1000000, <class 'object'>)
      4.26±0.02ms      4.31±0.01ms     1.01  bench_lib.Nan.time_nanpercentile(200000, 0)
      8.86±0.01ms       8.96±0.1ms     1.01  bench_lib.Pad.time_pad((4194304,), 1, 'edge')
      55.9±0.07μs         56.6±2μs     1.01  bench_core.CorrConv.time_correlate(1000, 100, 'full')
        317±0.3μs          321±6μs     1.01  bench_ufunc.UFunc.time_ufunc_types('negative')
          341±5μs          345±4μs     1.01  bench_lib.Nan.time_nansum(200000, 0)
      13.3±0.04ms      13.4±0.08ms     1.01  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(2000)
         85.2±3ms         86.2±3ms     1.01  bench_ma.Concatenate.time_it('unmasked+masked', 2000)
        138±0.5ms          139±1ms     1.01  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20000)
      50.3±0.05μs       50.8±0.7μs     1.01  bench_core.CorrConv.time_convolve(1000, 100, 'valid')
      3.20±0.03ms      3.24±0.03ms     1.01  bench_lib.Nan.time_nanquantile(200000, 2.0)
      1.29±0.01μs      1.30±0.01μs     1.01  bench_indexing.IndexingStructured0D.time_array_slice
      11.6±0.02ms      11.8±0.08ms     1.01  bench_lib.Pad.time_pad((4194304,), (0, 32), 'mean')
       72.4±0.2ms       73.2±0.5ms     1.01  bench_ma.Concatenate.time_it('ndarray', 2000)
        377±0.4μs        381±0.5μs     1.01  bench_ufunc.UFunc.time_ufunc_types('rad2deg')
       65.5±0.2ms       66.2±0.3ms     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10000)
          188±5μs          190±6μs     1.01  bench_core.CorrConv.time_correlate(1000, 1000, 'same')
       33.5±0.1ms      33.8±0.09ms     1.01  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3])
       12.3±0.2μs       12.4±0.1μs     1.01  bench_avx.AVX_UFunc.time_ufunc('square', 2, 'd')
          276±6μs          279±7μs     1.01  bench_core.CorrConv.time_correlate(1000, 1000, 'full')
          843±2μs          852±2μs     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100)
       19.6±0.1ms       19.8±0.1ms     1.01  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv(2)
      8.15±0.04μs      8.23±0.04μs     1.01  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 2, 'f')
          853±4ms          862±5ms     1.01  bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(0)
         1.82±0ms      1.84±0.02ms     1.01  bench_function_base.Sort.time_argsort('heap', 'float64', ('random',))
        118±0.7μs          119±5μs     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10)
          646±1ms          652±5ms     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100000)
          706±5μs          712±3μs     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100)
       55.1±0.2μs       55.6±0.8μs     1.01  bench_core.CorrConv.time_convolve(1000, 100, 'same')
        342±0.8ms        345±0.6ms     1.01  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'wrap')
          112±1μs        113±0.2μs     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10)
      8.87±0.09μs       8.95±0.2μs     1.01  bench_avx.AVX_UFunc.time_ufunc('trunc', 4, 'f')
      13.8±0.01μs      14.0±0.06μs     1.01  bench_avx.AVX_UFunc.time_ufunc('sqrt', 1, 'd')
         1.65±0ms         1.67±0ms     1.01  bench_lib.Nan.time_nanmin(200000, 50.0)
         615±20μs         620±30μs     1.01  bench_linalg.Eindot.time_matmul_trans_atc_a
      10.8±0.02ms      10.8±0.05ms     1.01  bench_linalg.Linalg.time_op('pinv', 'complex64')
      5.81±0.01μs      5.86±0.07μs     1.01  bench_avx.AVX_UFunc.time_ufunc('sqrt', 1, 'f')
        220±0.6ms          221±1ms     1.01  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'linear_ramp')
          673±1μs          679±2μs     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100)
        379±0.5μs          382±1μs     1.01  bench_ufunc.UFunc.time_ufunc_types('degrees')
       91.4±0.2μs         92.1±5μs     1.01  bench_core.CountNonzero.time_count_nonzero_axis(3, 10000, <class 'int'>)
      5.11±0.04ms      5.15±0.03ms     1.01  bench_io.Copy.time_memcpy_large_out_of_place('float64')
      9.06±0.05ms      9.13±0.08ms     1.01  bench_lib.Pad.time_pad((4194304,), 1, 'linear_ramp')
      11.6±0.03ms      11.7±0.07ms     1.01  bench_lib.Pad.time_pad((4194304,), 1, 'mean')
          360±6μs          363±4μs     1.01  bench_linalg.Eindot.time_inner_trans_a_ac
      2.04±0.02ms      2.05±0.03ms     1.01  bench_core.Temporaries.time_large
        116±0.5μs        117±0.2μs     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10)
      10.9±0.01ms      11.0±0.09ms     1.01  bench_core.CountNonzero.time_count_nonzero(1, 1000000, <class 'object'>)
      1.43±0.02ms         1.44±0ms     1.01  bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <class 'bool'>)
        588±0.5μs        593±0.8μs     1.01  bench_ufunc.UFunc.time_ufunc_types('maximum')
       35.8±0.3ms       36.1±0.3ms     1.01  bench_io.LoadtxtReadUint64Integers.time_read_uint64(10000)
         1.19±0ms      1.20±0.01ms     1.01  bench_lib.Nan.time_nansum(200000, 90.0)
        148±0.2μs        149±0.2μs     1.01  bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'bool'>)
         1.32±0ms         1.32±0ms     1.01  bench_function_base.Sort.time_sort('heap', 'int16', ('random',))
      12.7±0.03μs      12.8±0.09μs     1.01  bench_ma.Indexing.time_0d(False, 1, 100)
      52.5±0.05μs         52.8±2μs     1.01  bench_core.CorrConv.time_correlate(1000, 100, 'same')
         1.79±0ms         1.81±0ms     1.01  bench_ufunc.UFunc.time_ufunc_types('hypot')
      7.42±0.02ms      7.47±0.03ms     1.01  bench_linalg.Linalg.time_op('pinv', 'int32')
      8.90±0.04ms      8.96±0.07ms     1.01  bench_lib.Pad.time_pad((4194304,), (0, 32), 'edge')
      3.73±0.03ms      3.76±0.02ms     1.01  bench_linalg.Linalg.time_op('det', 'complex64')
          703±1μs          708±2μs     1.01  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100)
      6.77±0.07μs      6.81±0.01μs     1.01  bench_avx.AVX_UFunc.time_ufunc('absolute', 1, 'd')
        114±0.8μs        114±0.5μs     1.01  bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10)
         1.20±0ms         1.21±0ms     1.01  bench_function_base.Sort.time_sort('heap', 'int16', ('sorted_block', 100))
       15.4±0.4μs       15.5±0.4μs     1.01  bench_avx.AVX_UFunc.time_ufunc('floor', 4, 'd')
      2.55±0.01ms      2.56±0.01ms     1.01  bench_random.Shuffle.time_100000
          599±4μs          603±3μs     1.01  bench_lib.Nan.time_nanargmax(200000, 0.1)
      10.9±0.04μs      10.9±0.05μs     1.01  bench_ufunc.CustomScalar.time_divide_scalar2(<class 'numpy.float32'>)
         2.66±1ms         2.68±1ms     1.01  bench_shape_base.Block2D.time_block2d((512, 512), 'uint64', (2, 2))
      11.8±0.08ms      11.9±0.05ms     1.01  bench_lib.Pad.time_pad((4194304,), 8, 'mean')
         72.5±1μs         72.9±1μs     1.01  bench_shape_base.Block.time_3d(10, 'copy')
        198±0.4μs          199±1μs     1.01  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20)
      13.8±0.05μs       13.9±0.3μs     1.01  bench_ma.Indexing.time_0d(True, 2, 100)
        222±0.8μs        223±0.9μs     1.01  bench_core.CountNonzero.time_count_nonzero(3, 1000000, <class 'bool'>)
        104±0.2μs        105±0.2μs     1.01  bench_function_base.Sort.time_sort('quick', 'int64', ('uniform',))
          848±4ms          853±4ms     1.01  bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(500)
      6.14±0.01ms      6.17±0.01ms     1.01  bench_ufunc.UFunc.time_ufunc_types('arctanh')
        927±0.8μs          932±1μs     1.01  bench_function_base.Sort.time_sort('heap', 'int16', ('reversed',))
      3.31±0.02ms      3.33±0.01ms     1.01  bench_reduce.AddReduceSeparate.time_reduce(1, 'complex256')
      2.52±0.01ms         2.53±0ms     1.01  bench_io.Copy.time_memcpy_large_out_of_place('int16')
      2.85±0.04ms      2.87±0.04ms     1.01  bench_lib.Nan.time_nanstd(200000, 90.0)
        600±0.6μs        603±0.7μs     1.01  bench_ufunc.UFunc.time_ufunc_types('fmax')
      24.6±0.06μs      24.7±0.06μs     1.01  bench_function_base.Sort.time_argsort('merge', 'int16', ('ordered',))
      24.6±0.02μs      24.7±0.09μs     1.01  bench_function_base.Sort.time_argsort('merge', 'int16', ('uniform',))
         1.57±0ms         1.57±0ms     1.01  bench_random.Random.time_rng('poisson 10')
         5.54±0ms      5.57±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('log1p')
         1.77±0ms      1.78±0.01ms     1.00  bench_core.PackBits.time_packbits_axis0(<class 'numpy.uint64'>)
      4.67±0.01ms      4.69±0.02ms     1.00  bench_lib.Nan.time_nanquantile(200000, 0.1)
          288±2μs          289±5μs     1.00  bench_lib.Pad.time_pad((256, 128, 1), 1, 'edge')
         1.58±0ms      1.59±0.03ms     1.00  bench_reduce.AddReduceSeparate.time_reduce(0, 'complex64')
         93.0±1μs      93.4±0.05μs     1.00  bench_function_base.Sort.time_argsort('heap', 'float64', ('uniform',))
      12.9±0.01μs      12.9±0.02μs     1.00  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 1, 'd')
      7.66±0.03μs      7.70±0.04μs     1.00  bench_avx.AVX_UFunc.time_ufunc('ceil', 2, 'f')
        225±0.5ms          226±1ms     1.00  bench_import.Import.time_ma
       27.1±0.4μs       27.2±0.3μs     1.00  bench_io.CopyTo.time_copyto
      3.28±0.08μs       3.30±0.1μs     1.00  bench_io.Copy.time_cont_assign('int16')
      1.45±0.01ms      1.46±0.02ms     1.00  bench_lib.Nan.time_nanstd(200000, 2.0)
      9.08±0.04ms      9.12±0.07ms     1.00  bench_lib.Pad.time_pad((4194304,), (0, 32), 'linear_ramp')
         2.48±0ms         2.49±0ms     1.00  bench_ufunc.UFunc.time_ufunc_types('sqrt')
      3.05±0.05ms      3.06±0.07ms     1.00  bench_lib.Nan.time_nancumsum(200000, 2.0)
      1.28±0.01ms         1.29±0ms     1.00  bench_io.Copy.time_memcpy_large_out_of_place('int8')
      7.19±0.02ms      7.22±0.04ms     1.00  bench_linalg.Linalg.time_op('svd', 'int64')
      30.9±0.08μs      31.0±0.08μs     1.00  bench_function_base.Sort.time_argsort('merge', 'int64', ('reversed',))
       60.8±0.1μs       61.0±0.6μs     1.00  bench_function_base.Sort.time_sort('merge', 'int16', ('random',))
        189±0.3μs        190±0.4μs     1.00  bench_indexing.Indexing.time_op('indexes_', ':,I', '')
      2.90±0.05ms      2.91±0.05ms     1.00  bench_lib.Nan.time_nancumprod(200000, 0.1)
      7.16±0.03ms      7.19±0.04ms     1.00  bench_linalg.Linalg.time_op('svd', 'int16')
          704±2μs        707±0.5μs     1.00  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint64'>, 2047])
      17.1±0.02ms      17.2±0.06ms     1.00  bench_reduce.AddReduce.time_axis_1
      9.04±0.05ms      9.07±0.02ms     1.00  bench_lib.Pad.time_pad((4194304,), (0, 32), 'reflect')
      5.00±0.07ms      5.01±0.07ms     1.00  bench_lib.Nan.time_nanvar(200000, 50.0)
      7.69±0.02μs      7.72±0.01μs     1.00  bench_avx.AVX_UFunc.time_ufunc('trunc', 2, 'f')
      2.34±0.03ms      2.34±0.01ms     1.00  bench_lib.Pad.time_pad((256, 128, 1), 8, 'edge')
          280±8μs          281±8μs     1.00  bench_core.CorrConv.time_convolve(1000, 1000, 'full')
          675±2ms          677±2ms     1.00  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100000)
       63.1±0.3ms       63.3±0.2ms     1.00  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10000)
          117±3μs        118±0.3μs     1.00  bench_indexing.Indexing.time_op('indexes_', 'I', '')
          669±3μs          671±3μs     1.00  bench_ufunc.UFunc.time_ufunc_types('rint')
      6.71±0.03ms      6.74±0.04ms     1.00  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 8, 'edge')
       6.34±0.7ms       6.36±0.7ms     1.00  bench_lib.Nan.time_nanpercentile(200000, 50.0)
        180±300μs        180±200μs     1.00  bench_shape_base.Block2D.time_block2d((256, 256), 'uint64', (4, 4))
       6.24±0.7ms       6.26±0.7ms     1.00  bench_lib.Nan.time_nanmedian(200000, 50.0)
          397±2μs          398±2μs     1.00  bench_ufunc.UFunc.time_ufunc_types('square')
        216±0.6μs         216±20μs     1.00  bench_linalg.Eindot.time_dot_trans_a_at
        435±0.8μs        436±0.9μs     1.00  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 10000, <class 'object'>)
      2.90±0.04ms      2.91±0.06ms     1.00  bench_lib.Nan.time_nancumprod(200000, 0)
        681±0.5μs        683±0.5μs     1.00  bench_function_base.Sort.time_sort('quick', 'int64', ('sorted_block', 10))
       3.98±0.7ms       3.99±0.7ms     1.00  bench_lib.Nan.time_nanmedian(200000, 90.0)
      1.12±0.01ms      1.12±0.02ms     1.00  bench_lib.Nan.time_nanstd(200000, 0)
        706±0.6μs          708±1μs     1.00  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint64'>, 1535])
       8.74±0.1μs      8.76±0.08μs     1.00  bench_avx.AVX_UFunc.time_ufunc('absolute', 4, 'f')
      10.3±0.04ms      10.4±0.06ms     1.00  bench_linalg.Linalg.time_op('svd', 'complex64')
         249±30ms        250±100ms     1.00  bench_shape_base.Block.time_3d(100, 'copy')
        767±0.8ms          768±2ms     1.00  bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(10000)
      7.19±0.02ms      7.21±0.03ms     1.00  bench_linalg.Linalg.time_op('svd', 'int32')
         1.76±0ms      1.77±0.01ms     1.00  bench_random.RNG.time_raw('Philox')
         2.53±0ms         2.54±0ms     1.00  bench_core.CountNonzero.time_count_nonzero(1, 1000000, <class 'int'>)
      77.8±0.05μs      78.0±0.07μs     1.00  bench_core.CountNonzero.time_count_nonzero(3, 10000, <class 'int'>)
      52.4±0.03μs      52.5±0.03μs     1.00  bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'int'>)
        117±0.1μs          117±1μs     1.00  bench_core.Core.time_array_l_view
        861±0.6μs          863±1μs     1.00  bench_lib.Nan.time_nanmax(200000, 90.0)
        833±0.8μs        835±0.8μs     1.00  bench_random.Random.time_rng('weibull 1')
          192±7μs          192±6μs     1.00  bench_core.CorrConv.time_convolve(1000, 1000, 'same')
          347±5μs          348±4μs     1.00  bench_lib.Nan.time_nansum(200000, 0.1)
       48.0±0.2μs         48.1±2μs     1.00  bench_core.CorrConv.time_correlate(1000, 100, 'valid')
          761±2μs          762±4μs     1.00  bench_lib.Nan.time_nanargmax(200000, 2.0)
          453±3ms          454±3ms     1.00  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'reflect')
        225±0.6ms        226±0.5ms     1.00  bench_import.Import.time_fft
         5.20±0ms         5.21±0ms     1.00  bench_ufunc.UFunc.time_ufunc_types('log2')
      8.89±0.02ms      8.90±0.02ms     1.00  bench_lib.Pad.time_pad((4194304,), 1, 'reflect')
          946±6μs          947±4μs     1.00  bench_reduce.AddReduceSeparate.time_reduce(0, 'int16')
      2.84±0.04ms      2.84±0.05ms     1.00  bench_lib.Nan.time_nanvar(200000, 90.0)
       68.2±0.4μs       68.3±0.2μs     1.00  bench_io.CopyTo.time_copyto_sparse
      2.32±0.01ms      2.33±0.01ms     1.00  bench_lib.Nan.time_nansum(200000, 50.0)
         76.6±1μs         76.7±1μs     1.00  bench_core.Temporaries.time_mid
      3.95±0.09ms      3.96±0.01ms     1.00  bench_lib.Pad.time_pad((256, 128, 1), 8, 'wrap')
         2.38±0ms         2.38±0ms     1.00  bench_random.Randint_dtype.time_randint_slow('uint16')
      7.15±0.02ms      7.16±0.02ms     1.00  bench_linalg.Linalg.time_op('svd', 'float64')
          623±2μs          624±3μs     1.00  bench_core.CorrConv.time_correlate(100000, 10, 'full')
          601±2μs         602±10μs     1.00  bench_core.CorrConv.time_correlate(100000, 10, 'same')
      18.1±0.01ms         18.1±0ms     1.00  bench_ufunc.UFunc.time_ufunc_types('float_power')
         211±10μs          211±2μs     1.00  bench_linalg.Eindot.time_matmul_trans_a_at
        264±500μs        264±500μs     1.00  bench_shape_base.Block2D.time_block2d((512, 512), 'uint32', (2, 2))
      10.6±0.01ms      10.6±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('arcsin')
      6.16±0.02ms      6.17±0.03ms     1.00  bench_lib.Nan.time_nanprod(200000, 90.0)
         1.88±0ms         1.88±0ms     1.00  bench_random.Randint_dtype.time_randint_slow('uint8')
      27.1±0.02μs      27.1±0.09μs     1.00  bench_core.CountNonzero.time_count_nonzero(1, 10000, <class 'int'>)
      3.54±0.01μs      3.54±0.01μs     1.00  bench_core.CountNonzero.time_count_nonzero(2, 100, <class 'object'>)
       94.2±0.3μs       94.3±0.4μs     1.00  bench_ufunc.CustomInplace.time_float_add_temp
          341±1μs          341±1μs     1.00  bench_ufunc.UFunc.time_ufunc_types('trunc')
          763±4μs          763±3μs     1.00  bench_lib.Nan.time_nanargmin(200000, 2.0)
          706±1μs        707±0.9μs     1.00  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint64'>, 1024])
      13.8±0.05μs       13.8±0.1μs     1.00  bench_ma.Indexing.time_0d(True, 1, 10)
       76.0±0.1μs       76.0±0.2μs     1.00  bench_core.PackBits.time_packbits(<class 'numpy.uint64'>)
      2.19±0.03μs      2.19±0.02μs     1.00  bench_indexing.IndexingStructured0D.time_scalar_slice
         697±20μs         697±30μs     1.00  bench_shape_base.Block.time_block_complicated(100)
      7.56±0.07μs      7.57±0.03μs     1.00  bench_ufunc.CustomScalar.time_add_scalar2(<class 'numpy.float32'>)
      7.42±0.02ms      7.43±0.02ms     1.00  bench_linalg.Linalg.time_op('pinv', 'int64')
          375±1μs          375±2μs     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint16'>, 95])
      14.1±0.01ms      14.1±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('tan')
        461±0.8μs        461±0.3μs     1.00  bench_ufunc.UFunc.time_ufunc_types('lcm')
      10.7±0.06ms      10.7±0.05ms     1.00  bench_linalg.Linalg.time_op('pinv', 'complex128')
         1.20±0ms         1.20±0ms     1.00  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint8'>, 127])
      10.3±0.04ms      10.3±0.07ms     1.00  bench_linalg.Linalg.time_op('svd', 'complex128')
         1.35±0ms         1.35±0ms     1.00  bench_reduce.AddReduceSeparate.time_reduce(1, 'int16')
          149±1μs          149±1μs     1.00  bench_ma.Concatenate.time_it('unmasked+masked', 100)
         1.82±0ms      1.82±0.01ms     1.00  bench_random.RNG.time_raw('MT19937')
        230±0.4ms        230±0.4ms     1.00  bench_import.Import.time_numpy_inspect
          303±3μs          303±2μs     1.00  bench_ufunc.UFunc.time_ufunc_types('logical_not')
        226±0.3ms        226±0.6ms     1.00  bench_import.Import.time_random
       24.3±0.2μs      24.3±0.02μs     1.00  bench_ufunc.CustomScalar.time_divide_scalar2(<class 'numpy.float64'>)
          812±2μs         812±30μs     1.00  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint32'>, 2047])
          5.07±0s          5.07±0s     1.00  bench_random.Choice.time_legacy_choice(100000000.0)
         7.57±0ms         7.57±0ms     1.00  bench_core.CountNonzero.time_count_nonzero(3, 1000000, <class 'int'>)
         1.58±0ms      1.58±0.01ms     1.00  bench_reduce.AddReduceSeparate.time_reduce(1, 'longfloat')
      7.45±0.03ms      7.45±0.04ms     1.00  bench_linalg.Linalg.time_op('pinv', 'float64')
        302±0.3μs        302±0.4μs     1.00  bench_ufunc.UFunc.time_ufunc_types('isfinite')
       2.09±0.2ms       2.09±0.2ms     1.00  bench_core.CorrConv.time_correlate(1000, 10000, 'valid')
        225±0.7ms        225±0.7ms     1.00  bench_import.Import.time_linalg
        664±0.4μs          664±1μs     1.00  bench_function_base.Sort.time_argsort('quick', 'float64', ('sorted_block', 1000))
        134±0.5μs        134±0.3μs     1.00  bench_indexing.Indexing.time_op('indexes_', 'I', '=1')
      9.74±0.01ms      9.74±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('arccosh')
      8.45±0.02ms       8.45±0.4ms     1.00  bench_io.Copy.time_memcpy_large_out_of_place('complex128')
      15.6±0.07ms      15.6±0.02ms     1.00  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 8, 'reflect')
      27.8±0.02μs       27.8±0.2μs     1.00  bench_io.Copy.time_cont_assign('complex128')
         4.01±0ms      4.01±0.01ms     1.00  bench_random.Randint_dtype.time_randint_slow('uint64')
      3.05±0.08ms      3.05±0.04ms     1.00  bench_lib.Nan.time_nancumprod(200000, 2.0)
        857±0.5μs        856±0.9μs     1.00  bench_lib.Nan.time_nanmin(200000, 90.0)
      4.91±0.01ms      4.91±0.02ms     1.00  bench_ufunc.UFunc.time_ufunc_types('log')
      1.69±0.01μs         1.69±0μs     1.00  bench_core.CountNonzero.time_count_nonzero(2, 100, <class 'int'>)
      3.06±0.01ms         3.06±0ms     1.00  bench_random.Randint_dtype.time_randint_slow('uint32')
      9.44±0.01ms      9.43±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('tanh')
         1.44±0ms         1.44±0ms     1.00  bench_random.RNG.time_raw('PCG64')
       21.4±0.2ms       21.4±0.1ms     1.00  bench_linalg.Eindot.time_einsum_ij_jk_a_b
         1.38±0ms         1.38±0ms     1.00  bench_ufunc.UFunc.time_ufunc_types('absolute')
       6.15±0.6ms       6.15±0.6ms     1.00  bench_lib.Nan.time_nanquantile(200000, 50.0)
         5.06±0ms         5.06±0ms     1.00  bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'int'>)
       7.89±0.2ms       7.88±0.1ms     1.00  bench_lib.Nan.time_nancumprod(200000, 90.0)
      10.5±0.09μs      10.5±0.04μs     1.00  bench_io.Copy.time_strided_assign('float64')
         75.9±1μs         75.8±1μs     1.00  bench_core.Temporaries.time_mid2
         3.07±0ms         3.06±0ms     1.00  bench_random.Randint.time_randint_slow
          627±2μs          627±1μs     1.00  bench_core.CorrConv.time_convolve(100000, 10, 'full')
         1.15±0ms         1.15±0ms     1.00  bench_random.Randint_dtype.time_randint_fast('uint64')
       76.0±0.4μs       75.9±0.2μs     1.00  bench_core.CountNonzero.time_count_nonzero(1, 1000000, <class 'bool'>)
      8.41±0.05ms      8.40±0.07ms     1.00  bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'wrap')
          532±4μs          532±4μs     1.00  bench_ufunc.UFunc.time_ufunc_types('subtract')
        117±0.4μs        116±0.3μs     1.00  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10)
      13.9±0.07μs       13.9±0.2μs     1.00  bench_ma.Indexing.time_1d(True, 1, 10)
         2.60±0ms         2.60±0ms     1.00  bench_random.RNG.time_normal_zig('Philox')
      7.46±0.02ms      7.45±0.08ms     1.00  bench_linalg.Linalg.time_op('pinv', 'int16')
        226±0.6ms        225±0.3ms     1.00  bench_import.Import.time_numpy
      1.46±0.01ms      1.46±0.01ms     1.00  bench_lib.Nan.time_nanargmax(200000, 90.0)
       78.7±0.2μs       78.6±0.3μs     1.00  bench_ufunc.CustomInplace.time_char_or_temp
         1.34±0ms         1.34±0ms     1.00  bench_reduce.AddReduceSeparate.time_reduce(1, 'int32')
          584±1μs          583±3μs     1.00  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint32'>, 1535])
          544±2μs          543±2μs     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint32'>, 1535])
      9.49±0.04μs      9.48±0.03μs     1.00  bench_avx.AVX_UFunc.time_ufunc('reciprocal', 4, 'f')
          598±3μs          597±3μs     1.00  bench_lib.Nan.time_nanargmin(200000, 0)
       2.38±0.2ms       2.38±0.2ms     1.00  bench_core.CorrConv.time_correlate(1000, 10000, 'full')
          597±3μs          596±3μs     1.00  bench_lib.Nan.time_nanargmax(200000, 0)
       2.29±0.2ms       2.29±0.2ms     1.00  bench_core.CorrConv.time_correlate(1000, 10000, 'same')
         5.57±0ms      5.57±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('log10')
        291±0.2μs          291±2μs     1.00  bench_ufunc.UFunc.time_ufunc_types('conj')
          541±3μs          540±2μs     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint64'>, 1535])
         1.38±0ms         1.38±0ms     1.00  bench_ufunc.UFunc.time_ufunc_types('abs')
      14.2±0.02μs      14.1±0.02μs     1.00  bench_avx.AVX_UFunc.time_ufunc('sqrt', 2, 'd')
      1.46±0.01ms      1.46±0.01ms     1.00  bench_lib.Nan.time_nanargmin(200000, 90.0)
          227±1ms        226±0.6ms     1.00  bench_import.Import.time_matlib
        694±0.7μs        693±0.6μs     1.00  bench_random.RNG.time_32bit('Philox')
         2.16±0ms         2.16±0ms     1.00  bench_random.RNG.time_normal_zig('PCG64')
        132±0.3μs        132±0.3μs     1.00  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10)
          605±1μs          604±1μs     1.00  bench_core.CorrConv.time_convolve(100000, 10, 'same')
          933±2μs          931±1μs     1.00  bench_reduce.AddReduceSeparate.time_reduce(1, 'int64')
          461±1μs        460±0.9μs     1.00  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <class 'str'>)
       18.1±0.4ms       18.1±0.4ms     1.00  bench_lib.Nan.time_nancumprod(200000, 50.0)
       59.9±0.3μs       59.8±0.3μs     1.00  bench_ufunc.CustomInplace.time_char_or
         2.38±0ms         2.38±0ms     1.00  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint16'>, 1024])
        436±0.9μs        435±0.7μs     1.00  bench_random.RNG.time_64bit('SFC64')
        315±0.8μs        314±0.6μs     1.00  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 10000, <class 'str'>)
      7.61±0.09μs      7.59±0.04μs     1.00  bench_avx.AVX_UFunc.time_ufunc('absolute', 2, 'f')
        181±0.3μs        181±0.6μs     1.00  bench_ufunc.UFunc.time_ufunc_types('right_shift')
      17.0±0.04ms      17.0±0.04ms     1.00  bench_lib.Nan.time_nanprod(200000, 50.0)
      12.9±0.06μs       12.9±0.2μs     1.00  bench_ma.Indexing.time_0d(False, 2, 1000)
       2.24±0.2ms       2.24±0.2ms     1.00  bench_core.CorrConv.time_convolve(1000, 10000, 'same')
         72.2±4ms         72.1±4ms     1.00  bench_ma.Concatenate.time_it('unmasked', 2000)
          364±4μs          363±2μs     1.00  bench_linalg.Eindot.time_dot_trans_a_atc
          244±1μs          244±2μs     1.00  bench_function_base.Bincount.time_weights
      9.28±0.01ms         9.26±0ms     1.00  bench_ufunc.UFunc.time_ufunc_types('expm1')
        449±0.3μs        448±0.6μs     1.00  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint16'>, 1024])
      10.5±0.02ms      10.5±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('arccos')
          601±1μs          599±2μs     1.00  bench_core.CorrConv.time_correlate(100000, 10, 'valid')
         2.34±0μs      2.34±0.01μs     1.00  bench_core.CountNonzero.time_count_nonzero(1, 100, <class 'object'>)
         1.25±0ms         1.24±0ms     1.00  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint8'>, 64])
          789±1μs        787±0.9μs     1.00  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint64'>, 2047])
         1.87±0ms         1.86±0ms     1.00  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint8'>, 64])
      3.70±0.08ms      3.69±0.09ms     1.00  bench_lib.Nan.time_nancumsum(200000, 90.0)
          692±1μs        690±0.6μs     1.00  bench_random.Randint_dtype.time_randint_fast('uint32')
         1.63±0ms         1.63±0ms     1.00  bench_ufunc.UFunc.time_ufunc_types('divmod')
      11.8±0.01ms      11.7±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('cos')
         1.48±0ms         1.47±0ms     1.00  bench_core.PackBits.time_packbits_axis1(<class 'numpy.uint64'>)
          709±2μs          707±1μs     1.00  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint64'>, 95])
          963±2μs        961±0.8μs     1.00  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint8'>, 95])
          642±2μs          641±1μs     1.00  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <class 'object'>)
      25.5±0.07μs       25.5±0.2μs     1.00  bench_io.Copy.time_strided_copy('float64')
         1.65±0ms         1.65±0ms     1.00  bench_lib.Nan.time_nanmax(200000, 50.0)
      18.0±0.02μs      17.9±0.03μs     1.00  bench_io.Copy.time_strided_copy('int8')
         1.13±0ms         1.13±0ms     1.00  bench_random.RNG.time_64bit('MT19937')
       46.8±0.4μs       46.7±0.4μs     1.00  bench_linalg.Eindot.time_matmul_d_matmul_b_c
      25.1±0.04μs      25.1±0.04μs     1.00  bench_function_base.Sort.time_argsort('merge', 'int64', ('uniform',))
         1.13±0ms         1.13±0ms     1.00  bench_random.RNG.time_64bit('numpy')
         2.83±0ms         2.82±0ms     1.00  bench_random.RNG.time_normal_zig('MT19937')
        438±0.5μs          437±1μs     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint16'>, 1535])
         1.43±0ms         1.42±0ms     1.00  bench_function_base.Sort.time_argsort('heap', 'int16', ('sorted_block', 100))
      9.46±0.01ms      9.43±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('arcsinh')
      20.1±0.08μs      20.0±0.04μs     1.00  bench_io.Copy.time_strided_copy('float32')
          226±1μs          225±1μs     1.00  bench_function_base.Bincount.time_bincount
      14.0±0.06μs       14.0±0.1μs     1.00  bench_ma.Indexing.time_1d(True, 1, 1000)
         1.23±0ms         1.22±0ms     1.00  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint8'>, 64])
        757±0.6μs          755±1μs     1.00  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint32'>, 1535])
      2.52±0.02ms      2.51±0.03ms     1.00  bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <class 'int'>)
         23.0±2ms         22.9±2ms     1.00  bench_core.CorrConv.time_convolve(100000, 1000, 'full')
         1.08±0ms         1.08±0ms     1.00  bench_random.RNG.time_64bit('Philox')
       2.33±0.2ms       2.32±0.2ms     1.00  bench_core.CorrConv.time_convolve(1000, 10000, 'full')
       96.5±0.2μs       96.2±0.1μs     1.00  bench_io.CopyTo.time_copyto_8_sparse
      18.6±0.06μs      18.5±0.05μs     1.00  bench_io.Copy.time_strided_copy('int16')
          258±4ms          257±3ms     1.00  bench_io.Savez.time_vb_savez_squares
          998±1μs          995±2μs     1.00  bench_linalg.Eindot.time_einsum_i_ij_j
         1.65±0ms         1.64±0ms     1.00  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint8'>, 95])
         1.29±0ms         1.28±0ms     1.00  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint8'>, 127])
         1.79±0ms         1.79±0ms     1.00  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint8'>, 95])
      34.1±0.08ms      34.0±0.03ms     1.00  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <class 'str'>)
      12.3±0.04ms      12.3±0.07ms     1.00  bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'linear_ramp')
          605±3μs          603±1μs     1.00  bench_core.CorrConv.time_convolve(100000, 10, 'valid')
      14.1±0.06μs       14.0±0.1μs     1.00  bench_ma.Indexing.time_1d(True, 1, 100)
         10.4±0ms      10.4±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('cosh')
         1.16±0ms         1.15±0ms     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint8'>, 127])
          707±1μs         705±10μs     1.00  bench_ufunc.UFunc.time_ufunc_types('modf')
          543±2μs          541±1μs     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint64'>, 1024])
      2.91±0.04ms      2.90±0.05ms     1.00  bench_lib.Nan.time_nancumsum(200000, 0)
        437±0.9μs        435±0.1μs     1.00  bench_random.RNG.time_32bit('SFC64')
          166±1ms        166±0.4ms     1.00  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'edge')
      2.57±0.02ms      2.56±0.02ms     1.00  bench_lib.Nan.time_nanargmax(200000, 50.0)
          632±7μs          630±2μs     1.00  bench_reduce.AddReduceSeparate.time_reduce(0, 'float64')
        421±0.5μs        420±0.2μs     1.00  bench_reduce.AddReduceSeparate.time_reduce(1, 'float32')
       3.88±0.7ms       3.87±0.7ms     1.00  bench_lib.Nan.time_nanquantile(200000, 90.0)
          543±2μs          541±3μs     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint32'>, 2047])
         22.7±2ms         22.6±2ms     1.00  bench_core.CorrConv.time_convolve(100000, 1000, 'valid')
        560±0.4μs        558±0.3μs     1.00  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint64'>, 2047])
      24.9±0.07μs       24.8±0.1μs     1.00  bench_ufunc.Custom.time_nonzero
       2.06±0.2ms       2.05±0.2ms     1.00  bench_core.CorrConv.time_convolve(1000, 10000, 'valid')
      4.74±0.09ms       4.72±0.1ms     1.00  bench_lib.Nan.time_nancumsum(200000, 50.0)
         1.32±0ms         1.32±0ms     1.00  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint8'>, 127])
        415±0.7μs        413±0.6μs     1.00  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint16'>, 95])
      4.84±0.01ms      4.82±0.01ms     1.00  bench_ma.UFunc.time_2d(True, False, 1000)
         1.61±0ms         1.60±0ms     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint8'>, 95])
         1.06±0ms         1.06±0ms     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint8'>, 64])
         22.9±2ms         22.8±2ms     1.00  bench_core.CorrConv.time_convolve(100000, 1000, 'same')
          255±1μs        254±0.6μs     1.00  bench_random.Permutation.time_permutation_1d
         1.22±0ms         1.21±0ms     1.00  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint16'>, 95])
      4.16±0.01ms      4.14±0.01ms     1.00  bench_random.RNG.time_normal_zig('numpy')
        228±0.6μs        227±0.4μs     1.00  bench_lib.Nan.time_nanmax(200000, 0.1)
      2.57±0.01ms      2.56±0.01ms     1.00  bench_lib.Nan.time_nanargmin(200000, 50.0)
        295±0.2μs        294±0.4μs     1.00  bench_ufunc.UFunc.time_ufunc_types('gcd')
          545±2μs          543±2μs     1.00  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint32'>, 1024])
         1.22±0ms         1.22±0ms     1.00  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint16'>, 1535])
          869±4μs          865±1μs     1.00  bench_reduce.AddReduceSeparate.time_reduce(1, 'complex64')
         1.78±0ms         1.77±0ms     1.00  bench_random.RNG.time_normal_zig('SFC64')
      1.13±0.01ms      1.13±0.01ms     1.00  bench_linalg.Eindot.time_dot_trans_at_a
       67.6±0.7ms       67.3±0.2ms     1.00  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'constant')
         11.3±0ms         11.2±0ms     1.00  bench_ufunc.UFunc.time_ufunc_types('sinh')
        603±0.5μs        601±0.2μs     1.00  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint16'>, 1535])
        560±0.7μs        558±0.7μs     1.00  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint64'>, 1024])
          484±1μs        482±0.2μs     1.00  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint16'>, 1535])
        814±0.9μs          811±1μs     1.00  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint32'>, 1535])
          493±1μs        491±0.4μs     1.00  bench_function_base.Sort.time_sort('quick', 'int64', ('sorted_block', 1000))
       5.49±0.1μs      5.47±0.03μs     1.00  bench_io.Copy.time_cont_assign('float32')
       28.1±0.2μs       28.0±0.3μs     1.00  bench_function_base.Where.time_2
          759±2μs          755±1μs     1.00  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint32'>, 1024])
          667±1μs        664±0.6μs     1.00  bench_function_base.Sort.time_sort('quick', 'int64', ('sorted_block', 100))
       42.4±0.2ms       42.2±0.2ms     1.00  bench_records.Records.time_fromstring_formats_as_string
          790±1μs        786±0.7μs     1.00  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint64'>, 1535])
          502±5μs          500±4μs     1.00  bench_lib.Nan.time_nansum(200000, 2.0)
      1.51±0.01ms         1.51±0ms     1.00  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint64'>, 1535])
          907±3μs          903±3μs     1.00  bench_indexing.Indexing.time_op('indexes_', 'np.ix_(I, I)', '=1')
          667±1μs        664±0.5μs     1.00  bench_random.RNG.time_32bit('numpy')
          568±4μs          565±4μs     1.00  bench_ufunc.UFunc.time_ufunc_types('logical_xor')
      3.07±0.01ms         3.06±0ms     1.00  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint32'>, 1024])
      11.5±0.01ms      11.5±0.02ms     1.00  bench_ufunc.UFunc.time_ufunc_types('sin')
      18.7±0.03ms      18.6±0.01ms     1.00  bench_ufunc.UFunc.time_ufunc_types('power')
          700±3μs          697±3μs     1.00  bench_lib.Nan.time_nanprod(200000, 0.1)
          484±1μs          482±1μs     1.00  bench_random.Randint_dtype.time_randint_fast('uint16')
          817±1μs        813±0.8μs     1.00  bench_function_base.Sort.time_sort('quick', 'int64', ('random',))
         5.10±0ms      5.08±0.01ms     1.00  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <class 'int'>)
         1.11±0ms         1.10±0ms     1.00  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint8'>, 64])
         1.25±0ms         1.24±0ms     1.00  bench_random.RNG.time_raw('SFC64')
      5.10±0.04ms      5.07±0.03ms     1.00  bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'int'>)
          584±3μs        581±0.7μs     0.99  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint32'>, 1024])
      8.10±0.01ms      8.06±0.01ms     0.99  bench_ufunc.UFunc.time_ufunc_types('exp2')
      3.08±0.02ms         3.06±0ms     0.99  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint64'>, 1024])
      5.04±0.06ms      5.01±0.07ms     0.99  bench_lib.Nan.time_nanstd(200000, 50.0)
         1.78±0ms         1.77±0ms     0.99  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint8'>, 95])
         2.95±0ms      2.93±0.01ms     0.99  bench_ufunc.UFunc.time_ufunc_types('arctan2')
        292±0.5μs          290±3μs     0.99  bench_ufunc.UFunc.time_ufunc_types('conjugate')
        560±0.8μs        557±0.5μs     0.99  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint64'>, 95])
          590±1μs        587±0.6μs     0.99  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint16'>, 2047])
       15.3±0.3μs      15.3±0.08μs     0.99  bench_avx.AVX_UFunc.time_ufunc('rint', 4, 'd')
          791±2μs          787±1μs     0.99  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint64'>, 1024])
       10.6±0.1μs       10.6±0.1μs     0.99  bench_io.Copy.time_strided_assign('complex64')
       40.2±0.3ms       40.0±0.2ms     0.99  bench_core.CountNonzero.time_count_nonzero(3, 1000000, <class 'str'>)
      8.93±0.04ms      8.88±0.01ms     0.99  bench_lib.Pad.time_pad((4194304,), (0, 32), 'constant')
      7.25±0.03ms      7.21±0.03ms     0.99  bench_linalg.Linalg.time_op('svd', 'float32')
      8.94±0.04ms      8.89±0.01ms     0.99  bench_lib.Pad.time_pad((4194304,), 8, 'wrap')
      17.1±0.06ms      17.0±0.03ms     0.99  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 1000000, <class 'str'>)
        559±0.5μs        556±0.5μs     0.99  bench_random.RNG.time_64bit('PCG64')
        277±0.2μs        275±0.2μs     0.99  bench_ufunc.UFunc.time_ufunc_types('copysign')
          791±3μs        786±0.9μs     0.99  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint64'>, 95])
          593±1μs        589±0.3μs     0.99  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint16'>, 1535])
          406±5μs        403±0.6μs     0.99  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint16'>, 1024])
         1.94±0μs      1.93±0.01μs     0.99  bench_core.CountNonzero.time_count_nonzero(3, 100, <class 'int'>)
       86.7±0.9ms       86.2±0.3ms     0.99  bench_records.Records.time_fromarrays_formats_as_list
      25.4±0.03μs      25.2±0.04μs     0.99  bench_function_base.Sort.time_argsort('merge', 'int64', ('ordered',))
      4.85±0.01ms      4.82±0.01ms     0.99  bench_ma.UFunc.time_2d(True, True, 1000)
        579±0.8μs        576±0.8μs     0.99  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint16'>, 2047])
          412±1μs        410±0.2μs     0.99  bench_random.RNG.time_32bit('PCG64')
         2.19±0ms         2.18±0ms     0.99  bench_ufunc.UFunc.time_ufunc_types('floor_divide')
          815±1μs          810±1μs     0.99  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint32'>, 1024])
          760±1μs          755±2μs     0.99  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint32'>, 2047])
      13.9±0.03μs      13.8±0.05μs     0.99  bench_ma.Indexing.time_0d(True, 2, 1000)
       69.5±0.3ms       69.1±0.5ms     0.99  bench_ufunc.UFunc.time_ufunc_types('matmul')
      4.01±0.02ms      3.99±0.02ms     0.99  bench_lib.Nan.time_nanpercentile(200000, 0.1)
      34.2±0.06ms      34.0±0.06ms     0.99  bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <class 'str'>)
      17.1±0.05ms      17.0±0.01ms     0.99  bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <class 'str'>)
          542±2μs        539±0.9μs     0.99  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint64'>, 95])
       11.9±0.2ms      11.8±0.05ms     0.99  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 8, 'wrap')
        689±0.3μs          685±2μs     0.99  bench_random.Randint.time_randint_fast
        378±0.2μs        375±0.9μs     0.99  bench_ufunc.UFunc.time_ufunc_types('fabs')
       10.4±0.1μs      10.3±0.03μs     0.99  bench_io.Copy.time_cont_assign('complex64')
          785±2μs        779±0.7μs     0.99  bench_random.Random.time_rng('binomial 10 0.5')
          607±2μs          603±4μs     0.99  bench_lib.Nan.time_nanargmin(200000, 0.1)
        703±0.9μs        698±0.1μs     0.99  bench_ufunc.UFunc.time_ufunc_types('reciprocal')
          471±2μs        468±0.7μs     0.99  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint16'>, 2047])
          775±1μs        770±0.9μs     0.99  bench_indexing.Indexing.time_op('indexes_', 'np.ix_(I, I)', '')
          436±2μs        433±0.2μs     0.99  bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <class 'object'>)
        513±0.7μs        510±0.4μs     0.99  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint16'>, 95])
      1.52±0.01ms         1.51±0ms     0.99  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint64'>, 95])
        423±0.9μs        420±0.5μs     0.99  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint16'>, 2047])
        381±0.2μs        379±0.2μs     0.99  bench_ufunc.UFunc.time_ufunc_types('radians')
        307±0.6μs        305±0.9μs     0.99  bench_random.Randint_dtype.time_randint_fast('uint8')
        560±0.7μs          556±1μs     0.99  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint64'>, 1535])
          493±7μs          489±4μs     0.99  bench_core.Indices.time_indices
      2.94±0.01μs      2.92±0.01μs     0.99  bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'bool'>)
          556±1μs        552±0.9μs     0.99  bench_random.Bounded.time_bounded('MT19937', [<class 'numpy.uint16'>, 1024])
      12.7±0.03μs       12.6±0.1μs     0.99  bench_ma.Indexing.time_1d(False, 1, 10)
       10.6±0.2μs       10.5±0.2μs     0.99  bench_ufunc.CustomScalar.time_less_than_scalar2(<class 'numpy.float64'>)
        224±0.9μs        222±0.6μs     0.99  bench_core.CountNonzero.time_count_nonzero_axis(1, 10000, <class 'object'>)
       94.1±0.3μs         93.3±1μs     0.99  bench_ufunc.CustomInplace.time_int_or_temp
       26.9±0.2ms       26.7±0.1ms     0.99  bench_core.CountNonzero.time_count_nonzero(2, 1000000, <class 'str'>)
      1.51±0.01ms         1.49±0ms     0.99  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint32'>, 1535])
       14.1±0.1μs       14.0±0.3μs     0.99  bench_ma.Indexing.time_1d(True, 2, 100)
          944±9μs         936±10μs     0.99  bench_reduce.AddReduceSeparate.time_reduce(0, 'int32')
        256±0.8μs        254±0.4μs     0.99  bench_random.Permutation.time_permutation_int
          895±6μs         888±10μs     0.99  bench_lib.Nan.time_nanmean(200000, 2.0)
        111±0.9ms        110±0.6ms     0.99  bench_records.Records.time_fromarrays_formats_as_string
          168±1μs          166±4μs     0.99  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 10000, <class 'str'>)
      7.53±0.06ms      7.47±0.03ms     0.99  bench_linalg.Linalg.time_op('pinv', 'float32')
      8.25±0.02ms      8.18±0.01ms     0.99  bench_ufunc.UFunc.time_ufunc_types('exp')
      2.92±0.06ms      2.89±0.05ms     0.99  bench_lib.Nan.time_nancumsum(200000, 0.1)
         1.48±0ms         1.47±0ms     0.99  bench_function_base.Sort.time_argsort('heap', 'float64', ('sorted_block', 1000))
        166±0.5μs        164±0.3μs     0.99  bench_core.CountNonzero.time_count_nonzero_axis(1, 10000, <class 'str'>)
          168±2μs          167±1μs     0.99  bench_function_base.Sort.time_argsort('merge', 'int16', ('sorted_block', 1000))
          457±2μs          453±1μs     0.99  bench_random.Permutation.time_permutation_2d
      2.84±0.04ms      2.81±0.05ms     0.99  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <class 'bool'>)
        470±500μs        466±500μs     0.99  bench_shape_base.Block.time_3d(10, 'block')
        585±0.3μs          580±2μs     0.99  bench_random.Bounded.time_bounded('PCG64', [<class 'numpy.uint32'>, 2047])
       11.8±0.1μs      11.7±0.09μs     0.99  bench_avx.AVX_UFunc.time_ufunc('absolute', 2, 'd')
         1.58±0ms         1.57±0ms     0.99  bench_function_base.Sort.time_argsort('heap', 'int16', ('random',))
      3.71±0.01μs      3.68±0.01μs     0.99  bench_core.CountNonzero.time_count_nonzero(3, 10000, <class 'bool'>)
          787±3μs          780±2μs     0.99  bench_indexing.Indexing.time_op('indexes_rand_', 'np.ix_(I, I)', '')
        315±0.6μs        313±0.6μs     0.99  bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <class 'str'>)
          857±4μs          850±5μs     0.99  bench_lib.Nan.time_nanprod(200000, 2.0)
         61.3±1μs       60.7±0.2μs     0.99  bench_function_base.Sort.time_sort('merge', 'int16', ('sorted_block', 10))
      1.26±0.01ms      1.24±0.01ms     0.99  bench_reduce.AddReduceSeparate.time_reduce(1, 'complex128')
          562±2μs        556±0.8μs     0.99  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint16'>, 1024])
      7.54±0.06ms      7.47±0.01ms     0.99  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <class 'int'>)
       28.3±0.2μs      28.0±0.08μs     0.99  bench_function_base.Where.time_1
        223±0.4μs        221±0.3μs     0.99  bench_lib.Nan.time_nanmax(200000, 0)
        520±0.9μs          515±1μs     0.99  bench_random.Bounded.time_bounded('Philox', [<class 'numpy.uint16'>, 95])
      1.42±0.01ms      1.41±0.01ms     0.99  bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'constant')
        307±0.5μs        304±0.3μs     0.99  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint8'>, 127])
         1.02±0ms      1.01±0.01ms     0.99  bench_linalg.Eindot.time_matmul_trans_at_a
      3.93±0.01ms      3.89±0.02ms     0.99  bench_lib.Nan.time_nanmedian(200000, 0.1)
      14.0±0.09μs       13.9±0.2μs     0.99  bench_ma.Indexing.time_1d(True, 2, 10)
        704±0.5μs          697±2μs     0.99  bench_random.RNG.time_raw('numpy')
          694±1μs          687±2μs     0.99  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint64'>, 2047])
      6.62±0.03ms      6.55±0.02ms     0.99  bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64
          666±1μs          659±5μs     0.99  bench_random.RNG.time_32bit('MT19937')
         218±10μs        216±0.6μs     0.99  bench_linalg.Eindot.time_inner_trans_a_a
        192±0.8μs        190±0.3μs     0.99  bench_random.Randint_dtype.time_randint_slow('bool')
       4.11±0.7ms       4.06±0.7ms     0.99  bench_lib.Nan.time_nanpercentile(200000, 90.0)
        343±0.6μs          339±1μs     0.99  bench_lib.Pad.time_pad((4, 4, 4, 4), 8, 'constant')
      7.76±0.06μs      7.68±0.03μs     0.99  bench_avx.AVX_UFunc.time_ufunc('sqrt', 2, 'f')
       28.5±0.2μs       28.2±0.1μs     0.99  bench_function_base.Where.time_2_broadcast
          383±1μs        379±0.2μs     0.99  bench_ufunc.UFunc.time_ufunc_types('deg2rad')
          955±4μs          945±2μs     0.99  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100)
      1.59±0.03ms      1.58±0.03ms     0.99  bench_lib.Nan.time_nanmean(200000, 90.0)
        226±0.5μs          223±2μs     0.99  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 10000, <class 'object'>)
       70.4±0.8μs         69.6±1μs     0.99  bench_ma.Concatenate.time_it('unmasked', 100)
      13.8±0.06μs       13.7±0.1μs     0.99  bench_ma.Indexing.time_0d(True, 1, 100)
          695±4μs        687±0.9μs     0.99  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint32'>, 2047])
         2.14±0μs      2.11±0.01μs     0.99  bench_core.CountNonzero.time_count_nonzero(1, 10000, <class 'bool'>)
       80.4±100μs       79.6±100μs     0.99  bench_shape_base.Block2D.time_block2d((512, 512), 'uint8', (2, 2))
        892±0.4μs        882±0.6μs     0.99  bench_function_base.Sort.time_sort('quick', 'float64', ('random',))
       46.9±0.1μs       46.3±0.7μs     0.99  bench_ufunc.UFunc.time_ufunc_types('bitwise_and')
       77.6±100μs       76.7±100μs     0.99  bench_shape_base.Block2D.time_block2d((256, 256), 'uint32', (2, 2))
          696±3μs          688±5μs     0.99  bench_lib.Nan.time_nanprod(200000, 0)
      25.7±0.05μs       25.4±0.1μs     0.99  bench_io.Copy.time_strided_copy('complex64')
      24.5±0.06μs      24.3±0.06μs     0.99  bench_ufunc.CustomScalar.time_divide_scalar2_inplace(<class 'numpy.float64'>)
          398±5ms        394±0.8ms     0.99  bench_app.LaplaceInplace.time_it('normal')
         2.48±0μs      2.45±0.03μs     0.99  bench_core.CountNonzero.time_count_nonzero(1, 100, <class 'str'>)
        943±0.9μs        932±0.4μs     0.99  bench_function_base.Sort.time_sort('merge', 'int64', ('random',))
       12.0±0.1μs      11.9±0.09μs     0.99  bench_avx.AVX_UFunc.time_ufunc('floor', 2, 'd')
         769±10μs         759±20μs     0.99  bench_linalg.Eindot.time_dot_trans_atc_a
          547±5μs          540±3μs     0.99  bench_random.Bounded.time_bounded('SFC64', [<class 'numpy.uint64'>, 2047])
       67.6±0.1μs       66.8±0.8μs     0.99  bench_function_base.Sort.time_sort('merge', 'int16', ('reversed',))
          312±4μs          308±5μs     0.99  bench_lib.Pad.time_pad((256, 128, 1), 1, 'reflect')
      2.66±0.05ms      2.62±0.08ms     0.99  bench_function_base.Histogram1D.time_fine_binning
       46.0±0.5μs       45.4±0.2μs     0.99  bench_io.Copy.time_strided_copy('complex128')
         1.13±0ms         1.11±0ms     0.99  bench_function_base.Sort.time_sort('heap', 'int16', ('sorted_block', 1000))
      1.12±0.04ms      1.11±0.02ms     0.99  bench_lib.Nan.time_nanvar(200000, 0)
      9.03±0.09μs      8.91±0.08μs     0.99  bench_avx.AVX_UFunc.time_ufunc('sqrt', 4, 'f')
          464±2μs          458±2μs     0.99  bench_core.CountNonzero.time_count_nonzero_axis(3, 10000, <class 'str'>)
      3.13±0.02ms      3.09±0.02ms     0.99  bench_lib.Pad.time_pad((1024, 1024), 1, 'mean')
          346±9μs          341±4μs     0.99  bench_linalg.Eindot.time_matmul_trans_a_atc
       12.9±0.1μs       12.7±0.2μs     0.99  bench_ma.Indexing.time_0d(False, 1, 10)
       14.3±0.2μs       14.1±0.1μs     0.99  bench_ma.Indexing.time_1d(True, 2, 1000)
          488±2μs          481±2μs     0.99  bench_random.Bounded.time_bounded('numpy', [<class 'numpy.uint16'>, 2047])
          649±4μs         639±10μs     0.99  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint32', (4, 4))
       69.1±0.7ms       68.1±0.8ms     0.99  bench_records.Records.time_fromarrays_w_dtype
      1.18±0.01ms      1.17±0.01ms     0.99  bench_linalg.Linalg.time_op('det', 'float64')
       68.4±0.5μs       67.4±0.6μs     0.99  bench_function_base.Sort.time_sort('merge', 'int16', ('sorted_block', 1000))
      3.87±0.01μs      3.81±0.02μs     0.99  bench_core.CountNonzero.time_count_nonzero(2, 100, <class 'str'>)
        223±0.9μs        219±0.8μs     0.98  bench_indexing.Indexing.time_op('indexes_', ':,I', '=1')
      12.9±0.03μs       12.7±0.2μs     0.98  bench_ma.Indexing.time_1d(False, 2, 10)
          951±7ns          936±6ns     0.98  bench_indexing.IndexingStructured0D.time_array_all
       12.8±0.1μs       12.6±0.1μs     0.98  bench_ma.Indexing.time_1d(False, 1, 1000)
      1.26±0.01μs      1.24±0.01μs     0.98  bench_core.CountNonzero.time_count_nonzero(1, 100, <class 'bool'>)
        163±0.7μs        160±0.4μs     0.98  bench_ma.UFunc.time_2d(False, True, 1000)
       62.8±0.1μs         61.7±1μs     0.98  bench_function_base.Sort.time_sort('merge', 'int16', ('sorted_block', 100))
        616±0.9μs          606±1μs     0.98  bench_ufunc.UFunc.time_ufunc_types('frexp')
      10.5±0.09μs       10.3±0.1μs     0.98  bench_io.Copy.time_cont_assign('float64')
         1.45±0μs         1.42±0μs     0.98  bench_core.CountNonzero.time_count_nonzero(1, 100, <class 'int'>)
          536±6μs         527±10μs     0.98  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint32', (2, 2))
      12.8±0.07μs       12.5±0.1μs     0.98  bench_ma.Indexing.time_0d(False, 2, 10)
        214±0.1μs        210±0.5μs     0.98  bench_function_base.Sort.time_sort('quick', 'float64', ('uniform',))
         1.62±0ms         1.60±0ms     0.98  bench_function_base.Sort.time_argsort('heap', 'float64', ('sorted_block', 100))
        249±0.6μs        244±0.3μs     0.98  bench_function_base.Sort.time_argsort('quick', 'float64', ('reversed',))
       73.9±0.4μs       72.6±0.3μs     0.98  bench_ufunc.CustomInplace.time_int_or
         1.45±0ms      1.42±0.02ms     0.98  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 1000000, <class 'bool'>)
      2.54±0.02ms      2.49±0.01ms     0.98  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 1000000, <class 'int'>)
       95.9±0.5μs       94.1±0.4μs     0.98  bench_ma.UFunc.time_2d(True, False, 100)
          165±2μs          162±1μs     0.98  bench_ma.UFunc.time_2d(False, False, 1000)
      8.70±0.04μs      8.54±0.04μs     0.98  bench_io.Copy.time_strided_assign('int8')
      41.7±0.09μs      40.9±0.05μs     0.98  bench_core.PackBits.time_packbits_axis1(<class 'bool'>)
          354±1μs        348±0.4μs     0.98  bench_ufunc.UFunc.time_ufunc_types('ldexp')
      8.89±0.03μs      8.72±0.03μs     0.98  bench_io.Copy.time_strided_assign('float32')
      1.30±0.01μs         1.27±0μs     0.98  bench_core.CountNonzero.time_count_nonzero(3, 100, <class 'bool'>)
         5.51±0ms      5.40±0.01ms     0.98  bench_ufunc.UFunc.time_ufunc_types('arctan')
          942±5μs          924±4μs     0.98  bench_ufunc.UFunc.time_ufunc_types('true_divide')
      1.35±0.01ms      1.32±0.01ms     0.98  bench_lib.Pad.time_pad((1024, 1024), 1, 'edge')
        143±0.4μs        140±0.3μs     0.98  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10)
       15.6±0.1μs       15.3±0.1μs     0.98  bench_ma.UFunc.time_scalar(True, True, 10)
       12.8±0.1μs      12.5±0.08μs     0.98  bench_ma.Indexing.time_0d(False, 1, 1000)
         85.7±1μs         84.1±1μs     0.98  bench_linalg.Linalg.time_op('norm', 'complex256')
      3.62±0.01μs      3.55±0.01μs     0.98  bench_core.PackBits.time_packbits(<class 'bool'>)
       2.79±0.1ms      2.74±0.01ms     0.98  bench_app.MaxesOfDots.time_it
      4.32±0.01ms      4.23±0.05ms     0.98  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <class 'bool'>)
       3.88±0.5ms       3.80±0.2ms     0.98  bench_lib.Pad.time_pad((4, 4, 4, 4), (0, 32), 'mean')
          304±3μs          298±4μs     0.98  bench_lib.Pad.time_pad((256, 128, 1), 1, 'wrap')
      2.95±0.01ms      2.89±0.01ms     0.98  bench_indexing.IndexingSeparate.time_mmap_slicing
      2.74±0.04ms      2.68±0.04ms     0.98  bench_lib.Nan.time_nanmean(200000, 50.0)
          254±2μs        249±0.4μs     0.98  bench_core.CountNonzero.time_count_nonzero(2, 10000, <class 'str'>)
       47.0±0.4μs       46.0±0.6μs     0.98  bench_ufunc.UFunc.time_ufunc_types('bitwise_xor')
         1.01±0ms        992±0.8μs     0.98  bench_function_base.Sort.time_argsort('heap', 'int16', ('ordered',))
      8.45±0.02μs      8.27±0.01μs     0.98  bench_io.CopyTo.time_copyto_dense
          381±2μs          373±2μs     0.98  bench_core.CountNonzero.time_count_nonzero(3, 10000, <class 'str'>)
          514±7μs          503±3μs     0.98  bench_reduce.AddReduceSeparate.time_reduce(1, 'float64')
         1.18±0ms         1.15±0ms     0.98  bench_function_base.Sort.time_argsort('merge', 'float64', ('random',))
      5.24±0.01μs      5.12±0.01μs     0.98  bench_core.CountNonzero.time_count_nonzero(3, 100, <class 'str'>)
        942±0.7μs          920±1μs     0.98  bench_ufunc.UFunc.time_ufunc_types('divide')
       3.93±0.7ms       3.84±0.7ms     0.98  bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'edge')
        684±0.7μs          668±2μs     0.98  bench_ufunc.UFunc.time_ufunc_types('spacing')
      2.32±0.01μs      2.27±0.02μs     0.98  bench_ufunc.Custom.time_not_bool
         1.01±0ms        985±0.6μs     0.98  bench_function_base.Sort.time_sort('merge', 'float64', ('random',))
        152±0.4μs          149±1μs     0.98  bench_ufunc.UFunc.time_ufunc_types('signbit')
      2.63±0.01ms      2.57±0.02ms     0.98  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 8, 'constant')
      2.75±0.09ms      2.69±0.09ms     0.98  bench_function_base.Histogram1D.time_full_coverage
       87.7±0.9ms       85.6±0.4ms     0.98  bench_records.Records.time_fromarrays_wo_dtype
      12.9±0.04μs      12.6±0.03μs     0.98  bench_ma.Indexing.time_1d(False, 2, 1000)
       15.2±0.4μs       14.9±0.1μs     0.98  bench_avx.AVX_UFunc.time_ufunc('absolute', 4, 'd')
        138±200μs        135±200μs     0.98  bench_shape_base.Block2D.time_block2d((512, 512), 'uint16', (2, 2))
      1.21±0.02ms      1.18±0.05ms     0.98  bench_linalg.Linalg.time_op('det', 'int16')
        190±0.5μs          186±2μs     0.97  bench_lib.Pad.time_pad((256, 128, 1), 1, 'constant')
        748±0.9μs        729±0.4μs     0.97  bench_function_base.Sort.time_sort('quick', 'float64', ('sorted_block', 100))
      3.79±0.07ms      3.70±0.02ms     0.97  bench_linalg.Linalg.time_op('det', 'complex128')
          734±9μs         716±10μs     0.97  bench_lib.Nan.time_nanmean(200000, 0.1)
      12.8±0.03μs      12.4±0.06μs     0.97  bench_ma.Indexing.time_1d(False, 1, 100)
        497±0.7μs          484±6μs     0.97  bench_core.PackBits.time_packbits_axis0(<class 'bool'>)
      1.38±0.01ms      1.34±0.01ms     0.97  bench_lib.Pad.time_pad((1024, 1024), 1, 'reflect')
      1.29±0.01μs      1.26±0.01μs     0.97  bench_core.CountNonzero.time_count_nonzero(2, 100, <class 'bool'>)
          544±2μs          530±2μs     0.97  bench_ufunc.UFunc.time_ufunc_types('logaddexp2')
      1.35±0.02ms      1.31±0.01ms     0.97  bench_lib.Pad.time_pad((1024, 1024), 1, 'wrap')
      1.19±0.01ms      1.15±0.06ms     0.97  bench_linalg.Linalg.time_op('det', 'int64')
       16.0±0.1μs       15.6±0.4μs     0.97  bench_avx.AVX_UFunc.time_ufunc('trunc', 4, 'd')
      12.9±0.07μs      12.5±0.09μs     0.97  bench_ma.Indexing.time_1d(False, 2, 100)
         3.02±0ms         2.94±0ms     0.97  bench_ufunc.UFunc.time_ufunc_types('cbrt')
       79.5±0.8μs       77.3±0.5μs     0.97  bench_ma.Concatenate.time_it('unmasked+masked', 2)
      8.89±0.04μs      8.64±0.01μs     0.97  bench_core.CorrConv.time_correlate(1000, 10, 'valid')
       13.7±0.1ms      13.4±0.08ms     0.97  bench_core.CountNonzero.time_count_nonzero(1, 1000000, <class 'str'>)
       98.8±0.6μs         96.1±2μs     0.97  bench_shape_base.Block2D.time_block2d((256, 256), 'uint16', (4, 4))
         1.39±0ms         1.35±0ms     0.97  bench_function_base.Sort.time_argsort('heap', 'int16', ('sorted_block', 10))
        761±0.6μs        739±0.3μs     0.97  bench_function_base.Sort.time_sort('quick', 'float64', ('sorted_block', 10))
         92.9±1μs       90.2±0.8μs     0.97  bench_ma.Concatenate.time_it('masked', 100)
      6.84±0.05μs      6.65±0.07μs     0.97  bench_core.Core.time_eye_100
        129±0.3μs        125±0.2μs     0.97  bench_core.CountNonzero.time_count_nonzero(1, 10000, <class 'str'>)
       15.8±0.2μs       15.4±0.3μs     0.97  bench_ma.UFunc.time_scalar(True, True, 100)
          934±4ms          907±3ms     0.97  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100000)
      6.11±0.07μs      5.93±0.03μs     0.97  bench_shape_base.Block.time_3d(1, 'copy')
          558±3μs        542±0.7μs     0.97  bench_ufunc.UFunc.time_ufunc_types('multiply')
      11.5±0.06μs      11.2±0.03μs     0.97  bench_ufunc.CustomScalar.time_divide_scalar2_inplace(<class 'numpy.float32'>)
       87.7±0.3μs       85.2±0.2μs     0.97  bench_random.Randint_dtype.time_randint_fast('bool')
      15.8±0.08μs       15.4±0.1μs     0.97  bench_ma.UFunc.time_scalar(True, True, 1000)
      1.34±0.01ms      1.30±0.01ms     0.97  bench_lib.Pad.time_pad((1024, 1024), 1, 'constant')
       3.89±0.6ms       3.78±0.7ms     0.97  bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'reflect')
       1.91±0.2μs      1.85±0.06μs     0.97  bench_io.Copy.time_memcpy('int8')
       74.1±0.2μs       71.9±0.9μs     0.97  bench_ma.Concatenate.time_it('ndarray', 100)
        526±0.6μs        510±0.8μs     0.97  bench_ufunc.UFunc.time_ufunc_types('add')
       2.76±0.2ms       2.68±0.2ms     0.97  bench_lib.Pad.time_pad((1024, 1024), 8, 'mean')
          496±3μs          481±2μs     0.97  bench_ufunc.UFunc.time_ufunc_types('logical_and')
        358±0.2μs        347±0.3μs     0.97  bench_ufunc.UFunc.time_ufunc_types('ceil')
         1.20±0ms      1.16±0.01ms     0.97  bench_linalg.Linalg.time_op('det', 'float32')
       15.6±0.4μs      15.1±0.06μs     0.97  bench_avx.AVX_UFunc.time_ufunc('sqrt', 4, 'd')
         1.21±0ms         1.17±0ms     0.97  bench_function_base.Sort.time_argsort('heap', 'float64', ('reversed',))
       97.1±0.1μs       94.0±0.3μs     0.97  bench_function_base.Select.time_select
        129±0.4μs          125±1μs     0.97  bench_shape_base.Block2D.time_block2d((512, 512), 'uint8', (4, 4))
      2.47±0.01μs      2.39±0.01μs     0.97  bench_overrides.ArrayFunction.time_mock_concatenate_many
          113±4μs          109±4μs     0.97  bench_shape_base.Block.time_block_simple_column_wise(100)
      28.8±0.08μs      27.9±0.05μs     0.97  bench_reduce.MinMax.time_min(<class 'numpy.int64'>)
      1.32±0.08ms      1.28±0.09ms     0.97  bench_lib.Pad.time_pad((1024, 1024), (0, 32), 'edge')
         1.09±0ms         1.05±0ms     0.97  bench_function_base.Sort.time_argsort('heap', 'int16', ('reversed',))
      22.6±0.08μs      21.8±0.05μs     0.97  bench_random.Random.time_rng('normal')
         1.59±0ms         1.54±0ms     0.97  bench_function_base.Sort.time_argsort('heap', 'float64', ('sorted_block', 10))
      1.35±0.07ms       1.31±0.1ms     0.97  bench_lib.Pad.time_pad((1024, 1024), (0, 32), 'reflect')
        329±500μs        317±500μs     0.97  bench_shape_base.Block2D.time_block2d((512, 512), 'uint32', (4, 4))
      17.7±0.03μs      17.1±0.08μs     0.97  bench_core.Core.time_array_l100
       46.2±0.5μs       44.6±0.3μs     0.97  bench_linalg.Linalg.time_op('norm', 'longfloat')
       92.4±0.3ms       89.2±0.6ms     0.96  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10000)
        358±0.5μs        346±0.3μs     0.96  bench_ufunc.UFunc.time_ufunc_types('floor')
       29.0±0.2μs       28.0±0.1μs     0.96  bench_ma.UFunc.time_1d(False, False, 10)
       96.6±0.7μs       93.2±0.3μs     0.96  bench_ma.UFunc.time_2d(True, True, 100)
          487±5μs          470±5μs     0.96  bench_lib.Pad.time_pad((256, 128, 1), 1, 'mean')
      1.35±0.07ms      1.30±0.09ms     0.96  bench_lib.Pad.time_pad((1024, 1024), (0, 32), 'wrap')
          286±3μs          276±3μs     0.96  bench_shape_base.Block.time_nested(100)
      1.28±0.08ms      1.23±0.08ms     0.96  bench_lib.Pad.time_pad((1024, 1024), 8, 'edge')
      2.18±0.02ms      2.10±0.02ms     0.96  bench_lib.Pad.time_pad((256, 128, 1), 1, 'linear_ramp')
          205±1μs        197±0.5μs     0.96  bench_function_base.Sort.time_argsort('merge', 'int16', ('sorted_block', 100))
         72.2±1μs       69.6±0.6μs     0.96  bench_shape_base.Block.time_block_simple_row_wise(100)
      3.16±0.03μs      3.05±0.04μs     0.96  bench_ma.Indexing.time_scalar(False, 1, 10)
      3.19±0.04ms      3.07±0.01ms     0.96  bench_lib.Pad.time_pad((4, 4, 4, 4), 8, 'linear_ramp')
       755±1000μs       728±1000μs     0.96  bench_shape_base.Block2D.time_block2d((512, 512), 'uint64', (4, 4))
          542±2μs        522±0.9μs     0.96  bench_ufunc.UFunc.time_ufunc_types('logaddexp')
      13.0±0.04μs       12.5±0.1μs     0.96  bench_ma.Indexing.time_0d(False, 2, 100)
         4.35±0ms      4.19±0.01ms     0.96  bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <class 'bool'>)
      47.1±0.03μs      45.3±0.06μs     0.96  bench_ufunc.UFunc.time_ufunc_types('bitwise_or')
       29.1±0.1μs      28.0±0.05μs     0.96  bench_reduce.MinMax.time_max(<class 'numpy.int64'>)
      1.31±0.08ms       1.26±0.1ms     0.96  bench_lib.Pad.time_pad((1024, 1024), 8, 'wrap')
      6.77±0.06μs      6.50±0.07μs     0.96  bench_records.Records.time_fromstring_w_dtype
       35.0±0.1ms      33.6±0.02ms     0.96  bench_reduce.AddReduce.time_axis_0
          540±4μs          519±1μs     0.96  bench_function_base.Select.time_select_larger
          141±1μs        136±0.7μs     0.96  bench_ma.Concatenate.time_it('ndarray+masked', 100)
      1.14±0.01ms      1.09±0.01ms     0.96  bench_lib.Pad.time_pad((256, 128, 1), 8, 'constant')
       5.04±0.5ms      4.84±0.04ms     0.96  bench_lib.Pad.time_pad((4, 4, 4, 4), (0, 32), 'edge')
      1.81±0.09ms       1.74±0.2ms     0.96  bench_lib.Pad.time_pad((1024, 1024), (0, 32), 'linear_ramp')
         1.05±0ms         1.01±0ms     0.96  bench_ufunc.UFunc.time_ufunc_types('fmod')
      9.49±0.07μs      9.11±0.03μs     0.96  bench_core.CorrConv.time_correlate(1000, 10, 'full')
      31.3±0.04μs       30.0±0.5μs     0.96  bench_ufunc.UFunc.time_ufunc_types('bitwise_not')
       29.1±0.4μs       28.0±0.1μs     0.96  bench_ma.UFunc.time_1d(False, False, 100)
       42.5±0.3μs       40.7±0.3μs     0.96  bench_ma.UFunc.time_1d(True, True, 100)
      1.43±0.02μs      1.37±0.02μs     0.96  bench_ufunc.Scalar.time_add_scalar_conv_complex
       41.7±0.4μs       39.9±0.4μs     0.96  bench_linalg.Linalg.time_op('norm', 'complex64')
      2.64±0.03μs      2.53±0.05μs     0.96  bench_ma.Indexing.time_scalar(True, 1, 100)
      26.0±0.04μs      24.9±0.07μs     0.96  bench_ma.Concatenate.time_it('unmasked', 2)
      1.79±0.01ms         1.72±0ms     0.96  bench_lib.Pad.time_pad((1024, 1024), 1, 'linear_ramp')
          230±6μs        220±0.1μs     0.96  bench_function_base.Sort.time_argsort('merge', 'int16', ('sorted_block', 10))
      1.43±0.01μs      1.37±0.02μs     0.96  bench_ufunc.Scalar.time_add_scalar_conv
         1.14±0ms         1.09±0ms     0.96  bench_function_base.Sort.time_argsort('heap', 'float64', ('ordered',))
      9.31±0.07μs      8.91±0.03μs     0.96  bench_core.CorrConv.time_correlate(1000, 10, 'same')
      2.62±0.01μs      2.50±0.02μs     0.96  bench_ma.Indexing.time_scalar(True, 1, 10)
       18.7±0.2μs       17.9±0.1μs     0.96  bench_reduce.MinMax.time_min(<class 'numpy.float64'>)
        175±0.3μs        167±0.2μs     0.96  bench_function_base.Sort.time_sort('merge', 'float64', ('sorted_block', 100))
      29.7±0.08μs      28.4±0.07μs     0.96  bench_ma.UFunc.time_1d(False, False, 1000)
       51.3±0.1μs      49.0±0.07μs     0.96  bench_ma.UFunc.time_1d(False, True, 100)
       41.6±0.2μs       39.7±0.2μs     0.96  bench_ma.UFunc.time_1d(True, True, 10)
      40.1±0.07μs       38.3±0.5μs     0.96  bench_ma.Concatenate.time_it('masked', 2)
       45.7±0.2μs       43.6±0.2μs     0.96  bench_linalg.Eindot.time_dot_d_dot_b_c
      11.2±0.06μs      10.7±0.04μs     0.95  bench_core.CorrConv.time_convolve(1000, 10, 'valid')
       51.1±0.2μs       48.8±0.3μs     0.95  bench_ma.UFunc.time_1d(True, False, 10)
          237±2μs        226±0.2μs     0.95  bench_function_base.Sort.time_sort('quick', 'float64', ('reversed',))
       37.0±0.2μs       35.4±0.2μs     0.95  bench_ma.UFunc.time_2d(False, True, 100)
      2.84±0.05μs         2.71±0μs     0.95  bench_ufunc.Custom.time_and_bool
      50.9±0.08μs       48.5±0.2μs     0.95  bench_ma.UFunc.time_1d(False, True, 10)
        114±0.2μs        109±0.5μs     0.95  bench_function_base.Sort.time_argsort('merge', 'float64', ('sorted_block', 1000))
       65.5±0.2μs       62.5±0.1μs     0.95  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 10000, <class 'int'>)
        192±300μs        183±200μs     0.95  bench_shape_base.Block2D.time_block2d((512, 512), 'uint16', (4, 4))
       51.0±0.3μs      48.6±0.09μs     0.95  bench_shape_base.Block2D.time_block2d((256, 256), 'uint16', (2, 2))
        156±0.2μs        149±0.4μs     0.95  bench_function_base.Sort.time_argsort('quick', 'float64', ('ordered',))
       31.0±0.2μs      29.5±0.07μs     0.95  bench_ufunc.UFunc.time_ufunc_types('invert')
      2.68±0.01μs      2.55±0.03μs     0.95  bench_ma.Indexing.time_scalar(True, 2, 1000)
      1.08±0.01μs         1.03±0μs     0.95  bench_overrides.ArrayFunction.time_mock_concatenate_duck
       51.7±0.1μs       49.2±0.2μs     0.95  bench_ma.UFunc.time_1d(True, False, 100)
          119±1μs          114±1μs     0.95  bench_shape_base.Block2D.time_block2d((256, 256), 'uint32', (4, 4))
          877±4ns          835±5ns     0.95  bench_overrides.ArrayFunction.time_mock_concatenate_numpy
         88.2±2ms      83.9±0.04ms     0.95  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), (0, 32), 'mean')
      8.70±0.08μs      8.27±0.04μs     0.95  bench_reduce.AnyAll.time_any_slow
       33.9±0.3μs       32.2±0.2μs     0.95  bench_shape_base.Block2D.time_block2d((128, 128), 'uint16', (2, 2))
       60.9±0.5μs       57.8±0.3μs     0.95  bench_ma.UFunc.time_scalar_1d(True, False, 1000)
       45.1±0.2μs       42.8±0.3μs     0.95  bench_ma.UFunc.time_scalar_1d(True, True, 10)
       32.4±0.2μs       30.8±0.2μs     0.95  bench_ma.UFunc.time_2d(False, True, 10)
       37.3±0.3μs       35.4±0.2μs     0.95  bench_ma.UFunc.time_2d(False, False, 100)
      45.3±0.08μs       43.0±0.2μs     0.95  bench_ma.UFunc.time_scalar_1d(True, True, 100)
          181±1μs        172±0.7μs     0.95  bench_function_base.Percentile.time_percentile
      2.66±0.03μs      2.53±0.03μs     0.95  bench_ma.Indexing.time_scalar(True, 2, 10)
       57.4±0.2μs       54.4±0.2μs     0.95  bench_ma.UFunc.time_1d(False, True, 1000)
          966±7ns          916±7ns     0.95  bench_ufunc.Scalar.time_add_scalar
       38.6±0.3μs       36.6±0.1μs     0.95  bench_shape_base.Block2D.time_block2d((128, 128), 'uint32', (2, 2))
       49.8±0.1μs       47.2±0.3μs     0.95  bench_shape_base.Block2D.time_block2d((128, 128), 'uint64', (2, 2))
          168±1μs        159±0.6μs     0.95  bench_function_base.Percentile.time_quartile
          601±1μs        568±0.5μs     0.95  bench_function_base.Sort.time_sort('quick', 'float64', ('sorted_block', 1000))
      3.15±0.01μs      2.98±0.01μs     0.95  bench_ma.Indexing.time_scalar(False, 1, 100)
      3.24±0.05μs      3.07±0.02μs     0.95  bench_ufunc.Custom.time_or_bool
      2.64±0.02μs      2.50±0.01μs     0.95  bench_ma.Indexing.time_scalar(True, 1, 1000)
       8.67±0.1μs      8.20±0.06μs     0.95  bench_reduce.AnyAll.time_all_slow
      3.18±0.03μs         3.01±0μs     0.95  bench_ma.Indexing.time_scalar(False, 1, 1000)
        101±0.5μs       95.9±0.8μs     0.95  bench_function_base.Median.time_even
       29.4±0.2μs       27.8±0.3μs     0.95  bench_ma.Concatenate.time_it('ndarray', 2)
       49.7±0.6μs       46.9±0.5μs     0.95  bench_linalg.Linalg.time_op('norm', 'complex128')
       51.3±0.9μs       48.4±0.5μs     0.94  bench_ma.UFunc.time_scalar_1d(True, True, 1000)
       92.5±0.5μs       87.4±0.6μs     0.94  bench_function_base.Median.time_even_inplace
        409±0.5μs          386±4μs     0.94  bench_ufunc.UFunc.time_ufunc_types('logical_or')
        104±0.4μs       98.4±0.2μs     0.94  bench_function_base.Sort.time_sort('merge', 'float64', ('sorted_block', 1000))
      7.27±0.03μs       6.86±0.1μs     0.94  bench_core.CorrConv.time_correlate(50, 100, 'same')
      1.35±0.09ms      1.27±0.09ms     0.94  bench_lib.Pad.time_pad((1024, 1024), 8, 'reflect')
       91.8±0.4μs       86.4±0.9μs     0.94  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <class 'int'>)
       41.5±0.3μs       39.1±0.4μs     0.94  bench_shape_base.Block2D.time_block2d((256, 256), 'uint8', (2, 2))
      11.9±0.07μs      11.2±0.03μs     0.94  bench_core.CorrConv.time_convolve(1000, 10, 'full')
       83.0±0.4μs       78.0±0.3μs     0.94  bench_shape_base.Block2D.time_block2d((128, 128), 'uint32', (4, 4))
      1.98±0.01μs      1.86±0.02μs     0.94  bench_indexing.IndexingStructured0D.time_scalar_all
      1.26±0.08ms       1.19±0.1ms     0.94  bench_lib.Pad.time_pad((1024, 1024), 8, 'constant')
       54.1±0.2μs       50.8±0.4μs     0.94  bench_ma.UFunc.time_scalar_1d(True, False, 10)
      9.30±0.02μs       8.74±0.1μs     0.94  bench_core.CorrConv.time_correlate(50, 100, 'full')
       93.7±0.4μs       88.1±0.4μs     0.94  bench_shape_base.Block2D.time_block2d((128, 128), 'uint64', (4, 4))
        650±0.9μs          610±6μs     0.94  bench_ufunc.UFunc.time_ufunc_types('heaviside')
       11.6±0.1μs       10.9±0.1μs     0.94  bench_core.CorrConv.time_convolve(1000, 10, 'same')
       48.1±0.3μs       45.2±0.2μs     0.94  bench_random.Choice.time_legacy_choice(1000.0)
       54.8±0.3μs       51.5±0.4μs     0.94  bench_ma.UFunc.time_scalar_1d(True, False, 100)
       48.0±0.3μs       45.1±0.3μs     0.94  bench_ma.UFunc.time_2d(True, True, 10)
         380±70ms        357±200ms     0.94  bench_shape_base.Block.time_3d(100, 'block')
        298±0.4μs        279±0.4μs     0.94  bench_ufunc.UFunc.time_ufunc_types('isinf')
      3.33±0.01μs      3.12±0.01μs     0.94  bench_core.CorrConv.time_correlate(50, 10, 'full')
       47.9±0.2μs       44.9±0.2μs     0.94  bench_ma.UFunc.time_1d(True, True, 1000)
        164±0.4μs          154±1μs     0.94  bench_lib.Nan.time_nanpercentile(200, 50.0)
       75.8±0.6μs       71.0±0.4μs     0.94  bench_shape_base.Block2D.time_block2d((128, 128), 'uint16', (4, 4))
       73.5±0.3μs       68.8±0.8μs     0.94  bench_ma.Concatenate.time_it('ndarray+masked', 2)
        162±0.3μs          152±1μs     0.94  bench_lib.Nan.time_nanpercentile(200, 2.0)
       71.8±0.2μs       67.3±0.3μs     0.94  bench_shape_base.Block2D.time_block2d((64, 64), 'uint32', (4, 4))
       56.3±0.6μs       52.7±0.1μs     0.94  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <class 'bool'>)
       1.79±0.2ms       1.68±0.1ms     0.94  bench_lib.Pad.time_pad((1024, 1024), 8, 'linear_ramp')
      3.18±0.03μs      2.97±0.02μs     0.94  bench_ma.Indexing.time_scalar(False, 2, 100)
      65.6±0.06μs      61.3±0.05μs     0.94  bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <class 'int'>)
          304±3μs          284±2μs     0.93  bench_function_base.Sort.time_argsort('merge', 'float64', ('sorted_block', 10))
      1.29±0.01μs         1.20±0μs     0.93  bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), subok=True))
          191±6μs        179±0.8μs     0.93  bench_function_base.Sort.time_argsort('merge', 'int16', ('random',))
       18.9±0.1μs       17.7±0.1μs     0.93  bench_reduce.MinMax.time_max(<class 'numpy.float64'>)
         30.9±2ms       28.9±0.2ms     0.93  bench_random.Choice.time_legacy_choice(1000000.0)
      3.19±0.03μs      2.98±0.02μs     0.93  bench_ma.Indexing.time_scalar(False, 2, 1000)
       69.8±0.2μs       65.1±0.2μs     0.93  bench_shape_base.Block2D.time_block2d((32, 32), 'uint64', (4, 4))
       32.7±0.2μs       30.5±0.2μs     0.93  bench_shape_base.Block2D.time_block2d((64, 64), 'uint64', (2, 2))
       39.1±0.1μs       36.4±0.2μs     0.93  bench_ma.MA.time_masked_array_l100_t100
         1.01±0μs          940±6ns     0.93  bench_overrides.ArrayFunction.time_mock_broadcast_to_duck
       84.8±0.3μs       79.0±0.3μs     0.93  bench_shape_base.Block2D.time_block2d((256, 256), 'uint8', (4, 4))
       67.9±0.2μs       63.2±0.1μs     0.93  bench_shape_base.Block2D.time_block2d((32, 32), 'uint8', (4, 4))
         1.31±0μs         1.22±0μs     0.93  bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), subok=True, where=True))
       68.7±0.1μs       64.0±0.2μs     0.93  bench_shape_base.Block2D.time_block2d((32, 32), 'uint16', (4, 4))
      3.16±0.02μs      2.94±0.01μs     0.93  bench_ma.Indexing.time_scalar(False, 2, 10)
       48.3±0.1μs       44.9±0.1μs     0.93  bench_core.CorrConv.time_correlate(50, 1000, 'full')
       94.5±0.2μs       88.0±0.4μs     0.93  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'reflect')
        306±0.3μs        285±0.4μs     0.93  bench_lib.Nan.time_nanmax(200000, 2.0)
       20.9±0.2μs       19.5±0.3μs     0.93  bench_linalg.Linalg.time_op('norm', 'float64')
      1.09±0.01μs         1.01±0μs     0.93  bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), out=(array(3.),)))
         1.11±0μs         1.04±0μs     0.93  bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), out=array(3.)))
       46.6±0.3μs       43.3±0.1μs     0.93  bench_core.CorrConv.time_correlate(50, 1000, 'same')
      1.30±0.07ms      1.21±0.08ms     0.93  bench_lib.Pad.time_pad((1024, 1024), (0, 32), 'constant')
      71.1±0.07μs       66.0±0.3μs     0.93  bench_shape_base.Block2D.time_block2d((64, 64), 'uint16', (4, 4))
        434±0.2μs        403±0.3μs     0.93  bench_core.CorrConv.time_correlate(50, 10000, 'same')
        201±0.2μs        187±0.3μs     0.93  bench_function_base.Sort.time_argsort('merge', 'float64', ('sorted_block', 100))
       31.2±0.2μs       28.9±0.2μs     0.93  bench_ma.MA.time_masked_array_l100
       67.4±0.2μs      62.5±0.08μs     0.93  bench_shape_base.Block2D.time_block2d((16, 16), 'uint16', (4, 4))
      67.7±0.08μs       62.8±0.3μs     0.93  bench_shape_base.Block2D.time_block2d((16, 16), 'uint32', (4, 4))
      37.7±0.05μs       35.0±0.2μs     0.93  bench_core.Core.time_diagflat_l50_l50
        163±0.4μs        151±0.9μs     0.93  bench_lib.Nan.time_nanpercentile(200, 90.0)
       70.6±0.5μs       65.5±0.2μs     0.93  bench_shape_base.Block2D.time_block2d((64, 64), 'uint8', (4, 4))
      2.73±0.01μs      2.53±0.02μs     0.93  bench_ma.Indexing.time_scalar(True, 2, 100)
       76.2±0.3μs       70.6±0.1μs     0.93  bench_shape_base.Block2D.time_block2d((64, 64), 'uint64', (4, 4))
          436±2μs        404±0.3μs     0.93  bench_core.CorrConv.time_correlate(50, 10000, 'full')
         834±80μs          773±1μs     0.93  bench_lib.Pad.time_pad((4, 4, 4, 4), 8, 'mean')
      67.2±0.08μs      62.2±0.02μs     0.93  bench_shape_base.Block2D.time_block2d((16, 16), 'uint8', (4, 4))
       48.2±0.2μs       44.6±0.2μs     0.93  bench_ma.UFunc.time_2d(True, False, 10)
       32.9±0.2μs       30.4±0.2μs     0.92  bench_ma.UFunc.time_2d(False, False, 10)
       44.2±0.1μs       40.9±0.1μs     0.92  bench_core.CorrConv.time_correlate(50, 1000, 'valid')
      13.7±0.04μs       12.7±0.2μs     0.92  bench_shape_base.Block.time_block_simple_row_wise(10)
        153±0.3μs          141±1μs     0.92  bench_lib.Nan.time_nanpercentile(200, 0.1)
        432±0.5μs        399±0.8μs     0.92  bench_core.CorrConv.time_correlate(50, 10000, 'valid')
       93.2±0.2μs       86.0±0.3μs     0.92  bench_function_base.Median.time_odd
        153±0.4μs        141±0.6μs     0.92  bench_lib.Nan.time_nanpercentile(200, 0)
       69.5±0.2μs      64.1±0.08μs     0.92  bench_shape_base.Block2D.time_block2d((32, 32), 'uint32', (4, 4))
      27.5±0.09μs       25.4±0.1μs     0.92  bench_core.Core.time_diag_l100
       68.5±0.2μs       63.1±0.1μs     0.92  bench_shape_base.Block2D.time_block2d((16, 16), 'uint64', (4, 4))
        162±0.5μs          150±1μs     0.92  bench_lib.Nan.time_nanquantile(200, 2.0)
        162±0.3μs        150±0.3μs     0.92  bench_lib.Nan.time_nanquantile(200, 50.0)
       26.5±0.2μs      24.4±0.06μs     0.92  bench_linalg.Linalg.time_op('norm', 'int64')
       74.7±0.2μs       68.8±0.1μs     0.92  bench_shape_base.Block2D.time_block2d((128, 128), 'uint8', (4, 4))
       12.9±0.1μs      11.9±0.02μs     0.92  bench_reduce.MinMax.time_max(<class 'numpy.float32'>)
        108±0.6μs       99.6±0.5μs     0.92  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'wrap')
       84.8±0.4μs       77.9±0.4μs     0.92  bench_function_base.Median.time_odd_inplace
      45.2±0.06μs       41.5±0.2μs     0.92  bench_core.CorrConv.time_convolve(50, 1000, 'full')
      6.07±0.01μs      5.58±0.08μs     0.92  bench_array_coercion.ArrayCoercionSmall.time_array(range(0, 3))
          970±3ns        891±0.8ns     0.92  bench_core.Core.time_empty_100
       30.7±0.4μs      28.2±0.09μs     0.92  bench_shape_base.Block2D.time_block2d((64, 64), 'uint32', (2, 2))
       95.6±0.1μs       87.8±0.2μs     0.92  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'edge')
      13.1±0.02μs       12.0±0.1μs     0.92  bench_reduce.MinMax.time_min(<class 'numpy.float32'>)
         77.2±1μs        70.8±40μs     0.92  bench_random.Choice.time_choice(100000000.0)
          386±2μs        354±0.8μs     0.92  bench_core.CorrConv.time_convolve(50, 10000, 'same')
      10.8±0.04μs       9.88±0.1μs     0.92  bench_core.CorrConv.time_convolve(50, 100, 'full')
      3.11±0.03μs      2.85±0.01μs     0.92  bench_core.CorrConv.time_correlate(50, 10, 'same')
          163±1μs        150±0.8μs     0.92  bench_lib.Nan.time_nanquantile(200, 90.0)
        152±0.5μs          139±2μs     0.92  bench_lib.Nan.time_nanquantile(200, 0.1)
       64.4±0.2μs       59.0±0.6μs     0.92  bench_function_base.Median.time_odd_small
        383±0.4μs        351±0.4μs     0.92  bench_core.CorrConv.time_convolve(50, 10000, 'valid')
       35.5±0.2μs       32.5±0.1μs     0.92  bench_ma.UFunc.time_scalar_1d(False, False, 1000)
       64.6±0.5μs       59.2±0.3μs     0.92  bench_ma.UFunc.time_scalar_1d(False, True, 1000)
       27.2±0.1μs       24.9±0.2μs     0.91  bench_shape_base.Block2D.time_block2d((32, 32), 'uint16', (2, 2))
       8.52±0.1μs      7.79±0.04μs     0.91  bench_core.Core.time_identity_100
        388±0.4μs        355±0.3μs     0.91  bench_core.CorrConv.time_convolve(50, 10000, 'full')
       29.1±0.1μs       26.6±0.2μs     0.91  bench_shape_base.Block2D.time_block2d((64, 64), 'uint16', (2, 2))
      16.2±0.04μs      14.8±0.09μs     0.91  bench_random.Random.time_rng('uniform')
       80.7±0.4μs       73.7±0.3μs     0.91  bench_lib.Nan.time_nanmedian(200, 2.0)
      4.87±0.04μs      4.45±0.07μs     0.91  bench_core.CorrConv.time_correlate(50, 100, 'valid')
       26.2±0.2μs      23.9±0.08μs     0.91  bench_shape_base.Block2D.time_block2d((16, 16), 'uint16', (2, 2))
        114±0.3μs        104±0.2μs     0.91  bench_lib.Pad.time_pad((4, 4, 4, 4), 1, 'reflect')
          736±1ns          672±2ns     0.91  bench_overrides.ArrayFunction.time_mock_broadcast_to_numpy
       41.5±0.1μs       37.9±0.1μs     0.91  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 10000, <class 'bool'>)
       21.3±0.3μs      19.5±0.07μs     0.91  bench_linalg.Linalg.time_op('norm', 'int16')
      2.80±0.01μs      2.55±0.01μs     0.91  bench_core.CorrConv.time_correlate(50, 10, 'valid')
       81.3±0.4μs       74.2±0.1μs     0.91  bench_lib.Nan.time_nanmedian(200, 50.0)
      1.17±0.01μs      1.07±0.01μs     0.91  bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), out=array(3.), subok=True, where=True))
      4.34±0.03ms       3.96±0.1ms     0.91  bench_core.UnpackBits.time_unpackbits_axis0
       36.5±0.1μs       33.2±0.1μs     0.91  bench_core.Core.time_diagflat_l100
      6.24±0.08μs       5.69±0.1μs     0.91  bench_core.CorrConv.time_convolve(1000, 1000, 'valid')
      27.9±0.05μs       25.4±0.1μs     0.91  bench_shape_base.Block2D.time_block2d((32, 32), 'uint32', (2, 2))
          363±3μs          331±1μs     0.91  bench_function_base.Histogram1D.time_small_coverage
       43.9±0.1μs       40.0±0.2μs     0.91  bench_core.CorrConv.time_convolve(50, 1000, 'same')
       66.7±0.2μs       60.7±0.3μs     0.91  bench_shape_base.Block.time_nested(10)
        807±0.2μs          735±1μs     0.91  bench_function_base.Sort.time_sort('quick', 'int16', ('random',))
        153±0.2μs        139±0.8μs     0.91  bench_lib.Nan.time_nanquantile(200, 0)
         62.6±4μs       57.0±0.2μs     0.91  bench_ma.UFunc.time_1d(True, False, 1000)
       75.8±0.2μs       68.9±0.3μs     0.91  bench_lib.Nan.time_nanmedian(200, 0.1)
       86.8±0.3μs       79.0±0.4μs     0.91  bench_lib.Pad.time_pad((4, 4, 4, 4), 1, 'edge')
-      27.4±0.3μs         24.9±1μs     0.91  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 10000, <class 'bool'>)
-        81.9±4μs      74.5±0.08μs     0.91  bench_lib.Nan.time_nanmedian(200, 90.0)
-     56.9±0.08μs       51.7±0.1μs     0.91  bench_core.CountNonzero.time_count_nonzero_axis(3, 10000, <class 'bool'>)
-         214±1μs          194±1μs     0.91  bench_lib.Pad.time_pad((4, 4, 4, 4), 1, 'mean')
-      97.6±0.8μs       88.7±0.9μs     0.91  bench_lib.Pad.time_pad((4, 4, 4, 4), 1, 'wrap')
-      65.4±0.3μs       59.4±0.4μs     0.91  bench_function_base.Median.time_even_small
-      76.1±0.4μs       69.1±0.2μs     0.91  bench_lib.Nan.time_nanmedian(200, 0)
-      41.9±0.1μs       38.0±0.2μs     0.91  bench_core.CorrConv.time_convolve(50, 1000, 'valid')
-      34.8±0.3μs       31.6±0.1μs     0.91  bench_ma.UFunc.time_scalar_1d(False, False, 100)
-      55.7±0.3μs       50.6±0.2μs     0.91  bench_shape_base.Block.time_3d(1, 'block')
-        985±10ns          893±2ns     0.91  bench_core.Core.time_zeros_100
-      26.7±0.1μs      24.2±0.08μs     0.91  bench_shape_base.Block2D.time_block2d((16, 16), 'uint32', (2, 2))
-      72.9±0.8μs       66.1±0.7μs     0.91  bench_random.Choice.time_choice(1000000.0)
-      63.2±0.2μs      57.3±0.05μs     0.91  bench_shape_base.Block.time_nested(1)
-       103±0.6μs         92.9±1μs     0.91  bench_lib.Nan.time_nanvar(200, 0.1)
-     1.28±0.01μs         1.16±0μs     0.91  bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.)))
-     27.1±0.09μs       24.5±0.1μs     0.91  bench_shape_base.Block2D.time_block2d((16, 16), 'uint64', (2, 2))
-      27.1±0.2μs       24.5±0.1μs     0.90  bench_shape_base.Block2D.time_block2d((32, 32), 'uint8', (2, 2))
-      12.4±0.2μs       11.2±0.1μs     0.90  bench_shape_base.Block.time_block_simple_row_wise(1)
-      82.9±0.1μs       74.9±0.3μs     0.90  bench_function_base.Sort.time_argsort('heap', 'int16', ('uniform',))
-     40.0±0.09μs       36.1±0.1μs     0.90  bench_core.CountNonzero.time_count_nonzero_axis(1, 10000, <class 'int'>)
-     41.9±0.08μs      37.9±0.05μs     0.90  bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <class 'bool'>)
-         110±1μs       99.1±0.3μs     0.90  bench_lib.Nan.time_nanstd(200, 0)
        723±100μs          653±1μs    ~0.90  bench_lib.Pad.time_pad((4, 4, 4, 4), 8, 'edge')
-      28.7±0.1μs       25.9±0.2μs     0.90  bench_shape_base.Block2D.time_block2d((64, 64), 'uint8', (2, 2))
-      109±0.07μs      98.2±0.09μs     0.90  bench_linalg.Linalg.time_op('norm', 'float16')
-      26.4±0.1μs       23.8±0.1μs     0.90  bench_shape_base.Block2D.time_block2d((16, 16), 'uint8', (2, 2))
-      31.2±0.3μs       28.1±0.2μs     0.90  bench_shape_base.Block2D.time_block2d((128, 128), 'uint8', (2, 2))
-       149±0.2μs        135±0.4μs     0.90  bench_function_base.Sort.time_sort('quick', 'float64', ('ordered',))
-      29.2±0.1μs       26.2±0.2μs     0.90  bench_shape_base.Block2D.time_block2d((32, 32), 'uint64', (2, 2))
-      34.9±0.2μs       31.4±0.2μs     0.90  bench_ma.UFunc.time_scalar_1d(False, False, 10)
-       216±0.7μs          194±1μs     0.90  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'mean')
        161±0.2μs          145±3μs    ~0.90  bench_function_base.Sort.time_sort('quick', 'int16', ('reversed',))
-      54.6±0.2μs       49.1±0.2μs     0.90  bench_shape_base.Block.time_block_complicated(10)
-       110±0.6μs       98.9±0.4μs     0.90  bench_lib.Nan.time_nanstd(200, 2.0)
-         111±1μs        100±0.1μs     0.90  bench_lib.Nan.time_nanstd(200, 50.0)
-     6.32±0.02μs      5.67±0.05μs     0.90  bench_array_coercion.ArrayCoercionSmall.time_array_dtype_not_kwargs(range(0, 3))
-      17.8±0.3μs       16.0±0.2μs     0.90  bench_ma.UFunc.time_scalar(False, False, 1000)
-      49.2±0.1μs       44.2±0.3μs     0.90  bench_shape_base.Block.time_block_complicated(1)
-       672±0.4μs          603±1μs     0.90  bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 100))
-      20.7±0.2μs       18.6±0.1μs     0.90  bench_shape_base.Block.time_block_simple_column_wise(1)
-     3.09±0.06μs      2.77±0.01μs     0.90  bench_core.CorrConv.time_correlate(1000, 1000, 'valid')
-       249±0.5μs        223±0.4μs     0.90  bench_function_base.Sort.time_sort('merge', 'int64', ('sorted_block', 10))
-      40.6±0.7μs       36.3±0.7μs     0.90  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 10000, <class 'int'>)
-         111±1μs       99.4±0.5μs     0.89  bench_lib.Nan.time_nanstd(200, 90.0)
-       111±0.6μs       99.0±0.5μs     0.89  bench_lib.Nan.time_nanstd(200, 0.1)
-      57.3±0.8μs       51.2±0.2μs     0.89  bench_ma.UFunc.time_scalar_1d(False, True, 100)
-         340±2ms          304±8ms     0.89  bench_core.CorrConv.time_correlate(100000, 10000, 'same')
-      56.9±0.5μs       50.9±0.2μs     0.89  bench_ma.UFunc.time_scalar_1d(False, True, 10)
-     6.32±0.03μs      5.64±0.03μs     0.89  bench_array_coercion.ArrayCoercionSmall.time_array_no_copy(range(0, 3))
-     15.7±0.04ms      14.0±0.01ms     0.89  bench_reduce.AddReduceSeparate.time_reduce(0, 'float16')
-     9.07±0.04μs       8.10±0.1μs     0.89  bench_core.CorrConv.time_convolve(50, 100, 'same')
-      58.2±0.5μs       52.0±0.1μs     0.89  bench_random.Choice.time_choice(1000.0)
-      22.7±0.2μs       20.3±0.2μs     0.89  bench_shape_base.Block.time_block_simple_column_wise(10)
-      71.5±0.2μs       63.8±0.4μs     0.89  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'constant')
-       692±0.5μs        616±0.3μs     0.89  bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 10))
-      21.3±0.4μs      19.0±0.04μs     0.89  bench_linalg.Linalg.time_op('norm', 'int32')
-     35.7±0.05μs      31.8±0.06μs     0.89  bench_function_base.Sort.time_argsort('merge', 'float64', ('reversed',))
-      12.2±0.1μs      10.9±0.05μs     0.89  bench_ma.MA.time_masked_array
      1.98±0.08μs      1.76±0.09μs    ~0.89  bench_io.Copy.time_cont_assign('int8')
-       105±0.5μs       93.6±0.7μs     0.89  bench_lib.Nan.time_nanvar(200, 50.0)
-     3.70±0.01μs      3.28±0.01μs     0.89  bench_reduce.AnyAll.time_any_fast
-      14.7±0.2μs       13.0±0.1μs     0.89  bench_lib.Nan.time_nancumsum(200, 50.0)
-       104±0.3μs       91.5±0.5μs     0.88  bench_lib.Nan.time_nanvar(200, 2.0)
-       104±0.2μs         91.4±1μs     0.88  bench_lib.Nan.time_nanvar(200, 0)
-      15.0±0.2μs       13.2±0.2μs     0.88  bench_linalg.Linalg.time_op('norm', 'float32')
-       104±0.7μs       92.0±0.5μs     0.88  bench_lib.Nan.time_nanvar(200, 90.0)
-     1.67±0.01ms      1.47±0.02ms     0.88  bench_function_base.Sort.time_argsort('heap', 'int64', ('random',))
-         520±1μs        458±0.2μs     0.88  bench_function_base.Sort.time_sort('quick', 'int16', ('sorted_block', 1000))
-     8.86±0.02μs      7.80±0.02μs     0.88  bench_avx.AVX_UFunc.time_ufunc('ceil', 1, 'd')
-       211±0.3μs        186±0.2μs     0.88  bench_function_base.Sort.time_argsort('quick', 'int64', ('reversed',))
-         774±2ns          681±4ns     0.88  bench_array_coercion.ArrayCoercionSmall.time_array_dtype_not_kwargs(1)
-      67.3±0.2μs       59.2±0.2μs     0.88  bench_lib.Pad.time_pad((4, 4, 4, 4), 1, 'constant')
-     6.76±0.05μs      5.94±0.09μs     0.88  bench_core.CorrConv.time_convolve(50, 100, 'valid')
-      15.0±0.2μs      13.2±0.09μs     0.88  bench_lib.Nan.time_nancumprod(200, 50.0)
-     1.27±0.01μs      1.11±0.03μs     0.88  bench_overrides.ArrayFunction.time_mock_concatenate_mixed
-      12.4±0.1μs      10.9±0.07μs     0.88  bench_lib.Nan.time_nanmax(200, 90.0)
-     17.9±0.07μs       15.7±0.1μs     0.88  bench_ma.UFunc.time_scalar(False, False, 10)
-      12.2±0.3μs       10.7±0.1μs     0.88  bench_lib.Nan.time_nanmax(200, 2.0)
-      14.5±0.1μs       12.7±0.1μs     0.88  bench_lib.Nan.time_nancumsum(200, 90.0)
-     3.73±0.04μs      3.26±0.01μs     0.87  bench_reduce.AnyAll.time_all_fast
-     2.47±0.02μs      2.16±0.01μs     0.87  bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), 0, None, array(0.)))
-      12.2±0.1μs      10.7±0.07μs     0.87  bench_lib.Nan.time_nanmax(200, 50.0)
-      14.5±0.3μs      12.7±0.08μs     0.87  bench_lib.Nan.time_nancumprod(200, 90.0)
-     14.2±0.05μs      12.4±0.06μs     0.87  bench_lib.Nan.time_nancumsum(200, 2.0)
-      14.1±0.2μs       12.3±0.1μs     0.87  bench_lib.Nan.time_nancumprod(200, 0.1)
-     12.4±0.08μs      10.8±0.02μs     0.87  bench_lib.Nan.time_nanmin(200, 90.0)
-         351±8ms          305±2ms     0.87  bench_core.CorrConv.time_correlate(100000, 10000, 'full')
-      28.4±0.1μs       24.8±0.1μs     0.87  bench_lib.Nan.time_nanargmax(200, 50.0)
-      12.2±0.2μs      10.6±0.05μs     0.87  bench_lib.Nan.time_nanmin(200, 0.1)
-        1.21±0μs         1.05±0μs     0.87  bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), array(3.), subok=True, where=True))
-         851±4ns          741±4ns     0.87  bench_array_coercion.ArrayCoercionSmall.time_array_dtype_not_kwargs(5)
-     6.66±0.04μs      5.79±0.06μs     0.87  bench_array_coercion.ArrayCoercionSmall.time_asanyarray_dtype(range(0, 3))
-     18.0±0.05μs      15.6±0.06μs     0.87  bench_ma.UFunc.time_scalar(False, False, 100)
-     26.7±0.04μs       23.2±0.2μs     0.87  bench_core.CountNonzero.time_count_nonzero_axis(1, 10000, <class 'bool'>)
-     5.36±0.02μs      4.66±0.03μs     0.87  bench_core.CorrConv.time_convolve(50, 10, 'same')
-     5.52±0.03μs      4.79±0.02μs     0.87  bench_core.CorrConv.time_convolve(50, 10, 'full')
-     1.15±0.01μs          995±2ns     0.87  bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), array(3.)))
-      12.3±0.1μs       10.6±0.1μs     0.87  bench_lib.Nan.time_nanmin(200, 50.0)
-      12.1±0.1μs      10.5±0.08μs     0.87  bench_lib.Nan.time_nanmax(200, 0.1)
-      12.1±0.2μs      10.5±0.08μs     0.87  bench_lib.Nan.time_nanmax(200, 0)
-     6.61±0.03μs      5.73±0.02μs     0.87  bench_array_coercion.ArrayCoercionSmall.time_asarray_dtype(range(0, 3))
-     2.52±0.03μs      2.18±0.02μs     0.87  bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), 0))
-     6.63±0.05μs       5.73±0.1μs     0.87  bench_array_coercion.ArrayCoercionSmall.time_array_subok(range(0, 3))
-      14.4±0.1μs      12.4±0.06μs     0.86  bench_lib.Nan.time_nancumprod(200, 2.0)
-     3.34±0.05μs      2.89±0.07μs     0.86  bench_core.Core.time_array_l
-      1.18±0.2ms         1.02±0ms     0.86  bench_lib.Pad.time_pad((4, 4, 4, 4), 8, 'reflect')
-     14.1±0.05μs      12.2±0.05μs     0.86  bench_lib.Nan.time_nancumsum(200, 0.1)
-         892±4ns          769±1ns     0.86  bench_array_coercion.ArrayCoercionSmall.time_array(array([5]))
-     2.57±0.05μs         2.21±0μs     0.86  bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), 0, None))
-     2.40±0.01ms      2.06±0.02ms     0.86  bench_lib.Pad.time_pad((256, 128, 1), (0, 32), 'mean')
-     1.48±0.01ms         1.27±0ms     0.86  bench_function_base.Sort.time_argsort('heap', 'int64', ('sorted_block', 100))
-     9.30±0.06μs      8.00±0.04μs     0.86  bench_core.Core.time_dstack_l
-     2.73±0.05μs      2.35±0.01μs     0.86  bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), axis=0, dtype=None))
-         937±5μs          805±3μs     0.86  bench_lib.Pad.time_pad((1, 1, 1, 1, 1), 1, 'linear_ramp')
-      17.4±0.1μs       14.9±0.1μs     0.86  bench_lib.Nan.time_nansum(200, 0.1)
-     12.4±0.07μs      10.7±0.03μs     0.86  bench_lib.Nan.time_nanmin(200, 2.0)
-     2.61±0.03μs         2.24±0μs     0.86  bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), axis=0))
-      28.8±0.1μs       24.7±0.2μs     0.86  bench_lib.Nan.time_nanargmin(200, 50.0)
-     14.3±0.09μs      12.3±0.08μs     0.86  bench_lib.Nan.time_nancumsum(200, 0)
-     25.3±0.07μs       21.7±0.1μs     0.86  bench_ma.UFunc.time_scalar(True, False, 100)
-      18.1±0.2μs       15.5±0.1μs     0.86  bench_lib.Nan.time_nansum(200, 50.0)
-     25.5±0.07μs       21.8±0.1μs     0.86  bench_ma.UFunc.time_scalar(False, True, 100)
      24.6±0.03ms       21.0±0.2ms    ~0.85  bench_core.CorrConv.time_correlate(100000, 1000, 'full')
-      27.8±0.2μs       23.8±0.1μs     0.85  bench_lib.Nan.time_nanargmax(200, 0.1)
       24.5±0.2ms       20.9±0.2ms    ~0.85  bench_core.CorrConv.time_correlate(100000, 1000, 'same')
      24.3±0.02ms       20.7±0.2ms    ~0.85  bench_core.CorrConv.time_correlate(100000, 1000, 'valid')
-     8.30±0.04μs      7.07±0.04μs     0.85  bench_core.Core.time_vstack_l
-     5.69±0.04μs      4.85±0.02μs     0.85  bench_reduce.SmallReduction.time_small
-      28.0±0.1μs       23.8±0.2μs     0.85  bench_lib.Nan.time_nanargmax(200, 0)
-      28.2±0.2μs      24.0±0.04μs     0.85  bench_lib.Nan.time_nanargmin(200, 0.1)
-     6.74±0.01μs      5.74±0.07μs     0.85  bench_array_coercion.ArrayCoercionSmall.time_asarray(range(0, 3))
-      12.3±0.7μs      10.5±0.04μs     0.85  bench_lib.Nan.time_nanmin(200, 0)
-      17.4±0.1μs      14.8±0.08μs     0.85  bench_lib.Nan.time_nansum(200, 0)
-         775±4μs          658±3μs     0.85  bench_lib.Pad.time_pad((4, 4, 4, 4), 1, 'linear_ramp')
-        1.35±0ms         1.15±0ms     0.85  bench_function_base.Sort.time_argsort('heap', 'int64', ('sorted_block', 1000))
-      25.4±0.1μs       21.6±0.1μs     0.85  bench_ma.UFunc.time_scalar(False, True, 10)
-     5.02±0.02μs      4.27±0.03μs     0.85  bench_core.CorrConv.time_convolve(50, 10, 'valid')
-      28.5±0.2μs      24.2±0.07μs     0.85  bench_lib.Nan.time_nanargmax(200, 90.0)
-      25.7±0.6μs      21.8±0.07μs     0.85  bench_ma.UFunc.time_scalar(True, False, 1000)
-      28.7±0.1μs       24.3±0.1μs     0.85  bench_lib.Nan.time_nanargmin(200, 90.0)
-      28.3±0.1μs       24.0±0.2μs     0.85  bench_lib.Nan.time_nanargmax(200, 2.0)
-     2.45±0.02μs         2.07±0μs     0.85  bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.])))
-      17.8±0.1μs      15.0±0.07μs     0.84  bench_lib.Nan.time_nanprod(200, 90.0)
-      17.6±0.2μs       14.8±0.2μs     0.84  bench_lib.Nan.time_nansum(200, 2.0)
-     17.7±0.06μs      14.9±0.09μs     0.84  bench_lib.Nan.time_nanprod(200, 2.0)
-      25.9±0.3μs      21.8±0.09μs     0.84  bench_ma.UFunc.time_scalar(False, True, 1000)
-     14.4±0.07μs      12.1±0.05μs     0.84  bench_lib.Nan.time_nancumprod(200, 0)
-     1.36±0.03μs      1.14±0.01μs     0.84  bench_core.Core.time_array_l1
-     1.41±0.01ms         1.18±0ms     0.84  bench_function_base.Sort.time_argsort('heap', 'int64', ('sorted_block', 10))
      6.79±0.05μs      5.71±0.07μs    ~0.84  bench_array_coercion.ArrayCoercionSmall.time_asanyarray(range(0, 3))
-      16.8±0.2μs       14.1±0.5μs     0.84  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <class 'str'>)
-         769±5ns          645±5ns     0.84  bench_array_coercion.ArrayCoercionSmall.time_array(5)
-     1.20±0.02μs      1.01±0.02μs     0.84  bench_core.Core.time_array_empty
-      25.8±0.4μs      21.6±0.07μs     0.84  bench_ma.UFunc.time_scalar(True, False, 10)
-      18.2±0.1μs      15.3±0.05μs     0.84  bench_lib.Nan.time_nanprod(200, 50.0)
-      28.4±0.2μs       23.8±0.2μs     0.84  bench_lib.Nan.time_nanargmin(200, 2.0)
-     1.10±0.01μs          924±4ns     0.84  bench_core.Core.time_arange_100
-      50.6±0.2μs       42.2±0.1μs     0.83  bench_lib.Nan.time_nanmean(200, 0.1)
-      17.6±0.3μs      14.7±0.08μs     0.83  bench_lib.Nan.time_nanprod(200, 0)
-      17.5±0.2μs       14.5±0.1μs     0.83  bench_lib.Nan.time_nanprod(200, 0.1)
-      18.7±0.2μs      15.5±0.09μs     0.83  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <class 'str'>)
-     18.0±0.09μs       15.0±0.3μs     0.83  bench_lib.Nan.time_nansum(200, 90.0)
-     7.27±0.02μs      6.03±0.02μs     0.83  bench_array_coercion.ArrayCoercionSmall.time_array_all_kwargs(range(0, 3))
-         699±2ns          579±5ns     0.83  bench_array_coercion.ArrayCoercionSmall.time_array(1)
-     2.72±0.01μs      2.26±0.02μs     0.83  bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), axis=0, dtype=None, out=array(0.)))
-      28.4±0.1μs       23.5±0.1μs     0.83  bench_lib.Nan.time_nanargmin(200, 0)
-     1.02±0.01μs        845±0.7ns     0.83  bench_array_coercion.ArrayCoercionSmall.time_array_dtype_not_kwargs(array([5]))
-      50.8±0.2μs      41.9±0.09μs     0.83  bench_lib.Nan.time_nanmean(200, 2.0)
-     7.00±0.07μs      5.79±0.06μs     0.83  bench_core.Core.time_hstack_l
-         469±3ns          387±1ns     0.83  bench_array_coercion.ArrayCoercionSmall.time_array_no_copy(array([5]))
-      50.8±0.4μs       41.9±0.2μs     0.82  bench_lib.Nan.time_nanmean(200, 0)
-     3.38±0.02μs      2.78±0.01μs     0.82  bench_core.Core.time_ones_100
-      34.8±0.1μs       28.7±0.1μs     0.82  bench_function_base.Sort.time_sort('merge', 'float64', ('ordered',))
-      34.9±0.1μs      28.7±0.08μs     0.82  bench_function_base.Sort.time_sort('merge', 'float64', ('uniform',))
-      51.2±0.5μs       42.1±0.3μs     0.82  bench_lib.Nan.time_nanmean(200, 90.0)
-      74.8±0.1μs       61.4±0.2μs     0.82  bench_function_base.Sort.time_argsort('heap', 'int64', ('uniform',))
-      51.6±0.3μs       42.4±0.1μs     0.82  bench_lib.Nan.time_nanmean(200, 50.0)
-     18.5±0.09μs       15.1±0.2μs     0.82  bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <class 'str'>)
-      85.4±0.2μs         69.7±2μs     0.82  bench_function_base.Sort.time_sort('heap', 'float64', ('uniform',))
-      16.6±0.2μs      13.5±0.05μs     0.82  bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <class 'str'>)
-     1.56±0.01μs      1.27±0.01μs     0.81  bench_array_coercion.ArrayCoercionSmall.time_array([1])
-      30.4±0.1μs       24.7±0.2μs     0.81  bench_function_base.Sort.time_sort('merge', 'float64', ('reversed',))
-     7.01±0.07ms      5.70±0.02ms     0.81  bench_indexing.IndexingSeparate.time_mmap_fancy_indexing
-      106±0.05μs       86.3±0.2μs     0.81  bench_function_base.Sort.time_sort('quick', 'int16', ('ordered',))
-      14.2±0.1μs      11.5±0.06μs     0.81  bench_core.CountNonzero.time_count_nonzero_axis(1, 100, <class 'str'>)
-     1.70±0.01μs      1.38±0.01μs     0.81  bench_array_coercion.ArrayCoercionSmall.time_array_dtype_not_kwargs([1])
-        1.09±0ms        876±0.5μs     0.81  bench_function_base.Sort.time_argsort('heap', 'int64', ('reversed',))
-      19.8±0.1μs       16.0±0.3μs     0.81  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <class 'object'>)
-      19.8±0.1μs      15.9±0.06μs     0.80  bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <class 'object'>)
-     17.6±0.06μs       14.0±0.1μs     0.79  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <class 'object'>)
-      15.6±0.1μs      12.4±0.03μs     0.79  bench_core.Core.time_triu_l10x10
-      15.7±0.2μs       12.4±0.1μs     0.79  bench_core.Core.time_tril_l10x10
-      14.8±0.2μs      11.7±0.07μs     0.79  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <class 'str'>)
-     7.05±0.06μs      5.56±0.02μs     0.79  bench_array_coercion.ArrayCoercionSmall.time_ascontiguousarray(range(0, 3))
-     18.0±0.05μs      14.1±0.07μs     0.79  bench_core.UnpackBits.time_unpackbits_little
-        1.03±0ms          807±2μs     0.78  bench_function_base.Sort.time_argsort('heap', 'int64', ('ordered',))
-     2.70±0.02μs      2.11±0.02μs     0.78  bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), out=array(0.)))
-      17.7±0.2μs      13.7±0.06μs     0.78  bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <class 'object'>)
-     21.4±0.05μs       16.3±0.2μs     0.76  bench_shape_base.Block.time_no_lists(10)
-       311±0.2μs        235±0.3μs     0.76  bench_core.UnpackBits.time_unpackbits_axis1_little
-      8.90±0.8ms      6.74±0.03ms     0.76  bench_lib.Pad.time_pad((256, 128, 1), 8, 'linear_ramp')
-         560±9ns          424±3ns     0.76  bench_core.Core.time_array_1
-     14.7±0.04μs      11.1±0.03μs     0.75  bench_core.CountNonzero.time_count_nonzero_axis(1, 100, <class 'object'>)
-      15.2±0.1μs      11.3±0.02μs     0.74  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <class 'object'>)
         136±20μs          100±6μs    ~0.74  bench_shape_base.Block.time_no_lists(100)
-     13.3±0.06μs      9.61±0.04μs     0.72  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <class 'int'>)
-     11.7±0.05μs      8.42±0.05μs     0.72  bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <class 'bool'>)
-     13.5±0.06μs      9.74±0.07μs     0.72  bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <class 'int'>)
-      13.7±0.2μs      9.81±0.02μs     0.72  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <class 'int'>)
-     11.0±0.09μs       7.88±0.2μs     0.71  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <class 'bool'>)
-      11.5±0.1μs      8.17±0.08μs     0.71  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <class 'bool'>)
-     11.8±0.05μs      8.32±0.08μs     0.71  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <class 'bool'>)
-     1.90±0.01μs         1.33±0μs     0.70  bench_array_coercion.ArrayCoercionSmall.time_array_no_copy([1])
-        20.9±1μs       14.6±0.1μs     0.70  bench_shape_base.Block.time_no_lists(1)
-     12.5±0.08μs      8.73±0.05μs     0.70  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <class 'int'>)
-      10.5±0.2μs      7.28±0.05μs     0.70  bench_core.CountNonzero.time_count_nonzero_axis(1, 100, <class 'bool'>)
-     11.6±0.04μs      8.02±0.02μs     0.69  bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <class 'bool'>)
-      13.4±0.2μs      9.21±0.09μs     0.69  bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <class 'int'>)
-     12.2±0.05μs       8.32±0.1μs     0.68  bench_core.CountNonzero.time_count_nonzero_axis(1, 100, <class 'int'>)
-        2.13±0μs      1.44±0.01μs     0.68  bench_array_coercion.ArrayCoercionSmall.time_asarray_dtype([1])
-       151±0.1μs       102±0.07μs     0.67  bench_function_base.Sort.time_sort('quick', 'int16', ('uniform',))
-     2.12±0.01μs      1.43±0.01μs     0.67  bench_array_coercion.ArrayCoercionSmall.time_asanyarray_dtype([1])
-     1.12±0.01μs          725±2ns     0.65  bench_array_coercion.ArrayCoercionSmall.time_array_no_copy(5)
-     1.05±0.03μs          661±4ns     0.63  bench_array_coercion.ArrayCoercionSmall.time_array_no_copy(1)
-     2.13±0.02μs         1.34±0μs     0.63  bench_array_coercion.ArrayCoercionSmall.time_array_subok([1])
-     2.15±0.02μs      1.34±0.08μs     0.62  bench_array_coercion.ArrayCoercionSmall.time_asanyarray([1])
-     2.74±0.02μs         1.70±0μs     0.62  bench_array_coercion.ArrayCoercionSmall.time_array_all_kwargs([1])
-        1.33±0μs          790±2ns     0.59  bench_array_coercion.ArrayCoercionSmall.time_asarray_dtype(5)
-     1.52±0.01μs          899±4ns     0.59  bench_array_coercion.ArrayCoercionSmall.time_asanyarray_dtype(array([5]))
-     1.51±0.01μs          895±2ns     0.59  bench_array_coercion.ArrayCoercionSmall.time_asarray_dtype(array([5]))
-     1.33±0.01μs          789±6ns     0.59  bench_array_coercion.ArrayCoercionSmall.time_asanyarray_dtype(5)
-     1.51±0.01μs          887±2ns     0.59  bench_array_coercion.ArrayCoercionSmall.time_array_invalid_kwarg(range(0, 3))
-     2.14±0.02μs      1.25±0.01μs     0.59  bench_array_coercion.ArrayCoercionSmall.time_asarray([1])
-        1.51±0μs          884±4ns     0.59  bench_array_coercion.ArrayCoercionSmall.time_array_invalid_kwarg([1])
-     1.52±0.01μs          883±1ns     0.58  bench_array_coercion.ArrayCoercionSmall.time_array_invalid_kwarg(5)
-        1.25±0μs          726±5ns     0.58  bench_array_coercion.ArrayCoercionSmall.time_asanyarray_dtype(1)
-        1.26±0μs          727±7ns     0.58  bench_array_coercion.ArrayCoercionSmall.time_asarray_dtype(1)
-     1.54±0.03μs          882±3ns     0.57  bench_array_coercion.ArrayCoercionSmall.time_array_invalid_kwarg(1)
-     1.56±0.01μs          881±2ns     0.56  bench_array_coercion.ArrayCoercionSmall.time_array_invalid_kwarg(array([5]))
-     1.89±0.04μs      1.04±0.01μs     0.55  bench_array_coercion.ArrayCoercionSmall.time_array_all_kwargs(1)
-     1.93±0.01μs         1.05±0μs     0.54  bench_array_coercion.ArrayCoercionSmall.time_array_all_kwargs(5)
-         619±1ns         337±40ns     0.54  bench_array_coercion.ArrayCoercionSmall.time_asanyarray(array([5]))
         1.35±0μs        734±100ns    ~0.54  bench_array_coercion.ArrayCoercionSmall.time_array_subok(5)
-     1.56±0.03μs          842±2ns     0.54  bench_array_coercion.ArrayCoercionSmall.time_array_subok(array([5]))
-       617±0.4ns        326±0.5ns     0.53  bench_array_coercion.ArrayCoercionSmall.time_asarray(array([5]))
-         639±1ns        329±0.9ns     0.51  bench_array_coercion.ArrayCoercionSmall.time_ascontiguousarray(array([5]))
-     1.27±0.01μs        651±0.9ns     0.51  bench_array_coercion.ArrayCoercionSmall.time_array_subok(1)
-     2.45±0.01μs      1.24±0.01μs     0.51  bench_array_coercion.ArrayCoercionSmall.time_ascontiguousarray([1])
-     1.33±0.01μs          648±4ns     0.49  bench_array_coercion.ArrayCoercionSmall.time_asarray(5)
-     1.34±0.01μs          651±3ns     0.49  bench_array_coercion.ArrayCoercionSmall.time_asanyarray(5)
-        1.58±0μs          753±3ns     0.48  bench_array_coercion.ArrayCoercionSmall.time_array_all_kwargs(array([5]))
-     1.25±0.06μs          584±3ns     0.47  bench_array_coercion.ArrayCoercionSmall.time_asanyarray(1)
-     1.28±0.02μs          584±1ns     0.46  bench_array_coercion.ArrayCoercionSmall.time_asarray(1)
-     1.82±0.02μs         815±30ns     0.45  bench_array_coercion.ArrayCoercionSmall.time_ascontiguousarray(1)
-     1.90±0.01μs          842±2ns     0.44  bench_array_coercion.ArrayCoercionSmall.time_ascontiguousarray(5)

It might strip off a few ns for some reason, but probably not worth the trouble.

seberg · 2019-12-12T00:23:55Z

Hehehe, python 3.6 has METH_FASTCALL, just got to cheat a bit :).

seberg · 2019-12-12T17:20:13Z

Probably __array_function__ wrapped things hardly have a gain from METH_FASTCALL, but I think I can assume that unpacking the arguments twice should be fast compared to the old call overheads. Calls without any kwargs could be the one thing with a slight penalty I guess.

A lot of this is test fixes, since many things that were ValueErrors should really be TypeErrors

Frankly, this probably only helps a real amount for keyword arguments, which is not the most common thing. But METH_FASTCALL makes a lot of sense for reductions where kwargs are much more common.

seberg · 2019-12-16T03:04:43Z

It probably needs quite a bit of cleanup (and splitting up), but it now includes some cleanup of the ufunc code, unfortunately with a lot of #ifdef for METH_FASTCALL, which fairly ugly. tp_vectorcall works on 3.8 (not sure about the symbol name changes python may do in 3.9), but overall, it is currently not too much worth it likely: Ufuncs safe a bit, but not overwhelming, and without kwargs the gain seems just not that much.

…umented and not always there?)

It probably is just not worth the trouble of jumping through hoops to achieve it on other platforms...

mattip · 2019-12-16T21:53:14Z

It seems there is more than one PR here. Maybe you could split it into "code reorganization", "smaller cleanups" and "faster argument parsing" which would make reviewing easier, and would allow benchmarking the cleanups separately

seberg · 2019-12-16T22:04:22Z

Not sure how easy it is to split up all the way, but I can at least split it up into: Faster argparsing + simple example; array coercion and UFunc related fixes. I am planning to do that, but was going to go back to the refcounting now first.

seberg added 4 commits December 11, 2019 12:52

ENH: Implement a faster argument parsing infrastructure

73d4cf3

This replaces the PyArgs_ParseTupleAndKeywords in most cases.

ENH: Use new style argument parser for many functions

eb1a9c5

ENH: Use new style argument parser for array coercion

61940a4

this also moves asarray/asanyarray, etc. to C to make up for the slight loss in argument parsing speeds it should now be preferred to use these functions when possible. This also uses the preferred functions in a few places

BENCH: Add benchmarks for small array coercions

1f0cf6c

seberg mentioned this pull request Dec 11, 2019

ENH: Optimize all python array coercion calls #14029

Closed

eric-wieser reviewed Dec 11, 2019

View reviewed changes

numpy/core/src/multiarray/conversion_utils.c Outdated Show resolved Hide resolved

seberg commented Dec 11, 2019

View reviewed changes

ENH: Use METH_FASTCALL everywhere (undoing ufunc changes)

afd72d4

BUG: Fix missing check...

747d8da

ENH: Use some (annoying) macros to make METH_FASTCALL optional

d8c77f6

seberg force-pushed the faster-argparsing branch from 8471383 to d8c77f6 Compare December 12, 2019 00:16

MAINT: Remove silly super fast-path

66a3bb6

It might strip off a few ns for some reason, but probably not worth the trouble.

MAINT: Argparse cleanups and error message improvements

2d5b87e

seberg added 5 commits December 13, 2019 16:18

MAINT: UFUNC restructure ufunc override and argument parsing code

c70f258

TNY: Fix space in error

9c48291

FIXUPS for UFUNCS

3ca5ea8

A lot of this is test fixes, since many things that were ValueErrors should really be TypeErrors

More Fixes for ufuncs/ufuncoverrides

35a72b8

ENH,WIP: Implement tp_vectorcall for ufuncs

e52e247

Frankly, this probably only helps a real amount for keyword arguments, which is not the most common thing. But METH_FASTCALL makes a lot of sense for reductions where kwargs are much more common.

seberg force-pushed the faster-argparsing branch from 92fba24 to e52e247 Compare December 16, 2019 02:08

Fixup ufunc: define vectorcall with PyObject instead of PyUfuncObject

2d720d3

seberg force-pushed the faster-argparsing branch from 8363955 to 6250e74 Compare December 16, 2019 16:36

Fixup: UFUNC fix signature and PyDict_Size (macro version seems undoc…

e064932

…umented and not always there?)

seberg force-pushed the faster-argparsing branch from 6250e74 to e064932 Compare December 16, 2019 16:49

seberg added 2 commits December 16, 2019 14:36

UFUNC: Limit vectorcall usage to python 3.8+

24ecc53

It probably is just not worth the trouble of jumping through hoops to achieve it on other platforms...

Fixup Ufunc, incorrect type for vectorcall, again...

1aecfa2

seberg closed this Jan 6, 2020

seberg mentioned this pull request Jan 6, 2020

ENH: Implement faster keyword argument parsing capable of METH_FASTCALL #15269

Merged

Uh oh!

Conversation

seberg commented Dec 11, 2019

Uh oh!

Uh oh!

eric-wieser commented Dec 11, 2019

Uh oh!

seberg commented Dec 11, 2019

Uh oh!

eric-wieser commented Dec 11, 2019

Uh oh!

seberg commented Dec 11, 2019

Uh oh!

seberg Dec 11, 2019

Choose a reason for hiding this comment

Uh oh!

seberg commented Dec 11, 2019

Uh oh!

seberg commented Dec 11, 2019

Uh oh!

eric-wieser commented Dec 11, 2019

Uh oh!

seberg commented Dec 11, 2019

Uh oh!

seberg commented Dec 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Dec 12, 2019

Uh oh!

seberg commented Dec 12, 2019

Uh oh!

seberg commented Dec 16, 2019

Uh oh!

mattip commented Dec 16, 2019

Uh oh!

seberg commented Dec 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seberg commented Dec 11, 2019 •

edited

Loading