Skip to content

[REVIEW][Perf] Remove Redundant string concatenations in dask code-base#6137

Merged
quasiben merged 33 commits intodask:masterfrom
galipremsagar:perf
May 8, 2020
Merged

[REVIEW][Perf] Remove Redundant string concatenations in dask code-base#6137
quasiben merged 33 commits intodask:masterfrom
galipremsagar:perf

Conversation

@galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Apr 25, 2020

While profiling an internal cudf code sample, we found that there are some redundant computations being performed in nested for loops.

Here is a minimal profile:

startTime = datetime.now()
k = 1000
t = str(454768798089)
for i in range(1000):
    y = [ "helloworld" + t
        for i in range(k)
        for j in range(100)
    ]
print("old",(datetime.now() - startTime).total_seconds())


startTime = datetime.now()
k = 1000
x = "helloworld" + str(454768798089)
for i in range(1000):
    ll = [ x
        for i in range(k)
        for j in range(100)
    ]
old 10.286778
new 3.931635
  • Tests added / passed
  • Passes black dask / flake8 dask

@galipremsagar galipremsagar changed the title [REVIEW] Remove Redundant string concatenations in dask code-base [REVIEW][Perf] Remove Redundant string concatenations in dask code-base Apr 25, 2020
@quasiben
Copy link
Member

These look reasonable to me but I am biased here. I think it would be good to have an additional maintainer review here before merging

It is worth noting these changes are in the graph construction

@mrocklin
Copy link
Member

mrocklin commented Apr 27, 2020 via email

shuffle_join_token = "shuffle-join-" + token

start = dict(
(("shuffle-join-" + token, 0, inp), (b2.name, i) if i < b.npartitions else [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good @galipremsagar - Although, I do have a minor preference for names like shuffle_join_name rather than ..._token (this is proabably just a personal preference).

@rjzamora
Copy link
Member

Do these changes affect performance in real-world workflows?

This definitely reduces the time needed to build the graph for shuffle operations. I think we should definitley be sure to predefine (concatenate) all task labels outside python loops in the future.

@mrocklin
Copy link
Member

This definitely reduces the time needed to build the graph for shuffle operations. I think we should definitley be sure to predefine (concatenate) all task labels outside python loops in the future.

Awesome. I'm glad to hear it. If you all happen to have a before and after comparison for a large graph, like a dataframe merge, I'd love to see the results. The microbenchmark in the original post shows that this is a great start here in that it shows that it will make things faster, but it could be made stronger by showing how that speedup affects performance in the broader context of Dask.

@galipremsagar
Copy link
Contributor Author

Do these changes affect performance in real-world workflows?

Yes, it does. While testing an internal workflow noticed a 0.5 to 0.8s drop in benchmark times.

@mrocklin
Copy link
Member

mrocklin commented Apr 27, 2020 via email

@rjzamora
Copy link
Member

Can you include a minimal example in the comments of this pull request?
Ideally something not using cudf so that others here can see the effects.
I recommend using dask.datasets.timeseries for this.

Here is a simple example that illustrates the performance bump for shuffle-graph creation in dask.dataframe:

from dask.datasets import timeseries
from dask.dataframe.shuffle import shuffle

ddf_d = timeseries(start='2000-01-01', end='2010-01-01', partition_freq='1d')
ddf_h = timeseries(start='2000-01-01', end='2010-01-01', partition_freq='1h')

%timeit ddf_d_2 = shuffle(ddf_d, "id", shuffle="tasks")
%timeit ddf_h_2 = shuffle(ddf_h, "id", shuffle="tasks")

Master:

# "Day" Partitioning
274 ms ± 389 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Hour Partitioning
12.3 s ± 38.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This PR:

# "Day" Partitioning
217 ms ± 8.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Hour Partitioning
9.72 s ± 54.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Note that the benefit is much more significant when the intial dataframe has a large number of partitions (common for ETL on large-scale datasets in the real-world).

@quasiben
Copy link
Member

Any other concerns with this PR ? If not, I'd like to merge tomorrow afternoon

@galipremsagar
Copy link
Contributor Author

Any other concerns with this PR ? If not, I'd like to merge tomorrow afternoon

@quasiben Sure, few more minor tweaks I have to test. If they work I'll push those as well in this PR and this should be ready for merge.

@galipremsagar
Copy link
Contributor Author

@quasiben This is ready for review.

Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks agian for making these optimizations @galipremsagar - My only suggestion is to make sure we get back the same DataFrame subclass if we are using something like dask_cudf. This suggestion is not related to a change from your PR - but I think it is a good oportunity to fix it.

graph2, "repartition-get-" + token, df2, [None] * (npartitions + 1)
repartition_get_name, dsk, dependencies=[df2]
)
df3 = DataFrame(graph2, repartition_get_name, df2, [None] * (npartitions + 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realizing since we can/do use shuffle on a dask_cudf DataFrame, this should probably be something like df.__class__ rather than DataFrame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the new_dd_object function commonly used in dask/dataframe/core.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change 👍

graph = HighLevelGraph.from_collections("shuffle-" + token, dsk, dependencies=[df])
df2 = DataFrame(graph, "shuffle-" + token, df, df.divisions)
graph = HighLevelGraph.from_collections(shuffle_token, dsk, dependencies=[df])
df2 = DataFrame(graph, shuffle_token, df, df.divisions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below - We might not want DataFrame here.

@quasiben
Copy link
Member

quasiben commented Apr 30, 2020

@galipremsagar can I ask you to post new numbers with and without this PR ?

@galipremsagar
Copy link
Contributor Author

@quasiben Here are the numbers with respect to master and current changes:

from dask.datasets import timeseries
from dask.dataframe.shuffle import shuffle

ddf_d = timeseries(start='2000-01-01', end='2010-01-01', partition_freq='1d')
ddf_h = timeseries(start='2000-01-01', end='2010-01-01', partition_freq='1h')

%timeit ddf_d_2 = shuffle(ddf_d, "id", shuffle="tasks")
%timeit ddf_h_2 = shuffle(ddf_h, "id", shuffle="tasks")

THIS PR:

# "Day" Partitioning
343 ms ± 7.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Hour Partitioning
13.6 s ± 52.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

MASTER:

# "Day" Partitioning
402 ms ± 3.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Hour Partitioning
16.9 s ± 95.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@quasiben
Copy link
Member

@galipremsagar thanks for doing that. I think this is ready to merge but I'd like to give other maintainers time to comment. If there are no more comments I'll plan to merge tonight or tomorrow morning

@galipremsagar
Copy link
Contributor Author

@galipremsagar thanks for doing that. I think this is ready to merge but I'd like to give other maintainers time to comment. If there are no more comments I'll plan to merge tonight or tomorrow morning

Thanks @quasiben ! Sounds good to me.

root and others added 2 commits May 1, 2020 00:07
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@rjzamora
Copy link
Member

rjzamora commented May 5, 2020

@galipremsagar Thanks again for the work here. Do you think this is ready to go now?

@galipremsagar
Copy link
Contributor Author

galipremsagar commented May 5, 2020

@galipremsagar Thanks again for the work here. Do you think this is ready to go now?

Yes, it is except for the hasattr check mentioned here: #6137 (comment)

I'm not sure we can remove the check, do you have any suggestions on that part of code? @rjzamora

Update: Changes are ready for review.

@quasiben
Copy link
Member

quasiben commented May 5, 2020

Thanks for going through all this @galipremsagar and definitely appreciate the timings and snakeviz along the way. @TomAugspurger do you think this PR is worth running agains the dask benchmarks to ensure a few things:

  1. we haven't broken anything
  2. we either have improved performance or, at worst, kept performance the same without degradation ?

I am only asking for an opinion, not for you to do this work

@TomAugspurger
Copy link
Member

TomAugspurger commented May 5, 2020 via email

galipremsagar and others added 6 commits May 5, 2020 15:09
@galipremsagar
Copy link
Contributor Author

Here are the asv results against Master & current PR.

Master

[  0.00%] · For dask commit edd92d58:
[  0.00%] ·· Benchmarking conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz
[  1.09%] ··· Running (array.BlockInfoBlockwise.time_compute--)..........
[ 11.96%] ··· Running (array.Slicing.time_slice_slice_tail--).......
[ 19.57%] ··· Running (dataframe.MemoryDataFrame.time_reduction--).........
[ 29.35%] ··· Running (order.OrderCholesky.time_order_cholesky_lower--)....
[ 33.70%] ··· Running (order.OrderLinalgSolves.time_order_linalg_solve--)...
[ 36.96%] ··· Running (order.OrderManySubgraphs.time_order_many_subgraphs--)..
[ 39.13%] ··· Running (order.OrderRechunkTranspose.time_order_rechunk_transpose--).....
[ 44.57%] ··· Setting up io.py:33                                               ok
[ 44.57%] ··· Running (io.CSV.time_read_csv--)..
[ 46.74%] ··· Setting up io.py:58                                               ok
[ 46.74%] ··· Running (io.HDF5.time_read_hdf5--)..
[ 48.91%] ··· Setting up io.py:85                                               ok
[ 48.91%] ··· Running (io.Parquet.time_optimize_getitem--)..
[ 51.09%] ··· array.BlockInfoBlockwise.time_compute                     3.67±0.02s
[ 52.17%] ··· array.BlockInfoBlockwise.time_optimize                    21.3±0.2ms
[ 53.26%] ··· array.BlockInfoSingleton.time_optimize_singleton             140±4μs
[ 54.35%] ··· array.Blockwise.time_make_blockwise_graph                 2.70±0.02s
[ 55.43%] ··· array.FancyIndexing.time_fancy                            28.8±0.2ms
[ 56.52%] ··· array.Rechunk.time_rechunk                                25.0±0.6ms
[ 57.61%] ··· array.Rechunk.time_rechunk_meta                          10.2±0.03ms
[ 58.70%] ··· array.Slicing.time_slice_int_head                         8.85±0.4ms
[ 59.78%] ··· array.Slicing.time_slice_int_tail                         10.6±0.6ms
[ 60.87%] ··· array.Slicing.time_slice_slice_head                         20.7±1ms
[ 61.96%] ··· array.Slicing.time_slice_slice_tail                         21.0±1ms
[ 63.04%] ··· array.Slicing.time_slices_from_chunks                     1.19±0.01s
[ 64.13%] ··· array.TestSubs.time_subs                                    43.8±1ms
[ 65.22%] ··· array_overlap.MapOverlap.time_map_overlap                         ok
[ 65.22%] ··· ================= ========== ============
                    shape        boundary              
              ----------------- ---------- ------------
               (100, 100, 100)   reflect    62.1±0.7ms 
               (100, 100, 100)   periodic    61.0±1ms  
               (100, 100, 100)   nearest    62.4±0.8ms 
               (100, 100, 100)     none     15.5±0.3ms 
                (50, 512, 512)   reflect     69.8±2ms  
                (50, 512, 512)   periodic    68.5±2ms  
                (50, 512, 512)   nearest     71.2±2ms  
                (50, 512, 512)     none      21.4±1ms  
              ================= ========== ============

[ 66.30%] ··· dataframe.MemoryDataFrame.time_boolean_indexing              126±6ms
[ 67.39%] ··· dataframe.MemoryDataFrame.time_count_values                 39.6±1ms
[ 68.48%] ··· dataframe.MemoryDataFrame.time_groupby                       203±8ms
[ 69.57%] ··· dataframe.MemoryDataFrame.time_reduction                    137±20ms
[ 70.65%] ··· dataframe.MemoryDataFrame.time_scalar_comparison          12.5±0.7ms
[ 71.74%] ··· dataframe.MemoryDataFrame.time_set_index                    324±10ms
[ 72.83%] ··· optimization.Cull.time_cull                                  191±1ms
[ 73.91%] ··· optimization.Fuse.time_fuse                                       ok
[ 73.91%] ··· ========= =========
                param1           
              --------- ---------
               diamond   381±6ms 
                linear   175±3ms 
              ========= =========

[ 75.00%] ··· optimization.Inline.time_inline_constants                 43.2±0.5ms
[ 76.09%] ··· optimization.Inline.time_inline_functions                    259±4ms
[ 77.17%] ··· optimization.Inline.time_inline_keys                         168±4ms
[ 78.26%] ··· order.OrderCholesky.time_order_cholesky                      496±4ms
[ 79.35%] ··· order.OrderCholesky.time_order_cholesky_lower                474±6ms
[ 80.43%] ··· ...r.OrderCholeskyMixed.time_order_cholesky_mixed            384±7ms
[ 81.52%] ··· ...rCholeskyMixed.time_order_cholesky_mixed_lower           354±10ms
[ 82.61%] ··· order.OrderFullLayers.time_order_full_layers                      ok
[ 82.61%] ··· ============ =========
                 param1             
              ------------ ---------
               (1, 50000)   513±6ms 
               (2, 10000)   383±3ms 
               (10, 1000)   372±3ms 
               (100, 20)    416±3ms 
                (500, 2)    571±6ms 
               (9999, 1)    115±2ms 
               (50000, 1)   397±2ms 
              ============ =========

[ 83.70%] ··· order.OrderLinalgSolves.time_order_linalg_solve              340±6ms
[ 84.78%] ··· order.OrderLinearFull.time_order_linear_full              1.48±0.01s
[ 85.87%] ··· ...rLinearWithDanglers.time_order_linear_danglers                 ok
[ 85.87%] ··· ============ =========
                 param1             
              ------------ ---------
               (2, 10000)   191±2ms 
               (5, 5000)    265±2ms 
              ============ =========

[ 86.96%] ··· ...r.OrderManySubgraphs.time_order_many_subgraphs                 ok
[ 86.96%] ··· =========== =========
                 param1            
              ----------- ---------
               (1, 9999)   233±4ms 
               (3, 3333)   239±6ms 
               (10, 999)   258±4ms 
               (30, 303)   255±1ms 
               (100, 99)   277±3ms 
               (999, 10)   295±6ms 
              =========== =========

[ 88.04%] ··· order.OrderMapOverlap.time_order_mapoverlap                       ok
[ 88.04%] ··· =========================================== =========
                                 param1                            
              ------------------------------------------- ---------
                  ((10000.0, 10000.0), (200, 200), 1)      666±5ms 
               ((1000, 1000, 1000), (100, 100, 100), 10)   707±5ms 
              =========================================== =========

[ 89.13%] ··· ...rRechunkTranspose.time_order_rechunk_transpose           797±10ms
[ 90.22%] ··· order.OrderSVD.time_order_svd                             83.8±0.2ms
[ 91.30%] ··· tokenize.TokenizeBuiltins.time_tokenize                      197±1ms
[ 92.39%] ··· tokenize.TokenizeNumpy.time_tokenize                              ok
[ 92.39%] ··· ======== =============
               dtype                
              -------- -------------
                int       606±3μs   
               float      602±10μs  
                str     2.43±0.05ms 
               bytes      447±9μs   
               object     91.8±2ms  
              ======== =============

[ 93.48%] ··· tokenize.TokenizePandas.time_tokenize                             ok
[ 93.48%] ··· ===================== ============= =============
              --                             as_series         
              --------------------- ---------------------------
                      dtype              True         False    
              ===================== ============= =============
                      period           62.7±1μs     29.8±0.7μs 
                  datetime64[ns]       59.9±2μs     36.6±0.5μs 
               datetime64[ns, CET]    62.0±0.9μs    32.2±0.1μs 
                       int           64.9±0.08μs     24.3±1μs  
                     category          202±2μs      48.5±0.3μs 
                      sparse          68.6±0.2μs   30.3±0.05μs 
                       Int           1.23±0.01ms   1.19±0.01ms 
                      string           465±4μs       411±6μs   
                     boolean           871±10μs      820±1μs   
              ===================== ============= =============

[ 94.57%] ··· io.CSV.time_read_csv                                              ok
[ 94.57%] ··· ================= ============
                    param1                  
              ----------------- ------------
               single-threaded    279±4ms   
                  processes      1.96±0.02s 
                   threads        306±2ms   
              ================= ============

[ 95.65%] ··· io.CSV.time_read_csv_meta                                         ok
[ 95.65%] ··· ================= ============
                    param1                  
              ----------------- ------------
               single-threaded   28.3±0.1ms 
                  processes      29.2±0.4ms 
                   threads       28.7±0.3ms 
              ================= ============

[ 96.74%] ··· io.HDF5.time_read_hdf5                                            ok
[ 96.74%] ··· ================= =========
                    param1               
              ----------------- ---------
               single-threaded   393±3ms 
                  processes        n/a   
                   threads       409±9ms 
              ================= =========

[ 97.83%] ··· io.HDF5.time_read_hdf5_meta                                       ok
[ 97.83%] ··· ================= ===========
                    param1                 
              ----------------- -----------
               single-threaded   258±0.9ms 
                  processes         n/a    
                   threads       259±0.6ms 
              ================= ===========

[ 98.91%] ··· io.Parquet.time_optimize_getitem                          37.9±0.9ms
[100.00%] ··· io.Parquet.time_read_getitem_projection                     693±20ms

This PR

root@dt07:/cudf/dask-benchmarks/dask# asv run
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz.
·· Building 14708abf for conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz...
·· Installing 14708abf into conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz..
· Running 46 total benchmarks (1 commits * 1 environments * 46 benchmarks)
[  0.00%] · For dask commit 14708abf:
[  0.00%] ·· Benchmarking conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz
[  1.09%] ··· Running (array.BlockInfoBlockwise.time_compute--)........
[  9.78%] ··· Running (array.Slicing.time_slice_int_tail--)........
[ 18.48%] ··· Running (dataframe.MemoryDataFrame.time_groupby--)......
[ 25.00%] ··· Running (optimization.Inline.time_inline_constants--)......
[ 31.52%] ··· Running (order.OrderCholeskyMixed.time_order_cholesky_mixed_lower--)...
[ 34.78%] ··· Running (order.OrderLinearFull.time_order_linear_full--)....
[ 39.13%] ··· Running (order.OrderRechunkTranspose.time_order_rechunk_transpose--).....
[ 44.57%] ··· Setting up io.py:33                                               ok
[ 44.57%] ··· Running (io.CSV.time_read_csv--)..
[ 46.74%] ··· Setting up io.py:58                                               ok
[ 46.74%] ··· Running (io.HDF5.time_read_hdf5--)..
[ 48.91%] ··· Setting up io.py:85                                               ok
[ 48.91%] ··· Running (io.Parquet.time_optimize_getitem--)..
[ 51.09%] ··· array.BlockInfoBlockwise.time_compute                     3.62±0.02s
[ 52.17%] ··· array.BlockInfoBlockwise.time_optimize                    21.3±0.7ms
[ 53.26%] ··· array.BlockInfoSingleton.time_optimize_singleton             134±3μs
[ 54.35%] ··· array.Blockwise.time_make_blockwise_graph                 2.63±0.01s
[ 55.43%] ··· array.FancyIndexing.time_fancy                            27.7±0.8ms
[ 56.52%] ··· array.Rechunk.time_rechunk                                26.4±0.9ms
[ 57.61%] ··· array.Rechunk.time_rechunk_meta                           10.2±0.1ms
[ 58.70%] ··· array.Slicing.time_slice_int_head                         10.6±0.5ms
[ 59.78%] ··· array.Slicing.time_slice_int_tail                         10.5±0.6ms
[ 60.87%] ··· array.Slicing.time_slice_slice_head                         20.3±1ms
[ 61.96%] ··· array.Slicing.time_slice_slice_tail                       20.4±0.9ms
[ 63.04%] ··· array.Slicing.time_slices_from_chunks                     1.16±0.01s
[ 64.13%] ··· array.TestSubs.time_subs                                  43.6±0.4ms
[ 65.22%] ··· array_overlap.MapOverlap.time_map_overlap                         ok
[ 65.22%] ··· ================= ========== ============
                    shape        boundary              
              ----------------- ---------- ------------
               (100, 100, 100)   reflect    61.5±0.7ms 
               (100, 100, 100)   periodic   62.2±0.6ms 
               (100, 100, 100)   nearest    62.7±0.6ms 
               (100, 100, 100)     none     15.7±0.3ms 
                (50, 512, 512)   reflect     69.3±2ms  
                (50, 512, 512)   periodic    69.9±2ms  
                (50, 512, 512)   nearest     70.6±1ms  
                (50, 512, 512)     none      23.1±1ms  
              ================= ========== ============

[ 66.30%] ··· dataframe.MemoryDataFrame.time_boolean_indexing              132±9ms
[ 67.39%] ··· dataframe.MemoryDataFrame.time_count_values               41.0±0.8ms
[ 68.48%] ··· dataframe.MemoryDataFrame.time_groupby                       201±9ms
[ 69.57%] ··· dataframe.MemoryDataFrame.time_reduction                     133±4ms
[ 70.65%] ··· dataframe.MemoryDataFrame.time_scalar_comparison          12.4±0.6ms
[ 71.74%] ··· dataframe.MemoryDataFrame.time_set_index                     326±6ms
[ 72.83%] ··· optimization.Cull.time_cull                                  172±1ms
[ 73.91%] ··· optimization.Fuse.time_fuse                                       ok
[ 73.91%] ··· ========= =========
                param1           
              --------- ---------
               diamond   372±2ms 
                linear   175±2ms 
              ========= =========

[ 75.00%] ··· optimization.Inline.time_inline_constants                 42.7±0.6ms
[ 76.09%] ··· optimization.Inline.time_inline_functions                    265±5ms
[ 77.17%] ··· optimization.Inline.time_inline_keys                         171±3ms
[ 78.26%] ··· order.OrderCholesky.time_order_cholesky                      488±9ms
[ 79.35%] ··· order.OrderCholesky.time_order_cholesky_lower                466±3ms
[ 80.43%] ··· ...r.OrderCholeskyMixed.time_order_cholesky_mixed            376±7ms
[ 81.52%] ··· ...rCholeskyMixed.time_order_cholesky_mixed_lower            337±3ms
[ 82.61%] ··· order.OrderFullLayers.time_order_full_layers                      ok
[ 82.61%] ··· ============ =========
                 param1             
              ------------ ---------
               (1, 50000)   506±8ms 
               (2, 10000)   365±6ms 
               (10, 1000)   369±2ms 
               (100, 20)    419±3ms 
                (500, 2)    575±6ms 
               (9999, 1)    119±2ms 
               (50000, 1)   400±4ms 
              ============ =========

[ 83.70%] ··· order.OrderLinalgSolves.time_order_linalg_solve              334±2ms
[ 84.78%] ··· order.OrderLinearFull.time_order_linear_full              1.44±0.01s
[ 85.87%] ··· ...rLinearWithDanglers.time_order_linear_danglers                 ok
[ 85.87%] ··· ============ ===========
                 param1               
              ------------ -----------
               (2, 10000)    189±2ms  
               (5, 5000)    268±0.7ms 
              ============ ===========

[ 86.96%] ··· ...r.OrderManySubgraphs.time_order_many_subgraphs                 ok
[ 86.96%] ··· =========== =========
                 param1            
              ----------- ---------
               (1, 9999)   228±2ms 
               (3, 3333)   240±3ms 
               (10, 999)   257±7ms 
               (30, 303)   252±6ms 
               (100, 99)   275±3ms 
               (999, 10)   284±2ms 
              =========== =========

[ 88.04%] ··· order.OrderMapOverlap.time_order_mapoverlap                       ok
[ 88.04%] ··· =========================================== =========
                                 param1                            
              ------------------------------------------- ---------
                  ((10000.0, 10000.0), (200, 200), 1)      650±7ms 
               ((1000, 1000, 1000), (100, 100, 100), 10)   696±5ms 
              =========================================== =========

[ 89.13%] ··· ...rRechunkTranspose.time_order_rechunk_transpose           801±10ms
[ 90.22%] ··· order.OrderSVD.time_order_svd                             84.0±0.5ms
[ 91.30%] ··· tokenize.TokenizeBuiltins.time_tokenize                    192±0.9ms
[ 92.39%] ··· tokenize.TokenizeNumpy.time_tokenize                              ok
[ 92.39%] ··· ======== =============
               dtype                
              -------- -------------
                int       589±7μs
               float      580±6μs   
                str     2.39±0.05ms 
               bytes      429±8μs   
               object     89.8±2ms
              ======== =============

[ 93.48%] ··· tokenize.TokenizePandas.time_tokenize                             ok
[ 93.48%] ··· ===================== ============ =============
              --                            as_series         
              --------------------- --------------------------
                      dtype             True         False    
              ===================== ============ =============
                      period          62.1±1μs    30.4±0.02μs 
                  datetime64[ns]      59.6±2μs     36.8±0.9μs 
               datetime64[ns, CET]    64.4±2μs      32.8±2μs  
                       int           63.2±0.1μs   24.5±0.03μs 
                     category         209±7μs       48.6±2μs  
                      sparse         69.0±0.2μs    30.3±0.1μs 
                       Int            1.23±0ms    1.20±0.02ms 
                      string          463±6μs       409±7μs   
                     boolean          862±1μs       826±8μs   
              ===================== ============ =============

[ 94.57%] ··· io.CSV.time_read_csv                                              ok
[ 94.57%] ··· ================= ============
                    param1                  
              ----------------- ------------
               single-threaded    280±1ms   
                  processes      2.00±0.01s 
                   threads        308±2ms   
              ================= ============

[ 95.65%] ··· io.CSV.time_read_csv_meta                                         ok
[ 95.65%] ··· ================= =============
                    param1                   
              ----------------- -------------
               single-threaded    29.0±0.1ms 
                  processes       29.1±0.8ms 
                   threads       28.3±0.09ms 
              ================= =============

[ 96.74%] ··· io.HDF5.time_read_hdf5                                            ok
[ 96.74%] ··· ================= =========
                    param1               
              ----------------- ---------
               single-threaded   396±2ms 
                  processes        n/a   
                   threads       409±4ms 
              ================= =========

[ 97.83%] ··· io.HDF5.time_read_hdf5_meta                                       ok
[ 97.83%] ··· ================= =========
                    param1               
              ----------------- ---------
               single-threaded   263±4ms 
                  processes        n/a   
                   threads       265±5ms 
              ================= =========

[ 98.91%] ··· io.Parquet.time_optimize_getitem                          37.7±0.3ms
[100.00%] ··· io.Parquet.time_read_getitem_projection                     691±20ms

Except for dataframe.MemoryDataFrame.time_boolean_indexing rest all numbers look improved or reasonable to me.

@kkraus14
Copy link
Member

kkraus14 commented May 6, 2020

@galipremsagar is there any chance you could organize the above into a table? I'm struggling to read the ASV output above so I imagine others will as well :).

@galipremsagar
Copy link
Contributor Author

Apologies for not organizing it in a table in first-place.

Test Master This PR
array.BlockInfoBlockwise.time_compute 3.67±0.02s 3.62±0.02s
array.BlockInfoBlockwise.time_optimize 21.3±0.2ms 21.3±0.7ms
array.BlockInfoSingleton.time_optimize_singleton 140±4μs 134±3μs
array.Blockwise.time_make_blockwise_graph 2.70±0.02s 2.63±0.01s
array.FancyIndexing.time_fancy 28.8±0.2ms 27.7±0.8ms
array.Rechunk.time_rechunk 25.0±0.6ms 26.4±0.9ms
array.Rechunk.time_rechunk_meta 10.2±0.03ms 10.2±0.1ms
array.Slicing.time_slice_int_head 8.85±0.4ms 10.6±0.5ms
array.Slicing.time_slice_int_tail 10.6±0.6ms 10.5±0.6ms
array.Slicing.time_slice_slice_head 20.7±1ms 20.3±1ms
array.Slicing.time_slice_slice_tail 21.0±1ms 20.4±0.9ms
array.Slicing.time_slices_from_chunks 1.19±0.01s 1.16±0.01s
array.TestSubs.time_subs 43.8±1ms 43.6±0.4ms
dataframe.MemoryDataFrame.time_boolean_indexing 126±6ms 132±9ms
dataframe.MemoryDataFrame.time_count_values 39.6±1ms 41.0±0.8ms
dataframe.MemoryDataFrame.time_groupby 203±8ms 201±9ms
dataframe.MemoryDataFrame.time_reduction 137±20ms 133±4ms
dataframe.MemoryDataFrame.time_scalar_comparison 12.5±0.7ms 12.4±0.6ms
dataframe.MemoryDataFrame.time_set_index 324±10ms 326±6ms
optimization.Cull.time_cull 191±1ms 172±1ms
optimization.Inline.time_inline_constants 43.2±0.5ms 42.7±0.6ms
optimization.Inline.time_inline_functions 259±4ms 265±5ms
optimization.Inline.time_inline_keys 168±4ms 171±3ms
order.OrderCholesky.time_order_cholesky 496±4ms 488±9ms
order.OrderCholesky.time_order_cholesky_lower 474±6ms 466±3ms
...r.OrderCholeskyMixed.time_order_cholesky_mixed 384±7ms 376±7ms
...rCholeskyMixed.time_order_cholesky_mixed_lower 354±10ms 337±3ms
order.OrderLinalgSolves.time_order_linalg_solve 340±6ms 334±2ms
order.OrderLinearFull.time_order_linear_full 1.48±0.01s 1.44±0.01s
...rRechunkTranspose.time_order_rechunk_transpose 797±10ms 801±10ms
order.OrderSVD.time_order_svd 83.8±0.2ms 84.0±0.5ms
tokenize.TokenizeBuiltins.time_tokenize 197±1ms 192±0.9ms
io.Parquet.time_optimize_getitem 37.9±0.9ms 37.7±0.3ms
io.Parquet.time_read_getitem_projection 693±20ms 691±20ms

array_overlap.MapOverlap.time_map_overlap

shape boundary Master This PR
(100, 100, 100) reflect 62.1±0.7ms 61.5±0.7ms
(100, 100, 100) periodic 61.0±1ms 62.2±0.6ms
(100, 100, 100) nearest 62.4±0.8ms 62.7±0.6ms
(100, 100, 100) none 15.5±0.3ms 15.7±0.3ms
(50, 512, 512) reflect 69.8±2ms 69.3±2ms
(50, 512, 512) periodic 68.5±2ms 69.9±2ms
(50, 512, 512) nearest 71.2±2ms 70.6±1ms
(50, 512, 512) none 21.4±1ms 23.1±1ms

optimization.Fuse.time_fuse

param1 Master This PR
diamond 381±6ms 372±2ms
linear 175±3ms 175±2ms

order.OrderFullLayers.time_order_full_layers

param1 Master This PR
(1, 50000) 513±6ms 506±8ms
(2, 10000) 383±3ms 365±6ms
(10, 1000) 372±3ms 369±2ms
(100, 20) 416±3ms 419±3ms
(500, 2) 571±6ms 575±6ms
(9999, 1) 115±2ms 119±2ms
(50000, 1) 397±2ms 400±4ms

...rLinearWithDanglers.time_order_linear_danglers

param1 Master This PR
(2, 10000) 191±2ms 189±2ms
(5, 5000) 265±2ms 268±0.7ms

...r.OrderManySubgraphs.time_order_many_subgraphs

param1 Master This PR
(1, 9999) 233±4ms 228±2ms
(3, 3333) 239±6ms 240±3ms
(10, 999) 258±4ms 257±7ms
(30, 303) 255±1ms 252±6ms
(100, 99) 277±3ms 275±3ms
(999, 10) 295±6ms 284±2ms

order.OrderMapOverlap.time_order_mapoverlap

param1 Master This PR
((10000.0, 10000.0), (200, 200), 1) 666±5ms 650±7ms
((1000, 1000, 1000), (100, 100, 100), 10) 707±5ms 696±5ms

tokenize.TokenizeNumpy.time_tokenize

dtype Master This PR
int 606±3μs 589±7μs
float 602±10μs 580±6μs
str 2.43±0.05ms 2.39±0.05ms
bytes 447±9μs 429±8μs
object 91.8±2ms 89.8±2ms

tokenize.TokenizePandas.time_tokenize

dtype as_series=True(Master) as_series=True(This PR ) as_series=False(Master) as_series=False(This PR)
period 62.7±1μs 62.1±1μs 29.8±0.7μs 30.4±0.02μs
datetime64[ns] 59.9±2μs 59.6±2μs 36.6±0.5μs 36.8±0.9μs
datetime64[ns, CET] 62.0±0.9μs 64.4±2μs 32.2±0.1μs 32.8±2μs
int 64.9±0.08μs 63.2±0.1μs 24.3±1μs 24.5±0.03μs
category 202±2μs 209±7μs 48.5±0.3μs 48.6±2μs
sparse 68.6±0.2μs 69.0±0.2μs 30.3±0.05μs 30.3±0.1μs
Int 1.23±0.01ms 1.23±0ms 1.19±0.01ms 1.20±0.02ms
string 465±4μs 463±6μs 411±6μs 409±7μs
boolean 871±10μs 862±1μs 820±1μs 826±8μs

io.CSV.time_read_csv

param1 Master This PR
single-threaded 279±4ms 280±1ms
processes 1.96±0.02s 2.00±0.01s
threads 306±2ms 308±2ms

io.CSV.time_read_csv_meta

param1 Master This PR
single-threaded 28.3±0.1ms 29.0±0.1ms
processes 29.2±0.4ms 29.1±0.8ms
threads 28.7±0.3ms 28.3±0.09ms

io.HDF5.time_read_hdf5

param1 Master This PR
single-threaded 393±3ms 396±2ms
processes n/a n/a
threads 409±9ms 409±4ms

io.HDF5.time_read_hdf5_meta

param1 Master This PR
single-threaded 258±0.9ms 263±4ms
processes n/a n/a
threads 259±0.6ms 265±5ms

@kkraus14
Copy link
Member

kkraus14 commented May 6, 2020

Thanks @galipremsagar!

@quasiben
Copy link
Member

quasiben commented May 6, 2020

When I scanned through this only 3 items stood out and even then they are fairly close:

Test Master This PR
optimization.Cull.time_cull 191±1ms 172±1ms
map_overlap 21.4±1ms 23.1±1ms
order.OrderFullLayers.time_order_full_layers (2, 10000) 383±3ms 365±6ms

If someone else has time it would be go to recheck if there are any other outliers in the ASV benchmarks.

@TomAugspurger
Copy link
Member

TomAugspurger commented May 6, 2020

Just FYI, our ASV's likely aren't sensitive enough to pick up the gains here. For context, in the pandas benchmarks I watch for regressions I ignore any differences smaller than ~30 - 40%.

We might be able to construct a micro-benchmark that targets just graph construction improvements here and have a reasonable chance of seeing these improvements.

@quasiben
Copy link
Member

quasiben commented May 7, 2020

Thanks @TomAugspurger I think for this PR we are good to merge in. However, we'd like to pick up adding a mirco-benchmark for graph construction if that's ok with you?

@TomAugspurger
Copy link
Member

TomAugspurger commented May 7, 2020 via email

@quasiben
Copy link
Member

quasiben commented May 7, 2020

Thanks. I'll plan to merge tomorrow morning unless anyone else has concerns

@quasiben
Copy link
Member

quasiben commented May 8, 2020

Thanks again all. Merging in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants