[REVIEW][Perf] Remove Redundant string concatenations in dask code-base by galipremsagar · Pull Request #6137 · dask/dask

galipremsagar · 2020-04-25T17:39:51Z

While profiling an internal cudf code sample, we found that there are some redundant computations being performed in nested for loops.

Here is a minimal profile:

startTime = datetime.now()
k = 1000
t = str(454768798089)
for i in range(1000):
    y = [ "helloworld" + t
        for i in range(k)
        for j in range(100)
    ]
print("old",(datetime.now() - startTime).total_seconds())


startTime = datetime.now()
k = 1000
x = "helloworld" + str(454768798089)
for i in range(1000):
    ll = [ x
        for i in range(k)
        for j in range(100)
    ]

old 10.286778
new 3.931635

Tests added / passed
Passes black dask / flake8 dask

quasiben · 2020-04-27T14:10:11Z

These look reasonable to me but I am biased here. I think it would be good to have an additional maintainer review here before merging

It is worth noting these changes are in the graph construction

mrocklin · 2020-04-27T14:25:14Z

Do these changes affect performance in real-world workflows?

…

On Mon, Apr 27, 2020 at 7:10 AM Benjamin Zaitlen ***@***.***> wrote: These look reasonable to me but I am biased here. I think it would be good to have an additional maintainer review here before merging It is worth noting these changes are in the graph construction — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6137 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTCREYSCA7P2QKADKLDROWG5JANCNFSM4MQ2ZTIA> .

rjzamora · 2020-04-27T15:09:44Z

dask/bag/core.py

+    shuffle_join_token = "shuffle-join-" + token
+
    start = dict(
-        (("shuffle-join-" + token, 0, inp), (b2.name, i) if i < b.npartitions else [])


These changes look good @galipremsagar - Although, I do have a minor preference for names like shuffle_join_name rather than ..._token (this is proabably just a personal preference).

rjzamora · 2020-04-27T15:14:57Z

Do these changes affect performance in real-world workflows?

This definitely reduces the time needed to build the graph for shuffle operations. I think we should definitley be sure to predefine (concatenate) all task labels outside python loops in the future.

mrocklin · 2020-04-27T15:17:00Z

This definitely reduces the time needed to build the graph for shuffle operations. I think we should definitley be sure to predefine (concatenate) all task labels outside python loops in the future.

Awesome. I'm glad to hear it. If you all happen to have a before and after comparison for a large graph, like a dataframe merge, I'd love to see the results. The microbenchmark in the original post shows that this is a great start here in that it shows that it will make things faster, but it could be made stronger by showing how that speedup affects performance in the broader context of Dask.

galipremsagar · 2020-04-27T15:19:30Z

Do these changes affect performance in real-world workflows?

Yes, it does. While testing an internal workflow noticed a 0.5 to 0.8s drop in benchmark times.

mrocklin · 2020-04-27T15:44:30Z

Can you include a minimal example in the comments of this pull request? Ideally something not using cudf so that others here can see the effects. I recommend using dask.datasets.timeseries for this.

…

On Mon, Apr 27, 2020 at 8:19 AM GALI PREM SAGAR ***@***.***> wrote: Do these changes affect performance in real-world workflows? Yes, it does. While testing an internal workflow noticed a 0.5 to 0.8s drop in benchmark times. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6137 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTGGPPAX7P77CCGVJX3ROWPBFANCNFSM4MQ2ZTIA> .

rjzamora · 2020-04-27T16:37:06Z

Can you include a minimal example in the comments of this pull request?
Ideally something not using cudf so that others here can see the effects.
I recommend using dask.datasets.timeseries for this.

Here is a simple example that illustrates the performance bump for shuffle-graph creation in dask.dataframe:

from dask.datasets import timeseries
from dask.dataframe.shuffle import shuffle

ddf_d = timeseries(start='2000-01-01', end='2010-01-01', partition_freq='1d')
ddf_h = timeseries(start='2000-01-01', end='2010-01-01', partition_freq='1h')

%timeit ddf_d_2 = shuffle(ddf_d, "id", shuffle="tasks")
%timeit ddf_h_2 = shuffle(ddf_h, "id", shuffle="tasks")

Master:

# "Day" Partitioning
274 ms ± 389 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Hour Partitioning
12.3 s ± 38.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This PR:

# "Day" Partitioning
217 ms ± 8.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Hour Partitioning
9.72 s ± 54.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Note that the benefit is much more significant when the intial dataframe has a large number of partitions (common for ETL on large-scale datasets in the real-world).

quasiben · 2020-04-29T01:49:01Z

Any other concerns with this PR ? If not, I'd like to merge tomorrow afternoon

galipremsagar · 2020-04-29T01:51:03Z

Any other concerns with this PR ? If not, I'd like to merge tomorrow afternoon

@quasiben Sure, few more minor tweaks I have to test. If they work I'll push those as well in this PR and this should be ready for merge.

galipremsagar · 2020-04-29T05:16:39Z

@quasiben This is ready for review.

rjzamora

Thanks agian for making these optimizations @galipremsagar - My only suggestion is to make sure we get back the same DataFrame subclass if we are using something like dask_cudf. This suggestion is not related to a change from your PR - but I think it is a good oportunity to fix it.

rjzamora · 2020-04-29T14:00:20Z

dask/dataframe/shuffle.py

-            graph2, "repartition-get-" + token, df2, [None] * (npartitions + 1)
+            repartition_get_name, dsk, dependencies=[df2]
        )
+        df3 = DataFrame(graph2, repartition_get_name, df2, [None] * (npartitions + 1))


Just realizing since we can/do use shuffle on a dask_cudf DataFrame, this should probably be something like df.__class__ rather than DataFrame.

See the new_dd_object function commonly used in dask/dataframe/core.py

Made the change 👍

rjzamora · 2020-04-29T14:02:24Z

dask/dataframe/shuffle.py

-    graph = HighLevelGraph.from_collections("shuffle-" + token, dsk, dependencies=[df])
-    df2 = DataFrame(graph, "shuffle-" + token, df, df.divisions)
+    graph = HighLevelGraph.from_collections(shuffle_token, dsk, dependencies=[df])
+    df2 = DataFrame(graph, shuffle_token, df, df.divisions)


See comment below - We might not want DataFrame here.

quasiben · 2020-04-30T11:36:49Z

@galipremsagar can I ask you to post new numbers with and without this PR ?

galipremsagar · 2020-04-30T15:42:48Z

@quasiben Here are the numbers with respect to master and current changes:

from dask.datasets import timeseries
from dask.dataframe.shuffle import shuffle

ddf_d = timeseries(start='2000-01-01', end='2010-01-01', partition_freq='1d')
ddf_h = timeseries(start='2000-01-01', end='2010-01-01', partition_freq='1h')

%timeit ddf_d_2 = shuffle(ddf_d, "id", shuffle="tasks")
%timeit ddf_h_2 = shuffle(ddf_h, "id", shuffle="tasks")

THIS PR:

# "Day" Partitioning
343 ms ± 7.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Hour Partitioning
13.6 s ± 52.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

MASTER:

# "Day" Partitioning
402 ms ± 3.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Hour Partitioning
16.9 s ± 95.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

quasiben · 2020-04-30T16:01:51Z

@galipremsagar thanks for doing that. I think this is ready to merge but I'd like to give other maintainers time to comment. If there are no more comments I'll plan to merge tonight or tomorrow morning

galipremsagar · 2020-04-30T16:03:30Z

@galipremsagar thanks for doing that. I think this is ready to merge but I'd like to give other maintainers time to comment. If there are no more comments I'll plan to merge tonight or tomorrow morning

Thanks @quasiben ! Sounds good to me.

# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

rjzamora · 2020-05-05T16:00:30Z

@galipremsagar Thanks again for the work here. Do you think this is ready to go now?

galipremsagar · 2020-05-05T16:03:14Z

@galipremsagar Thanks again for the work here. Do you think this is ready to go now?

~~Yes, it is except for the hasattr check mentioned here: #6137 (comment)~~

I'm not sure we can remove the check, do you have any suggestions on that part of code? @rjzamora

Update: Changes are ready for review.

quasiben · 2020-05-05T20:35:31Z

Thanks for going through all this @galipremsagar and definitely appreciate the timings and snakeviz along the way. @TomAugspurger do you think this PR is worth running agains the dask benchmarks to ensure a few things:

we haven't broken anything
we either have improved performance or, at worst, kept performance the same without degradation ?

I am only asking for an opinion, not for you to do this work

TomAugspurger · 2020-05-05T20:50:18Z

A new benchmark at dask-benchmarks demonstrating the performance gain here would be great to have, to help ensure that we don't regress on this gain. For testing, you can change the path to dask in `dask/asv.conf.json`. I have my layout like ``` dask/ dask-benchmarks/ ``` so ```diff diff --git a/dask/asv.conf.json b/dask/asv.conf.json index 147e999..2b3f673 100644 --- a/dask/asv.conf.json +++ b/dask/asv.conf.json @@ -11,7 +11,7 @@ // The URL or local path of the source code repository for the // project being benchmarked - "repo": "https://github.com/dask/dask.git", + "repo": "../../dask", // List of branches to benchmark. If not provided, defaults to "master" // (for git) or "default" (for mercurial). ``` then you can compare master against your HEAD to see the improvement.

…

On Tue, May 5, 2020 at 3:35 PM Benjamin Zaitlen ***@***.***> wrote: Thanks for going through all this @galipremsagar <https://github.com/galipremsagar> and definitely appreciate the timings and snakeviz along the way. @TomAugspurger <https://github.com/TomAugspurger> do you think this PR is worth running agains the dask benchmarks <https://github.com/dask/dask-benchmarks> to ensure a few things: 1. we haven't broken anything 2. we either have improved performance or, at worst, kept performance the same without degradation ? I am only asking for an opinion, not for you to do this work — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6137 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIXMVFSK5CT6FHFBPZ3RQB2CFANCNFSM4MQ2ZTIA> .

dask/array/core.py

dask/array/creation.py

Co-authored-by: Keith Kraus <keith.j.kraus@gmail.com>

galipremsagar · 2020-05-06T00:52:27Z

Here are the asv results against Master & current PR.

Master

[  0.00%] · For dask commit edd92d58:
[  0.00%] ·· Benchmarking conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz
[  1.09%] ··· Running (array.BlockInfoBlockwise.time_compute--)..........
[ 11.96%] ··· Running (array.Slicing.time_slice_slice_tail--).......
[ 19.57%] ··· Running (dataframe.MemoryDataFrame.time_reduction--).........
[ 29.35%] ··· Running (order.OrderCholesky.time_order_cholesky_lower--)....
[ 33.70%] ··· Running (order.OrderLinalgSolves.time_order_linalg_solve--)...
[ 36.96%] ··· Running (order.OrderManySubgraphs.time_order_many_subgraphs--)..
[ 39.13%] ··· Running (order.OrderRechunkTranspose.time_order_rechunk_transpose--).....
[ 44.57%] ··· Setting up io.py:33                                               ok
[ 44.57%] ··· Running (io.CSV.time_read_csv--)..
[ 46.74%] ··· Setting up io.py:58                                               ok
[ 46.74%] ··· Running (io.HDF5.time_read_hdf5--)..
[ 48.91%] ··· Setting up io.py:85                                               ok
[ 48.91%] ··· Running (io.Parquet.time_optimize_getitem--)..
[ 51.09%] ··· array.BlockInfoBlockwise.time_compute                     3.67±0.02s
[ 52.17%] ··· array.BlockInfoBlockwise.time_optimize                    21.3±0.2ms
[ 53.26%] ··· array.BlockInfoSingleton.time_optimize_singleton             140±4μs
[ 54.35%] ··· array.Blockwise.time_make_blockwise_graph                 2.70±0.02s
[ 55.43%] ··· array.FancyIndexing.time_fancy                            28.8±0.2ms
[ 56.52%] ··· array.Rechunk.time_rechunk                                25.0±0.6ms
[ 57.61%] ··· array.Rechunk.time_rechunk_meta                          10.2±0.03ms
[ 58.70%] ··· array.Slicing.time_slice_int_head                         8.85±0.4ms
[ 59.78%] ··· array.Slicing.time_slice_int_tail                         10.6±0.6ms
[ 60.87%] ··· array.Slicing.time_slice_slice_head                         20.7±1ms
[ 61.96%] ··· array.Slicing.time_slice_slice_tail                         21.0±1ms
[ 63.04%] ··· array.Slicing.time_slices_from_chunks                     1.19±0.01s
[ 64.13%] ··· array.TestSubs.time_subs                                    43.8±1ms
[ 65.22%] ··· array_overlap.MapOverlap.time_map_overlap                         ok
[ 65.22%] ··· ================= ========== ============
                    shape        boundary              
              ----------------- ---------- ------------
               (100, 100, 100)   reflect    62.1±0.7ms 
               (100, 100, 100)   periodic    61.0±1ms  
               (100, 100, 100)   nearest    62.4±0.8ms 
               (100, 100, 100)     none     15.5±0.3ms 
                (50, 512, 512)   reflect     69.8±2ms  
                (50, 512, 512)   periodic    68.5±2ms  
                (50, 512, 512)   nearest     71.2±2ms  
                (50, 512, 512)     none      21.4±1ms  
              ================= ========== ============

[ 66.30%] ··· dataframe.MemoryDataFrame.time_boolean_indexing              126±6ms
[ 67.39%] ··· dataframe.MemoryDataFrame.time_count_values                 39.6±1ms
[ 68.48%] ··· dataframe.MemoryDataFrame.time_groupby                       203±8ms
[ 69.57%] ··· dataframe.MemoryDataFrame.time_reduction                    137±20ms
[ 70.65%] ··· dataframe.MemoryDataFrame.time_scalar_comparison          12.5±0.7ms
[ 71.74%] ··· dataframe.MemoryDataFrame.time_set_index                    324±10ms
[ 72.83%] ··· optimization.Cull.time_cull                                  191±1ms
[ 73.91%] ··· optimization.Fuse.time_fuse                                       ok
[ 73.91%] ··· ========= =========
                param1           
              --------- ---------
               diamond   381±6ms 
                linear   175±3ms 
              ========= =========

[ 75.00%] ··· optimization.Inline.time_inline_constants                 43.2±0.5ms
[ 76.09%] ··· optimization.Inline.time_inline_functions                    259±4ms
[ 77.17%] ··· optimization.Inline.time_inline_keys                         168±4ms
[ 78.26%] ··· order.OrderCholesky.time_order_cholesky                      496±4ms
[ 79.35%] ··· order.OrderCholesky.time_order_cholesky_lower                474±6ms
[ 80.43%] ··· ...r.OrderCholeskyMixed.time_order_cholesky_mixed            384±7ms
[ 81.52%] ··· ...rCholeskyMixed.time_order_cholesky_mixed_lower           354±10ms
[ 82.61%] ··· order.OrderFullLayers.time_order_full_layers                      ok
[ 82.61%] ··· ============ =========
                 param1             
              ------------ ---------
               (1, 50000)   513±6ms 
               (2, 10000)   383±3ms 
               (10, 1000)   372±3ms 
               (100, 20)    416±3ms 
                (500, 2)    571±6ms 
               (9999, 1)    115±2ms 
               (50000, 1)   397±2ms 
              ============ =========

[ 83.70%] ··· order.OrderLinalgSolves.time_order_linalg_solve              340±6ms
[ 84.78%] ··· order.OrderLinearFull.time_order_linear_full              1.48±0.01s
[ 85.87%] ··· ...rLinearWithDanglers.time_order_linear_danglers                 ok
[ 85.87%] ··· ============ =========
                 param1             
              ------------ ---------
               (2, 10000)   191±2ms 
               (5, 5000)    265±2ms 
              ============ =========

[ 86.96%] ··· ...r.OrderManySubgraphs.time_order_many_subgraphs                 ok
[ 86.96%] ··· =========== =========
                 param1            
              ----------- ---------
               (1, 9999)   233±4ms 
               (3, 3333)   239±6ms 
               (10, 999)   258±4ms 
               (30, 303)   255±1ms 
               (100, 99)   277±3ms 
               (999, 10)   295±6ms 
              =========== =========

[ 88.04%] ··· order.OrderMapOverlap.time_order_mapoverlap                       ok
[ 88.04%] ··· =========================================== =========
                                 param1                            
              ------------------------------------------- ---------
                  ((10000.0, 10000.0), (200, 200), 1)      666±5ms 
               ((1000, 1000, 1000), (100, 100, 100), 10)   707±5ms 
              =========================================== =========

[ 89.13%] ··· ...rRechunkTranspose.time_order_rechunk_transpose           797±10ms
[ 90.22%] ··· order.OrderSVD.time_order_svd                             83.8±0.2ms
[ 91.30%] ··· tokenize.TokenizeBuiltins.time_tokenize                      197±1ms
[ 92.39%] ··· tokenize.TokenizeNumpy.time_tokenize                              ok
[ 92.39%] ··· ======== =============
               dtype                
              -------- -------------
                int       606±3μs   
               float      602±10μs  
                str     2.43±0.05ms 
               bytes      447±9μs   
               object     91.8±2ms  
              ======== =============

[ 93.48%] ··· tokenize.TokenizePandas.time_tokenize                             ok
[ 93.48%] ··· ===================== ============= =============
              --                             as_series         
              --------------------- ---------------------------
                      dtype              True         False    
              ===================== ============= =============
                      period           62.7±1μs     29.8±0.7μs 
                  datetime64[ns]       59.9±2μs     36.6±0.5μs 
               datetime64[ns, CET]    62.0±0.9μs    32.2±0.1μs 
                       int           64.9±0.08μs     24.3±1μs  
                     category          202±2μs      48.5±0.3μs 
                      sparse          68.6±0.2μs   30.3±0.05μs 
                       Int           1.23±0.01ms   1.19±0.01ms 
                      string           465±4μs       411±6μs   
                     boolean           871±10μs      820±1μs   
              ===================== ============= =============

[ 94.57%] ··· io.CSV.time_read_csv                                              ok
[ 94.57%] ··· ================= ============
                    param1                  
              ----------------- ------------
               single-threaded    279±4ms   
                  processes      1.96±0.02s 
                   threads        306±2ms   
              ================= ============

[ 95.65%] ··· io.CSV.time_read_csv_meta                                         ok
[ 95.65%] ··· ================= ============
                    param1                  
              ----------------- ------------
               single-threaded   28.3±0.1ms 
                  processes      29.2±0.4ms 
                   threads       28.7±0.3ms 
              ================= ============

[ 96.74%] ··· io.HDF5.time_read_hdf5                                            ok
[ 96.74%] ··· ================= =========
                    param1               
              ----------------- ---------
               single-threaded   393±3ms 
                  processes        n/a   
                   threads       409±9ms 
              ================= =========

[ 97.83%] ··· io.HDF5.time_read_hdf5_meta                                       ok
[ 97.83%] ··· ================= ===========
                    param1                 
              ----------------- -----------
               single-threaded   258±0.9ms 
                  processes         n/a    
                   threads       259±0.6ms 
              ================= ===========

[ 98.91%] ··· io.Parquet.time_optimize_getitem                          37.9±0.9ms
[100.00%] ··· io.Parquet.time_read_getitem_projection                     693±20ms

This PR

root@dt07:/cudf/dask-benchmarks/dask# asv run
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz.
·· Building 14708abf for conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz...
·· Installing 14708abf into conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz..
· Running 46 total benchmarks (1 commits * 1 environments * 46 benchmarks)
[  0.00%] · For dask commit 14708abf:
[  0.00%] ·· Benchmarking conda-py3.7-cloudpickle-distributed-fsspec-numpy-pandas-partd-pip+cityhash-pip+xxhash-pyarrow-pytables-s3fs-scipy-toolz
[  1.09%] ··· Running (array.BlockInfoBlockwise.time_compute--)........
[  9.78%] ··· Running (array.Slicing.time_slice_int_tail--)........
[ 18.48%] ··· Running (dataframe.MemoryDataFrame.time_groupby--)......
[ 25.00%] ··· Running (optimization.Inline.time_inline_constants--)......
[ 31.52%] ··· Running (order.OrderCholeskyMixed.time_order_cholesky_mixed_lower--)...
[ 34.78%] ··· Running (order.OrderLinearFull.time_order_linear_full--)....
[ 39.13%] ··· Running (order.OrderRechunkTranspose.time_order_rechunk_transpose--).....
[ 44.57%] ··· Setting up io.py:33                                               ok
[ 44.57%] ··· Running (io.CSV.time_read_csv--)..
[ 46.74%] ··· Setting up io.py:58                                               ok
[ 46.74%] ··· Running (io.HDF5.time_read_hdf5--)..
[ 48.91%] ··· Setting up io.py:85                                               ok
[ 48.91%] ··· Running (io.Parquet.time_optimize_getitem--)..
[ 51.09%] ··· array.BlockInfoBlockwise.time_compute                     3.62±0.02s
[ 52.17%] ··· array.BlockInfoBlockwise.time_optimize                    21.3±0.7ms
[ 53.26%] ··· array.BlockInfoSingleton.time_optimize_singleton             134±3μs
[ 54.35%] ··· array.Blockwise.time_make_blockwise_graph                 2.63±0.01s
[ 55.43%] ··· array.FancyIndexing.time_fancy                            27.7±0.8ms
[ 56.52%] ··· array.Rechunk.time_rechunk                                26.4±0.9ms
[ 57.61%] ··· array.Rechunk.time_rechunk_meta                           10.2±0.1ms
[ 58.70%] ··· array.Slicing.time_slice_int_head                         10.6±0.5ms
[ 59.78%] ··· array.Slicing.time_slice_int_tail                         10.5±0.6ms
[ 60.87%] ··· array.Slicing.time_slice_slice_head                         20.3±1ms
[ 61.96%] ··· array.Slicing.time_slice_slice_tail                       20.4±0.9ms
[ 63.04%] ··· array.Slicing.time_slices_from_chunks                     1.16±0.01s
[ 64.13%] ··· array.TestSubs.time_subs                                  43.6±0.4ms
[ 65.22%] ··· array_overlap.MapOverlap.time_map_overlap                         ok
[ 65.22%] ··· ================= ========== ============
                    shape        boundary              
              ----------------- ---------- ------------
               (100, 100, 100)   reflect    61.5±0.7ms 
               (100, 100, 100)   periodic   62.2±0.6ms 
               (100, 100, 100)   nearest    62.7±0.6ms 
               (100, 100, 100)     none     15.7±0.3ms 
                (50, 512, 512)   reflect     69.3±2ms  
                (50, 512, 512)   periodic    69.9±2ms  
                (50, 512, 512)   nearest     70.6±1ms  
                (50, 512, 512)     none      23.1±1ms  
              ================= ========== ============

[ 66.30%] ··· dataframe.MemoryDataFrame.time_boolean_indexing              132±9ms
[ 67.39%] ··· dataframe.MemoryDataFrame.time_count_values               41.0±0.8ms
[ 68.48%] ··· dataframe.MemoryDataFrame.time_groupby                       201±9ms
[ 69.57%] ··· dataframe.MemoryDataFrame.time_reduction                     133±4ms
[ 70.65%] ··· dataframe.MemoryDataFrame.time_scalar_comparison          12.4±0.6ms
[ 71.74%] ··· dataframe.MemoryDataFrame.time_set_index                     326±6ms
[ 72.83%] ··· optimization.Cull.time_cull                                  172±1ms
[ 73.91%] ··· optimization.Fuse.time_fuse                                       ok
[ 73.91%] ··· ========= =========
                param1           
              --------- ---------
               diamond   372±2ms 
                linear   175±2ms 
              ========= =========

[ 75.00%] ··· optimization.Inline.time_inline_constants                 42.7±0.6ms
[ 76.09%] ··· optimization.Inline.time_inline_functions                    265±5ms
[ 77.17%] ··· optimization.Inline.time_inline_keys                         171±3ms
[ 78.26%] ··· order.OrderCholesky.time_order_cholesky                      488±9ms
[ 79.35%] ··· order.OrderCholesky.time_order_cholesky_lower                466±3ms
[ 80.43%] ··· ...r.OrderCholeskyMixed.time_order_cholesky_mixed            376±7ms
[ 81.52%] ··· ...rCholeskyMixed.time_order_cholesky_mixed_lower            337±3ms
[ 82.61%] ··· order.OrderFullLayers.time_order_full_layers                      ok
[ 82.61%] ··· ============ =========
                 param1             
              ------------ ---------
               (1, 50000)   506±8ms 
               (2, 10000)   365±6ms 
               (10, 1000)   369±2ms 
               (100, 20)    419±3ms 
                (500, 2)    575±6ms 
               (9999, 1)    119±2ms 
               (50000, 1)   400±4ms 
              ============ =========

[ 83.70%] ··· order.OrderLinalgSolves.time_order_linalg_solve              334±2ms
[ 84.78%] ··· order.OrderLinearFull.time_order_linear_full              1.44±0.01s
[ 85.87%] ··· ...rLinearWithDanglers.time_order_linear_danglers                 ok
[ 85.87%] ··· ============ ===========
                 param1               
              ------------ -----------
               (2, 10000)    189±2ms  
               (5, 5000)    268±0.7ms 
              ============ ===========

[ 86.96%] ··· ...r.OrderManySubgraphs.time_order_many_subgraphs                 ok
[ 86.96%] ··· =========== =========
                 param1            
              ----------- ---------
               (1, 9999)   228±2ms 
               (3, 3333)   240±3ms 
               (10, 999)   257±7ms 
               (30, 303)   252±6ms 
               (100, 99)   275±3ms 
               (999, 10)   284±2ms 
              =========== =========

[ 88.04%] ··· order.OrderMapOverlap.time_order_mapoverlap                       ok
[ 88.04%] ··· =========================================== =========
                                 param1                            
              ------------------------------------------- ---------
                  ((10000.0, 10000.0), (200, 200), 1)      650±7ms 
               ((1000, 1000, 1000), (100, 100, 100), 10)   696±5ms 
              =========================================== =========

[ 89.13%] ··· ...rRechunkTranspose.time_order_rechunk_transpose           801±10ms
[ 90.22%] ··· order.OrderSVD.time_order_svd                             84.0±0.5ms
[ 91.30%] ··· tokenize.TokenizeBuiltins.time_tokenize                    192±0.9ms
[ 92.39%] ··· tokenize.TokenizeNumpy.time_tokenize                              ok
[ 92.39%] ··· ======== =============
               dtype                
              -------- -------------
                int       589±7μs
               float      580±6μs   
                str     2.39±0.05ms 
               bytes      429±8μs   
               object     89.8±2ms
              ======== =============

[ 93.48%] ··· tokenize.TokenizePandas.time_tokenize                             ok
[ 93.48%] ··· ===================== ============ =============
              --                            as_series         
              --------------------- --------------------------
                      dtype             True         False    
              ===================== ============ =============
                      period          62.1±1μs    30.4±0.02μs 
                  datetime64[ns]      59.6±2μs     36.8±0.9μs 
               datetime64[ns, CET]    64.4±2μs      32.8±2μs  
                       int           63.2±0.1μs   24.5±0.03μs 
                     category         209±7μs       48.6±2μs  
                      sparse         69.0±0.2μs    30.3±0.1μs 
                       Int            1.23±0ms    1.20±0.02ms 
                      string          463±6μs       409±7μs   
                     boolean          862±1μs       826±8μs   
              ===================== ============ =============

[ 94.57%] ··· io.CSV.time_read_csv                                              ok
[ 94.57%] ··· ================= ============
                    param1                  
              ----------------- ------------
               single-threaded    280±1ms   
                  processes      2.00±0.01s 
                   threads        308±2ms   
              ================= ============

[ 95.65%] ··· io.CSV.time_read_csv_meta                                         ok
[ 95.65%] ··· ================= =============
                    param1                   
              ----------------- -------------
               single-threaded    29.0±0.1ms 
                  processes       29.1±0.8ms 
                   threads       28.3±0.09ms 
              ================= =============

[ 96.74%] ··· io.HDF5.time_read_hdf5                                            ok
[ 96.74%] ··· ================= =========
                    param1               
              ----------------- ---------
               single-threaded   396±2ms 
                  processes        n/a   
                   threads       409±4ms 
              ================= =========

[ 97.83%] ··· io.HDF5.time_read_hdf5_meta                                       ok
[ 97.83%] ··· ================= =========
                    param1               
              ----------------- ---------
               single-threaded   263±4ms 
                  processes        n/a   
                   threads       265±5ms 
              ================= =========

[ 98.91%] ··· io.Parquet.time_optimize_getitem                          37.7±0.3ms
[100.00%] ··· io.Parquet.time_read_getitem_projection                     691±20ms

Except for dataframe.MemoryDataFrame.time_boolean_indexing rest all numbers look improved or reasonable to me.

kkraus14 · 2020-05-06T00:54:02Z

@galipremsagar is there any chance you could organize the above into a table? I'm struggling to read the ASV output above so I imagine others will as well :).

galipremsagar · 2020-05-06T01:31:24Z

Apologies for not organizing it in a table in first-place.

Test	Master	This PR
array.BlockInfoBlockwise.time_compute	3.67±0.02s	3.62±0.02s
array.BlockInfoBlockwise.time_optimize	21.3±0.2ms	21.3±0.7ms
array.BlockInfoSingleton.time_optimize_singleton	140±4μs	134±3μs
array.Blockwise.time_make_blockwise_graph	2.70±0.02s	2.63±0.01s
array.FancyIndexing.time_fancy	28.8±0.2ms	27.7±0.8ms
array.Rechunk.time_rechunk	25.0±0.6ms	26.4±0.9ms
array.Rechunk.time_rechunk_meta	10.2±0.03ms	10.2±0.1ms
array.Slicing.time_slice_int_head	8.85±0.4ms	10.6±0.5ms
array.Slicing.time_slice_int_tail	10.6±0.6ms	10.5±0.6ms
array.Slicing.time_slice_slice_head	20.7±1ms	20.3±1ms
array.Slicing.time_slice_slice_tail	21.0±1ms	20.4±0.9ms
array.Slicing.time_slices_from_chunks	1.19±0.01s	1.16±0.01s
array.TestSubs.time_subs	43.8±1ms	43.6±0.4ms
dataframe.MemoryDataFrame.time_boolean_indexing	126±6ms	132±9ms
dataframe.MemoryDataFrame.time_count_values	39.6±1ms	41.0±0.8ms
dataframe.MemoryDataFrame.time_groupby	203±8ms	201±9ms
dataframe.MemoryDataFrame.time_reduction	137±20ms	133±4ms
dataframe.MemoryDataFrame.time_scalar_comparison	12.5±0.7ms	12.4±0.6ms
dataframe.MemoryDataFrame.time_set_index	324±10ms	326±6ms
optimization.Cull.time_cull	191±1ms	172±1ms
optimization.Inline.time_inline_constants	43.2±0.5ms	42.7±0.6ms
optimization.Inline.time_inline_functions	259±4ms	265±5ms
optimization.Inline.time_inline_keys	168±4ms	171±3ms
order.OrderCholesky.time_order_cholesky	496±4ms	488±9ms
order.OrderCholesky.time_order_cholesky_lower	474±6ms	466±3ms
...r.OrderCholeskyMixed.time_order_cholesky_mixed	384±7ms	376±7ms
...rCholeskyMixed.time_order_cholesky_mixed_lower	354±10ms	337±3ms
order.OrderLinalgSolves.time_order_linalg_solve	340±6ms	334±2ms
order.OrderLinearFull.time_order_linear_full	1.48±0.01s	1.44±0.01s
...rRechunkTranspose.time_order_rechunk_transpose	797±10ms	801±10ms
order.OrderSVD.time_order_svd	83.8±0.2ms	84.0±0.5ms
tokenize.TokenizeBuiltins.time_tokenize	197±1ms	192±0.9ms
io.Parquet.time_optimize_getitem	37.9±0.9ms	37.7±0.3ms
io.Parquet.time_read_getitem_projection	693±20ms	691±20ms

array_overlap.MapOverlap.time_map_overlap

shape	boundary	Master	This PR
(100, 100, 100)	reflect	62.1±0.7ms	61.5±0.7ms
(100, 100, 100)	periodic	61.0±1ms	62.2±0.6ms
(100, 100, 100)	nearest	62.4±0.8ms	62.7±0.6ms
(100, 100, 100)	none	15.5±0.3ms	15.7±0.3ms
(50, 512, 512)	reflect	69.8±2ms	69.3±2ms
(50, 512, 512)	periodic	68.5±2ms	69.9±2ms
(50, 512, 512)	nearest	71.2±2ms	70.6±1ms
(50, 512, 512)	none	21.4±1ms	23.1±1ms

optimization.Fuse.time_fuse

param1	Master	This PR
diamond	381±6ms	372±2ms
linear	175±3ms	175±2ms

order.OrderFullLayers.time_order_full_layers

param1	Master	This PR
(1, 50000)	513±6ms	506±8ms
(2, 10000)	383±3ms	365±6ms
(10, 1000)	372±3ms	369±2ms
(100, 20)	416±3ms	419±3ms
(500, 2)	571±6ms	575±6ms
(9999, 1)	115±2ms	119±2ms
(50000, 1)	397±2ms	400±4ms

...rLinearWithDanglers.time_order_linear_danglers

param1	Master	This PR
(2, 10000)	191±2ms	189±2ms
(5, 5000)	265±2ms	268±0.7ms

...r.OrderManySubgraphs.time_order_many_subgraphs

param1	Master	This PR
(1, 9999)	233±4ms	228±2ms
(3, 3333)	239±6ms	240±3ms
(10, 999)	258±4ms	257±7ms
(30, 303)	255±1ms	252±6ms
(100, 99)	277±3ms	275±3ms
(999, 10)	295±6ms	284±2ms

order.OrderMapOverlap.time_order_mapoverlap

param1	Master	This PR
((10000.0, 10000.0), (200, 200), 1)	666±5ms	650±7ms
((1000, 1000, 1000), (100, 100, 100), 10)	707±5ms	696±5ms

tokenize.TokenizeNumpy.time_tokenize

dtype	Master	This PR
int	606±3μs	589±7μs
float	602±10μs	580±6μs
str	2.43±0.05ms	2.39±0.05ms
bytes	447±9μs	429±8μs
object	91.8±2ms	89.8±2ms

tokenize.TokenizePandas.time_tokenize

dtype	as_series=True(Master)	as_series=True(This PR )	as_series=False(Master)	as_series=False(This PR)
period	62.7±1μs	62.1±1μs	29.8±0.7μs	30.4±0.02μs
datetime64[ns]	59.9±2μs	59.6±2μs	36.6±0.5μs	36.8±0.9μs
datetime64[ns, CET]	62.0±0.9μs	64.4±2μs	32.2±0.1μs	32.8±2μs
int	64.9±0.08μs	63.2±0.1μs	24.3±1μs	24.5±0.03μs
category	202±2μs	209±7μs	48.5±0.3μs	48.6±2μs
sparse	68.6±0.2μs	69.0±0.2μs	30.3±0.05μs	30.3±0.1μs
Int	1.23±0.01ms	1.23±0ms	1.19±0.01ms	1.20±0.02ms
string	465±4μs	463±6μs	411±6μs	409±7μs
boolean	871±10μs	862±1μs	820±1μs	826±8μs

io.CSV.time_read_csv

param1	Master	This PR
single-threaded	279±4ms	280±1ms
processes	1.96±0.02s	2.00±0.01s
threads	306±2ms	308±2ms

io.CSV.time_read_csv_meta

param1	Master	This PR
single-threaded	28.3±0.1ms	29.0±0.1ms
processes	29.2±0.4ms	29.1±0.8ms
threads	28.7±0.3ms	28.3±0.09ms

io.HDF5.time_read_hdf5

param1	Master	This PR
single-threaded	393±3ms	396±2ms
processes	n/a	n/a
threads	409±9ms	409±4ms

io.HDF5.time_read_hdf5_meta

param1	Master	This PR
single-threaded	258±0.9ms	263±4ms
processes	n/a	n/a
threads	259±0.6ms	265±5ms

kkraus14 · 2020-05-06T01:53:51Z

Thanks @galipremsagar!

quasiben · 2020-05-06T19:07:57Z

When I scanned through this only 3 items stood out and even then they are fairly close:

Test	Master	This PR
optimization.Cull.time_cull	191±1ms	172±1ms
map_overlap	21.4±1ms	23.1±1ms
order.OrderFullLayers.time_order_full_layers (2, 10000)	383±3ms	365±6ms

If someone else has time it would be go to recheck if there are any other outliers in the ASV benchmarks.

TomAugspurger · 2020-05-06T20:14:26Z

Just FYI, our ASV's likely aren't sensitive enough to pick up the gains here. For context, in the pandas benchmarks I watch for regressions I ignore any differences smaller than ~30 - 40%.

We might be able to construct a micro-benchmark that targets just graph construction improvements here and have a reasonable chance of seeing these improvements.

quasiben · 2020-05-07T20:16:39Z

Thanks @TomAugspurger I think for this PR we are good to merge in. However, we'd like to pick up adding a mirco-benchmark for graph construction if that's ok with you?

TomAugspurger · 2020-05-07T20:19:46Z

Yeah I think so (haven't looked closely at the latest changes, but what I saw earlier looked fine).

…

On Thu, May 7, 2020 at 3:16 PM Benjamin Zaitlen ***@***.***> wrote: Thanks @TomAugspurger <https://github.com/TomAugspurger> I think for this PR we are good to merge in. However, we'd like to pick up adding a mirco-benchmark for graph construction if that's ok with you? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6137 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIXO6KUSFZQAWKNTNL3RQMJLNANCNFSM4MQ2ZTIA> .

quasiben · 2020-05-07T20:21:53Z

Thanks. I'll plan to merge tomorrow morning unless anyone else has concerns

quasiben · 2020-05-08T17:52:24Z

Thanks again all. Merging in

remove redundant string concatenations inside for loops

ccd6286

galipremsagar changed the title ~~[REVIEW] Remove Redundant string concatenations in dask code-base~~ [REVIEW][Perf] Remove Redundant string concatenations in dask code-base Apr 25, 2020

rjzamora reviewed Apr 27, 2020

View reviewed changes

galipremsagar added 3 commits April 27, 2020 10:23

Merge remote-tracking branch 'upstream/master' into perf

23fc0e8

rename to _name instead of _token

bc09bf7

fuse for loops in rearrange_by_column_tasks

735534b

galipremsagar force-pushed the perf branch from 1c799c0 to 735534b Compare April 29, 2020 01:21

fuse for loops

a24cf42

rjzamora requested changes Apr 29, 2020

View reviewed changes

galipremsagar added 3 commits April 29, 2020 08:46

Merge remote-tracking branch 'upstream/master' into perf

ed9212c

change call to new_dd_object

5430c0b

empty commit

d6d8342

rjzamora mentioned this pull request Apr 30, 2020

[REVIEW] Generalize rearrange_by_column_tasks and add DataFrame.shuffle #6066

Merged

3 tasks

fix dict initializing

fb8e990

root and others added 2 commits May 1, 2020 00:07

Merge remote-tracking branch 'upstream/master' into perf

c352d69

# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

add minor change to avoid tuple duplication

e67cf8c

rjzamora approved these changes May 5, 2020

View reviewed changes

move attr check in non None block

fe2d5be

kkraus14 reviewed May 5, 2020

View reviewed changes

dask/array/core.py Outdated Show resolved Hide resolved

dask/array/core.py Outdated Show resolved Hide resolved

dask/array/creation.py Outdated Show resolved Hide resolved

dask/array/creation.py Show resolved Hide resolved

galipremsagar and others added 6 commits May 5, 2020 15:09

Merge remote-tracking branch 'upstream/master' into perf

1a048ce

Update dask/array/core.py

6ad7e64

Co-authored-by: Keith Kraus <keith.j.kraus@gmail.com>

Update dask/array/core.py

cd630e5

Co-authored-by: Keith Kraus <keith.j.kraus@gmail.com>

Update dask/array/creation.py

e8db6f9

Co-authored-by: Keith Kraus <keith.j.kraus@gmail.com>

Merge branch 'perf' of https://github.com/galipremsagar/dask into perf

89ad031

add generator for aggdims

14708ab

Trigger CI

2d71b1c

trigger ci

f577afd

quasiben mentioned this pull request May 7, 2020

[Discussion] Client/Scheduler Performance dask/distributed#3783

Closed

13 tasks

quasiben mentioned this pull request May 8, 2020

Benchmark for graph construction dask/dask-benchmarks#41

Open

quasiben merged commit bdb7e90 into dask:master May 8, 2020

gjoseph92 mentioned this pull request May 21, 2020

Cache doesn't cull unreachable tasks after replacement #6228

Open

clarkzinzow mentioned this pull request Jul 9, 2020

Custom optimization functions are called twice, once without scheduler kwargs. #6381

Closed

Uh oh!

Conversation

galipremsagar commented Apr 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quasiben commented Apr 27, 2020

Uh oh!

mrocklin commented Apr 27, 2020 via email

Uh oh!

rjzamora Apr 27, 2020

Choose a reason for hiding this comment

Uh oh!

rjzamora commented Apr 27, 2020

Uh oh!

mrocklin commented Apr 27, 2020

Uh oh!

galipremsagar commented Apr 27, 2020

Uh oh!

mrocklin commented Apr 27, 2020 via email

Uh oh!

rjzamora commented Apr 27, 2020

Uh oh!

quasiben commented Apr 29, 2020

Uh oh!

galipremsagar commented Apr 29, 2020

Uh oh!

galipremsagar commented Apr 29, 2020

Uh oh!

rjzamora left a comment

Choose a reason for hiding this comment

Uh oh!

rjzamora Apr 29, 2020

Choose a reason for hiding this comment

Uh oh!

mrocklin Apr 29, 2020

Choose a reason for hiding this comment

Uh oh!

galipremsagar Apr 29, 2020

Choose a reason for hiding this comment

Uh oh!

rjzamora Apr 29, 2020

Choose a reason for hiding this comment

Uh oh!

quasiben commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

galipremsagar commented Apr 30, 2020

Uh oh!

quasiben commented Apr 30, 2020

Uh oh!

galipremsagar commented Apr 30, 2020

Uh oh!

rjzamora commented May 5, 2020

Uh oh!

galipremsagar commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quasiben commented May 5, 2020

Uh oh!

TomAugspurger commented May 5, 2020 via email

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

galipremsagar commented May 6, 2020

Uh oh!

kkraus14 commented May 6, 2020

Uh oh!

galipremsagar commented May 6, 2020

Uh oh!

kkraus14 commented May 6, 2020

Uh oh!

quasiben commented May 6, 2020

Uh oh!

TomAugspurger commented May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quasiben commented May 7, 2020

Uh oh!

galipremsagar commented Apr 25, 2020 •

edited

Loading

quasiben commented Apr 30, 2020 •

edited

Loading

galipremsagar commented May 5, 2020 •

edited

Loading

TomAugspurger commented May 6, 2020 •

edited

Loading