-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Dataframe functions .with_column and .with_column_renamed (and possibly others) are slow. One can really see this in dataframe's with many many columns where a .with_column call can take seconds
Related: #7698
Running benches/dataframe.rs (target/release/deps/dataframe-daaaeac4dbc7597d)
Gnuplot not found, using plotters backend
with_column_10 time: [6.2305 ms 6.2916 ms 6.4057 ms]
change: [+15.699% +18.873% +21.723%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking with_column_100: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 13.5s.
with_column_100 time: [1.3307 s 1.3633 s 1.3996 s]
change: [+8.2536% +11.473% +14.825%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking with_column_200: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 235.2s.
with_column_200 time: [23.350 s 25.011 s 27.300 s]
change: [-46.710% -42.652% -37.407%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
2 (20.00%) high severe
To Reproduce
Just time the function calls. A PR for a benchmark will be coming soon.
Expected behavior
dataframe function calls should be fast, as fast as all other operations in DataFusion.
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working