Skip to content

Perf: Dataframe with_column and with_column_renamed are slow #14563

@Omega359

Description

@Omega359

Describe the bug

Dataframe functions .with_column and .with_column_renamed (and possibly others) are slow. One can really see this in dataframe's with many many columns where a .with_column call can take seconds

Related: #7698

     Running benches/dataframe.rs (target/release/deps/dataframe-daaaeac4dbc7597d)
Gnuplot not found, using plotters backend
with_column_10          time:   [6.2305 ms 6.2916 ms 6.4057 ms]
                        change: [+15.699% +18.873% +21.723%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benchmarking with_column_100: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 13.5s.
with_column_100         time:   [1.3307 s 1.3633 s 1.3996 s]
                        change: [+8.2536% +11.473% +14.825%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benchmarking with_column_200: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 235.2s.
with_column_200         time:   [23.350 s 25.011 s 27.300 s]
                        change: [-46.710% -42.652% -37.407%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe

To Reproduce

Just time the function calls. A PR for a benchmark will be coming soon.

Expected behavior

dataframe function calls should be fast, as fast as all other operations in DataFusion.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions