Investigate `compute_as_if_collection` for performance issues

Over [here](https://github.com/coiled/coiled-runtime/issues/79) we identified a case where writing a dataframe `to_parquet` with `compute=True` resulted in *much* slower (~10x) write times compared to `compute=False`, then calling `compute()` on the resulting scalar.

Those two different code paths were more different than they needed to be, with one using `dask.base.compute_as_if_collection`, and the other using a `dd.Scalar` directly. In #8982 we consolidated those two code paths to just use `Scalar`, and this seemingly fixed the issue. However, it's still concerning that `compute_as_if_collection` had such poor performance: this is used in a number of places throughout the codebase.  It could be that there are some optimizations that are not surviving the process.

Opening this issue to track follow-up investigations to #8982.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate `compute_as_if_collection` for performance issues #8991

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Investigate compute_as_if_collection for performance issues #8991

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Investigate `compute_as_if_collection` for performance issues #8991