[Data] Ray Data currently lacks support for displaying execution plans and does not provide operator-level metrics within those plans

### What happened + What you expected to happen

Ray Data currently lacks support for displaying execution plans and does not provide operator-level metrics within those plans.

Specifically:
1. No Standard API for Execution Plans: Ray Data does not offer a standard API (e.g., an explain() method) to visualize the execution plan, encompassing both the logical and physical plans.

2. Limited Plan Visibility: While print(ds) can display aspects of the logical plan, it does not reveal the physical execution plan.

3. Insufficient Operator-Level Metrics: The execution plan view lacks detailed metrics for individual operators.  This makes it difficult to assess critical performance aspects such as:
The volume of data processed by each operator.
Whether predicate pushdown optimizations have been successfully applied.

4. Coarse-Grained Dataset Statistics: The metrics provided by the Dataset.stats() API are not sufficiently granular.  They are reported at the block level and lack comprehensive, global statistics like total input/output row counts.
```
Operator 1 ReadLance->Filter(NoneType): 200 tasks executed, 200 blocks produced in 29967.49s
* Remote wall time: 92.49ms min, 2.92s max, 1.22s mean, 243.21s total
* Remote cpu time: 7.36ms min, 418.41ms max, 208.63ms mean, 41.73s total
* UDF time: 305.39us min, 1.28s max, 150.0ms mean, 30.0s total
* Peak heap memory usage (MiB): 407.04 min, 683.52 max, 527 mean
* Output num rows per block: 0 min, 1 max, 0 mean, 1 total
* Output size bytes per block: 0 min, 18327 max, 91 mean, 18327 total
* Output rows per task: 0 min, 1 max, 0 mean, 200 tasks used
* Tasks per node: 14 min, 25 max, 20 mean; 10 nodes used
* Operator throughput:
        * Ray Data throughput: 3.336949276338555e-05 rows/s
        * Estimated single node throughput: 0.00411168768564564 rows/s

Operator 2 limit=20: 200 tasks executed, 200 blocks produced in 29967.49s
* Remote wall time: 92.49ms min, 2.92s max, 1.22s mean, 243.21s total
* Remote cpu time: 7.36ms min, 418.41ms max, 208.63ms mean, 41.73s total
* UDF time: 305.39us min, 1.28s max, 150.0ms mean, 30.0s total
* Peak heap memory usage (MiB): 407.04 min, 683.52 max, 527 mean
* Output num rows per block: 0 min, 1 max, 0 mean, 1 total
* Output size bytes per block: 0 min, 18327 max, 91 mean, 18327 total
* Output rows per task: 0 min, 1 max, 0 mean, 200 tasks used
* Tasks per node: 14 min, 25 max, 20 mean; 10 nodes used
* Operator throughput:
        * Ray Data throughput: 3.336949276338555e-05 rows/s
        * Estimated single node throughput: 0.00411168768564564 rows/s

Dataset iterator time breakdown:
* Total time overall: 7.87s
    * Total time in Ray Data iterator initialization code: 713.05ms
    * Total time user thread is blocked by Ray Data iter_batches: 7.15s
    * Total execution time for user thread: 1.65ms
* Batch iteration time breakdown (summed across prefetch threads):
    * In ray.get(): 150.7us min, 6.41ms max, 636.5us avg, 127.3ms total
    * In batch creation: 4.48us min, 4.48us max, 4.48us avg, 4.48us total
    * In batch formatting: 26.3us min, 26.3us max, 26.3us avg, 26.3us total

Dataset throughput:
        * Ray Data throughput: 3.336949276338555e-05 rows/s
        * Estimated single node throughput: 0.00205584384282282 rows/s
```


Is it because the detailed execution plan and execution metrics were not considered in the design of ray data, or is it just not supported for the time being?


### Versions / Dependencies

ray 2.40.0

### Reproduction script

None

### Issue Severity

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Ray Data currently lacks support for displaying execution plans and does not provide operator-level metrics within those plans #55052

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Data] Ray Data currently lacks support for displaying execution plans and does not provide operator-level metrics within those plans #55052

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions