Prepared physical plan reusage

## Problem ## 
While implementing the saving of prepared statements in our storage based on the DataFusion, we encountered the following issue:

It is inefficient to save the logical plan and rebuild the physical plan from it at execution time.
The physical planner and optimizer are quite heavy, and for large queries with many columns, building the physical plan can take 100-200 ms.

After reading the optimizer's code, I realized that it includes a lot of complex logic, which is generally quite difficult to optimize. Moreover, the time required for plan construction is an uncontrollable variable, as the optimizations can be arbitrarily complex in order to produce a good plan.

The current API does not allow reusing physical plans for multiple executions due to the following reasons:

1. Metrics. Since metrics are stored within the plans, streams are forced to share a pointer to the same metrics during execution. If physical plans are reused, streams will compete for writing to the metrics, which leads to non-scalability.

2. Lack of physical placeholders. This makes it impossible to substitute parameters during the execute(...) stage.

## Possible solution ## 
To address these issues, I created experimental patches that:

1. Move metric storage out of Execution Plans into `TaskContext`. The set of metrics is associated with a node of the ExecutionPlan, for example, metrics can be associated with `ProjectionExec`.

2. Introduce physical placeholders implemented as a `PhysicalExpr` trait. These placeholders are resolved at the execute(...) stage, fetching parameters from the `TaskContext`.

The experimental patches can be found here:

https://github.com/tarantool/datafusion/commits/askalt/physical-placehdolders/

## Challenges related to physical placeholders ## 

For example, since the values of placeholders are not known during the optimizer passes, certain optimizations that depend on these values cannot be performed. In some cases, rebuilding the plan based on the parameters could be beneficial.

These issues can either be addressed by the user or resolved directly in DataFusion at a later stage.

## In total 

I’m creating this issue to discuss:

1. How do you view the problems mentioned above?
2. Could the experimental patches be useful for upstream (I’m ready to refine them if needed)?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prepared physical plan reusage #14342

Problem

Possible solution

Challenges related to physical placeholders

In total

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Prepared physical plan reusage #14342

Description

Problem

Possible solution

Challenges related to physical placeholders

In total

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions