-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[Task]: Optimize Spark Runner parDo transform evaluator #32537
Copy link
Copy link
Closed
Description
What needs to happen?
When evaluating ParDo operations in the TransformTranslator in Apache Spark Runner, too many filter operations are applied.
The reason for applying filter operations is that a ParDo can have multiple outputs, so we apply filter operations to filter only elements such as each TupleTag.
However, the filter operation is also applied to a ParDo with a single output, which can have a performance impact.
Therefore, we should avoid applying the filter operation when evaluating ParDo operations with a single output.
Issue Priority
Priority: 2 (default / most normal work should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner
Reactions are currently unavailable