Skip to content

[Task]: Optimize Spark Runner parDo transform evaluator #32537

@twosom

Description

@twosom

What needs to happen?

When evaluating ParDo operations in the TransformTranslator in Apache Spark Runner, too many filter operations are applied.
The reason for applying filter operations is that a ParDo can have multiple outputs, so we apply filter operations to filter only elements such as each TupleTag.

However, the filter operation is also applied to a ParDo with a single output, which can have a performance impact.
Therefore, we should avoid applying the filter operation when evaluating ParDo operations with a single output.

related mail context

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions