Skip to content
This repository was archived by the owner on Jan 12, 2026. It is now read-only.
Back to Milestones

Python Batch Runner: Beam Portability API Integration

Open
Overdue by 3 year(s)
Due by May 31, 2022
Last updated Oct 26, 2022

This milestone tracks all work required to run batch pipelines on Ray through integration with the Beam portability API: https://beam.apache.org/roadmap/portability/.

The Beam Portability API provides several advantages vs. the current BundleBasedDirectRunner implementation, including:

  1. Improved cross-language portability posture due to the use of language-agnostic constructs between the SDK and runner.
  2. Improved performance via built-in pipeline execution graph optimizers.
  3. Reduced maintainability burden via more future proof APIs and reusable components like execution graph schedulers and state managers.

Successful completion of this milestone will allow users to:

  1. Define and run any valid Python Beam batch pipeline either locally or on a remote cluster using Ray.
  2. Integrate Beam batch pipelines with existing machine learning and data science applications on Ray (e.g. via TFX, Beam Pandas DataFrames, etc.).
50% complete

List view

    There are no open issues in this milestone

    Add issues to milestones to help organize your work for a particular release or project. Find and add issues with no milestones in this repo.