Skip to content

[Bug]: Python's Storage API streaming exactly-once writes on DataflowV2 is broken with autosharding, and it's the only option available #28587

@ahmedabu98

Description

@ahmedabu98

What happened?

Java's Storage Write API streaming writes is broken on Dataflow runner V2 when autosharding is enabled. The Python wrapper uses the Java implementation and runs exclusively on Runner V2.

To provide a workaround and unblock users, we should enable setting a fixed number of shards.

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions