fix schema inference inside parameterized types by reuvenlax · Pull Request #32705 · apache/beam

reuvenlax · 2024-10-08T22:35:53Z

Previously Beam prioritized schemas over coders in inference, but did not inspect nested parameterized types for schemas. This led to some sharp edges for users - e.g. if Foo had a registered schema.

PCollection = readFoo();

Would infer the correct SchemaCoder for Foo. However

PCollection<Iterable> = readAllFoos();

Would not search for a schema, and instead take whatever Coder accepted Foo (possibly SerializableCoder). This led to a lot of confusion for users.

This PR ensures that the schema lookup continues while inspecting type parameters.

Note: this PR touches many files due to a new parameter added to CoderRegistry(), however the vast majority of those changes are trivial.

github-actions · 2024-10-09T00:50:05Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

reuvenlax · 2024-10-09T04:54:34Z

R: @ahmedabu98

github-actions · 2024-10-09T04:55:46Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

ahmedabu98

Thanks, left one suggestion

ahmedabu98 · 2024-10-09T08:23:44Z

sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java

+  public static CoderRegistry createDefault(@Nullable SchemaRegistry schemaRegistry) {
+    return new CoderRegistry(schemaRegistry);
  }


This might be a breaking change for users

Can we have the old createDefault() method as well and have it return new CoderRegistry(null)?

Would maintain existing use cases and limit the number of files in this PR

Good point as this is a public method (even though it's probably not intended for use outside of core Beam). Added the old createDefault() back.

…ized types" This reverts commit c243491.

…ized types" (#33133) This reverts commit c243491.

…ized types" (#33133) (#33147) This reverts commit c243491. Co-authored-by: Yi Hu <yathu@google.com>

…ed types

github-actions bot added the java label Oct 8, 2024

github-actions bot added io gcp labels Oct 9, 2024

fix nested schema inference

bc6fa83

reuvenlax force-pushed the schema_inference_parameterized_types branch from 2cff426 to bc6fa83 Compare October 9, 2024 02:26

fix compilation errors

0f7a73a

github-actions bot added extensions runners sql dataflow hadoop hbase kafka core flink labels Oct 9, 2024

fix compilation errors

dd79bae

ahmedabu98 reviewed Oct 9, 2024

View reviewed changes

add backwards compatible method

c7cec89

github-actions bot removed io extensions runners sql dataflow gcp hadoop hbase kafka labels Oct 9, 2024

github-actions bot removed core flink labels Oct 9, 2024

reuvenlax changed the title ~~fix nested schema inference~~ fix schema inference inside parameterized types Oct 9, 2024

reuvenlax merged commit c243491 into apache:master Oct 9, 2024

Abacn mentioned this pull request Nov 15, 2024

Validate Beam 2.61.0rc1 GoogleCloudPlatform/DataflowTemplates#2014

Closed

Abacn added a commit that referenced this pull request Nov 15, 2024

Revert "Merge pull request #32705: fix schema inference for parameter…

1ed5273

…ized types" This reverts commit c243491.

Abacn mentioned this pull request Nov 15, 2024

Revert "fix schema inference inside parameterized types" #33133

Merged

damccorm pushed a commit that referenced this pull request Nov 18, 2024

Revert "Merge pull request #32705: fix schema inference for parameter…

8da9bbe

…ized types" (#33133) This reverts commit c243491.

damccorm pushed a commit that referenced this pull request Nov 18, 2024

Revert "Merge pull request #32705: fix schema inference for parameter…

fd35ca6

…ized types" (#33133) This reverts commit c243491.

damccorm added a commit that referenced this pull request Nov 18, 2024

Revert "Merge pull request #32705: fix schema inference for parameter…

d4c6af2

…ized types" (#33133) (#33147) This reverts commit c243491. Co-authored-by: Yi Hu <yathu@google.com>

reeba212 pushed a commit to reeba212/beam that referenced this pull request Dec 4, 2024

Merge pull request apache#32705: fix schema inference for parameteriz…

7466316

…ed types

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix schema inference inside parameterized types#32705

fix schema inference inside parameterized types#32705
reuvenlax merged 4 commits intoapache:masterfrom
reuvenlax:schema_inference_parameterized_types

reuvenlax commented Oct 8, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2024

Uh oh!

reuvenlax commented Oct 9, 2024

Uh oh!

github-actions bot commented Oct 9, 2024

Uh oh!

ahmedabu98 left a comment

Uh oh!

ahmedabu98 Oct 9, 2024

Uh oh!

reuvenlax Oct 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reuvenlax commented Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 9, 2024

Uh oh!

reuvenlax commented Oct 9, 2024

Uh oh!

github-actions bot commented Oct 9, 2024

Uh oh!

ahmedabu98 left a comment

Choose a reason for hiding this comment

Uh oh!

ahmedabu98 Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

reuvenlax Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

reuvenlax commented Oct 8, 2024 •

edited

Loading