Skip to content

[BigQueryIO] fetch updated schema for newly created Storage API stream writers#33231

Merged
ahmedabu98 merged 8 commits intoapache:masterfrom
ahmedabu98:storapi_schemaupdate_dynamicdest
Jan 8, 2025
Merged

[BigQueryIO] fetch updated schema for newly created Storage API stream writers#33231
ahmedabu98 merged 8 commits intoapache:masterfrom
ahmedabu98:storapi_schemaupdate_dynamicdest

Conversation

@ahmedabu98
Copy link
Copy Markdown
Contributor

@ahmedabu98 ahmedabu98 commented Nov 26, 2024

Fixes #33238

This problem happens whenever a StreamWriter is created after the table's schema is updated. The new StreamWriter ignores the previously updated schema and ends up using the base schema, dropping the new columns.
Such a case can happen when a pipeline decides to increase its parallelism (e.g. autosharding of threads or autoscaling of workers) after a schema update happens. The increased parallelism creates new stream writers that continue using the base schema, ignoring the updated schema.

This PR fixes this by fetching the stream's schema whenever we create a new append client. Existing tests are modified to cover this case, and new tests are added for schema update with dynamic destinations.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Newly created Storage API write streams do not recognize the previously updated table schema

3 participants