Skip to content

feat: Modularization: Schema overrides via string and file#1840

Merged
manitgupta merged 12 commits intoGoogleCloudPlatform:mainfrom
manitgupta:modular-overrides-v2
Oct 1, 2024
Merged

feat: Modularization: Schema overrides via string and file#1840
manitgupta merged 12 commits intoGoogleCloudPlatform:mainfrom
manitgupta:modular-overrides-v2

Conversation

@manitgupta
Copy link
Copy Markdown
Member

No description provided.

@manitgupta manitgupta changed the title Schema overrides via String and file feat: Modularization: Schema overrides via string and file Sep 5, 2024
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 5, 2024

Codecov Report

Attention: Patch coverage is 74.07407% with 35 lines in your changes missing coverage. Please review.

Project coverage is 50.22%. Comparing base (ab97ff6) to head (03ea26f).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...oud/teleport/v2/templates/DataStreamToSpanner.java 0.00% 14 Missing ⚠️
...migrations/schema/SchemaStringOverridesParser.java 83.56% 5 Missing and 7 partials ⚠️
...emplates/transform/ChangeEventTransformerDoFn.java 0.00% 2 Missing and 1 partial ⚠️
...r/migrations/schema/NoopSchemaOverridesParser.java 0.00% 3 Missing ⚠️
...r/migrations/schema/SchemaFileOverridesParser.java 86.36% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1840      +/-   ##
============================================
+ Coverage     43.98%   50.22%   +6.23%     
+ Complexity     3457     1182    -2275     
============================================
  Files           827      372     -455     
  Lines         49052    20146   -28906     
  Branches       5141     2035    -3106     
============================================
- Hits          21576    10118   -11458     
+ Misses        25829     9352   -16477     
+ Partials       1647      676     -971     
Components Coverage Δ
spanner-templates 63.57% <66.80%> (-0.02%) ⬇️
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 74.80% <66.80%> (-0.44%) ⬇️
spanner-live-reverse-replication 69.03% <83.13%> (+0.64%) ⬆️
spanner-bulk-migration 84.28% <83.13%> (-0.02%) ⬇️
Files with missing lines Coverage Δ
...ations/convertors/ChangeEventSessionConvertor.java 80.80% <100.00%> (+0.56%) ⬆️
.../spanner/migrations/schema/SchemaFileOverride.java 100.00% <100.00%> (ø)
...emplates/transform/ChangeEventTransformerDoFn.java 83.00% <0.00%> (-4.76%) ⬇️
...r/migrations/schema/NoopSchemaOverridesParser.java 0.00% <0.00%> (ø)
...r/migrations/schema/SchemaFileOverridesParser.java 86.36% <86.36%> (ø)
...migrations/schema/SchemaStringOverridesParser.java 83.56% <83.56%> (ø)
...oud/teleport/v2/templates/DataStreamToSpanner.java 4.95% <0.00%> (-0.90%) ⬇️

... and 480 files with indirect coverage changes

@manitgupta manitgupta marked this pull request as ready for review September 9, 2024 09:44
@manitgupta manitgupta requested a review from a team as a code owner September 9, 2024 09:44
@manitgupta
Copy link
Copy Markdown
Member Author

@bharadwaj-aditya
I took a 2nd look at making changes to the session file based mapping logic inside bulk and live migration template.

Code ref -

  1. https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/30370f0460b2a29d4266d0ef95f6ec1e06a20d9e/v2/sourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/templates/PipelineController.java#L64

  2. https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/30370f0460b2a29d4266d0ef95f6ec1e06a20d9e/v2/spanner-common/src/main/java/com/google/cloud/teleport/v2/spanner/migrations/convertors/ChangeEventSessionConvertor.java#L88

Observations -

  1. Post integration of SessionFileMapper in live migration template, it would only be used to call two utility methods getSpannerTableName and getSpannerColumnName instead of accessing them directly via the schema object, which live migration template does today. This is not much benefit.
  2. SessionFileMapper requires a Ddl object as input to construct it, which is not required at the place where overrides are implemented currently.
  3. A large part of the mapping logic for bulk migration is actually written inside the PipelineController, similar is the case for live migrations, where it is written in ChangeEventTransformerDoFn. In the discussion that we had on greater encapsulation of the mapping logic, it seems like we will need to refactor both bulk and live migration to achieve it.

I will have to dig a bit deeper into the bulk code, but it looks like actually some of the methods like getSpannerColumnName in SessionFileMapper are actually only used in UTs. It's not immediately clear to me why.

Screenshot 2024-09-25 at 3 03 40 PM

I'll think a bit more on how we can go about this, but we should treat it as a separate exercise IMO.

@bharadwaj-aditya
Copy link
Copy Markdown
Contributor

@bharadwaj-aditya I took a 2nd look at making changes to the session file based mapping logic inside bulk and live migration template.

Code ref -

  1. https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/30370f0460b2a29d4266d0ef95f6ec1e06a20d9e/v2/sourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/templates/PipelineController.java#L64
  2. https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/30370f0460b2a29d4266d0ef95f6ec1e06a20d9e/v2/spanner-common/src/main/java/com/google/cloud/teleport/v2/spanner/migrations/convertors/ChangeEventSessionConvertor.java#L88

Observations -

  1. Post integration of SessionFileMapper in live migration template, it would only be used to call two utility methods getSpannerTableName and getSpannerColumnName instead of accessing them directly via the schema object, which live migration template does today. This is not much benefit.
  2. SessionFileMapper requires a Ddl object as input to construct it, which is not required at the place where overrides are implemented currently.
  3. A large part of the mapping logic for bulk migration is actually written inside the PipelineController, similar is the case for live migrations, where it is written in ChangeEventTransformerDoFn. In the discussion that we had on greater encapsulation of the mapping logic, it seems like we will need to refactor both bulk and live migration to achieve it.

I will have to dig a bit deeper into the bulk code, but it looks like actually some of the methods like getSpannerColumnName in SessionFileMapper are actually only used in UTs. It's not immediately clear to me why.

Screenshot 2024-09-25 at 3 03 40 PM

I'll think a bit more on how we can go about this, but we should treat it as a separate exercise IMO.

Regarding 1 - Agreed the ISchemaMapper does some stuff which is not required here.

Regarding 2 - Agreed - it is not required currently, but the logic to resolve it etc exists in the template in any case, any concern in using that in this flow ? It comes as the ddlView which is loaded as a part of the pipeline as opposed to launcher, but i'm not sure that is reason enough to not use it.

Regarding 3 -
In the bulk migration flow, the approach is to be SpannerSchema first, so the calls being made are getSourceColumn instead of getSpannerColumn. The mapper is being used for these calls in both identity and session.

Copy link
Copy Markdown
Contributor

@bharadwaj-aditya bharadwaj-aditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@manitgupta manitgupta merged commit 3b15a14 into GoogleCloudPlatform:main Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants