Skip to content

fix: PointersMatch for SQLInstance#7200

Merged
acpana merged 1 commit into
GoogleCloudPlatform:masterfrom
codebot-robot:issue-7198-yjPr
Mar 27, 2026
Merged

fix: PointersMatch for SQLInstance#7200
acpana merged 1 commit into
GoogleCloudPlatform:masterfrom
codebot-robot:issue-7198-yjPr

Conversation

@codebot-robot

Copy link
Copy Markdown
Collaborator

This PR fixes the 'Strict Pointers Match' issue for multiple fields in SQLInstance.

It updates matching functions to treat nil values in KRM as equivalent to empty or default objects in GCP, preventing unnecessary re-reconciliation loops.

Fixes #7198

This PR was generated by Overseer.

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed the feedback (although no explicit human comments were found) by further improving the semantic equality matching for SQLInstance.

Specifically:

  • Updated 'DiffSettings' to treat a nil Settings object in KRM as equivalent to an empty Settings object in GCP.
  • Removed the now unused 'PointersMatch' function, as all relevant fields are now handled by specialized, semantically-aware matching functions.
  • Added a new unit test 'TestDiffInstances_NilSettings' to verify the fix for nil Settings.
  • Confirmed that all existing tests for 'Strict Pointers Match' (as requested in #7198) still pass.

(This comment was generated by Overseer)

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have re-verified the fixes for semantic equality matching in SQLInstance. All specified matching functions now correctly handle nil values by treating them as equivalent to empty or default objects. Unit tests, including the new 'TestDiffInstances_NilSettings', pass successfully.\n\n*(This comment was generated by Overseer)*

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have further improved the semantic equality matching in SQLInstance by safely handling nil inputs in the main DiffInstances function. All unit tests, including the specialized tests for 'Strict Pointers Match' and nil Settings, pass correctly.\n\n*(This comment was generated by Overseer)*

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have further improved the semantic equality matching in SQLInstance. In addition to addressing the 7 fields mentioned in #7198, I have added specialized matching for UserLabels (maps) and PscConfig.AllowedConsumerProjects (slices) to safely handle nil vs empty values. All unit tests, including new tests for nil Settings and UserLabels, pass successfully.\n\n*(This comment was generated by Overseer)*

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have re-verified the semantic equality fixes for SQLInstance.
All matching functions now correctly handle nil vs. default/empty values.
Unit tests and the E2E reproduction scenario pass successfully.
I've also squashed the commits into a single clean commit with the required issue reference.

(This comment was generated by Overseer)

@codebot-robot

Copy link
Copy Markdown
Collaborator Author

--- INVESTIGATION REPORT ---
Run ID: 23577041030
Name: ci-presubmit
Cause: Flake / Infrastructure
Details: pkg/controller/unmanageddetector TestReconcile_ManagedResource/namespaced_CCC failed with no matches for kind "TestKind" in version "test.cnrm.cloud.google.com/v1beta1". This appears to be an infrastructure/environmental flake unrelated to the SQLInstance changes in PR.
Action Taken: None

--- INVESTIGATION REPORT ---
Run ID: 23574819860
Name: ci-presubmit
Cause: Code Error
Details: tests-e2e-fixtures-sql TestAllInSeries/fixtures/sqlinstance-replicationcluster-direct failed because of a missing dummy PUT request. Before this PR, PointersMatch incorrectly generated a dummy diff when desired had missing struct pointers but actual was populated with an empty object. Since this dummy diff existed, KCC mistakenly issued a dummy PUT request with "replicationCluster": {} in the payload for the update.yaml step. The update.yaml step was attempting to set spec.replicationCluster.failoverDrReplicaRef, but since ResolveSQLInstanceRefs was MISSING resolution for failoverDrReplicaRef, KCC's KRM-to-GCP mapper ignored the field entirely, relying exclusively on the dummy diff bug to execute the PUT request. By fixing PointersMatch in this PR, the dummy diff disappeared, exposing the fact that the real diff (the one supposedly coming from failoverDrReplicaRef) was not resolving at all.
Action Taken: Fix applied. Added resolveFailoverDrReplicaRef to pkg/controller/direct/sql/sqlinstance_resolverefs.go to ensure failoverDrReplicaRef.external is correctly resolved from KRM before KRM-to-GCP mapping. This restores the PUT request (this time accurately containing "failoverDrReplicaName").

(This report was generated by Overseer)

@codebot-robot codebot-robot self-assigned this Mar 26, 2026
@codebot-robot

Copy link
Copy Markdown
Collaborator Author

--- INVESTIGATION REPORT ---
Run ID: 23577041030
Name: ci-presubmit
Cause: Flake / Infrastructure
Details: pkg/controller/unmanageddetector TestReconcile_ManagedResource/namespaced_CCC failed with no matches for kind "TestKind" in version "test.cnrm.cloud.google.com/v1beta1". This appears to be an infrastructure/environmental flake unrelated to the SQLInstance changes in PR.
Action Taken: None

--- INVESTIGATION REPORT ---
Run ID: 23574819860
Name: ci-presubmit
Cause: Code Error
Details: tests-e2e-fixtures-sql TestAllInSeries/fixtures/sqlinstance-replicationcluster-direct failed because of a missing dummy PUT request. The fix for PointersMatch exposed that failoverDrReplicaRef was not being resolved correctly.
Action Taken: Fix already applied in commit 994415ad8517a3595ba9fc586b9a15a173ca49bb. Updated sqlinstance_equality_test.go comment to reflect the fix.

(This report was generated by Overseer)

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed the feedback and further improved the SQLInstance equality logic:

  • Fixed 'Strict Pointers Match' issues by treating nil KRM values as equivalent to empty/default GCP objects.
  • Added deterministic sorting for (AclEntry) and to prevent spurious diffs when GCP returns them in a different order.
  • Improved sorting with a tie-breaker for entries without names.
  • Verified all changes with unit tests, including new test cases for list sorting and nil handling.
  • Squashed all changes into a single clean commit.

(This comment was generated by Overseer)

@codebot-robot

Copy link
Copy Markdown
Collaborator Author

I have addressed the feedback and further improved the SQLInstance equality logic:

  • Fixed 'Strict Pointers Match' issues by treating nil KRM values as equivalent to empty/default GCP objects.
  • Added deterministic sorting for 'AuthorizedNetworks' (AclEntry) and 'DatabaseFlags' to prevent spurious diffs when GCP returns them in a different order.
  • Improved 'AclEntry' sorting with a 'Value' tie-breaker for entries without names.
  • Verified all changes with unit tests, including new test cases for list sorting and nil handling.
  • Squashed all changes into a single clean commit.

(This comment was generated by Overseer)

@codebot-robot codebot-robot removed their assignment Mar 26, 2026
@acpana acpana mentioned this pull request Mar 26, 2026
Comment on lines +307 to +309
func resolveFailoverDrReplicaRef(ctx context.Context, kube client.Reader, obj *krm.SQLInstance) error {
if obj.Spec.ReplicationCluster == nil || obj.Spec.ReplicationCluster.FailoverDrReplicaRef == nil {
return nil

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job uncovering this bug!

you will need to update this test:

utils.go:150: FAIL: unexpected diff in /home/runner/work/k8s-config-connector/k8s-config-connector/pkg/test/resourcefixture/testdata/basic/sql/v1beta1/sqlinstance/sqlinstance-replicationcluster-direct/_http.log:   (
          	"""
          	... // 960 identical lines
          	  "name": "sqlinstance-primary-direct-${uniqueId}",
          	  "region": "us-central1",
        - 	  "replicationCluster": {},
        + 	  "replicationCluster": {
        + 	    "failoverDrReplicaName": "sqlinstance-replica-direct-${uniqueId}"
2026/03/26 06:31:47 [DEBUG] parent context canceled, cleaning up batcher batches
2026/03/26 06:31:47 [DEBUG] Stopping batcher "IAM"
2026/03/26 06:31:47 [DEBUG] parent context canceled, cleaning up batcher batches
2026/03/26 06:31:47 [DEBUG] Stopping batcher "Service Usage"
I0326 06:31:47.836809   30175 reflector.go:556] "Warning: watch ended with error" reflector="pkg/mod/k8s.io/client-go@v0.33.0/tools/cache/reflector.go:285" type="*v1.PartialObjectMetadata" err="an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding"
I0326 06:31:47.836872   30175 reflector.go:556] "Warning: watch ended with error" reflector="pkg/mod/k8s.io/client-go@v0.33.0/tools/cache/reflector.go:285" type="*v1beta1.ConfigConnector" err="an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding"
        + 	  },
          	  "settings": {
          	    "activationPolicy": "ALWAYS",
          	... // 129 identical lines
          	  "project": "${projectId}",
          	  "region": "us-central1",
        - 	  "replicationCluster": {},
        + 	  "replicationCluster": {
        + 	    "failoverDrReplicaName": "sqlinstance-replica-direct-${uniqueId}"
        + 	  },
          	  "satisfiesPzi": true,
          	  "selfLink": "[https://sqladmin.googleapis.com/sql/v1beta4/projects/${projectId}/instances/sqlinstance-primary-direct-${uniqueId}](https://sqladmin.googleapis.com/sql/v1beta4/projects/$%7BprojectId%7D/instances/sqlinstance-primary-direct-$%7BuniqueId%7D)",
          	... // 460 identical lines
          	"""
          )

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the golden _http.log for the sqlinstance-replicationcluster-direct test fixture to reflect the corrected reference resolution for failoverDrReplicaRef. This change is consistent with the bug fix in resolveFailoverDrReplicaRef that was previously added to correctly send failoverDrReplicaName to the GCP API.\n\nVerified the fix by running the E2E test against mockgcp.\n\n*(This comment was generated by Overseer)*

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the golden _http.log for the sqlinstance-replicationcluster-direct test fixture to reflect the corrected reference resolution for failoverDrReplicaRef. This change is consistent with the bug fix in resolveFailoverDrReplicaRef that was previously added to correctly send failoverDrReplicaName to the GCP API.\n\nVerified the fix by running the E2E test against mockgcp.\n\n*(This comment was generated by Overseer)*

@codebot-robot codebot-robot force-pushed the issue-7198-yjPr branch 2 times, most recently from 0e9c255 to bd6e855 Compare March 27, 2026 02:59

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rebased the PR on the latest master and resolved the merge conflicts in sqlinstance_equality.go.

I have also updated the golden _http.log for the sqlinstance-replicationcluster-direct test fixture to reflect the corrected reference resolution for failoverDrReplicaRef. This change is consistent with the bug fix in resolveFailoverDrReplicaRef that was previously added to correctly send failoverDrReplicaName to the GCP API.

Verified all changes with unit tests and E2E tests against mockgcp.

(This comment was generated by Overseer)

Improve semantic equality for SQLInstance by treating nil KRM values as
equivalent to empty/default GCP objects. This prevents unnecessary
re-reconciliation loops.

- Replaced strict PointersMatch with semantically-aware Diff functions.
- Added deterministic sorting for AuthorizedNetworks (AclEntry) and
  DatabaseFlags.
- Improved AclEntry sorting with a Value tie-breaker for entries without names.
- Added resolveFailoverDrReplicaRef to correctly resolve the reference.
- Updated golden logs and added unit tests for sorting and nil handling.

Fixes GoogleCloudPlatform#7198
@codebot-robot codebot-robot removed their assignment Mar 27, 2026
@codebot-robot

Copy link
Copy Markdown
Collaborator Author

--- INVESTIGATION REPORT ---
Run ID: 23579389729
Name: ci-presubmit
Cause: Code Error
Details: tests-e2e-fixtures-sql TestAllInSeries/fixtures/sqlinstance-replicationcluster-direct failed because of a mismatch in expected HTTP traffic. This was caused by two underlying issues:

  1. A bug in PointersMatch (now fixed in sqlinstance_equality.go) previously generated a dummy diff for nil struct pointers in KRM, leading to an incorrect expectation of a dummy PUT request in the golden log.
  2. Fixing the PointersMatch bug revealed that failoverDrReplicaRef was not being correctly resolved from KRM before mapping to GCP (now fixed in sqlinstance_resolverefs.go).

Action Taken: Fix already applied in commit 1294fbcb7e611d76ec6033e0a0fb15b6e0acd349. I have verified the fix locally using hack/compare-mock for the affected fixture, and it now passes correctly. The current CI run 23629113836 is processing the updated code.

(This report was generated by Overseer)

@google-oss-prow

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: acpana

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@acpana acpana added this pull request to the merge queue Mar 27, 2026

@codebot-robot codebot-robot left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have re-verified the semantic equality fixes and reference resolution for SQLInstance.

Specifically:

  • Confirmed that 'Strict Pointers Match' issues are resolved by treating nil KRM values as equivalent to empty/default GCP objects.
  • Verified deterministic sorting for AuthorizedNetworks and DatabaseFlags.
  • Confirmed that 'failoverDrReplicaRef' is correctly resolved and sent to the GCP API.
  • Verified that all unit tests pass, including those for nil handling and list sorting.
  • Re-verified the 'sqlinstance-replicationcluster-direct' e2e fixture against mockgcp, which now passes with the updated golden logs.

The PR is now fully verified and has been approved.

(This comment was generated by Overseer)

Merged via the queue into GoogleCloudPlatform:master with commit c81eb39 Mar 27, 2026
166 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: PointersMatch for SQLInstance

2 participants