Skip to content

xds/internal/client: Add async gauge metrics for Connected and Resources (A78)#8807

Merged
mbissa merged 11 commits into
grpc:masterfrom
mbissa:a78-xds-gauge-metrics-with-tests
Apr 3, 2026
Merged

xds/internal/client: Add async gauge metrics for Connected and Resources (A78)#8807
mbissa merged 11 commits into
grpc:masterfrom
mbissa:a78-xds-gauge-metrics-with-tests

Conversation

@mbissa

@mbissa mbissa commented Jan 7, 2026

Copy link
Copy Markdown
Contributor

This PR leverages the async gauge framework and implements the xdsclient metrics to report number of xds resources and whether or not the xDS client currently has a working ADS stream to the xDS server along with the required labels as part of A78

RELEASE NOTES:

  • xds/internal/client: Add async gauge metrics for "grpc.xds_client.connected" and "grpc.xds_client.resources" as part of A78.

@mbissa mbissa added this to the 1.79 Release milestone Jan 7, 2026
@mbissa mbissa requested a review from easwars January 7, 2026 01:18
@mbissa mbissa self-assigned this Jan 7, 2026
@mbissa mbissa added Type: Feature New features or improvements in behavior Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Jan 7, 2026
@codecov

codecov Bot commented Jan 7, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 82.82828% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.01%. Comparing base (a21508e) to head (93fa34a).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
internal/xds/xdsclient/clientimpl.go 36.84% 11 Missing and 1 partial ⚠️
internal/xds/clients/xdsclient/authority.go 83.33% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8807      +/-   ##
==========================================
- Coverage   83.06%   83.01%   -0.05%     
==========================================
  Files         413      413              
  Lines       33135    33229      +94     
==========================================
+ Hits        27523    27586      +63     
- Misses       4199     4226      +27     
- Partials     1413     1417       +4     
Files with missing lines Coverage Δ
internal/xds/clients/xdsclient/ads_stream.go 85.22% <100.00%> (+0.36%) ⬆️
internal/xds/clients/xdsclient/xdsclient.go 85.84% <100.00%> (+1.26%) ⬆️
internal/xds/clients/xdsclient/authority.go 80.79% <83.33%> (+1.28%) ⬆️
internal/xds/xdsclient/clientimpl.go 78.29% <36.84%> (-7.16%) ⬇️

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread internal/xds/clients/xdsclient/metrics/metrics.go
Comment thread internal/xds/clients/xdsclient/metrics/metrics.go
Comment thread internal/xds/clients/xdsclient/metrics/metrics.go Outdated
Comment thread internal/xds/xdsclient/clientimpl.go Outdated
Comment thread internal/xds/xdsclient/clientimpl.go
@easwars

easwars commented Feb 6, 2026

Copy link
Copy Markdown
Contributor

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces asynchronous gauge metrics for tracking xDS client connectivity and resource counts, which is a valuable addition for observability. The implementation looks mostly solid, but I've found a critical issue in the connectivity metric logic that causes it to report incorrect status. I've also included a couple of medium-severity suggestions to improve test coverage and code clarity. Please see the detailed comments below.

Comment thread internal/xds/clients/xdsclient/xdsclient.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go
Comment thread internal/xds/xdsclient/clientimpl.go Outdated
Comment thread internal/xds/clients/config.go Outdated
Comment thread internal/xds/clients/config.go Outdated
Comment thread internal/xds/clients/config.go Outdated
Comment thread internal/xds/clients/config.go
Comment thread internal/xds/xdsclient/clientimpl.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go
Comment thread internal/xds/xdsclient/clientimpl.go Outdated
Comment thread internal/xds/clients/xdsclient/xdsclient.go Outdated
Comment thread internal/xds/clients/xdsclient/xdsclient.go Outdated
Comment thread internal/xds/clients/xdsclient/xdsclient.go Outdated
@easwars easwars removed their assignment Feb 7, 2026
@mbissa mbissa modified the milestones: 1.80 Release, 1.81 Release Mar 6, 2026
@mbissa mbissa force-pushed the a78-xds-gauge-metrics-with-tests branch from 916bd0e to 30b9c73 Compare March 11, 2026 21:25
@mbissa mbissa assigned easwars and unassigned mbissa Mar 11, 2026
@mbissa

mbissa commented Mar 11, 2026

Copy link
Copy Markdown
Contributor Author
sequenceDiagram
    participant OS as "OTel/Stats System"
    participant Rec as "MetricsRecorder (Core)"
    participant MR as "MetricsReporter (Adapter)"
    participant XC as "XDSClient"
    participant AS as "ADS Stream"

    Note over XC, Rec: Phase 1: Registration (One-time)
    XC->>MR: RegisterAsyncReporter(reporter)
    MR->>Rec: RegisterAsyncReporter(hook)
    Rec-->>XC: metricsCleanup func

    Note over XC, AS: Phase 2: Synchronous Events & ADS Lifecycle
    XC->>AS: WatchResource(lds:foo)
    AS->>AS: NewStream() Success
    AS->>XC: established = true

    AS->>XC: Recv(Response: lds:foo)
    XC->>XC: Cache Update (Requested -> ACKed)
    
    AS->>AS: Recv() Error (Stream Breaks)
    AS->>XC: established = false
    alt if legitimate (gRFC A9)
        AS->>MR: (No Counter)
    else non-legitimate
        AS->>MR: ReportMetric(ServerFailure)
    end
    MR->>Rec: Record Counter

    Note over OS, XC: Phase 3: Asynchronous Scrape (Periodic)
    OS->>Rec: Scrape Metrics
    Rec->>MR: Trigger Hook
    MR->>XC: Report(AsyncMetricsRecorder)
    XC->>XC: reportConnectedState() --> 1 if established, else 0
    XC->>XC: reportResourceStats() --> counts by (Type, CacheState)
    XC-->>OS: Return Gauge Values (via Core)

    Note over XC, Rec: Phase 4: Cleanup
    XC->>Rec: metricsCleanup()
Loading

@mbissa

mbissa commented Mar 12, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces asynchronous gauge metrics for tracking the xDS client's connection status and the number of cached xDS resources, as specified in gRFC A78. The implementation is well-structured, introducing new interfaces for asynchronous metric reporting and ensuring proper lifecycle management. The accompanying tests are thorough and cover various states and scenarios for the new metrics. I have a couple of minor suggestions to improve variable naming for better code clarity and maintainability.

Comment thread internal/xds/clients/xdsclient/authority.go Outdated
Comment thread internal/xds/clients/xdsclient/xdsclient.go Outdated
Comment thread internal/xds/clients/xdsclient/metrics/metrics.go Outdated
Comment thread internal/xds/clients/xdsclient/ads_stream.go Outdated
Comment thread internal/xds/clients/xdsclient/ads_stream.go Outdated
Comment thread internal/xds/clients/xdsclient/ads_stream.go
Comment thread internal/xds/clients/xdsclient/ads_stream.go Outdated
Comment thread internal/xds/clients/xdsclient/authority.go Outdated
Comment thread internal/xds/clients/xdsclient/xdsclient.go Outdated
Comment thread internal/xds/clients/xdsclient/xdsclient.go
Comment thread internal/xds/clients/xdsclient/metrics/metrics.go
Comment thread internal/xds/clients/xdsclient/authority.go Outdated
@easwars easwars assigned mbissa and unassigned easwars Mar 16, 2026
mbissa and others added 2 commits March 20, 2026 14:29
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@mbissa mbissa force-pushed the a78-xds-gauge-metrics-with-tests branch from 7582c77 to 720d321 Compare March 20, 2026 11:58
@mbissa mbissa force-pushed the a78-xds-gauge-metrics-with-tests branch from 720d321 to f397493 Compare March 20, 2026 12:01
@mbissa mbissa assigned easwars and unassigned mbissa Mar 24, 2026
Comment thread internal/xds/clients/xdsclient/metrics/metrics.go Outdated
Comment thread internal/xds/clients/xdsclient/ads_stream.go Outdated
Comment thread internal/xds/xdsclient/clientimpl.go Outdated
Comment thread internal/xds/clients/config.go
Comment thread internal/xds/clients/xdsclient/test/helpers_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/helpers_test.go
Comment thread internal/xds/clients/xdsclient/test/helpers_test.go
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
@easwars easwars assigned mbissa and unassigned easwars Mar 25, 2026
@mbissa mbissa assigned easwars and unassigned mbissa Mar 31, 2026
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
// Verify state transitions to disconnected (Value: 0) after NewStream attempt fails.
close(unblock)

if err := tmr.waitForSpecificMetric(ctx, &metrics.ServerFailure{ServerURI: mgmtServer.Address}); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think it makes sense to specify a Value: 0 in this case as it makes things explicit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, simplified.

Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
if err := tmr.waitForSpecificMetric(ctx, &metrics.XDSClientConnected{ServerURI: mgmtServer.Address, Value: 1}); err != nil {
t.Fatalf("Expected XDSClientConnected to be 1 after response, got: %v", err)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very long test that is testing a whole bunch of transitions. Is it possible to break them into smaller test(s) so that each one is just testing one particular transition. It is always better to have tests that do one particular thing instead of having tests that do N things. It is easier to maintain and easier to debug when something fails.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to reach to step 4/5 we will need to go through initial steps - because the logic is varies for initial NewStream and then disconnection (semantically) and then reconnection (semantically means different the second time).

Comment thread internal/xds/clients/xdsclient/xdsclient.go Outdated
Comment thread internal/xds/clients/xdsclient/test/helpers_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go Outdated
Comment thread internal/xds/clients/xdsclient/test/metrics_test.go
},
})

// Verify that resources missing from the authoritative update transition to

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. authoritative update is not a term that is well known or regularly used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@easwars easwars assigned mbissa and unassigned easwars Mar 31, 2026
@mbissa mbissa force-pushed the a78-xds-gauge-metrics-with-tests branch from bf0c55d to 32dc1cb Compare April 2, 2026 23:37
@mbissa mbissa assigned easwars and unassigned mbissa Apr 2, 2026
@mbissa mbissa merged commit aa4d281 into grpc:master Apr 3, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants