Skip to content

[core][telemetry/06] record gauge metric e2e#53231

Merged
can-anyscale merged 147 commits intomasterfrom
can-tel05
Jun 6, 2025
Merged

[core][telemetry/06] record gauge metric e2e#53231
can-anyscale merged 147 commits intomasterfrom
can-tel05

Conversation

@can-anyscale
Copy link
Copy Markdown
Contributor

@can-anyscale can-anyscale commented May 22, 2025

This is a series of PR to migrate metric collection from opencencus to openlemetry. For context on the existing components, see #53098.


This PR

  • Use open telemetry to export gauge metric to grpc endpoint
  • Grpc endpoint also export received metrics to prometheus server
  • Add unit tests + e2e tests. Note that I create a test job separately for open telemetry, the grpc server needs to use the network host directly (it doesn't work in a docker-in-docker environment from CI for reasons I don't 100% understand yet).

Overall, when RAY_experimental_enable_open_telemetry_on_core is True, gauge metric uses open telemetry and everything else keeps using opencensus. Otherwise, everything will use opencensus. This ensure correctness for both options.

Test:

  • CI

@can-anyscale can-anyscale force-pushed the can-tel04 branch 3 times, most recently from 7515515 to c93d2e5 Compare May 27, 2025 16:45
@can-anyscale can-anyscale force-pushed the can-tel05 branch 2 times, most recently from 14327e0 to 7cfc5af Compare May 27, 2025 20:32
@can-anyscale can-anyscale force-pushed the can-tel04 branch 2 times, most recently from 46124de to 5828c32 Compare May 27, 2025 23:11
@can-anyscale can-anyscale force-pushed the can-tel04 branch 2 times, most recently from 6fe7413 to f50e634 Compare May 28, 2025 00:50
@can-anyscale can-anyscale force-pushed the can-tel05 branch 4 times, most recently from 361ee76 to 4244eb1 Compare May 28, 2025 17:11
@can-anyscale can-anyscale force-pushed the can-tel04 branch 11 times, most recently from 0c934db to 6d5215e Compare May 28, 2025 21:58
@can-anyscale can-anyscale changed the base branch from can-tel04 to can-tel04bis May 28, 2025 22:59
can and others added 29 commits June 6, 2025 02:55
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

<!-- Please give a short summary of the change and the problem this
solves. -->

<!-- For example: "Closes #1234" -->

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: can <can@anyscale.com>
tag.key: tag.value.string_value
for tag in data_point.attributes
},
data_point.as_double,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, here we assume that all metrics will be with double type, is that right? If that's the case, should we add a comment somewhere to clarify the assumption so that when people add new metrics, they will also be aware of that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants