[core][telemetry/08] record counter metric e2e#53449
Conversation
efc22ae to
461d11f
Compare
c3dab2e to
d6343cf
Compare
4ca8343 to
7efc7b3
Compare
1337adc to
241bb02
Compare
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> <!-- Please give a short summary of the change and the problem this solves. --> <!-- For example: "Closes #1234" --> - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: can <can@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> <!-- Please give a short summary of the change and the problem this solves. --> <!-- For example: "Closes #1234" --> - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> <!-- Please give a short summary of the change and the problem this solves. --> <!-- For example: "Closes #1234" --> - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: can <can@anyscale.com>
Signed-off-by: can <can@anyscale.com>
| TEST_F(MetricTest, TestCounterMetric) { | ||
| ASSERT_TRUE(OpenTelemetryMetricRecorder::GetInstance().IsMetricRegistered( | ||
| "metric_counter_test")); | ||
| STATS_metric_counter_test.Record(100.0, {{"Tag1", "Value1"}, {"Tag2", "Value2"}}); |
There was a problem hiding this comment.
not testing anything after the record?
There was a problem hiding this comment.
yeah no, open telemetry doesn't have an api to get the value after recording so i just test that it's not crashing; the e2e tests (test_metrics_agent.py) will check the values.
| ASSERT_EQ(OpenTelemetryMetricRecorder::GetInstance().GetObservableMetricValue( | ||
| "metric_test", {{"Tag1", "Value1"}, {"Tag2", "Value2"}}), | ||
| "metric_gauge_test", {{"Tag1", "Value1"}, {"Tag2", "Value2"}}), | ||
| 42.0); |
There was a problem hiding this comment.
also one thing that I thought of, some of the get functions that exist only exist to be used in tests
we shouldn't have functions that are only used in tests, if tests really really need something private, they can be defined as friends or you can even have a function in the fixture class that accesses that private part and just make the fixture class a friend
There was a problem hiding this comment.
i can look into the friend stuff
src/ray/stats/metric.h
Outdated
| OpenTelemetryMetricRecorder::GetInstance().RegisterGaugeMetric(name, description); | ||
| } else if (T == COUNT && | ||
| ::RayConfig::instance().experimental_enable_open_telemetry_on_core()) { | ||
| OpenTelemetryMetricRecorder::GetInstance().RegisterCounterMetric(name, description); |
There was a problem hiding this comment.
nit but should just have a larger if around that checks for experimental_enable_open_telemetry_on_core
|
@dayshah's comments |
| """ | ||
| Check if a metric with the given name is an observable metric. | ||
| """ | ||
| return self._observations_by_name.get(name) is not None |
There was a problem hiding this comment.
do any of these 3 need to be separate functions? they're small and only seem to be used in one place each. Also they don't lock, and the fact that they exist increases the chance they're used without the lock
There was a problem hiding this comment.
merged \o/; i think in general though mutex should only be used at the top of the call chain and internal/private functions should not acquire or hold locks themselves (to avoid deadlock)
| with self._lock: | ||
| if name in self._registered_instruments: | ||
| # Counter with the same name is already registered. | ||
| return |
There was a problem hiding this comment.
should this condition ever happen?
There was a problem hiding this comment.
in our code base probably not, but it's an api so it tries to handle potential failures from the users (who are also us basically but well)
There was a problem hiding this comment.
Oh, actually this does happen and is expected behavior in [1]: for each batch of metrics received from the Export function, metrics are uniquely registered within the context of a single Export call, but not necessarily across multiple Export calls.
There was a problem hiding this comment.
Can we add a comment describing the case?
|
@dayshah's comments |
| data_points = metric.gauge.data_points | ||
| # counter metrics | ||
| if metric.WhichOneof("data") == "sum" and metric.sum.is_monotonic: | ||
| self._open_telemetry_metric_recorder.register_counter_metric( |
| with self._lock: | ||
| if name in self._registered_instruments: | ||
| # Counter with the same name is already registered. | ||
| return |
There was a problem hiding this comment.
Oh, actually this does happen and is expected behavior in [1]: for each batch of metrics received from the Export function, metrics are uniquely registered within the context of a single Export call, but not necessarily across multiple Export calls.
Signed-off-by: can <can@anyscale.com> Signed-off-by: Cuong Nguyen <can@anyscale.com>
| """ | ||
| Set the value of a metric with the given name and tags. | ||
| This will create a gauge if it does not exist. | ||
| Set the value of a metric with the given name and tags. If the metric is not |
There was a problem hiding this comment.
From the logic of the function, it seems that we will not store the value if the metric is not registered even for observable metrics and it seems that the logic is different from the logic on c++ side.
Wondering do we assume that once this function is called, the metric should be registered?
There was a problem hiding this comment.
It's a similar behavior in the c++ side that we'll ignore the metric value if it is not registered (https://github.com/ray-project/ray/blob/master/src/ray/telemetry/open_telemetry_metric_recorder.cc#L189). The only difference is that c++ will throw while python side just emits a warning.
I don't remember I throw on c++ side but I feel like c++ side should also emit a warning instead of throwing.
| with self._lock: | ||
| if name in self._registered_instruments: | ||
| # Counter with the same name is already registered. | ||
| return |
There was a problem hiding this comment.
Can we add a comment describing the case?
| recorder.set_metric_value( | ||
| name="test_counter", | ||
| tags={"label_key": "label_value"}, | ||
| value=10.0, |
There was a problem hiding this comment.
Wondering should we verify the value of the counter here as well?
There was a problem hiding this comment.
for counter metric we cannot retrieve its value (internal state within open-telemetry api) - there is another e2e test that we call the /metric endpoint to actually test the value ;)
This is a series of PR to migrate metric collection from opencencus to openlemetry. For context on the existing components, see #53098.
This PR
Test: