Skip to content

RuntimeMetrics creates the wrong kind of counter for some metrics #126167

@mattsains-msft

Description

@mattsains-msft

Description

Runtime metrics that are not monotone increasing (e.g., thread pool queue length, which can go down as well as up) should not be created as ObservableCounters, because these are for monotone increase values (values that only go up, e.g., number of times X happened).

The following metrics defined in RuntimeMetrics have the incorrect counter type: (https://github.com/dotnet/dotnet/blob/865f09801def155c0e8f6a14addbc82a48ebb9cd/src/runtime/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RuntimeMetrics.cs)

  • dotnet.thread_pool.queue.length
  • dotnet.thread_pool.thread.count

Both of these values can decrease as well as increase, so they should be defined as ObservableUpDownCounters.

Reproduction Steps

The following code illustrates the problem:

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

using System.Diagnostics;
using System.Net.Http;
using Microsoft.AspNetCore.Builder;
using Microsoft.Extensions.Logging;
using OpenTelemetry;
using OpenTelemetry.Metrics;
using Azure.Monitor.OpenTelemetry.Exporter;
using System;
using System.Threading;

public class Program
{
    internal class M : BaseExporter<Metric>
    {
        public override ExportResult Export(in Batch<Metric> batch)
        {
            foreach (var metric in batch)
            {
                if (metric.Name == "dotnet.thread_pool.queue.length" || metric.Name == "dotnet.thread_pool.thread.count")
                {
                    if (metric.MetricType != MetricType.LongSumNonMonotonic)
                    {
                        Console.WriteLine("Detected {0} metric was of incorrect type: {1}", metric.Name, metric.MetricType);
                    }
                }
            }
            return ExportResult.Success;
        }
    }

    public static void Main(string[] args)
    {
        var builder = WebApplication.CreateBuilder(args);

        var meterProvider = Sdk
            .CreateMeterProviderBuilder()
            .AddReader(new PeriodicExportingMetricReader(new M(), 2000))
            .AddRuntimeInstrumentation()
            .Build();

        var app = builder.Build();

        app.MapGet("/", () =>
        {
            app.Logger.LogInformation("Hello World!");

            using var client = new HttpClient();
            var response = client.GetAsync("https://www.bing.com/").Result;

            return $"Hello World! OpenTelemetry Trace: {Activity.Current?.Id}";
        });

        app.Run();
    }
}

The output includes these lines:

Detected dotnet.thread_pool.thread.count metric was of incorrect type: LongSum
Detected dotnet.thread_pool.queue.length metric was of incorrect type: LongSum

Expected behavior

these two metrics should have type LongSumNonMonotonic because they are not monotonic values over time

Actual behavior

the metrics are defined as LongSum instead of LongSumNonMonotonic. This causes some readers to interpret decreases as negative values (i.e., report negative thread pool count and queue length)

Regression?

Before .net 9.0, these metrics were defined in OpenTelemetry.Instrumentation.Runtime, which defines them correctly: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/d2ff8863eadf51cedc25fcfeaad5f101bf956dd7/src/OpenTelemetry.Instrumentation.Runtime/RuntimeMetrics.cs#L137-L150

Known Workarounds

No response

Configuration

  • .NET 10.0.5
  • Windows 11 Enterprise 25H2
  • x64
  • It's not specific to this configuration
  • Not using Blazor

Other information

No response

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions