Skip to content

Buffer in CGroups parser can get corrupted #5114

@mobratil

Description

@mobratil

Description

Parsers for CGroups uses buffer for reading from the filesystem. The buffer is one and shared between the calls.
In case the methods are called in parallel the buffer can easily get corrupted.

The issue is present in 2 classes:
https://github.com/dotnet/extensions/blob/main/src/Libraries/Microsoft.Extensions.Diagnostics.ResourceMonitoring/Linux/LinuxUtilizationParserCgroupV1.cs
https://github.com/dotnet/extensions/blob/main/src/Libraries/Microsoft.Extensions.Diagnostics.ResourceMonitoring/Linux/LinuxUtilizationParserCgroupV2.cs

Reproduction Steps

The issue is hard to reproduce. In LinuxUtilizationParser there can be calls to the parser from multiple parallel threads.

One call can happen from the obervable gauge to CpuUtilization() method. The second call can occure in GetSnapshot() method from the publisher.

The buffer in parser can get corrupted resulting in weird error messages like:

Unable to gather utilization statistics.
Expected proc/stat to start with 'cpu ' but it was ' 8: 0000000000000000FFFF0000Ecpu 21390382 598466 10047926 536502883 17443424 0 2449856 0 0 0C043C0A:1F90 0000000000000000FFFF0000B8003C0A:D21D 01 00000000:00000000 00:00000000 00000000 1000 0 67727390 1 0000000000000000 22 0 0 10 -1 sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inodecpu 21390556 598466 10048040 536505465 17443694 0 2449865 0 0 0cpu 21390674 598466 10048097 536508217 17443923 0 2449873 0 0 0cpu 21390836 598466 10048186 536511097 17443926 0 2449896 0 0 0cpu 21390946 598466 10048228 536514094 17443926 0 2449911 0 0 0cpu 21391034 598466 10048258 536517117 17443926 0 2449919 0 0 0cpu 21391131 598466 10048290 536520119 17443926 0 2449931 0 0 0'.

Expected behavior

Buffers shouldn't get corrupted when any parallel call are performed.

Actual behavior

The buffers in LinuxUtilizationParserCgroupV1 and LinuxUtilizationParserCgroupV2 get corrupted preventing reading of utilization values.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

The issue could be fixed by pooling buffers in parsers classes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions