Skip to content

Parallel.For performance degrades significantly after 2.14 billion milliseconds of system uptime #87543

@brandonagr

Description

@brandonagr

Description

When Environment.TickCount wraps around after 24.9 days (int.MaxValue number of milliseconds) of system uptime, Parallel.For loops will start exhibiting poor performance, generating way more tasks than expected

We have a production service that is using Parallel.For to process 100s of millions of items, after 24.9 days of uptime we have noticed our calculation process going half as fast and not using as much CPU. A typical log entry for a loop might look like this

Count: 13,039,697
Time: 0.463
Workers: 64
Top Work: 0.462 256,561, 0.461 274,515, 0.460 140,719

After 24.9 days of uptime it takes twice as long and looks like:

Count: 13,039,697
Time: 0.871
Workers: 733,401
Top Work: 0.836 1,309,099, 0.008 16, 0.008 16

Note how all worker tasks except for one only complete 16 iterations of the work

I suspect that TaskReplicator.GenerateCooperativeMultitakingTaskTimeout() is returning a negative timeout
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/TaskReplicator.cs#L170

Which causes all loops inside the TaskReplicate.Run action to break out of the do while loop immediately
https://github.com/dotnet/runtime/blob/v7.0.7/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs#L1091

For tasks that are processing 100k items, instead of 10s of worker tasks being used, thousands of worker tasks are spawned, resulting in significantly degraded performance.

Reproduction Steps

On a system where Environment.TickCount is negative (after 24.9 days of system uptime), can run the following test console that outputs how Parallel.For is splitting up the work and see that way more workers are spawned than expected and each one only processes 16 items (other than the root task which does the bulk of all work)

Run this net7 console app

using System.Diagnostics;

int maxDop = 32;
if (args.Length != 0)
{
	if (!int.TryParse(args[0], out maxDop))
	{
		maxDop = 32;
	}
}

Console.WriteLine("Running withMaxDop {0}", maxDop);
EvaluationAndDedupeWorkerData.Process(100_000, maxDop);

class EvaluationAndDedupeWorkerData
{
	public int Iterations;
	public readonly Stopwatch Level;
	public readonly List<KeyValuePair<long, long>> DataPointCalcTime = new();

	public EvaluationAndDedupeWorkerData()
	{
		Iterations = 0;
		Level = Stopwatch.StartNew();
	}

	public static void Process(int count, int maxDop)
	{
		var workAnalysis = new List<Tuple<double, int>>();
		var sw = Stopwatch.StartNew();

		Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = maxDop },
			() => {
				return new EvaluationAndDedupeWorkerData();
			},
			(evalIndex, _, local) => {

				local.Iterations++;

				int dummy = 0;
				while (dummy < 100_000)
				{
					dummy++;
				}

				return local;
			},
			local => {
				lock (workAnalysis)
				{
					workAnalysis.Add(Tuple.Create(local.Level.Elapsed.TotalSeconds, local.Iterations));
				}
			});

		var workString = string.Join(", ", workAnalysis.OrderByDescending(x => x.Item1).Select(x => $"{x.Item1:0.000} {x.Item2:N0}").Take(5));
		Console.WriteLine($"Count: {count:N0} maxDop: {maxDop} time: {sw.Elapsed.TotalSeconds:0.000} workers: {workAnalysis.Count:N0} Top Work: {workString}");
	}
}

Expected behavior

Count: 100,000 maxDop: 32 time: 1.304 workers: 57 Top Work: 1.272 4,587, 1.060 4,047, 0.968 4,692, 0.944 3,541, 0.859 3,339

The list of top work is the time spent by each worker task and the number of iterations completed, it's expected that the numbers would be somewhat evenly distributed

Actual behavior

Count: 100,000 maxDop: 32 time: 2.304 workers: 4,352 Top Work: 1.272 24,587, 0.060 16, 0.010 16, ...

Instead of the iterations being evenly spread over workers

Regression?

This code appears to be unchanged in 8+ years, so maybe it's always been broken in this way

Known Workarounds

Reboot server so that it doesn't have a negative Environment.TickCount

Configuration

Was tested with net 7.05 and 7.0.7, running on windows server 2022

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions