Skip to content

Compiled regex timeout skewed on first evaluation #54747

@Timovzl

Description

@Timovzl

Description

When we need a performant, reusable Regex, we construct one, with RegexOptions.Compiled. We accept the one-time performance hit, and expect an instance that can be used at high performance afterwards.

The matchTimeout parameter lets us evaluate dynamically created expressions with a certain safety, protecting us against inefficient expressions. In this use case, we want to set a relatively small timeout. 50 milliseconds is quite a lot of time for the CPU to evaluate a single expression against a small input.

Unfortunately, with a reasonable timeout like 50 milliseconds, the first evaluation will sometimes unexpectedly exceed its timeout (about once per 7500 attempts in my test).

This is problematic because we can no longer reliably use a reasonable timeout. We would have to use a timeout like 300 milliseconds merely to prevent the false-positive exception on the first evaluation, but such a long timeout may be unacceptable.

It does not matter how efficient the expression is. It does not matter how simple the input string is (although empty input does not suffer from the problem, perhaps due to some short-circuiting).

Unit tests confirm that the first evaluation consistently takes much longer than subsequent ones, although the amount of overhead fluctuates.

My best guess is that the overhead on the first evaluation comes from the JIT compilation of the code generated by the Regex.

Workaround

By running the following code immediately after Regex construction, the problem can be worked around entirely:

// Warm up, since the first evaluation is slow and risks hitting the timeout, presumably because the compiled code needs to be JIT'ed
try
{
	_ = regex.IsMatch("");
}
catch (RegexMatchTimeoutException)
{
	// No problem, as our regex is now compiled and should be fast on further evaluations
}

Minimal Repro

With xUnit:

[Fact]
public void UsuallyThrowDueToSkewedInitialEvaluationTimeout()
{
	// Parallel helps run into the timeout with fewer attempts and in less than, although a regularly loop also works with sufficient attempts
	Parallel.For(0, 30_000, _ =>
	{
		// Is complex enough to hit the issue, but evaluates in a mere ~50-500 ticks on non-first invocation
		var regex = new Regex(@"(^|[^0-9])00200\d{11}($|[^0-9])",
			RegexOptions.Compiled | RegexOptions.ExplicitCapture,
			matchTimeout: TimeSpan.FromMilliseconds(50));

		// If we enable warmup, the problem disappears
		//try
		//{
		//	regex.IsMatch("");
		//}
		//catch (RegexMatchTimeoutException)
		//{
		//	// No problem, as our regex is now compiled and should be fast on further evaluations
		//}

		// Only the initial evaluation of an instance is relevant to this test
		regex.IsMatch("Lorem 002001234567890123 ipsum dolor sit amet, consectetur adipiscing elit. Aenean ultrices eleifend volutpat.");
	});
}

Suggested Solution

At least if RegexOptions.Compiled is given, it makes sense to me to perform the described warmup (see Workaround) in the Regex constructor.

This way, the matchTimeout parameter becomes the reliable tool it was meant to be.

The cost is that we pay a bit more for an instance if we discard it unused.

Constructing a Regex with RegexOptions.Compiled comes at a cost anyway, so it seems like the correct place to pay for the overhead.

Configuration

Tested with both .NET Core 3.1 and .NET 5, on both 64-bit Windows and 64-bit Linux, in both debug and release mode.

.NET 5 requires a slightly harder input text to reproduce the issue, and it has a smaller chance of the test doing so. It may take a few test runs to observe it. Alternatively a reduced timeout (say, 20 ms) also does the trick.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions