-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Description
When we need a performant, reusable Regex, we construct one, with RegexOptions.Compiled. We accept the one-time performance hit, and expect an instance that can be used at high performance afterwards.
The matchTimeout parameter lets us evaluate dynamically created expressions with a certain safety, protecting us against inefficient expressions. In this use case, we want to set a relatively small timeout. 50 milliseconds is quite a lot of time for the CPU to evaluate a single expression against a small input.
Unfortunately, with a reasonable timeout like 50 milliseconds, the first evaluation will sometimes unexpectedly exceed its timeout (about once per 7500 attempts in my test).
This is problematic because we can no longer reliably use a reasonable timeout. We would have to use a timeout like 300 milliseconds merely to prevent the false-positive exception on the first evaluation, but such a long timeout may be unacceptable.
It does not matter how efficient the expression is. It does not matter how simple the input string is (although empty input does not suffer from the problem, perhaps due to some short-circuiting).
Unit tests confirm that the first evaluation consistently takes much longer than subsequent ones, although the amount of overhead fluctuates.
My best guess is that the overhead on the first evaluation comes from the JIT compilation of the code generated by the Regex.
Workaround
By running the following code immediately after Regex construction, the problem can be worked around entirely:
// Warm up, since the first evaluation is slow and risks hitting the timeout, presumably because the compiled code needs to be JIT'ed
try
{
_ = regex.IsMatch("");
}
catch (RegexMatchTimeoutException)
{
// No problem, as our regex is now compiled and should be fast on further evaluations
}Minimal Repro
With xUnit:
[Fact]
public void UsuallyThrowDueToSkewedInitialEvaluationTimeout()
{
// Parallel helps run into the timeout with fewer attempts and in less than, although a regularly loop also works with sufficient attempts
Parallel.For(0, 30_000, _ =>
{
// Is complex enough to hit the issue, but evaluates in a mere ~50-500 ticks on non-first invocation
var regex = new Regex(@"(^|[^0-9])00200\d{11}($|[^0-9])",
RegexOptions.Compiled | RegexOptions.ExplicitCapture,
matchTimeout: TimeSpan.FromMilliseconds(50));
// If we enable warmup, the problem disappears
//try
//{
// regex.IsMatch("");
//}
//catch (RegexMatchTimeoutException)
//{
// // No problem, as our regex is now compiled and should be fast on further evaluations
//}
// Only the initial evaluation of an instance is relevant to this test
regex.IsMatch("Lorem 002001234567890123 ipsum dolor sit amet, consectetur adipiscing elit. Aenean ultrices eleifend volutpat.");
});
}Suggested Solution
At least if RegexOptions.Compiled is given, it makes sense to me to perform the described warmup (see Workaround) in the Regex constructor.
This way, the matchTimeout parameter becomes the reliable tool it was meant to be.
The cost is that we pay a bit more for an instance if we discard it unused.
Constructing a Regex with RegexOptions.Compiled comes at a cost anyway, so it seems like the correct place to pay for the overhead.
Configuration
Tested with both .NET Core 3.1 and .NET 5, on both 64-bit Windows and 64-bit Linux, in both debug and release mode.
.NET 5 requires a slightly harder input text to reproduce the issue, and it has a smaller chance of the test doing so. It may take a few test runs to observe it. Alternatively a reduced timeout (say, 20 ms) also does the trick.