Skip to content

Warm up the query string obfuscator regex#5266

Merged
kevingosse merged 3 commits intomasterfrom
kevin/regex_timeout
Mar 8, 2024
Merged

Warm up the query string obfuscator regex#5266
kevingosse merged 3 commits intomasterfrom
kevin/regex_timeout

Conversation

@kevingosse
Copy link
Contributor

@kevingosse kevingosse commented Mar 5, 2024

Summary of changes

It looks like the regex is compiled during the first invocation, not during construction, which might explain the timeouts we're seeing. Now warmup the regex in the constructor.

Reason for change

Timeouts occur in tests, and telemetry show that they occur in the wild too.

See also: dotnet/runtime#54747

Implementation details

Matching o in the obfuscator constructor, and catching all exceptions. I initially tried with an empty string, but it looks like some values are optimized and short-circuit the regex.

@kevingosse kevingosse requested a review from a team as a code owner March 5, 2024 10:53
Copy link
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I would personally be inclined to keep the timeout boosted (although everywhere) but I guess we'll see if I can say "I told you so"

Comment on lines -195 to -198
// After updating the regex implementation multiple runtimes started hitting the timeout.
// Increasing timeout here to prevent flaky tests.
SetEnvironmentVariable(Configuration.ConfigurationKeys.ObfuscationQueryStringRegexTimeout, "5000");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bold move

catch (Exception exception)
{
// We're in the constructor, we can't let exceptions escape
_logger.Warning(exception, "Timeout during the query string obfuscation regex warmup, with a timeout value of {TotalMilliseconds} ms and regex pattern {Pattern}", _timeout.TotalMilliseconds, _regex.ToString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually care about timeouts during compilation, assuming that even if it does timeout, the compilation completes? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care much, that's why I lowered to warning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't ever bail-out of the compilation, even if it times out, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok ok, removing the log 😄

@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Mar 5, 2024

Datadog Report

Branch report: kevin/regex_timeout
Commit report: 7da3e78
Test service: dd-trace-dotnet

✅ 0 Failed, 327274 Passed, 1555 Skipped, 56m 15.46s Wall Time
❄️ 1 New Flaky
⌛ 1 Performance Regression

New Flaky Tests (1)

  • ObfuscateWithDefaultPattern - Datadog.Trace.Tests.Util.Http.EvidenceRedactorTests - Last Failure

    Expand for error
     Expected string to be "key1=val1&<redacted>&key2=val2" with a length of 30, but "" has a length of 0, differs near "" (index 0).
    

⌛ Performance Regressions vs Default Branch (1)

  • Baseline_exception - scenarios 4.38s (+160.38ms, +4%) - Details

@andrewlock
Copy link
Member

andrewlock commented Mar 5, 2024

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5266) - mean (76ms)  : 63, 90
     .   : milestone, 76,
    master - mean (74ms)  : 66, 83
     .   : milestone, 74,

    section CallTarget+Inlining+NGEN
    This PR (5266) - mean (994ms)  : 969, 1019
     .   : milestone, 994,
    master - mean (996ms)  : 972, 1020
     .   : milestone, 996,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5266) - mean (112ms)  : 108, 116
     .   : milestone, 112,
    master - mean (111ms)  : 108, 115
     .   : milestone, 111,

    section CallTarget+Inlining+NGEN
    This PR (5266) - mean (721ms)  : 696, 747
     .   : milestone, 721,
    master - mean (717ms)  : 686, 749
     .   : milestone, 717,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5266) - mean (95ms)  : 92, 97
     .   : milestone, 95,
    master - mean (94ms)  : 91, 97
     .   : milestone, 94,

    section CallTarget+Inlining+NGEN
    This PR (5266) - mean (671ms)  : 650, 692
     .   : milestone, 671,
    master - mean (676ms)  : 641, 712
     .   : milestone, 676,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5266) - mean (188ms)  : 186, 191
     .   : milestone, 188,
    master - mean (188ms)  : 186, 190
     .   : milestone, 188,

    section CallTarget+Inlining+NGEN
    This PR (5266) - mean (1,071ms)  : 1051, 1090
     .   : milestone, 1071,
    master - mean (1,064ms)  : 1037, 1091
     .   : milestone, 1064,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5266) - mean (272ms)  : 268, 277
     .   : milestone, 272,
    master - mean (271ms)  : 268, 275
     .   : milestone, 271,

    section CallTarget+Inlining+NGEN
    This PR (5266) - mean (867ms)  : 843, 891
     .   : milestone, 867,
    master - mean (871ms)  : 843, 899
     .   : milestone, 871,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5266) - mean (261ms)  : 257, 265
     .   : milestone, 261,
    master - mean (261ms)  : 257, 265
     .   : milestone, 261,

    section CallTarget+Inlining+NGEN
    This PR (5266) - mean (856ms)  : 826, 885
     .   : milestone, 856,
    master - mean (857ms)  : 833, 881
     .   : milestone, 857,

Loading

@andrewlock
Copy link
Member

Throughput/Crank Report:zap:

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5266) (11.027M)   : 0, 11027168
    master (10.925M)   : 0, 10925303
    benchmarks/2.9.0 (10.620M)   : 0, 10620397

    section Automatic
    This PR (5266) (7.676M)   : 0, 7675574
    master (7.496M)   : 0, 7496085
    benchmarks/2.9.0 (7.702M)   : 0, 7701617

    section Trace stats
    This PR (5266) (7.888M)   : 0, 7887981
    master (7.872M)   : 0, 7871947

    section Manual
    This PR (5266) (9.607M)   : 0, 9607231
    master (9.635M)   : 0, 9635035

    section Manual + Automatic
    This PR (5266) (7.272M)   : 0, 7271867
    master (7.114M)   : 0, 7113688

    section Version Conflict
    This PR (5266) (6.409M)   : 0, 6409175
    master (6.292M)   : 0, 6292435

Loading
gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5266) (9.629M)   : 0, 9629288
    benchmarks/2.9.0 (9.526M)   : 0, 9525834

    section Automatic
    This PR (5266) (6.694M)   : 0, 6693837

    section Trace stats
    This PR (5266) (6.857M)   : 0, 6856945

    section Manual
    This PR (5266) (8.539M)   : 0, 8538647

    section Manual + Automatic
    This PR (5266) (6.198M)   : 0, 6198182

    section Version Conflict
    This PR (5266) (5.725M)   : 0, 5725422

Loading
gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5266) (10.235M)   : 0, 10235203
    master (9.856M)   : 0, 9856447
    benchmarks/2.9.0 (10.533M)   : 0, 10532864

    section Automatic
    This PR (5266) (7.487M)   : 0, 7487139
    master (7.076M)   : 0, 7076232
    benchmarks/2.9.0 (7.815M)   : 0, 7814762

    section Trace stats
    This PR (5266) (7.597M)   : 0, 7596712
    master (7.404M)   : 0, 7404111

    section Manual
    This PR (5266) (9.349M)   : 0, 9348947
    master (8.658M)   : 0, 8657799

    section Manual + Automatic
    This PR (5266) (7.193M)   : 0, 7192785
    master (6.832M)   : 0, 6831716

    section Version Conflict
    This PR (5266) (6.348M)   : 0, 6348073
    master (6.173M)   : 0, 6172929

Loading
gantt
    title Throughput Linux x64 (ASM) (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    master (7.559M)   : 0, 7559179
    benchmarks/2.9.0 (7.859M)   : 0, 7859139

    section No attack
    master (1.844M)   : 0, 1844300
    benchmarks/2.9.0 (3.196M)   : 0, 3196117

    section Attack
    master (1.451M)   : 0, 1451050
    benchmarks/2.9.0 (2.519M)   : 0, 2519149

    section Blocking
    master (3.102M)   : 0, 3101736

    section IAST default
    master (6.434M)   : 0, 6433840

    section IAST full
    master (5.626M)   : 0, 5625514

    section Base vuln
    master (0.917M)   : 0, 917115

    section IAST vuln
    master (0.881M)   : 0, 881291

Loading

@github-actions github-actions bot added the area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) label Mar 5, 2024
Copy link
Member

@lucaspimentel lucaspimentel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this initializes the regex the way we want to.

@lucaspimentel lucaspimentel changed the title Warmup the obfuscator regex Warm up the query string obfuscator regex Mar 5, 2024
@lucaspimentel
Copy link
Member

PR description:

I reverted some of the timeout changes introduced in #5255 because I'm hopelessly optimistic.

I don't see these changes.

@kevingosse
Copy link
Contributor Author

PR description:

I reverted some of the timeout changes introduced in #5255 because I'm hopelessly optimistic.

I don't see these changes.

Right. I reverted the revert because it was indeed too optimistic. Fixed the description.

@kevingosse kevingosse force-pushed the kevin/regex_timeout branch from d21a776 to 7da3e78 Compare March 7, 2024 14:08
@kevingosse kevingosse merged commit de4d915 into master Mar 8, 2024
@kevingosse kevingosse deleted the kevin/regex_timeout branch March 8, 2024 09:08
@github-actions github-actions bot added this to the vNext milestone Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants