fix(internal): do not restart any ThreadRestartTimer threads#17312
Conversation
Performance SLOsComparing candidate brettlangdon/fix-potential-thread-leak (0ee1f8e) with baseline main (48788ce) 📈 Performance Regressions (2 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 104.976µs (SLO: <130.000µs 📉 -19.2%) vs baseline: +3.3% Memory: ✅ 43.977MB (SLO: <46.000MB -4.4%) vs baseline: +4.9% ✅ add_inplace_aspectTime: ✅ 102.016µs (SLO: <130.000µs 📉 -21.5%) vs baseline: -0.6% Memory: ✅ 43.960MB (SLO: <46.000MB -4.4%) vs baseline: +4.9% ✅ add_inplace_noaspectTime: ✅ 28.058µs (SLO: <40.000µs 📉 -29.9%) vs baseline: -0.8% Memory: ✅ 43.895MB (SLO: <46.000MB -4.6%) vs baseline: +4.8% ✅ add_noaspectTime: ✅ 49.413µs (SLO: <70.000µs 📉 -29.4%) vs baseline: -0.4% Memory: ✅ 43.906MB (SLO: <46.000MB -4.6%) vs baseline: +4.8% ✅ bytearray_aspectTime: ✅ 257.057µs (SLO: <400.000µs 📉 -35.7%) vs baseline: +1.1% Memory: ✅ 43.932MB (SLO: <46.000MB -4.5%) vs baseline: +5.0% ✅ bytearray_extend_aspectTime: ✅ 659.487µs (SLO: <800.000µs 📉 -17.6%) vs baseline: -2.1% Memory: ✅ 44.358MB (SLO: <46.000MB -3.6%) vs baseline: +5.9% ✅ bytearray_extend_noaspectTime: ✅ 270.402µs (SLO: <400.000µs 📉 -32.4%) vs baseline: +0.3% Memory: ✅ 44.029MB (SLO: <46.000MB -4.3%) vs baseline: +5.2% ✅ bytearray_noaspectTime: ✅ 140.878µs (SLO: <300.000µs 📉 -53.0%) vs baseline: -0.6% Memory: ✅ 43.967MB (SLO: <46.000MB -4.4%) vs baseline: +4.8% ✅ bytes_aspectTime: ✅ 221.443µs (SLO: <300.000µs 📉 -26.2%) vs baseline: -0.9% Memory: ✅ 44.189MB (SLO: <46.000MB -3.9%) vs baseline: +5.6% ✅ bytes_noaspectTime: ✅ 134.217µs (SLO: <200.000µs 📉 -32.9%) vs baseline: +0.7% Memory: ✅ 43.541MB (SLO: <46.000MB -5.3%) vs baseline: +3.8% ✅ bytesio_aspectTime: ✅ 3.790ms (SLO: <5.000ms 📉 -24.2%) vs baseline: -0.6% Memory: ✅ 43.969MB (SLO: <46.000MB -4.4%) vs baseline: +4.6% ✅ bytesio_noaspectTime: ✅ 316.863µs (SLO: <420.000µs 📉 -24.6%) vs baseline: ~same Memory: ✅ 43.692MB (SLO: <46.000MB -5.0%) vs baseline: +4.3% ✅ capitalize_aspectTime: ✅ 89.949µs (SLO: <300.000µs 📉 -70.0%) vs baseline: -0.4% Memory: ✅ 43.690MB (SLO: <46.000MB -5.0%) vs baseline: +4.1% ✅ capitalize_noaspectTime: ✅ 249.102µs (SLO: <300.000µs 📉 -17.0%) vs baseline: -0.2% Memory: ✅ 43.993MB (SLO: <46.000MB -4.4%) vs baseline: +5.1% ✅ casefold_aspectTime: ✅ 93.848µs (SLO: <500.000µs 📉 -81.2%) vs baseline: +4.3% Memory: ✅ 44.018MB (SLO: <46.000MB -4.3%) vs baseline: +4.6% ✅ casefold_noaspectTime: ✅ 309.937µs (SLO: <500.000µs 📉 -38.0%) vs baseline: +1.9% Memory: ✅ 43.631MB (SLO: <46.000MB -5.1%) vs baseline: +4.4% ✅ decode_aspectTime: ✅ 86.391µs (SLO: <100.000µs 📉 -13.6%) vs baseline: -0.4% Memory: ✅ 43.868MB (SLO: <46.000MB -4.6%) vs baseline: +5.1% ✅ decode_noaspectTime: ✅ 152.999µs (SLO: <210.000µs 📉 -27.1%) vs baseline: +0.3% Memory: ✅ 43.791MB (SLO: <46.000MB -4.8%) vs baseline: +4.4% ✅ encode_aspectTime: ✅ 84.365µs (SLO: <200.000µs 📉 -57.8%) vs baseline: -0.7% Memory: ✅ 43.972MB (SLO: <46.000MB -4.4%) vs baseline: +5.6% ✅ encode_noaspectTime: ✅ 142.689µs (SLO: <200.000µs 📉 -28.7%) vs baseline: +0.7% Memory: ✅ 43.759MB (SLO: <46.000MB -4.9%) vs baseline: +4.7% ✅ format_aspectTime: ✅ 14.635ms (SLO: <19.200ms 📉 -23.8%) vs baseline: +0.3% Memory: ✅ 43.984MB (SLO: <46.000MB -4.4%) vs baseline: +5.1% ✅ format_map_aspectTime: ✅ 16.412ms (SLO: <21.500ms 📉 -23.7%) vs baseline: ~same Memory: ✅ 43.732MB (SLO: <46.000MB -4.9%) vs baseline: +3.9% ✅ format_map_noaspectTime: ✅ 376.238µs (SLO: <500.000µs 📉 -24.8%) vs baseline: -1.7% Memory: ✅ 43.935MB (SLO: <46.000MB -4.5%) vs baseline: +4.9% ✅ format_noaspectTime: ✅ 313.550µs (SLO: <500.000µs 📉 -37.3%) vs baseline: +1.0% Memory: ✅ 43.676MB (SLO: <46.000MB -5.1%) vs baseline: +4.1% ✅ index_aspectTime: ✅ 125.624µs (SLO: <300.000µs 📉 -58.1%) vs baseline: ~same Memory: ✅ 44.152MB (SLO: <46.000MB -4.0%) vs baseline: +5.4% ✅ index_noaspectTime: ✅ 40.146µs (SLO: <300.000µs 📉 -86.6%) vs baseline: -0.7% Memory: ✅ 43.818MB (SLO: <46.000MB -4.7%) vs baseline: +4.6% ✅ join_aspectTime: ✅ 214.336µs (SLO: <300.000µs 📉 -28.6%) vs baseline: -1.6% Memory: ✅ 44.027MB (SLO: <46.000MB -4.3%) vs baseline: +5.0% ✅ join_noaspectTime: ✅ 144.460µs (SLO: <300.000µs 📉 -51.8%) vs baseline: +0.2% Memory: ✅ 43.866MB (SLO: <46.000MB -4.6%) vs baseline: +4.8% ✅ ljust_aspectTime: ✅ 504.435µs (SLO: <700.000µs 📉 -27.9%) vs baseline: -0.7% Memory: ✅ 44.072MB (SLO: <46.000MB -4.2%) vs baseline: +5.3% ✅ ljust_noaspectTime: ✅ 259.688µs (SLO: <300.000µs 📉 -13.4%) vs baseline: -0.5% Memory: ✅ 43.699MB (SLO: <46.000MB -5.0%) vs baseline: +4.2% ✅ lower_aspectTime: ✅ 294.927µs (SLO: <500.000µs 📉 -41.0%) vs baseline: -2.8% Memory: ✅ 44.088MB (SLO: <46.000MB -4.2%) vs baseline: +5.2% ✅ lower_noaspectTime: ✅ 235.947µs (SLO: <300.000µs 📉 -21.4%) vs baseline: +0.1% Memory: ✅ 43.703MB (SLO: <46.000MB -5.0%) vs baseline: +4.5% ✅ lstrip_aspectTime: ✅ 0.357ms (SLO: <3.000ms 📉 -88.1%) vs baseline: 📈 +28.6% Memory: ✅ 43.810MB (SLO: <46.000MB -4.8%) vs baseline: +5.4% ✅ lstrip_noaspectTime: ✅ 0.176ms (SLO: <3.000ms 📉 -94.1%) vs baseline: -0.9% Memory: ✅ 43.728MB (SLO: <46.000MB -4.9%) vs baseline: +4.2% ✅ modulo_aspectTime: ✅ 14.345ms (SLO: <18.750ms 📉 -23.5%) vs baseline: +0.1% Memory: ✅ 43.999MB (SLO: <46.000MB -4.3%) vs baseline: +4.9% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.840ms (SLO: <19.350ms 📉 -23.3%) vs baseline: +0.3% Memory: ✅ 43.973MB (SLO: <46.000MB -4.4%) vs baseline: +4.4% ✅ modulo_aspect_for_bytesTime: ✅ 14.351ms (SLO: <18.900ms 📉 -24.1%) vs baseline: -0.4% Memory: ✅ 43.975MB (SLO: <46.000MB -4.4%) vs baseline: +5.0% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.603ms (SLO: <19.150ms 📉 -23.7%) vs baseline: -0.1% Memory: ✅ 44.103MB (SLO: <46.000MB -4.1%) vs baseline: +5.5% ✅ modulo_noaspectTime: ✅ 0.361ms (SLO: <3.000ms 📉 -88.0%) vs baseline: -1.0% Memory: ✅ 43.667MB (SLO: <46.000MB -5.1%) vs baseline: +4.2% ✅ replace_aspectTime: ✅ 18.421ms (SLO: <24.000ms 📉 -23.2%) vs baseline: +0.3% Memory: ✅ 43.998MB (SLO: <46.000MB -4.4%) vs baseline: +4.6% ✅ replace_noaspectTime: ✅ 279.967µs (SLO: <400.000µs 📉 -30.0%) vs baseline: +0.4% Memory: ✅ 43.821MB (SLO: <46.000MB -4.7%) vs baseline: +4.5% ✅ repr_aspectTime: ✅ 322.812µs (SLO: <420.000µs 📉 -23.1%) vs baseline: +1.2% Memory: ✅ 44.080MB (SLO: <46.000MB -4.2%) vs baseline: +5.1% ✅ repr_noaspectTime: ✅ 46.752µs (SLO: <90.000µs 📉 -48.1%) vs baseline: ~same Memory: ✅ 43.826MB (SLO: <46.000MB -4.7%) vs baseline: +4.7% ✅ rstrip_aspectTime: ✅ 385.849µs (SLO: <500.000µs 📉 -22.8%) vs baseline: -1.4% Memory: ✅ 43.881MB (SLO: <46.000MB -4.6%) vs baseline: +5.2% ✅ rstrip_noaspectTime: ✅ 185.058µs (SLO: <300.000µs 📉 -38.3%) vs baseline: -0.2% Memory: ✅ 43.754MB (SLO: <46.000MB -4.9%) vs baseline: +4.4% ✅ slice_aspectTime: ✅ 184.221µs (SLO: <300.000µs 📉 -38.6%) vs baseline: -0.2% Memory: ✅ 44.041MB (SLO: <46.000MB -4.3%) vs baseline: +5.0% ✅ slice_noaspectTime: ✅ 53.894µs (SLO: <90.000µs 📉 -40.1%) vs baseline: -0.2% Memory: ✅ 43.680MB (SLO: <46.000MB -5.0%) vs baseline: +4.1% ✅ stringio_aspectTime: ✅ 3.824ms (SLO: <5.000ms 📉 -23.5%) vs baseline: -0.7% Memory: ✅ 44.039MB (SLO: <46.000MB -4.3%) vs baseline: +5.1% ✅ stringio_noaspectTime: ✅ 350.680µs (SLO: <500.000µs 📉 -29.9%) vs baseline: -0.5% Memory: ✅ 43.684MB (SLO: <46.000MB -5.0%) vs baseline: +4.2% ✅ strip_aspectTime: ✅ 277.423µs (SLO: <350.000µs 📉 -20.7%) vs baseline: -1.2% Memory: ✅ 44.138MB (SLO: <46.000MB -4.0%) vs baseline: +5.0% ✅ strip_noaspectTime: ✅ 179.453µs (SLO: <240.000µs 📉 -25.2%) vs baseline: +0.8% Memory: ✅ 43.683MB (SLO: <46.000MB -5.0%) vs baseline: +3.9% ✅ swapcase_aspectTime: ✅ 333.168µs (SLO: <500.000µs 📉 -33.4%) vs baseline: -0.6% Memory: ✅ 44.001MB (SLO: <46.000MB -4.3%) vs baseline: +4.7% ✅ swapcase_noaspectTime: ✅ 271.171µs (SLO: <400.000µs 📉 -32.2%) vs baseline: +1.0% Memory: ✅ 43.702MB (SLO: <46.000MB -5.0%) vs baseline: +4.4% ✅ title_aspectTime: ✅ 319.164µs (SLO: <500.000µs 📉 -36.2%) vs baseline: -0.5% Memory: ✅ 44.093MB (SLO: <46.000MB -4.1%) vs baseline: +5.3% ✅ title_noaspectTime: ✅ 258.108µs (SLO: <400.000µs 📉 -35.5%) vs baseline: +0.6% Memory: ✅ 43.791MB (SLO: <46.000MB -4.8%) vs baseline: +4.4% ✅ translate_aspectTime: ✅ 496.600µs (SLO: <700.000µs 📉 -29.1%) vs baseline: -1.4% Memory: ✅ 43.912MB (SLO: <46.000MB -4.5%) vs baseline: +5.0% ✅ translate_noaspectTime: ✅ 426.180µs (SLO: <500.000µs 📉 -14.8%) vs baseline: -1.9% Memory: ✅ 43.978MB (SLO: <46.000MB -4.4%) vs baseline: +4.9% ✅ upper_aspectTime: ✅ 296.411µs (SLO: <500.000µs 📉 -40.7%) vs baseline: -0.4% Memory: ✅ 44.117MB (SLO: <46.000MB -4.1%) vs baseline: +5.1% ✅ upper_noaspectTime: ✅ 237.082µs (SLO: <400.000µs 📉 -40.7%) vs baseline: +0.5% Memory: ✅ 43.752MB (SLO: <46.000MB -4.9%) vs baseline: +4.5% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 519.304µs (SLO: <700.000µs 📉 -25.8%) vs baseline: 📈 +23.5% Memory: ✅ 43.794MB (SLO: <46.000MB -4.8%) vs baseline: +4.7% ✅ ospathbasename_noaspectTime: ✅ 426.423µs (SLO: <700.000µs 📉 -39.1%) vs baseline: ~same Memory: ✅ 44.029MB (SLO: <46.000MB -4.3%) vs baseline: +5.1% ✅ ospathjoin_aspectTime: ✅ 626.692µs (SLO: <700.000µs 📉 -10.5%) vs baseline: +0.4% Memory: ✅ 43.940MB (SLO: <46.000MB -4.5%) vs baseline: +4.9% ✅ ospathjoin_noaspectTime: ✅ 631.470µs (SLO: <700.000µs -9.8%) vs baseline: ~same Memory: ✅ 43.906MB (SLO: <46.000MB -4.6%) vs baseline: +5.2% ✅ ospathnormcase_aspectTime: ✅ 350.738µs (SLO: <700.000µs 📉 -49.9%) vs baseline: -0.6% Memory: ✅ 43.973MB (SLO: <46.000MB -4.4%) vs baseline: +5.2% ✅ ospathnormcase_noaspectTime: ✅ 357.104µs (SLO: <700.000µs 📉 -49.0%) vs baseline: -0.5% Memory: ✅ 43.913MB (SLO: <46.000MB -4.5%) vs baseline: +5.0% ✅ ospathsplit_aspectTime: ✅ 486.307µs (SLO: <700.000µs 📉 -30.5%) vs baseline: -0.1% Memory: ✅ 43.965MB (SLO: <46.000MB -4.4%) vs baseline: +5.5% ✅ ospathsplit_noaspectTime: ✅ 499.336µs (SLO: <700.000µs 📉 -28.7%) vs baseline: +0.9% Memory: ✅ 43.925MB (SLO: <46.000MB -4.5%) vs baseline: +5.1% ✅ ospathsplitdrive_aspectTime: ✅ 374.393µs (SLO: <700.000µs 📉 -46.5%) vs baseline: +0.5% Memory: ✅ 43.884MB (SLO: <46.000MB -4.6%) vs baseline: +4.8% ✅ ospathsplitdrive_noaspectTime: ✅ 73.446µs (SLO: <700.000µs 📉 -89.5%) vs baseline: +0.9% Memory: ✅ 43.743MB (SLO: <46.000MB -4.9%) vs baseline: +4.6% ✅ ospathsplitext_aspectTime: ✅ 456.369µs (SLO: <700.000µs 📉 -34.8%) vs baseline: +0.4% Memory: ✅ 43.881MB (SLO: <46.000MB -4.6%) vs baseline: +4.5% ✅ ospathsplitext_noaspectTime: ✅ 464.274µs (SLO: <700.000µs 📉 -33.7%) vs baseline: +0.7% Memory: ✅ 43.917MB (SLO: <46.000MB -4.5%) vs baseline: +5.0% 🟡 Near SLO Breach (2 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 19.712ms (SLO: <22.300ms 📉 -11.6%) vs baseline: ~same Memory: ✅ 69.560MB (SLO: <73.500MB -5.4%) vs baseline: +5.0% ✅ exception-replay-enabledTime: ✅ 1.327ms (SLO: <1.450ms -8.5%) vs baseline: -0.2% Memory: ✅ 67.692MB (SLO: <71.500MB -5.3%) vs baseline: +4.9% ✅ iastTime: ✅ 19.653ms (SLO: <22.250ms 📉 -11.7%) vs baseline: -0.4% Memory: ✅ 69.442MB (SLO: <75.000MB -7.4%) vs baseline: +4.7% ✅ profilerTime: ✅ 15.155ms (SLO: <16.550ms -8.4%) vs baseline: +1.0% Memory: ✅ 60.490MB (SLO: <61.000MB 🟡 -0.8%) vs baseline: +4.8% ✅ resource-renamingTime: ✅ 19.640ms (SLO: <21.750ms -9.7%) vs baseline: -0.1% Memory: ✅ 69.501MB (SLO: <73.500MB -5.4%) vs baseline: +4.8% ✅ span-code-originTime: ✅ 20.320ms (SLO: <28.200ms 📉 -27.9%) vs baseline: +1.9% Memory: ✅ 69.511MB (SLO: <75.000MB -7.3%) vs baseline: +4.8% ✅ tracerTime: ✅ 19.711ms (SLO: <21.750ms -9.4%) vs baseline: ~same Memory: ✅ 69.580MB (SLO: <75.000MB -7.2%) vs baseline: +4.9% ✅ tracer-and-profilerTime: ✅ 21.163ms (SLO: <23.500ms -9.9%) vs baseline: +0.6% Memory: ✅ 71.457MB (SLO: <75.000MB -4.7%) vs baseline: +4.8% ✅ tracer-dont-create-db-spansTime: ✅ 19.917ms (SLO: <21.500ms -7.4%) vs baseline: +0.5% Memory: ✅ 69.521MB (SLO: <75.000MB -7.3%) vs baseline: +4.8% ✅ tracer-minimalTime: ✅ 16.857ms (SLO: <17.500ms -3.7%) vs baseline: -0.1% Memory: ✅ 69.560MB (SLO: <75.000MB -7.3%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 19.715ms (SLO: <21.750ms -9.4%) vs baseline: ~same Memory: ✅ 69.511MB (SLO: <72.500MB -4.1%) vs baseline: +4.8% ✅ tracer-no-cachesTime: ✅ 17.680ms (SLO: <19.650ms 📉 -10.0%) vs baseline: ~same Memory: ✅ 69.521MB (SLO: <75.000MB -7.3%) vs baseline: +4.9% ✅ tracer-no-databasesTime: ✅ 19.268ms (SLO: <20.100ms -4.1%) vs baseline: -0.5% Memory: ✅ 69.560MB (SLO: <75.000MB -7.3%) vs baseline: +4.9% ✅ tracer-no-middlewareTime: ✅ 19.455ms (SLO: <21.500ms -9.5%) vs baseline: -0.2% Memory: ✅ 69.491MB (SLO: <75.000MB -7.3%) vs baseline: +4.9% ✅ tracer-no-templatesTime: ✅ 19.734ms (SLO: <22.000ms 📉 -10.3%) vs baseline: +0.4% Memory: ✅ 69.493MB (SLO: <73.500MB -5.5%) vs baseline: +4.8% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 3.391ms (SLO: <4.750ms 📉 -28.6%) vs baseline: +0.5% Memory: ✅ 56.649MB (SLO: <66.500MB 📉 -14.8%) vs baseline: +4.9% ✅ appsec-postTime: ✅ 2.874ms (SLO: <6.750ms 📉 -57.4%) vs baseline: ~same Memory: ✅ 56.588MB (SLO: <66.500MB 📉 -14.9%) vs baseline: +4.8% ✅ appsec-telemetryTime: ✅ 3.370ms (SLO: <4.750ms 📉 -29.0%) vs baseline: ~same Memory: ✅ 56.631MB (SLO: <66.500MB 📉 -14.8%) vs baseline: +5.0% ✅ debuggerTime: ✅ 1.881ms (SLO: <2.000ms -5.9%) vs baseline: ~same Memory: ✅ 49.352MB (SLO: <51.500MB -4.2%) vs baseline: +4.9% ✅ iast-getTime: ✅ 1.871ms (SLO: <2.000ms -6.5%) vs baseline: ~same Memory: ✅ 46.109MB (SLO: <49.000MB -5.9%) vs baseline: +4.9% ✅ profilerTime: ✅ 1.915ms (SLO: <2.100ms -8.8%) vs baseline: -0.2% Memory: ✅ 52.439MB (SLO: <53.500MB 🟡 -2.0%) vs baseline: +4.9% ✅ resource-renamingTime: ✅ 3.344ms (SLO: <3.650ms -8.4%) vs baseline: -0.6% Memory: ✅ 56.630MB (SLO: <60.000MB -5.6%) vs baseline: +4.9% ✅ tracerTime: ✅ 3.372ms (SLO: <3.650ms -7.6%) vs baseline: +0.3% Memory: ✅ 56.588MB (SLO: <60.000MB -5.7%) vs baseline: +4.8% ✅ tracer-nativeTime: ✅ 3.358ms (SLO: <3.650ms -8.0%) vs baseline: -0.1% Memory: ✅ 56.686MB (SLO: <60.000MB -5.5%) vs baseline: +4.9%
|
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: 4e9012d | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
Codeowners resolved as |
|
This change is marked for backport to 4.5 and it does not conflict with that branch. |
|
This change is marked for backport to 4.6 and it does not conflict with that branch. |
## Description We have a possible situation where we can end up with orphaned `ThreadRestartTimer` threads in fork-heavy applications. If a fork occurs while the current `ThreadRestartTimer` thread is in the stopping state (after `cls._instance = None` has already been called) then it will appear in `periodic_threads` and be added to `_threads_to_restart_after_fork` and then force restarted. ## Testing <!-- Describe your testing strategy or note what tests are included --> ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes This bug is really really hard to reproduce and technically shouldn't be possible because the GIL should be held throughout the whole `os.fork()` call, but this is a code path which could cause some of the crashes we've seen in a fork-heavy application and is an easy defensive guard to add. Co-authored-by: juanjux <juanjo.alvarezmartinez@datadoghq.com> (cherry picked from commit 96352ec)
## Description We have a possible situation where we can end up with orphaned `ThreadRestartTimer` threads in fork-heavy applications. If a fork occurs while the current `ThreadRestartTimer` thread is in the stopping state (after `cls._instance = None` has already been called) then it will appear in `periodic_threads` and be added to `_threads_to_restart_after_fork` and then force restarted. ## Testing <!-- Describe your testing strategy or note what tests are included --> ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes This bug is really really hard to reproduce and technically shouldn't be possible because the GIL should be held throughout the whole `os.fork()` call, but this is a code path which could cause some of the crashes we've seen in a fork-heavy application and is an easy defensive guard to add. Co-authored-by: juanjux <juanjo.alvarezmartinez@datadoghq.com> (cherry picked from commit 96352ec)
|
the only reason to keep the ThreadRestartTimer contraption is for cases where fork happens very frequently, and the threads are always restarted for no reason in the parent process. Things would be much easier without it, so if we could somehow declare the worst-case scenario rare, we could just get rid of it altogether 🤔 |
## Description We have a possible situation where we can end up with orphaned `ThreadRestartTimer` threads in fork-heavy applications. If a fork occurs while the current `ThreadRestartTimer` thread is in the stopping state (after `cls._instance = None` has already been called) then it will appear in `periodic_threads` and be added to `_threads_to_restart_after_fork` and then force restarted. ## Testing <!-- Describe your testing strategy or note what tests are included --> ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes This bug is really really hard to reproduce and technically shouldn't be possible because the GIL should be held throughout the whole `os.fork()` call, but this is a code path which could cause some of the crashes we've seen in a fork-heavy application and is an easy defensive guard to add. Co-authored-by: juanjux <juanjo.alvarezmartinez@datadoghq.com>

Description
We have a possible situation where we can end up with orphaned
ThreadRestartTimerthreads in fork-heavy applications.If a fork occurs while the current
ThreadRestartTimerthread is in the stopping state (aftercls._instance = Nonehas already been called) then it will appear inperiodic_threadsand be added to_threads_to_restart_after_forkand then force restarted.Testing
Risks
Additional Notes
This bug is really really hard to reproduce and technically shouldn't be possible because the GIL should be held throughout the whole
os.fork()call, but this is a code path which could cause some of the crashes we've seen in a fork-heavy application and is an easy defensive guard to add.