Skip to content

instrument the Django cache framework#70

Merged
palazzem merged 14 commits into
masterfrom
palazzem/django-cache
Sep 30, 2016
Merged

instrument the Django cache framework#70
palazzem merged 14 commits into
masterfrom
palazzem/django-cache

Conversation

@palazzem

@palazzem palazzem commented Sep 27, 2016

Copy link
Copy Markdown

What it does

Add the support of the Django cache framework out of the box. Each time any of the defined cache s is used, a new span is created despite the underlying cache client connector (redis or pylibmc) is instrumented or not.

It includes:

  • the cache client wrapper is traced
  • the cache_page decorator is wrapped
  • all tests that check if spans are created properly
  • all tests that check if the cache framework primitives work as expected and if the wrap has broken something

What is missing:

  • we're not auto-patching the underlying cache client connector
  • we should support better metas for the cache_page decorator

Comment thread ddtrace/contrib/django/cache.py Outdated

from .conf import import_from_string
from .utils import quantize_key_values
from ..util import _resource_from_cache_prefix

@palazzem palazzem Sep 27, 2016

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utils of utils that refers utils... I think that we may update these names so that they're more "friendly". Suggestions?

'get_many',
'set_many',
'delete_many',
]

@palazzem palazzem Sep 27, 2016

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all the methods implemented by the base class.

Pros: we have just this list
Cons: we should remember to change that list if a method changes/is added/is removed in the next Django releases

An alternative is to do an introspection but I don't think that's good to over-engineering the code. Furthermore I don't think that it's so usual to add/remove methods to a cache system.... I mean a cache works in that way.

Comment thread ddtrace/contrib/django/cache.py Outdated
Django supported cache servers (Redis, Memcached, Database, Custom)
"""
# discover used cache backends
cache_backends = [cache['BACKEND'] for cache in settings.CACHES.values()]

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Django users can have more than one cache system connected to the application. With that I only get all BACKEND names such as django.core.cache.backends.memcached.PyLibMCCache


# trace all backends
for cache_module in cache_backends:
cache = import_from_string(cache_module, cache_module)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wrapping the class (not the instance nor the base class) so that when the instance is created, it's properly wrapped. I think this approach is good because it only suppose that developers don't change how Django loads stuff.

Obviously it could fail but this approach is so generic that could wrap any kind of backend (even custom/unexpected users backend) until they have the proper signature. Wrapping a base class is not possible because each backend implements their own methods (we don't have a class that wraps the cache system).

"""
# check if the backend owns the given bounded method
if not hasattr(cls, method_name):
return

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because some methods may not be available (i.e. custom users backend), I'm skipping the method.

@@ -0,0 +1,13 @@
def quantize_key_values(key):

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a better name? actually I'm doing easy stuff to remove values (i.e. security tokens in a set_many).

@palazzem palazzem force-pushed the palazzem/django-cache branch from 8f8caa7 to e34ae17 Compare September 27, 2016 15:56
@palazzem palazzem changed the title [WIP] instrument the Django cache framework instrument the Django cache framework Sep 28, 2016
@palazzem palazzem force-pushed the palazzem/django-cache branch from 0440667 to 3297a95 Compare September 30, 2016 08:24
# get the original function method
method = getattr(self, DATADOG_NAMESPACE.format(method=method_name))
with tracer.trace('django.cache',
span_type=TYPE, service=settings.DEFAULT_SERVICE) as span:

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is fine that the span service is django or what users have defined. Having a new service just for the cache I don't think it's correct. I mean it's always the cache system of this component and not the access to the redis (or whatever) cache. Am I right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, that's a hard question. For instance, in our own code we have promoted our cache to an individual service.
As long as we have no proper navigation for non-top level names, these stats wouldn't be accessible. But at the same time, I might not want to see it as a top-level service.

@LeoCavaille @clutchski what do you think?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the previous flask-cache integration, I've provided a different service (flask_cache was the name). But for the case of Django, when we hit a view that is cached we will have traces sent to 3 different services:

  • django (web)
  • django_cache (the cache.get() call)
  • redis (if redis is the used cache)

eq_(span_decr.meta, expected_meta)
assert start < span_decr.start < span_decr.start + span_decr.duration < end

def test_cache_get_many(self):

@LotharSee LotharSee Sep 30, 2016

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, when we have a get_many accessing N keys, it generates N+1 spans? IS LocMemCache the only one in the case?
Just want to check what might generate a ton of spans which would flood us.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. It's only related to LocMemCache because it's not providing an atomic operation (it's simply calling the get() many times). I can provide some tests for other backends.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests to ensure the correct behavior

# get the original function method
method = getattr(self, DATADOG_NAMESPACE.format(method=method_name))
with tracer.trace('django.cache',
span_type=TYPE, service=settings.DEFAULT_SERVICE) as span:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, that's a hard question. For instance, in our own code we have promoted our cache to an individual service.
As long as we have no proper navigation for non-top level names, these stats wouldn't be accessible. But at the same time, I might not want to see it as a top-level service.

@LeoCavaille @clutchski what do you think?

Comment thread ddtrace/contrib/util.py Outdated
Combine the resource name with the cache prefix (if any) to generate
the cache resource name
"""
if getattr(cache, "key_prefix", None):

@LotharSee LotharSee Sep 30, 2016

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it luck or a shared lib which explain why both Flask and Django have the same cache.key_prefix convention?
Since it is very specific, we might want to keep it in each integration folder.

For instance, If I get to contrib/util directly and I find this, I have no idea what it is).

So I wouldn't be surprised to have our own version in django/utils.py.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right observation. I think they're just sharing this convention unofficially, so it's better that I move everything as it was, having our own version in django/utils.py. Working on that.

@@ -0,0 +1,330 @@
import time

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to test other cache backends (like Redis) here?
Wouldn't need to be as complete, just testing the patching + one simple case (let's say GET) + one sneaky case (like MGET).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost easy... I can add tests for each backend "duplicating" only a get() and get_many() both for Redis and Memcached (with all implementations such as pylibmc and django_pylibmc.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all tests are under the test_cache_backends.py module

Emanuele Palazzetti added 3 commits September 30, 2016 12:38
…jango/utils; it could be different from flask_cache for future versions

This reverts commit e308bda.
}
}

CACHES = {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're testing all core backends and a reasonable set of third-party backends

'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'LOCATION': '127.0.0.1:51211',
},
'django_pylibmc': {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this configuration is usual for this kind of backend; furthermore it's used by our beta customer

@@ -0,0 +1,238 @@
import time

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests get() and get_many() operations for all cache backends


# tests
spans = self.tracer.writer.pop()
eq_(len(spans), 1)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case the operation is atomic so we're going to send only one span

@LotharSee LotharSee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Clean implementation and great test coverage 👍

@palazzem palazzem merged commit 4ab6373 into master Sep 30, 2016
@palazzem palazzem deleted the palazzem/django-cache branch September 30, 2016 12:16
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Apr 23, 2026
…llocator (#17664)

## Description

This PR fixes a segmentation fault in the memory allocation profiler that occurs when a hook call races with `memalloc` start/stop operations. The issue arises from concurrent access to the saved allocator struct, which could be partially written while being read, resulting in`NULL` function pointers being dereferenced.  The key indicator in that case is that `#1 0x0000000000000000` frame -- we are trying to execute a null function pointer.

````
Error UnixSignal: Process terminated with SEGV_MAPERR (SIGSEGV)
#0   0x00007ff3c303a8d4  
#1   0x0000000000000000 memalloc_alloc (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/profiling/collector/_memalloc.cpp:68)
#2   0x00007ff39dcb3b20 memalloc_alloc (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/profiling/collector/_memalloc.cpp:68)
#3   0x00007ff39dcb3b20 memalloc_malloc(void*, unsigned long) (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/profiling/collector/_memalloc.cpp:80)
#4   0x00007ff3c3087e1b PyUnicode_New 
#5   0x00007ff3c30889f4  
#6   0x00007ff3c3170c84  
#7   0x00007ff3c316b931  
#8   0x00007ff3c31aaac8  
#9   0x00007ff3c31033ac  
#10  0x00007ff3c310e2a6 PyObject_CallMethodObjArgs 
#11  0x00007ff3c310e46d  
#12  0x00007ff3c31a96c2  
#13  0x00007ff3c3102fd7 PyObject_Vectorcall 
#14  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#15  0x00007ff3c323c094  
#16  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#17  0x00007ff3c323c094  
#18  0x00007ff3c30e997d PyObject_CallOneArg 
#19  0x00007ff3c306a480 _PyObject_GenericGetAttrWithDict 
#20  0x00007ff3c30c620d PyObject_GetAttr 
#21  0x00007ff3c32309e7 _PyEval_EvalFrameDefault 
#22  0x00007ff3c323c094  
#23  0x00007ff3c312880e  
#24  0x00007ff3c30e917c _PyObject_MakeTpCall 
#25  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#26  0x00007ff3c323c094  
#27  0x00007ff3c312880e  
#28  0x00007ff3c30e917c _PyObject_MakeTpCall 
#29  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#30  0x00007ff3c323c094  
#31  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#32  0x00007ff3c323c094  
#33  0x00007ff3c317d0fd  
#34  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#35  0x00007ff3c323c094  
#36  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#37  0x00007ff3c323c094  
#38  0x00007ff3c317d1b5  
#39  0x00007ff3c3102fd7 PyObject_Vectorcall 
#40  0x00007ff3c3232f4a _PyEval_EvalFrameDefault 
#41  0x00007ff3c3240da5  
#42  0x00007ff3c324112d  
#43  0x00007ff3c3233be1 _PyEval_EvalFrameDefault 
#44  0x00007ff3c323c094  
#45  0x00007ff3c317d1b5  
#46  0x00007ff3c3102fd7 PyObject_Vectorcall 
#47  0x00007ff3c3232f4a _PyEval_EvalFrameDefault 
#48  0x00007ff3c323c094  
#49  0x00007ff3c31033ac  
#50  0x00007ff3c310358d PyObject_CallFunctionObjArgs 
#51  0x00007ff3bf7eb91d WraptBoundFunctionWrapper_call (/project/src/wrapt/_wrappers.c:3750)
#52  0x00007ff3c3104055 _PyObject_Call 
#53  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#54  0x00007ff3c323c094  
#55  0x00007ff3c317d23c  
#56  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#57  0x00007ff3c323c094  
#58  0x00007ff3c310416f _PyObject_Call 
#59  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#60  0x00007ff3c3240da5  
#61  0x00007ff3c324112d  
#62  0x00007ff3c3233be1 _PyEval_EvalFrameDefault 
#63  0x00007ff3c323c094  
#64  0x00007ff3c317d1b5  
#65  0x00007ff3c3102fd7 PyObject_Vectorcall 
#66  0x00007ff3c3232f4a _PyEval_EvalFrameDefault 
#67  0x00007ff3c323c094  
#68  0x00007ff3c317d1b5  
#69  0x00007ff3c310416f _PyObject_Call 
#70  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#71  0x00007ff3c323c094  
#72  0x00007ff3c317d1b5  
#73  0x00007ff3c310416f _PyObject_Call 
#74  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#75  0x00007ff3c323c094  
#76  0x00007ff3c31033ac  
#77  0x00007ff3c310358d PyObject_CallFunctionObjArgs 
#78  0x00007ff3bf7eb91d WraptBoundFunctionWrapper_call (/project/src/wrapt/_wrappers.c:3750)
#79  0x00007ff3c30e917c _PyObject_MakeTpCall 
#80  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#81  0x00007ff3c323c094  
#82  0x00007ff3c317d518  
#83  0x00007ff3c3155963  
#84  0x00007ff3c315393d  
#85  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#86  0x00007ff3c323c094  
#87  0x00007ff3c317d0fd  
#88  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#89  0x00007ff3c323c094  
#90  0x00007ff3c317d0fd  
#91  0x00007ff3c317d518  
#92  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#93  0x00007ff3c323c094  
#94  0x00007ff3c30e9371 _PyObject_FastCallDictTstate 
#95  0x00007ff3c30e958d _PyObject_Call_Prepend 
#96  0x00007ff3c3109150  
#97  0x00007ff3c3104055 _PyObject_Call 
#98  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#99  0x00007ff3c323c094  
#100 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#101 0x00007ff3c30e958d _PyObject_Call_Prepend 
#102 0x00007ff3c3109150  
#103 0x00007ff3c30e917c _PyObject_MakeTpCall 
#104 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#105 0x00007ff3c323c094  
#106 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#107 0x00007ff3c30e958d _PyObject_Call_Prepend 
#108 0x00007ff3c3109150  
#109 0x00007ff3c30e917c _PyObject_MakeTpCall 
#110 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#111 0x00007ff3c323c094  
#112 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#113 0x00007ff3c30e958d _PyObject_Call_Prepend 
#114 0x00007ff3c3109150  
#115 0x00007ff3c30e917c _PyObject_MakeTpCall 
#116 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#117 0x00007ff3c323c094  
#118 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#119 0x00007ff3c30e958d _PyObject_Call_Prepend 
#120 0x00007ff3c3109150  
#121 0x00007ff3c30e917c _PyObject_MakeTpCall 
#122 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#123 0x00007ff3c323c094  
#124 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#125 0x00007ff3c30e958d _PyObject_Call_Prepend 
#126 0x00007ff3c3109150  
#127 0x00007ff3c30e917c _PyObject_MakeTpCall 
#128 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#129 0x00007ff3c323c094  
#130 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#131 0x00007ff3c30e958d _PyObject_Call_Prepend 
#132 0x00007ff3c3109150  
#133 0x00007ff3c30e917c _PyObject_MakeTpCall 
#134 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#135 0x00007ff3c323c094  
#136 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#137 0x00007ff3c30e958d _PyObject_Call_Prepend 
#138 0x00007ff3c3109150  
#139 0x00007ff3c30e917c _PyObject_MakeTpCall 
#140 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#141 0x00007ff3c323c094  
#142 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#143 0x00007ff3c30e958d _PyObject_Call_Prepend 
#144 0x00007ff3c3109150  
#145 0x00007ff3c30e917c _PyObject_MakeTpCall 
#146 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#147 0x00007ff3c323c094  
#148 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#149 0x00007ff3c30e958d _PyObject_Call_Prepend 
#150 0x00007ff3c3109150  
#151 0x00007ff3c30e917c _PyObject_MakeTpCall 
#152 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#153 0x00007ff3c323c094  
#154 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#155 0x00007ff3c30e958d _PyObject_Call_Prepend 
#156 0x00007ff3c3109150  
#157 0x00007ff3c30e917c _PyObject_MakeTpCall 
#158 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#159 0x00007ff3c323c094  
#160 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#161 0x00007ff3c30e958d _PyObject_Call_Prepend 
#162 0x00007ff3c3109150  
#163 0x00007ff3c30e917c _PyObject_MakeTpCall 
#164 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#165 0x00007ff3c323c094  
#166 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#167 0x00007ff3c30e958d _PyObject_Call_Prepend 
#168 0x00007ff3c3109150  
#169 0x00007ff3c30e917c _PyObject_MakeTpCall 
#170 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#171 0x00007ff3c323c094  
#172 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#173 0x00007ff3c30e958d _PyObject_Call_Prepend 
#174 0x00007ff3c3109150  
#175 0x00007ff3c30e917c _PyObject_MakeTpCall 
#176 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#177 0x00007ff3c323c094  
#178 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#179 0x00007ff3c30e958d _PyObject_Call_Prepend 
#180 0x00007ff3c3109150  
#181 0x00007ff3c30e917c _PyObject_MakeTpCall 
#182 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#183 0x00007ff3c323c094  
#184 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#185 0x00007ff3c30e958d _PyObject_Call_Prepend 
#186 0x00007ff3c3109150  
#187 0x00007ff3c30e917c _PyObject_MakeTpCall 
#188 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#189 0x00007ff3c323c094  
#190 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#191 0x00007ff3c323c094  
#192 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#193 0x00007ff3c30e958d _PyObject_Call_Prepend 
#194 0x00007ff3c3109150  
#195 0x00007ff3c30e917c _PyObject_MakeTpCall 
#196 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#197 0x00007ff3c323c094  
#198 0x00007ff3c317d0fd  
#199 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#200 0x00007ff3c323c094  
#201 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#202 0x00007ff3c323c094  
#203 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#204 0x00007ff3c323c094  
#205 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#206 0x00007ff3c323c094  
#207 0x00007ff3c317d23c  
#208 0x00007ff3c31a7ec5  
#209 0x00007ff3c301ac77  
#210 0x00007ff3c357c573  
````

The fix implements two key changes.

1. **Hook functions (`memalloc_alloc`, `memalloc_realloc`)**: Snapshot the allocator struct locally before use and guard indirect function calls with `NULL` checks. This prevents crashes if a partially-written struct is observed during a start/stop race.

2. **Start/stop operations (`memalloc_start`, `memalloc_stop`)**: Use local variables and single assignments when publishing the allocator struct to `global_memalloc_ctx.pymem_allocator_obj`. This ensures concurrent hook calls observe either the old or new struct, never a partially-written intermediate state.

The real root cause is that `PyMem_GetAllocator` is not documented as atomic, and the struct could be read field-by-field while being written to concurrently.  By using local copies and single assignments, we ensure atomicity at the C level and prevent observation of inconsistent state.

Co-authored-by: thomas.kowalski <thomas.kowalski@datadoghq.com>
emmettbutler pushed a commit that referenced this pull request Apr 24, 2026
…llocator (#17664)

## Description

This PR fixes a segmentation fault in the memory allocation profiler that occurs when a hook call races with `memalloc` start/stop operations. The issue arises from concurrent access to the saved allocator struct, which could be partially written while being read, resulting in`NULL` function pointers being dereferenced.  The key indicator in that case is that `#1 0x0000000000000000` frame -- we are trying to execute a null function pointer.

````
Error UnixSignal: Process terminated with SEGV_MAPERR (SIGSEGV)
#0   0x00007ff3c303a8d4  
#1   0x0000000000000000 memalloc_alloc (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/profiling/collector/_memalloc.cpp:68)
#2   0x00007ff39dcb3b20 memalloc_alloc (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/profiling/collector/_memalloc.cpp:68)
#3   0x00007ff39dcb3b20 memalloc_malloc(void*, unsigned long) (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/profiling/collector/_memalloc.cpp:80)
#4   0x00007ff3c3087e1b PyUnicode_New 
#5   0x00007ff3c30889f4  
#6   0x00007ff3c3170c84  
#7   0x00007ff3c316b931  
#8   0x00007ff3c31aaac8  
#9   0x00007ff3c31033ac  
#10  0x00007ff3c310e2a6 PyObject_CallMethodObjArgs 
#11  0x00007ff3c310e46d  
#12  0x00007ff3c31a96c2  
#13  0x00007ff3c3102fd7 PyObject_Vectorcall 
#14  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#15  0x00007ff3c323c094  
#16  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#17  0x00007ff3c323c094  
#18  0x00007ff3c30e997d PyObject_CallOneArg 
#19  0x00007ff3c306a480 _PyObject_GenericGetAttrWithDict 
#20  0x00007ff3c30c620d PyObject_GetAttr 
#21  0x00007ff3c32309e7 _PyEval_EvalFrameDefault 
#22  0x00007ff3c323c094  
#23  0x00007ff3c312880e  
#24  0x00007ff3c30e917c _PyObject_MakeTpCall 
#25  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#26  0x00007ff3c323c094  
#27  0x00007ff3c312880e  
#28  0x00007ff3c30e917c _PyObject_MakeTpCall 
#29  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#30  0x00007ff3c323c094  
#31  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#32  0x00007ff3c323c094  
#33  0x00007ff3c317d0fd  
#34  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#35  0x00007ff3c323c094  
#36  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#37  0x00007ff3c323c094  
#38  0x00007ff3c317d1b5  
#39  0x00007ff3c3102fd7 PyObject_Vectorcall 
#40  0x00007ff3c3232f4a _PyEval_EvalFrameDefault 
#41  0x00007ff3c3240da5  
#42  0x00007ff3c324112d  
#43  0x00007ff3c3233be1 _PyEval_EvalFrameDefault 
#44  0x00007ff3c323c094  
#45  0x00007ff3c317d1b5  
#46  0x00007ff3c3102fd7 PyObject_Vectorcall 
#47  0x00007ff3c3232f4a _PyEval_EvalFrameDefault 
#48  0x00007ff3c323c094  
#49  0x00007ff3c31033ac  
#50  0x00007ff3c310358d PyObject_CallFunctionObjArgs 
#51  0x00007ff3bf7eb91d WraptBoundFunctionWrapper_call (/project/src/wrapt/_wrappers.c:3750)
#52  0x00007ff3c3104055 _PyObject_Call 
#53  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#54  0x00007ff3c323c094  
#55  0x00007ff3c317d23c  
#56  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#57  0x00007ff3c323c094  
#58  0x00007ff3c310416f _PyObject_Call 
#59  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#60  0x00007ff3c3240da5  
#61  0x00007ff3c324112d  
#62  0x00007ff3c3233be1 _PyEval_EvalFrameDefault 
#63  0x00007ff3c323c094  
#64  0x00007ff3c317d1b5  
#65  0x00007ff3c3102fd7 PyObject_Vectorcall 
#66  0x00007ff3c3232f4a _PyEval_EvalFrameDefault 
#67  0x00007ff3c323c094  
#68  0x00007ff3c317d1b5  
#69  0x00007ff3c310416f _PyObject_Call 
#70  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#71  0x00007ff3c323c094  
#72  0x00007ff3c317d1b5  
#73  0x00007ff3c310416f _PyObject_Call 
#74  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#75  0x00007ff3c323c094  
#76  0x00007ff3c31033ac  
#77  0x00007ff3c310358d PyObject_CallFunctionObjArgs 
#78  0x00007ff3bf7eb91d WraptBoundFunctionWrapper_call (/project/src/wrapt/_wrappers.c:3750)
#79  0x00007ff3c30e917c _PyObject_MakeTpCall 
#80  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#81  0x00007ff3c323c094  
#82  0x00007ff3c317d518  
#83  0x00007ff3c3155963  
#84  0x00007ff3c315393d  
#85  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#86  0x00007ff3c323c094  
#87  0x00007ff3c317d0fd  
#88  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#89  0x00007ff3c323c094  
#90  0x00007ff3c317d0fd  
#91  0x00007ff3c317d518  
#92  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#93  0x00007ff3c323c094  
#94  0x00007ff3c30e9371 _PyObject_FastCallDictTstate 
#95  0x00007ff3c30e958d _PyObject_Call_Prepend 
#96  0x00007ff3c3109150  
#97  0x00007ff3c3104055 _PyObject_Call 
#98  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#99  0x00007ff3c323c094  
#100 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#101 0x00007ff3c30e958d _PyObject_Call_Prepend 
#102 0x00007ff3c3109150  
#103 0x00007ff3c30e917c _PyObject_MakeTpCall 
#104 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#105 0x00007ff3c323c094  
#106 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#107 0x00007ff3c30e958d _PyObject_Call_Prepend 
#108 0x00007ff3c3109150  
#109 0x00007ff3c30e917c _PyObject_MakeTpCall 
#110 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#111 0x00007ff3c323c094  
#112 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#113 0x00007ff3c30e958d _PyObject_Call_Prepend 
#114 0x00007ff3c3109150  
#115 0x00007ff3c30e917c _PyObject_MakeTpCall 
#116 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#117 0x00007ff3c323c094  
#118 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#119 0x00007ff3c30e958d _PyObject_Call_Prepend 
#120 0x00007ff3c3109150  
#121 0x00007ff3c30e917c _PyObject_MakeTpCall 
#122 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#123 0x00007ff3c323c094  
#124 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#125 0x00007ff3c30e958d _PyObject_Call_Prepend 
#126 0x00007ff3c3109150  
#127 0x00007ff3c30e917c _PyObject_MakeTpCall 
#128 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#129 0x00007ff3c323c094  
#130 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#131 0x00007ff3c30e958d _PyObject_Call_Prepend 
#132 0x00007ff3c3109150  
#133 0x00007ff3c30e917c _PyObject_MakeTpCall 
#134 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#135 0x00007ff3c323c094  
#136 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#137 0x00007ff3c30e958d _PyObject_Call_Prepend 
#138 0x00007ff3c3109150  
#139 0x00007ff3c30e917c _PyObject_MakeTpCall 
#140 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#141 0x00007ff3c323c094  
#142 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#143 0x00007ff3c30e958d _PyObject_Call_Prepend 
#144 0x00007ff3c3109150  
#145 0x00007ff3c30e917c _PyObject_MakeTpCall 
#146 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#147 0x00007ff3c323c094  
#148 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#149 0x00007ff3c30e958d _PyObject_Call_Prepend 
#150 0x00007ff3c3109150  
#151 0x00007ff3c30e917c _PyObject_MakeTpCall 
#152 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#153 0x00007ff3c323c094  
#154 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#155 0x00007ff3c30e958d _PyObject_Call_Prepend 
#156 0x00007ff3c3109150  
#157 0x00007ff3c30e917c _PyObject_MakeTpCall 
#158 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#159 0x00007ff3c323c094  
#160 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#161 0x00007ff3c30e958d _PyObject_Call_Prepend 
#162 0x00007ff3c3109150  
#163 0x00007ff3c30e917c _PyObject_MakeTpCall 
#164 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#165 0x00007ff3c323c094  
#166 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#167 0x00007ff3c30e958d _PyObject_Call_Prepend 
#168 0x00007ff3c3109150  
#169 0x00007ff3c30e917c _PyObject_MakeTpCall 
#170 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#171 0x00007ff3c323c094  
#172 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#173 0x00007ff3c30e958d _PyObject_Call_Prepend 
#174 0x00007ff3c3109150  
#175 0x00007ff3c30e917c _PyObject_MakeTpCall 
#176 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#177 0x00007ff3c323c094  
#178 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#179 0x00007ff3c30e958d _PyObject_Call_Prepend 
#180 0x00007ff3c3109150  
#181 0x00007ff3c30e917c _PyObject_MakeTpCall 
#182 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#183 0x00007ff3c323c094  
#184 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#185 0x00007ff3c30e958d _PyObject_Call_Prepend 
#186 0x00007ff3c3109150  
#187 0x00007ff3c30e917c _PyObject_MakeTpCall 
#188 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#189 0x00007ff3c323c094  
#190 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#191 0x00007ff3c323c094  
#192 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#193 0x00007ff3c30e958d _PyObject_Call_Prepend 
#194 0x00007ff3c3109150  
#195 0x00007ff3c30e917c _PyObject_MakeTpCall 
#196 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#197 0x00007ff3c323c094  
#198 0x00007ff3c317d0fd  
#199 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#200 0x00007ff3c323c094  
#201 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#202 0x00007ff3c323c094  
#203 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#204 0x00007ff3c323c094  
#205 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#206 0x00007ff3c323c094  
#207 0x00007ff3c317d23c  
#208 0x00007ff3c31a7ec5  
#209 0x00007ff3c301ac77  
#210 0x00007ff3c357c573  
````

The fix implements two key changes.

1. **Hook functions (`memalloc_alloc`, `memalloc_realloc`)**: Snapshot the allocator struct locally before use and guard indirect function calls with `NULL` checks. This prevents crashes if a partially-written struct is observed during a start/stop race.

2. **Start/stop operations (`memalloc_start`, `memalloc_stop`)**: Use local variables and single assignments when publishing the allocator struct to `global_memalloc_ctx.pymem_allocator_obj`. This ensures concurrent hook calls observe either the old or new struct, never a partially-written intermediate state.

The real root cause is that `PyMem_GetAllocator` is not documented as atomic, and the struct could be read field-by-field while being written to concurrently.  By using local copies and single assignments, we ensure atomicity at the C level and prevent observation of inconsistent state.

Co-authored-by: thomas.kowalski <thomas.kowalski@datadoghq.com>
emmettbutler pushed a commit that referenced this pull request May 6, 2026
…llocator (#17664)

## Description

This PR fixes a segmentation fault in the memory allocation profiler that occurs when a hook call races with `memalloc` start/stop operations. The issue arises from concurrent access to the saved allocator struct, which could be partially written while being read, resulting in`NULL` function pointers being dereferenced.  The key indicator in that case is that `#1 0x0000000000000000` frame -- we are trying to execute a null function pointer.

````
Error UnixSignal: Process terminated with SEGV_MAPERR (SIGSEGV)
#0   0x00007ff3c303a8d4  
#1   0x0000000000000000 memalloc_alloc (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/profiling/collector/_memalloc.cpp:68)
#2   0x00007ff39dcb3b20 memalloc_alloc (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/profiling/collector/_memalloc.cpp:68)
#3   0x00007ff39dcb3b20 memalloc_malloc(void*, unsigned long) (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/profiling/collector/_memalloc.cpp:80)
#4   0x00007ff3c3087e1b PyUnicode_New 
#5   0x00007ff3c30889f4  
#6   0x00007ff3c3170c84  
#7   0x00007ff3c316b931  
#8   0x00007ff3c31aaac8  
#9   0x00007ff3c31033ac  
#10  0x00007ff3c310e2a6 PyObject_CallMethodObjArgs 
#11  0x00007ff3c310e46d  
#12  0x00007ff3c31a96c2  
#13  0x00007ff3c3102fd7 PyObject_Vectorcall 
#14  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#15  0x00007ff3c323c094  
#16  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#17  0x00007ff3c323c094  
#18  0x00007ff3c30e997d PyObject_CallOneArg 
#19  0x00007ff3c306a480 _PyObject_GenericGetAttrWithDict 
#20  0x00007ff3c30c620d PyObject_GetAttr 
#21  0x00007ff3c32309e7 _PyEval_EvalFrameDefault 
#22  0x00007ff3c323c094  
#23  0x00007ff3c312880e  
#24  0x00007ff3c30e917c _PyObject_MakeTpCall 
#25  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#26  0x00007ff3c323c094  
#27  0x00007ff3c312880e  
#28  0x00007ff3c30e917c _PyObject_MakeTpCall 
#29  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#30  0x00007ff3c323c094  
#31  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#32  0x00007ff3c323c094  
#33  0x00007ff3c317d0fd  
#34  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#35  0x00007ff3c323c094  
#36  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#37  0x00007ff3c323c094  
#38  0x00007ff3c317d1b5  
#39  0x00007ff3c3102fd7 PyObject_Vectorcall 
#40  0x00007ff3c3232f4a _PyEval_EvalFrameDefault 
#41  0x00007ff3c3240da5  
#42  0x00007ff3c324112d  
#43  0x00007ff3c3233be1 _PyEval_EvalFrameDefault 
#44  0x00007ff3c323c094  
#45  0x00007ff3c317d1b5  
#46  0x00007ff3c3102fd7 PyObject_Vectorcall 
#47  0x00007ff3c3232f4a _PyEval_EvalFrameDefault 
#48  0x00007ff3c323c094  
#49  0x00007ff3c31033ac  
#50  0x00007ff3c310358d PyObject_CallFunctionObjArgs 
#51  0x00007ff3bf7eb91d WraptBoundFunctionWrapper_call (/project/src/wrapt/_wrappers.c:3750)
#52  0x00007ff3c3104055 _PyObject_Call 
#53  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#54  0x00007ff3c323c094  
#55  0x00007ff3c317d23c  
#56  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#57  0x00007ff3c323c094  
#58  0x00007ff3c310416f _PyObject_Call 
#59  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#60  0x00007ff3c3240da5  
#61  0x00007ff3c324112d  
#62  0x00007ff3c3233be1 _PyEval_EvalFrameDefault 
#63  0x00007ff3c323c094  
#64  0x00007ff3c317d1b5  
#65  0x00007ff3c3102fd7 PyObject_Vectorcall 
#66  0x00007ff3c3232f4a _PyEval_EvalFrameDefault 
#67  0x00007ff3c323c094  
#68  0x00007ff3c317d1b5  
#69  0x00007ff3c310416f _PyObject_Call 
#70  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#71  0x00007ff3c323c094  
#72  0x00007ff3c317d1b5  
#73  0x00007ff3c310416f _PyObject_Call 
#74  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#75  0x00007ff3c323c094  
#76  0x00007ff3c31033ac  
#77  0x00007ff3c310358d PyObject_CallFunctionObjArgs 
#78  0x00007ff3bf7eb91d WraptBoundFunctionWrapper_call (/project/src/wrapt/_wrappers.c:3750)
#79  0x00007ff3c30e917c _PyObject_MakeTpCall 
#80  0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#81  0x00007ff3c323c094  
#82  0x00007ff3c317d518  
#83  0x00007ff3c3155963  
#84  0x00007ff3c315393d  
#85  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#86  0x00007ff3c323c094  
#87  0x00007ff3c317d0fd  
#88  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#89  0x00007ff3c323c094  
#90  0x00007ff3c317d0fd  
#91  0x00007ff3c317d518  
#92  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#93  0x00007ff3c323c094  
#94  0x00007ff3c30e9371 _PyObject_FastCallDictTstate 
#95  0x00007ff3c30e958d _PyObject_Call_Prepend 
#96  0x00007ff3c3109150  
#97  0x00007ff3c3104055 _PyObject_Call 
#98  0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#99  0x00007ff3c323c094  
#100 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#101 0x00007ff3c30e958d _PyObject_Call_Prepend 
#102 0x00007ff3c3109150  
#103 0x00007ff3c30e917c _PyObject_MakeTpCall 
#104 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#105 0x00007ff3c323c094  
#106 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#107 0x00007ff3c30e958d _PyObject_Call_Prepend 
#108 0x00007ff3c3109150  
#109 0x00007ff3c30e917c _PyObject_MakeTpCall 
#110 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#111 0x00007ff3c323c094  
#112 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#113 0x00007ff3c30e958d _PyObject_Call_Prepend 
#114 0x00007ff3c3109150  
#115 0x00007ff3c30e917c _PyObject_MakeTpCall 
#116 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#117 0x00007ff3c323c094  
#118 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#119 0x00007ff3c30e958d _PyObject_Call_Prepend 
#120 0x00007ff3c3109150  
#121 0x00007ff3c30e917c _PyObject_MakeTpCall 
#122 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#123 0x00007ff3c323c094  
#124 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#125 0x00007ff3c30e958d _PyObject_Call_Prepend 
#126 0x00007ff3c3109150  
#127 0x00007ff3c30e917c _PyObject_MakeTpCall 
#128 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#129 0x00007ff3c323c094  
#130 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#131 0x00007ff3c30e958d _PyObject_Call_Prepend 
#132 0x00007ff3c3109150  
#133 0x00007ff3c30e917c _PyObject_MakeTpCall 
#134 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#135 0x00007ff3c323c094  
#136 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#137 0x00007ff3c30e958d _PyObject_Call_Prepend 
#138 0x00007ff3c3109150  
#139 0x00007ff3c30e917c _PyObject_MakeTpCall 
#140 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#141 0x00007ff3c323c094  
#142 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#143 0x00007ff3c30e958d _PyObject_Call_Prepend 
#144 0x00007ff3c3109150  
#145 0x00007ff3c30e917c _PyObject_MakeTpCall 
#146 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#147 0x00007ff3c323c094  
#148 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#149 0x00007ff3c30e958d _PyObject_Call_Prepend 
#150 0x00007ff3c3109150  
#151 0x00007ff3c30e917c _PyObject_MakeTpCall 
#152 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#153 0x00007ff3c323c094  
#154 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#155 0x00007ff3c30e958d _PyObject_Call_Prepend 
#156 0x00007ff3c3109150  
#157 0x00007ff3c30e917c _PyObject_MakeTpCall 
#158 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#159 0x00007ff3c323c094  
#160 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#161 0x00007ff3c30e958d _PyObject_Call_Prepend 
#162 0x00007ff3c3109150  
#163 0x00007ff3c30e917c _PyObject_MakeTpCall 
#164 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#165 0x00007ff3c323c094  
#166 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#167 0x00007ff3c30e958d _PyObject_Call_Prepend 
#168 0x00007ff3c3109150  
#169 0x00007ff3c30e917c _PyObject_MakeTpCall 
#170 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#171 0x00007ff3c323c094  
#172 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#173 0x00007ff3c30e958d _PyObject_Call_Prepend 
#174 0x00007ff3c3109150  
#175 0x00007ff3c30e917c _PyObject_MakeTpCall 
#176 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#177 0x00007ff3c323c094  
#178 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#179 0x00007ff3c30e958d _PyObject_Call_Prepend 
#180 0x00007ff3c3109150  
#181 0x00007ff3c30e917c _PyObject_MakeTpCall 
#182 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#183 0x00007ff3c323c094  
#184 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#185 0x00007ff3c30e958d _PyObject_Call_Prepend 
#186 0x00007ff3c3109150  
#187 0x00007ff3c30e917c _PyObject_MakeTpCall 
#188 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#189 0x00007ff3c323c094  
#190 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#191 0x00007ff3c323c094  
#192 0x00007ff3c30e92f1 _PyObject_FastCallDictTstate 
#193 0x00007ff3c30e958d _PyObject_Call_Prepend 
#194 0x00007ff3c3109150  
#195 0x00007ff3c30e917c _PyObject_MakeTpCall 
#196 0x00007ff3c32335a2 _PyEval_EvalFrameDefault 
#197 0x00007ff3c323c094  
#198 0x00007ff3c317d0fd  
#199 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#200 0x00007ff3c323c094  
#201 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#202 0x00007ff3c323c094  
#203 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#204 0x00007ff3c323c094  
#205 0x00007ff3c3233dd3 _PyEval_EvalFrameDefault 
#206 0x00007ff3c323c094  
#207 0x00007ff3c317d23c  
#208 0x00007ff3c31a7ec5  
#209 0x00007ff3c301ac77  
#210 0x00007ff3c357c573  
````

The fix implements two key changes.

1. **Hook functions (`memalloc_alloc`, `memalloc_realloc`)**: Snapshot the allocator struct locally before use and guard indirect function calls with `NULL` checks. This prevents crashes if a partially-written struct is observed during a start/stop race.

2. **Start/stop operations (`memalloc_start`, `memalloc_stop`)**: Use local variables and single assignments when publishing the allocator struct to `global_memalloc_ctx.pymem_allocator_obj`. This ensures concurrent hook calls observe either the old or new struct, never a partially-written intermediate state.

The real root cause is that `PyMem_GetAllocator` is not documented as atomic, and the struct could be read field-by-field while being written to concurrently.  By using local copies and single assignments, we ensure atomicity at the C level and prevent observation of inconsistent state.

Co-authored-by: thomas.kowalski <thomas.kowalski@datadoghq.com>
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Jun 9, 2026
## Description

This PR fixes crashes (caught by Crash Tracking) caused by IAST trying to access freed memories during interpreter shutdown. 

`is_text` supposedly safely access objects but can only check whether the addresses make sense, it can't know  whether objects have been freed.

https://github.com/DataDog/dd-trace-py/blob/160d0fb6ac1a4f8487825d35064acb7489392f7c/ddtrace/appsec/_iast/_taint_tracking/utils/string_utils.h#L50-L67

When uvloop's `run_until_complete` unwinds (`Py_XDECREF` in frames 57 to 61), custom type objects can be freed by the cyclic GC before all of their instances finish cleanup. An instance can still have a positive refcount, its pointer is non-null and aligned, `ob_type` is non-null, but the type it points to has been deallocated.

There's already a `shutting_down` guard in `get_tainted_object_map_from_pyobject`, but `get_tainted_object_map` calls `is_text(obj)` before reaching that.

https://github.com/DataDog/dd-trace-py/blob/c641709fa9381957ef114f605b5eadb3fa79ceeb/ddtrace/appsec/_iast/_taint_tracking/context/taint_engine_context.cpp#L140-L147

**Example crashing stack**

```
Error UnixSignal: Process terminated with SEGV_MAPERR (SIGSEGV)
#0   0x000061dcfbfffe73 PyType_IsSubtype (/usr/src/python/Objects/typeobject.c:2137)
#1   0x0000000000000000 PyObject_TypeCheck (/opt/python/cp312-cp312/include/python3.12/object.h:381)
#2   0x0000000000000000 is_text(_object const*) (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/utils/string_utils.h:66)
#3   0x0000000000000000 is_text(_object const*) (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/utils/string_utils.h:51)
#4   0x000074d4052a7de1 PyObject_TypeCheck (/opt/python/cp312-cp312/include/python3.12/object.h:381)
#5   0x000074d4052a7de1 is_text(_object const*) (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/utils/string_utils.h:66)
#6   0x000074d4052a7de1 is_text(_object const*) (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/utils/string_utils.h:51)
#7   0x000074d4052a7de1 TaintEngineContext::get_tainted_object_map(_object*) (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/context/taint_engine_context.cpp:141)
#8   0x0000000000000000 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (/opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:732)
#9   0x0000000000000000 std::__shared_ptr<absl::lts_20250127::node_hash_map<unsigned long, std::pair<long, std::shared_ptr<TaintedObject> >, absl::lts_20250127::hash_internal::Hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::pair<long, std::shared_ptr<TaintedObject> > > > >, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (/opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:1183)
#10  0x0000000000000000 std::shared_ptr<absl::lts_20250127::node_hash_map<unsigned long, std::pair<long, std::shared_ptr<TaintedObject> >, absl::lts_20250127::hash_internal::Hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::pair<long, std::shared_ptr<TaintedObject> > > > > >::~shared_ptr() (/opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:121)
#11  0x0000000000000000 operator() (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/context/taint_engine_context.cpp:357)
#12  0x0000000000000000 call_impl<bool, pyexport_taint_engine_context(pybind11::module&)::<lambda(pybind11::object)>&, 0, pybind11::detail::void_type> (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/_vendor/pybind11/include/pybind11/cast.h:2137)
#13  0x0000000000000000 call<bool, pybind11::detail::void_type, pyexport_taint_engine_context(pybind11::module&)::<lambda(pybind11::object)>&> (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/_vendor/pybind11/include/pybind11/cast.h:2105)
#14  0x0000000000000000 operator() (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/_vendor/pybind11/include/pybind11/pybind11.h:430)
#15  0x000074d4052a8a70 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (/opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:732)
#16  0x000074d4052a8a70 std::__shared_ptr<absl::lts_20250127::node_hash_map<unsigned long, std::pair<long, std::shared_ptr<TaintedObject> >, absl::lts_20250127::hash_internal::Hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::pair<long, std::shared_ptr<TaintedObject> > > > >, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (/opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:1183)
#17  0x000074d4052a8a70 std::shared_ptr<absl::lts_20250127::node_hash_map<unsigned long, std::pair<long, std::shared_ptr<TaintedObject> >, absl::lts_20250127::hash_internal::Hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::pair<long, std::shared_ptr<TaintedObject> > > > > >::~shared_ptr() (/opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:121)
#18  0x000074d4052a8a70 operator() (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/context/taint_engine_context.cpp:357)
#19  0x000074d4052a8a70 call_impl<bool, pyexport_taint_engine_context(pybind11::module&)::<lambda(pybind11::object)>&, 0, pybind11::detail::void_type> (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/_vendor/pybind11/include/pybind11/cast.h:2137)
#20  0x000074d4052a8a70 call<bool, pybind11::detail::void_type, pyexport_taint_engine_context(pybind11::module&)::<lambda(pybind11::object)>&> (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/_vendor/pybind11/include/pybind11/cast.h:2105)
#21  0x000074d4052a8a70 operator() (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/_vendor/pybind11/include/pybind11/pybind11.h:430)
#22  0x000074d4052a8a70 pybind11::cpp_function::initialize<pyexport_taint_engine_context(pybind11::module_&)::{lambda(pybind11::object)#5}, bool, pybind11::object, pybind11::name, pybind11::scope, pybind11::sibling>(pyexport_taint_engine_context(pybind11::module_&)::{lambda(pybind11::object&&)#5}, bool (*)(pybind11::object), pybind11::name const, pybind11::scope&, pybind11::sibling)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) [clone .cold] (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/_vendor/pybind11/include/pybind11/pybind11.h:400)
#23  0x000074d40525fd00 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (/go/src/github.com/DataDog/apm-reliability/dd-trace-py/ddtrace/appsec/_iast/_taint_tracking/_vendor/pybind11/include/pybind11/pybind11.h:1063)
#24  0x000061dcfbff87cf cfunction_call (/usr/src/python/Objects/methodobject.c:551)
#25  0x000061dcfbfe95de _PyObject_MakeTpCall (/usr/src/python/Objects/call.c:245)
#26  0x000061dcfc010357 _PyEval_EvalFrameDefault (/usr/src/python/Python/bytecodes.c:2719)
#27  0x000061dcfc0d8a91 _PyObject_VectorcallTstate (/usr/src/python/./Include/internal/pycore_call.h:93)
#28  0x000061dcfbfcd894 partial_vectorcall (/usr/src/python/./Modules/_functoolsmodule.c:269)
#29  0x000061dcfbfe9ad4 object_vacall (/usr/src/python/Objects/call.c:850)
#30  0x000061dcfc046d3e PyObject_CallFunctionObjArgs (/usr/src/python/Objects/call.c:961)
#31  0x000074d40e7d4d0a WraptBoundFunctionWrapper_call (/project/src/wrapt/_wrappers.c:3024)
#32  0x000061dcfbfe95de _PyObject_MakeTpCall (/usr/src/python/Objects/call.c:245)
#33  0x000061dcfc010357 _PyEval_EvalFrameDefault (/usr/src/python/Python/bytecodes.c:2719)
#34  0x000061dcfbfeca86 gen_send_ex2 (/usr/src/python/Objects/genobject.c:238)
#35  0x000074d40fdeadc7 task_step_impl (/usr/src/python/Modules/_asynciomodule.c:2869)
#36  0x000074d40fdeb5a2 task_step (/usr/src/python/Modules/_asynciomodule.c:3188)
#37  0x0000000000000000 __Pyx_PyObject_Call (/project/uvloop/loop.c:191431)
#38  0x000074d3f24978eb __Pyx_PyObject_Call (/project/uvloop/loop.c:191431)
#39  0x000074d3f24978eb __Pyx_PyObject_FastCallDict (/project/uvloop/loop.c:191552)
#40  0x000074d3f2573a69 __pyx_f_6uvloop_4loop_6Handle__run (/project/uvloop/loop.c:66873)
#41  0x000074d3f257796b __pyx_f_6uvloop_4loop_4Loop__on_idle (/project/uvloop/loop.c:17975)
#42  0x000074d3f2571e52 __pyx_f_6uvloop_4loop_6Handle__run (/project/uvloop/loop.c:66927)
#43  0x000074d3f2573c88 __pyx_f_6uvloop_4loop_cb_idle_callback (/project/uvloop/loop.c:87335)
#44  0x0000000000000000 uv__queue_empty (/project/build/libuv-x86_64/src/queue.h:33)
#45  0x000074d3f258f311 uv__queue_empty (/project/build/libuv-x86_64/src/queue.h:33)
#46  0x000074d3f258f311 uv__run_idle (/project/build/libuv-x86_64/src/unix/loop-watcher.c:68)
#47  0x000074d3f258c647 uv_run (/project/build/libuv-x86_64/src/unix/core.c:440)
#48  0x000074d3f24addb5 __pyx_f_6uvloop_4loop_4Loop__Loop__run (/project/uvloop/loop.c:18471)
#49  0x000074d3f2515e50 __pyx_f_6uvloop_4loop_4Loop__run (/project/uvloop/loop.c:18876)
#50  0x0000000000000000 __pyx_pf_6uvloop_4loop_4Loop_24run_forever (/project/uvloop/loop.c:31528)
#51  0x000074d3f2526cf0 __pyx_pf_6uvloop_4loop_4Loop_24run_forever (/project/uvloop/loop.c:31528)
#52  0x000074d3f2526cf0 __pyx_pw_6uvloop_4loop_4Loop_25run_forever (/project/uvloop/loop.c:31331)
#53  0x000061dcfbfea47c PyObject_VectorcallMethod (/usr/src/python/Objects/call.c:887)
#54  0x0000000000000000 Py_DECREF (/opt/_internal/cpython-3.12.11/include/python3.12/object.h:700)
#55  0x0000000000000000 Py_XDECREF (/opt/_internal/cpython-3.12.11/include/python3.12/object.h:798)
#56  0x000074d3f252ad60 Py_DECREF (/opt/_internal/cpython-3.12.11/include/python3.12/object.h:700)
#57  0x000074d3f252ad60 Py_XDECREF (/opt/_internal/cpython-3.12.11/include/python3.12/object.h:798)
#58  0x000074d3f252ad60 __pyx_pf_6uvloop_4loop_4Loop_44run_until_complete (/project/uvloop/loop.c:33769)
#59  0x0000000000000000 Py_XDECREF (/opt/_internal/cpython-3.12.11/include/python3.12/object.h:797)
#60  0x000074d3f252c591 Py_XDECREF (/opt/_internal/cpython-3.12.11/include/python3.12/object.h:797)
#61  0x000074d3f252c591 __pyx_pw_6uvloop_4loop_4Loop_45run_until_complete (/project/uvloop/loop.c:33322)
#62  0x000061dcfbfe98f8 PyObject_Vectorcall (/usr/src/python/Objects/call.c:325)
#63  0x000061dcfc010357 _PyEval_EvalFrameDefault (/usr/src/python/Python/bytecodes.c:2719)
#64  0x000061dcfbfeca86 gen_send_ex2 (/usr/src/python/Objects/genobject.c:238)
#65  0x000074d40fdeadc7 task_step_impl (/usr/src/python/Modules/_asynciomodule.c:2869)
#66  0x000074d40fdeb5a2 task_step (/usr/src/python/Modules/_asynciomodule.c:3188)
#67  0x000061dcfbfe95de _PyObject_MakeTpCall (/usr/src/python/Objects/call.c:245)
#68  0x000061dcfbf6c77a context_run (/usr/src/python/Python/context.c:668)
#69  0x000061dcfc05bbeb cfunction_vectorcall_FASTCALL_KEYWORDS (/usr/src/python/Objects/methodobject.c:439)
#70  0x000061dcfbf62275 _PyEval_EvalFrameDefault (/usr/src/python/Python/bytecodes.c:3227)
#71  0x000061dcfc093679 PyEval_EvalCode (/usr/src/python/Python/ceval.c:579)
#72  0x000061dcfc0b12bc run_eval_code_obj (/usr/src/python/Python/pythonrun.c:1723)
#73  0x000061dcfc0b1234 run_mod (/usr/src/python/Python/pythonrun.c:1744)
#74  0x000061dcfc0b0df1 pyrun_file (/usr/src/python/Python/pythonrun.c:1643)
#75  0x000061dcfc0b0c37 _PyRun_SimpleFileObject (/usr/src/python/Python/pythonrun.c:433)
#76  0x000061dcfc0b0a57 _PyRun_AnyFileObject (/usr/src/python/Python/pythonrun.c:78)
#77  0x000061dcfc0bafb0 Py_RunMain (/usr/src/python/Modules/main.c:713)
#78  0x000061dcfc0bab3d Py_BytesMain (/usr/src/python/Modules/main.c:768)
#79  0x000074d411000d90 __libc_start_call_main (sysdeps/nptl/libc_start_call_main.h:58)
#80  0x0000000000000000 call_init (csu/libc-start.c:128)
#81  0x000074d411000e40 call_init (csu/libc-start.c:128)
#82  0x000074d411000e40 __libc_start_main_alias_2 (csu/libc-start.c:379)
#83  0x000061dcfc033075 _start 
```

Co-authored-by: thomas.kowalski <thomas.kowalski@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants