Add data memory allocation tracing facilities. #284

thouis · 2012-05-21T17:21:12Z

Add numpy.core.multiarray.trace_data_allocations() function for
controlling tracing malloc/free/realloc for numpy data. This
functionality is exposed by #defing NPY_TRACE_DATA_MALLOC to 1 in
ndarraytypes.h (the default with this commit). When enabled,
PyDataMem_NEW/FREE/RENEW become actual functions, defined in
multiarraymodule.c, which call back into Python if callbacks are
registered with trace_data_allocations().

Also added tests, including handling python errors during a
callback,which I think should probably log errors but not actually
raise an exception.

Add numpy.core.multiarray.trace_data_allocations() function for controlling tracing malloc/free/realloc for numpy data. This functionality is exposed by #defing NPY_TRACE_DATA_MALLOC to 1 in ndarraytypes.h (the default with this commit). When enabled, PyDataMem_NEW/FREE/RENEW become actual functions, defined in multiarraymodule.c, which call back into Python if callbacks are registered with trace_data_allocations(). Also added tests, including handling python errors during a callback,which I think should probably log errors but not actually raise an exception.

thouis · 2012-05-21T17:24:04Z

Open question:

Which namespace should hold the trace_data_allocations() function?

The callback mechanism is pretty fast, not noticeably slower to run numpy.test() with tracing callbacks set to None than before this functionality. This should be quantified/verified, though.

teoliphant · 2012-05-21T18:42:20Z

This looks like a pretty useful facility. I don't remember seeing it advertised on the numpy-discussion list, though (but I might have missed it).

The trace_data_allocations function should probably be either in numpy.core or in a separate numpy.tools name-space. But, this is another good question for the mailing list.

thouis · 2012-05-21T19:34:03Z

The brief discussion is here: http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062373.html

Sorry if it was a bit rushed, I thought having something that worked in place to see what it would involve would be helpful.

…e squashed.

njsmith · 2012-05-22T09:04:46Z

numpy/core/src/multiarray/multiarraymodule.c

Probably slightly better to use PyErr_WriteUnraiseable here?

I'm not sure about that, as I think the information that should be printed is exactly what PyErr_PrintEx(0) produces. Is there some other reasno to prefer PyErr_WriteUnraiseable?

Just that PyErr_WriteUnraisable is defined by the Python API as the official thing to do in exactly this situation :-). I guess it might have trivial improvements too, like printing to sys.stderr even if that's reassigned.

It looks like one major difference is that PyErr_PrintEx calls into sys.excepthook. Perhaps that's what's wanted, to allow, e.g., pdb to be started on an exception in the tracing callbacks.

But if that's a good idea, then it's also a good idea for sys.excepthook to be called when there's an exception in a ctypes callback, a cython callback, a __del__ method, a weakrefs callback, etc. And who knows, maybe it is. But this just doesn't seem like a good place to be trying to second-guess and "fix" the Python interpreter's standard behavior.

Good point. In that case, I guess the callback should either generate an exception (and the malloc/free/realloc happens but doesn't return to the point above), or do as you say. I don't know which is better, so for now will go with the latter.

What should happen seems like another question for the mailing list, so I will check there, as well.

Actually, since most code is just treating the functions as macro wrappers for malloc(), they don't generally check PyErr_Occured after calling them, so it's almost surely best to just log and continue to limit the amount of code changed and avoid problems with 3rd party libraries.

Right, it's simply not possible to throw an exception out of functions like these. That's why the Python interpreter has a function for dealing with exceptions that are "Unraisable" :-).

Making these functions part of the API converts them to static, requiring the following changes: - npysort/sort.c.src: #define PyDataMem_NEW/FREE to malloc/free. - src/_compiled_base.c: changed one pair of calls to use PyArray_malloc/free. Also moved the trace_data_allocations() python function to numpy.core and modified it to return the previous callbacks.

thouis · 2012-05-23T20:59:47Z

This is still a WIP, especially since some tests fail if I run np.test() while tracing that don't fail when run outside of tracing:

import numpy as np

total = 0
szs = {}

def inc(ptr, size):
    global total
    szs[ptr] = size
    total += size
    if size > 100:
        print total

def sub(ptr):
    global total
    total -= szs[ptr]

def res(newptr, oldptr, newsize):
    global total
    total -= szs[oldptr]
    szs[newptr] = newsize
    total += newsize

np.core.trace_data_allocations(inc, sub, res)
np.test()
print "END", total

thouis · 2012-05-23T21:19:36Z

My theory is that some tests are failing due to refcounts being invalid when the new functions are called, and the gc being triggered or objects otherwise being freed.

One test that regularly fails is test_regression.py:test_refcount_vectorize().

thouis · 2012-05-24T13:46:13Z

Some tests are still failing when run under tracing, though not consistently. Valgrind is not producing showing me anything within them in particular, but disabling gc completely seems to prevent them from failing.

thouis · 2012-05-24T15:10:03Z

scipy.test() also has some errors that only occur while tracing.

I'm going to set this aside for a few days, and perhaps write a simpler version that doesn't call back into Python. Instead, it can expose the malloc/free counts/bytes via a function, and track them in pure C code. Then the same approach as https://github.com/fabianp/memory_profiler can be used to track allocation line-by-line and most of the complexity goes away.

thouis · 2012-06-05T10:06:24Z

All of the failing tests seem to have been caused by the buffer copy bug, fixed in https://github.com/mwiebe/numpy/tree/nditer_buffer_flag (but not yet pulled into numpy).

I also have a version that implements tracing, with pure C in the allocation functions writing to a dynamically allocated buffer, which must then be fetched proactively by Python. However, I think this version is a little nicer to use from the Python perspective.

charris · 2012-06-05T17:00:35Z

I'd think the PyDataMem_* memory allocations/deallocations should remain completely independent of Python, I suspect that is why they are there.

thouis · 2012-06-06T08:08:23Z

Thinking more about this, the lower-level version is probably just as easy to use, and avoids the potential issues with GC and python allocation, so I'll close this PR and open a new one when it's ready.

feat: Add vqdmulh[q]_[s16|s32]

Fix some errors with previous commit (pushed from old repo). Should b…

3d1732d

…e squashed.

njsmith reviewed May 22, 2012
View reviewed changes

thouis added 2 commits May 23, 2012 22:25

added tests for return value from trace_data_allocations() (SQASHME)

fc67bfa

Pause GC during callback

ee7f3c1

thouis closed this Jun 6, 2012

thouis mentioned this pull request Jan 11, 2016

ENH: expose PyDataMem_NEW/FREE/RENEW as numpy API functions with an event hook. #309

Merged

luyahan pushed a commit to plctlab/numpy that referenced this pull request Apr 25, 2024

Merge pull request numpy#284 from howjmay/vqdmulh

46ba7f3

feat: Add vqdmulh[q]_[s16|s32]

Uh oh!

Add data memory allocation tracing facilities. #284

Add data memory allocation tracing facilities. #284

Uh oh!

Conversation

thouis commented May 21, 2012

Uh oh!

thouis commented May 21, 2012

Uh oh!

teoliphant commented May 21, 2012

Uh oh!

thouis commented May 21, 2012

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thouis commented May 23, 2012

Uh oh!

thouis commented May 23, 2012

Uh oh!

thouis commented May 24, 2012

Uh oh!

thouis commented May 24, 2012

Uh oh!

thouis commented Jun 5, 2012

Uh oh!

charris commented Jun 5, 2012

Uh oh!

thouis commented Jun 6, 2012

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants