Numpy alloc logging #301

thouis · 2012-06-08T17:33:19Z

A simpler version of the numpy allocation logging facility. This is pure C in the malloc/free/realloc functions, with a dynamic logging buffer that python code can fetch on demand.

It still lacks documentation for the C allocation and python logging-related functions, and a home in the namespace. numpy.testing?

First step to enabling allocation tracing for numpy arrays, this just exposes the functions without any logging implemented.

This adds logging of allocations via the PyDataMem_NEW/FREE/RENEW functions, controlled and accessed by the functions alloc_log_enable(), alloc_log_disable(), alloc_log_isenabled(), alloc_log_fetch(), and alloc_log_clear(). alloc_log_fetch() returns a list of tuples of (event_type, inptr, outptr, size) event_type is one of "malloc", "free", or "realloc" corresponding to calls of the form: outptr = malloc(size) free(inptr) outptr = realloc(inptr, size) Both alloc_fetch_log() and alloc_clear_log() clear the log. TODO: python function documentation TODO: place in numpy namespace somewhere other than multiarray (perhaps numpy.testing?)

thouis · 2012-06-08T17:39:42Z

Also needs some examples, but I have several ready.

njsmith · 2012-06-08T19:27:18Z

numpy/core/src/multiarray/multiarraymodule.c

This should probably call alloc_log_clear (so just return alloc_log_clear(NULL);)

I don't agree. The user may want to capture the log for a piece of code by enabling/disabling around it and fetching the log afterwards.

Then maybe alloc_log_enable should clear it. The point is that we don't want log entries to get mixed up between different logging sessions, since that would silently give misleading results.

Perhaps. I wonder if there are cases it makes more difficult, though. One example would be:

for i in range(100): alloc_log_enable() result = my_func() alloc_log_disable() other_computation(result) log = alloc_log_fetch() # analyze log for problems in my_func()

It would be a bit more difficult to do this if enabling cleared the log, but not impossible.

Perhaps enable/disable should take a keyword argument clear_log that defaults to True.

That could also just be written

log = [] for i in range(100): alloc_log_enable() result = my_func() alloc_log_disable() log += alloc_log_fetch() other_computation(result)

Slightly less efficient to keep appending to a list like that, but memory overhead always creates time overhead, no-one will be enabling this and then expecting maximum speed while it's turned on.

@cournape: Re; LD_PRELOAD: You really think it would be easier to add a new shared library (multiarray_alloc.so or something) containing just these functions and then arrange for it to be dynamically linked to multiarray.so on all supported platforms? IIUC that would require at least one of: screwing around with RPATH, installing the library into a systemwide shared directory, or using dlsym directly. And it still wouldn't work on Win32...

@thouis: I may not be understanding what you mean about complexity. IIUC, with the callback case, the data collection code looks like:

_heap_log = [] def record_heap_action(action, inptr, outptr, size): _heap_log.append((action, inptr, outptr, size, traceback.extract_stack())) numpy.something.set_malloc_callback(record_heap_action)

With the log version, it's something like

_heap_log = [] def poll_heap_log(): log = np.something.alloc_log_fetch() if log: stack = traceback.extract_stack() for entry in log: _heap_log.append(log + (stack,))

plus a bunch of arcane gunk to set up a trace function; I'd have to stare at the sys.settrace docs to work out how that would look. Plus this adds much more overhead (assuming most lines of code are not actually allocating any numpy arrays). So the callback version is significantly less complex on the user side, IMHO.

(I'm assuming that people will use some front-end tool that then produces reports based on these logs, or dumps kcachegrind-compatible output files, etc.)

There's still the question about re-entering Python from malloc/free call-sites, but really this should work. Every DECREF is already a potential Python re-entry point that might execute arbitrary code.

@njsmith: The complexity is in extracting the relevant parts of the stack trace, which, in the cases I've dealt with so far, is to to find the first line in the library you care about. In the callback case, this complexity is higher, because you have to walk the stack. In the log case, you only poll on the relevant lines. This is the trace function I'm using for profiling CellProfiler, for instance:

def trace_memory_usage(self, frame, event, arg): # We get line and return events just before they are executed. if self.path in frame.f_code.co_filename and \ event in ('line', 'return'): new_state = (frame.f_code.co_filename, frame.f_lineno) if self.state != new_state: meminfo = get_memory() if meminfo['allocations'] + meminfo['frees'] > 16: meminfo['lineinfo'] = self.state meminfo['time'] = time.time() self.trace.append(meminfo) self.state = new_state # continue tracing return self.trace_memory_usage

self.path is the path to the directory with the code being profiled. get_memory() fetches and condenses the log to something useful. The overhead is not particularly high, since it only triggers on lines within the relevant code, and if the log is empty, very little work is done. I'd estimate running np.test() while profiling numpy takes only about 1.2 times as long.

I expect most people could make use of a simple profiling class, sidestepping most of this. I have one based on https://github.com/fabianp/memory_profiler that I'll add to this PR in the tools directory when it's cleaned up.

That said, I'd be fine with either version, as an intended user of the code. I don't know if it's better to push the code to make sure it's not fragile (the callback version), or to not change assumptions about underlying behavior, even if incorrect (this version).

@njsmith: This is basically the complete script I'm using for profiling: https://gist.github.com/62ae8b026f82b45dc597

Figuring out the best granularity of analysis is best done at report time, not record time. I recommend playing with kcachegrind some time -- after you've "zoomed" in and out a few times on the fly, you'll never want to write that kind of stack walking code again.

And anyway, if you are going to do line-by-line tracing in any case, while turning the memory profiler on and off as you go, then you can do this just as easily with either the callback or logging APIs. (Reimplementing the logging API using the callback API is like 5 lines of code, most of them function signatures.) So the complexity of the callback version is never higher.

@njsmith: Good point. I think either version could produce that sort of output, with about the same amount of code.

njsmith · 2012-06-08T19:54:41Z

I think that's all my comments on the code itself, but you should take the design discussion to the list too. I actually think the Python-callback version was somewhat clearer, not sure what was wrong with it, but I don't have a strong opinion. But "do we want this", "is this the right API" are list questions, not code review questions.

njsmith · 2012-06-08T19:55:33Z

Oh right, and one other standard code review point: needs docs and tests before it could be merged.

This should be reverted if npysort is merged into the multiarray module build, as the PyDataMem_NEW/FREE functions will be available again in that case.

cournape · 2012-06-11T05:47:59Z

@njsmith re LD_PRELOAD, it is clearly overkill for the feature being discussed, but having this implemented would be useful for other interesting usecases. In general, I am worried about adding very specific API for usecases that can be supported with other tools on most platforms.

thouis · 2012-06-11T07:49:28Z

@cournape: I agree that "there should be one obvious way to do it" (to paraphrase), but I think there are factors that make numpy suitable for this feature:

numpy arrays use orders of magnitude more memory than most python objects, and this is often a limiting factor in algorithms relying on numpy.
numpy can behave in complicated ways with regards to memory management, e.g., views, OWNDATA, etc., making it sometimes difficult to know where memory usage problems are happening and why.
numpy attracts a large number of programmers that have only worked with high-level languages (python/R/Matlab/etc.), who don't have the skills (or time to acquire them) to use other tools, but still need to be able to diagnose these sorts of problems.
Other tools are not well integrated with Python.

I would also prefer a solution that wasn't specific to numpy. How about adding a PyDataMem_HOOK variable, NULL by default, with the same signature as _alloc_log_event, called on each malloc/free/realloc when non-NULL. I'm not sure if global variables and the PyArray_API stuff (which I still don't understand, completely) would support this, so perhaps there needs to be a PyDataMem_SET_HOOK() function.

If, on the other hand, the consensus is to eventually to add tracing facilities directly to numpy, I'd suggest that they spend a few releases behind a #ifdef gate, off by default, to limit their use to a more knowledgable subset of the numpy user base until such a time as support issues have been ironed out.

njsmith · 2012-06-14T15:30:22Z

@cournape: Maybe I'm not understanding what you're proposing for LD_PRELOAD, because the version I'm understanding (moving specifically these three functions into a shared library so that they're called by indirection through the linker) seems just as specific to this problem as the current PR does, while being absurdly more complex.

I'm not sure how useful I'd find this patch personally - I know enough to work out where copies are being made own my own, and would find a general Python memory profiler more useful - but it clearly has at least some user demand and is very low hanging fruit. (Unlike a real Python memory profiler, which will probably never happen.) And to the best of my knowledge there simply isn't any good way to match up memory usage to specific lines of Python code on any popular platform, esp. Linux (which I'm most familiar with). If you know one, I'd be honestly interested in hearing about it.

@thouis: That sounds like an excess of caution to me -- there's no magic about 'malloc', it's just a C function. Anyone who uses this API directly will have to know what they're doing, but that's just the nature of low-level debugging/profiling APIs. I know you ran into a number of weird issues when you started doing this, but AFAICT that's just because there was existing buggy C code; those issues could have started showing up on any change, really, and you just got unlucky.

thouis · 2012-06-14T16:22:27Z

@njsmith: Actually, I think just adding a hook function may be better than either of the other approaches. Logging doesn't support calling back into python (and so loses some immediacy), while Python callbacks may prevent effective debugging in multithreaded code or some large-scale or high-performance cases. Setting up a callback is pretty easy via Cython, so a hook facility is accessible to a fair number of numpy users, and examples of both logging and callback can be included with numpy or made available in other projects.

I'm also looking for consensus, and I don't know that either the logging or callback approaches can get it. It's early enough that I don't know that either should, but putting in a hook allows easy experimentation.

cournape · 2012-06-14T16:27:54Z

@njsmith As for LD_PRELOAD, I meant that having the mean to do this in general (private shared libraries) would be very beneficial outside this precise usecase. I am not arguing doing it just for this, that would be insane.

For memory profiling, there is a newcomer that may be of some interest to you: https://github.com/sjagoe/pikos

njsmith · 2012-06-14T16:55:41Z

@cournape OH, you mean, it would be good to patch the linker so that LD_PRELOAD was usable in more situations? Well, I guess I agree with that, but yeah, it doesn't seem very relevant here :-). Also, looking at the Pikos slides, it isn't a memory profiler at all -- it's just a front-end to existing tools, and the motivation for this patch in the first place is that there are no reliable existing tools.

@thouis: Sure, a C callback approach seems fine. It's even easy to write a C callback in pure Python using ctypes -- though unfortunately I guess there'd be no way to get at the hook setting function using ctypes, so you'd need a C compiler for that.

thouis · 2012-06-15T12:06:56Z

Closed in favor of #309

refactor: Remove unwanted intrinsics

thouis added 2 commits June 8, 2012 19:26

ENH: expose PyDataMem_NEW/FREE/RENEW as numpy API functions.

32a29ae

First step to enabling allocation tracing for numpy arrays, this just exposes the functions without any logging implemented.

njsmith reviewed Jun 8, 2012
View reviewed changes

thouis added 3 commits June 9, 2012 22:29

replace PyDataMem_NEW/FREE with NpySortArray_malloc/free in sort.c.src

58f0467

This should be reverted if npysort is merged into the multiarray module build, as the PyDataMem_NEW/FREE functions will be available again in that case.

Simplify log reporting code and remove incorrect DECREFS

6773f48

Correct signatures of PyDataMem_NEW/RENEW to return void *

a8c1530

thouis closed this Jun 15, 2012

thouis mentioned this pull request Jan 11, 2016

ENH: expose PyDataMem_NEW/FREE/RENEW as numpy API functions with an event hook. #309

Merged

luyahan pushed a commit to plctlab/numpy that referenced this pull request Apr 25, 2024

Merge pull request numpy#301 from howjmay/remove-func

a9d7d61

refactor: Remove unwanted intrinsics

Uh oh!

Numpy alloc logging #301

Numpy alloc logging #301

Uh oh!

Conversation

thouis commented Jun 8, 2012

Uh oh!

thouis commented Jun 8, 2012

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njsmith commented Jun 8, 2012

Uh oh!

njsmith commented Jun 8, 2012

Uh oh!

cournape commented Jun 11, 2012

Uh oh!

thouis commented Jun 11, 2012

Uh oh!

njsmith commented Jun 14, 2012

Uh oh!

thouis commented Jun 14, 2012

Uh oh!

cournape commented Jun 14, 2012

Uh oh!

njsmith commented Jun 14, 2012

Uh oh!

thouis commented Jun 15, 2012

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants