Conversation
We will put them back if profiling shows it is really necessary.
|
Could you say a few words about how this work compares with Spacetime ? When would one use one or the other ? What are the advantages/disadvantages of one over the other ? |
|
I won't comment more on this PR, as I am biaised towards OCamlPro's Memory Profiler (ocp-memprof), but since you compared the two memory profilers in your talk, I will give my own comparison here:
FWIW, I am sad to see so much work (with this work and Spacetime) on building a competitor for ocp-memprof, whose license was really cheap and provided full access to the sources, and for which there is now little interest for OCamlPro to work on. Sometimes, it's better to pay a little to get a well-crafted tool with an efficient GUI than a plethora of unmaintained prototypes. |
|
@lefessan I think many people would be pleased if OCamlPro would consider contributing the code for storing a compressed version of the heap graph (as applied to Spacetime snapshots). Although I understand this might not be possible for commercial reasons. |
It's not a matter of commercial reasons, it's just that we just have no good reason to spend any more time (and thus, money) on this subject. And actually, technically, releasing this code would be useless without releasing the code to analyse them, i.e. all ocp-memprof. |
From a technical point of view, this is very different to Spacetime: spacetime not only profiles allocation it also constructs a call graph that can be used for debugging as well as for profiling purposes. On the other hand, this approach chooses at random only a few allocations, and gives the user the opportunity to gather a lot of information about the state of the program when allocating. This has two major advantages: first, this is much more lightweight, making it possible to enable it in production with almost no overhead. Second, because any user-chosen function can be called when sampling, it gives more flexibility to which information is gathered. |
This is true, but a little bit exaggerated: first, the memory overhead is, in fact, negligible if the sampling rate is low enough and second, the memory is not "leaked", because it is recovered as soon as the heap is freed. |
When configured with --statmemprof, there is still a use of VLAs in memprof.c, making the code non C90-compliant and hence not compatible with MSVC.
Do you have numbers ? For example on Coq, where backtraces might become large with some tactics ?
What I meant was, if the program is leaking memory, it will leak even more memory, since these leaked blocks are never freed, neither are the attached backtraces. |
Let us consider an extremely bad case : the captured callstacks have average length 10^4. Then, if we set the sampling rate to 10^-5 (with which you already get much statistical information), then the memory overhead is bounded by 10%. Given that, in practice, I expect callstacks to be much smaller (that would mean that the program is quite close to stack overflow, in which case the programmer has probably other problems in her head), I really think that the memory overhead is negligible. If you are still worried about you memory being filled with callstacks, you can still either compress them when sampling or limit their size. |
|
I experimented a bit with statmemprof. Here are some observations:
Things that might be interesting to know / do.
I am very enthusiastic about this work, and would love to see it reviewed and merged. I don't think the integration with the spacetime viewer is a pre-requisite. |
|
Nice! Thanks a lot @braibant. Two quick comments on low-hanging fruits/remarks:
At the Mirage retreat @hannesm played with statmemprof and had the same issue (bigarrays are heavily used in the Mirage codebase due to Cstruct), and @chambart helped him do a sort of ad-hoc hack to track bigarray allocations (adding instrumentation directly in the C implementation of bigarrays) -- if I understand correctly, in a separate ephemeron. It would be nicer to have built-in support for out-of-heap resources (and why not also the "virtual cost" API of custom values) built in statmemprof, which might emerge as an enhancement PR from @hannesm / @chambart's experiment, but I would not bet on that -- someone with the resource to do something complete for upstreaming may go quicker than this pair of already-overloaded programmers. (For Zarith, I would assume that the out-of-heap size is relatively small, and thus fairly proportional to the OCaml-side tracking? Do bignum get arbirary large in real-world computations, the way Bigarrays do?)
I know that @let-def has done some amusing experiments encoding type information in "the rest of the header" (what is typically used by ocp-memprof or spacetim), it might be possible to combine the two works in interesting ways! |
|
@jhjourdan In terms of getting this upstreamed, I think we need to break this down into its constituent parts, much like I'm doing for the gdb work. I believe there is sufficient consensus as to the overall aim here that we should merge individual parts of the work as they become ready, even if some parts have a few ragged edges and stubs, rather than doing everything strictly in dependency order. My previous experience has shown that the latter often leads to long delays. As discussed on caml-devel, JS are willing to devote resources to reviewing this from approximately the start of April onwards, and @let-def has kindly agreed to contribute some of his time as well. We can do the splitting up of the patch at that time if necessary. The rough idea so far is to concentrate on getting the core parts in for 4.09 and defer some of the more elaborate pieces for 4.10. There are two elaborate pieces that I have thought about so far:
@lpw25 and I have discussed what to do about the problem relating to backtraces that @braibant mentions. We think we have an approach that should be straightforward to implement without much code: add some functionality to retrieve the backtrace as a list of return addresses. These can then be put through the same mechanisms (which if I remember correctly use @let-def 's |
|
@braibant Thanks for the feedback!
I can indeed plan to support out-of heap support as an improvement of the current implementation. I do not think this is a difficult addition. It will require, however, some support from the corresponding C libraries. This could be as simple as requiring them to use
AFAIK, @let-def was already doing something in that direction.
I would say that this is something that can be done outside of the OCaml runtime, e.g., as part of statmemprof-emacs or similar tooling. It is easy to imagine that the sampling callback inspects e.g., the current time and the current status of the GC (via GC.quick_stat) to remember the date of birth of each block. |
There already is such a mechanism in this merge request, and this deferral mechanism is the source of much of the complication, since it requires a specific data structure for recording the deferred allocations. The handling of non-deffered allocations is actually rather simple. The only simplification of your proposal (deferring all C allocations) would be in the public interface of |
I agree that the Comballoc adds a large amount of complexity to the code. Perhaps a solution for a first version would be to have statmemprof activated only with a specific configure option, and this configure option would deactivate Comballoc? I don't have a clear understanding of the performance impact of deactivating Comballoc, though. @xavierleroy? |
Alright. Then I'll try to prepare smaller patches to review when I'll have a bit more time than now. |
|
@jhjourdan OK, I'm going to have to look more carefully at the details for the |
@chambart presented statmemprof at the MirageOS retreat in Marrakesh last week. He used a slightly modified statmemprof-emacs which includes 3 more numbers (roughly GC generation number): generation of first allocation, generation of last allocation, average generation. If the first and last are 0 and max, and average is more than the middle, the allocation may be a leak (worth to look for these) - I find these additions incredible helpful.
yes, this would be great to have integrated. I'm not sure whether it should be statistical as well, or taking into account every allocation. Some initial places where code needs to be hooked were identified (esp. for bigarray), I will see whether I can further develop something.
I adapted the -> Could the |
|
Converting a |
|
A substantially updated version of this work has now been merged. |
|
Thanks, @lpw25! |
This GPR implements a mechanism to statistically profile the heap used by an OCaml program. The sampling rate is tunable so that the overhead can be reduced as much as desired. More information about the general idea in this document.
The patch can divided in different parts:
This GPR is still a work in progress. I would be happy to receive general comments from the OCaml development team. I have also other concerns:
caml_alloc_shrfunction and its variants becomes a mess.caml_alloc_shr, allowing GC calls, which is called as often as possible. So here is my question: apart from the OCaml runtime system, do you think there is much code in the wild exploiting the fact thatcaml_alloc_shrdoes not call the GC? Would it be possible to remove this guarantee?