Avoid scanning the stack twice when collecting callstacks in Memprof. by stedolan · Pull Request #9279 · ocaml/ocaml

stedolan · 2020-02-03T10:59:53Z

The logic for collecting callstacks for Memprof is split into two passes: first, the stack is scanned to work out its length, a buffer is allocated, and then the stack is scanned again to fill it.

This is a de-optimisation: scanning the stack is much more expensive than reallocating and copying a buffer that turned out to be too short.

This patch does only one stack scan, using an off-heap buffer that's resized if necessary, and copying the result to a correctly-sized buffer afterwards. It makes a simple benchmark that spends ~30% of its time in Memprof goes ~10% faster, so this cuts off close to a third of the Memprof overhead. (Numbers will vary wildly according to stack depth and sampling rate).

There's a further optimisation: memprof.c also caches the last used buffer to save a couple of allocations, as the stack size at the last sample is a good predictor of the stack size at the next.

@jhjourdan: Is the distinction between capture_callstack_postponed and capture_callstack still needed, now that caml_alloc doesn't invoke async callbacks?

jhjourdan

I approve this PR, which seems correct to me. I have a few minor nitpicks, which should not block merging.

runtime/backtrace_byt.c

jhjourdan · 2020-02-03T13:24:57Z

runtime/backtrace_byt.c

+  intnat trace_pos = 0, trace_len = *plen;
+  value* trace = *pbuffer;


Caching trace_len and trace makes the code harder to read. Have you actually observed some performance benefit doing so?

The locals are there for correctness in the reallocation case:

trace_len *= 2; trace = caml_stat_resize_noexc(trace, trace_len * sizeof(value)); if (trace == NULL) break; *pbuffer = trace; *plen = trace_len;

If caml_stat_resize_noexc fails, the size has not changed and *plen should not be updated.

Right. But then I would have only a local at the allocation site.

I think it's cleaner to just introduce the locals where needed, I'll push a new version.

jhjourdan · 2020-02-03T13:26:50Z

runtime/backtrace_nat.c

+  intnat trace_pos = 0, trace_len = *plen;
+  value* trace = *pbuffer;


runtime/backtrace_byt.c

jhjourdan · 2020-02-03T13:32:33Z

runtime/backtrace_nat.c

  }

-  for (; trace_pos < trace_size; trace_pos++) {
+  while (trace_pos < max_frames) {


I think I prefer the while here!

runtime/memprof.c

Changes

jhjourdan · 2020-02-03T13:43:43Z

@jhjourdan: Is the distinction between capture_callstack_postponed and capture_callstack still needed, now that caml_alloc doesn't invoke async callbacks?

Yes, this is still needed. The reason is that caml_alloc_shr gives the additional guarantee that no memory is moved in the OCaml heap, and there is indeed code that calls it without declaring all the roots. Therefore, it is not possible to allocate in the minor heap in capture_callstack_postponed, since this function is called by caml_alloc_shr.

jhjourdan · 2020-02-03T13:44:57Z

I have a few minor nitpicks, which should not block merging.

Well, actually the comment about caml_shutdown is serious and should be addressed since it can lead to UB.

stedolan · 2020-02-05T10:50:31Z

I have a few minor nitpicks, which should not block merging.

Well, actually the comment about caml_shutdown is serious and should be addressed since it can lead to UB.

@jhjourdan does the last commit fix these?

jhjourdan

Looks good to me. Ready to merge.

gasche

Approved on Jacques-Henri's behalf.

Avoid scanning the stack twice when collecting callstacks in Memprof. (cherry picked from commit edee8ce)

…edee8ce) Avoid scanning the stack twice when collecting callstacks in Memprof.

Avoid scanning the stack twice when collecting callstacks in Memprof.

060f6e8

stedolan mentioned this pull request Feb 3, 2020

Statmemprof optimisation 1/3: optimise call stack collection #8731

Closed

jhjourdan approved these changes Feb 3, 2020

View reviewed changes

Review

20f6a17

stedolan force-pushed the optimise-callstacks-again branch from cdf5c0a to 20f6a17 Compare February 4, 2020 11:03

jhjourdan approved these changes Feb 5, 2020

View reviewed changes

gasche approved these changes Feb 5, 2020

View reviewed changes

gasche merged commit edee8ce into ocaml:trunk Feb 5, 2020

stedolan pushed a commit to janestreet/ocaml that referenced this pull request Mar 6, 2020

Merge pull request ocaml#9279 from stedolan/optimise-callstacks-again

056cc9f

Avoid scanning the stack twice when collecting callstacks in Memprof. (cherry picked from commit edee8ce)

stedolan pushed a commit to janestreet/ocaml that referenced this pull request Mar 17, 2020

ocaml#9279 from stedolan/optimise-callstacks-again (cherry-pick commit …

2aefe5c

…edee8ce) Avoid scanning the stack twice when collecting callstacks in Memprof.

mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Apr 7, 2020

ocaml#9279 from stedolan/optimise-callstacks-again (cherry-pick commit …

9f3e925

…edee8ce) Avoid scanning the stack twice when collecting callstacks in Memprof.

gasche mentioned this pull request Apr 18, 2020

Memprof: optimize random sampling #9466

Merged

		intnat trace_pos = 0, trace_len = *plen;
		value* trace = *pbuffer;

Conversation

stedolan commented Feb 3, 2020

Uh oh!

jhjourdan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jhjourdan Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

stedolan Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

jhjourdan Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

stedolan Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

jhjourdan Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jhjourdan Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

stedolan Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhjourdan commented Feb 3, 2020

Uh oh!

jhjourdan commented Feb 3, 2020

Uh oh!

stedolan commented Feb 5, 2020

Uh oh!

jhjourdan left a comment

Choose a reason for hiding this comment

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants