Memprof: provide the guarantee that an allocation callback is always run in the same thread the allocation takes place. Attempt II.#9674
Conversation
|
One last comment: the first commit of this PR is actually #9653, which I'd loved to be merged first. |
41b78e8 to
2171097
Compare
2171097 to
905874a
Compare
|
@stedolan : any chance you could tell me what you think about this? Of course, you don't need to do a thorough review, but I'd appreciate to know your opinion. |
15012b0 to
4af10f3
Compare
4af10f3 to
c43efd3
Compare
|
Would it be possible to get a review for this PR? I know the diff is large, but I tried hard to split it into several commits, and this PR simplify some aspects of the implementation of Memprof. I'd really like to see it merged for 4.12, and given that the diff is large, the review process may take some time. For the records, the first version of this PR, #9449, has been there since mid April, and did not receive much attention since then.. |
c43efd3 to
bd8073a
Compare
|
@stedolan would you be available for reviewing this PR? I'm afraid it will just drop into oblivion otherwise. |
|
@stedolan, anychance you will have some time to have a look at this before 4.12? |
|
Yes! Apologies for the slowness, it's on my list but I haven't found time yet. I'll have a look early next week. |
|
Thanks ! And sorry if you feel a bit harassed : I am just afraid we won't have time before 4.12! |
|
This looks good! I'm very glad to see the back of idx_ptr. I'm going to have a more careful read of the reentrancy logic now, though, because I think that remains the trickiest part. By the way, is there still a test of what happens when Thread.exit is called during a callback? |
|
While reading this I thought of a slightly simplification: since each This is a bit easier to explain in code than text. See stedolan@25563fb |
4418ffd to
5372a4a
Compare
Thanks for the suggestion. I merged your ideas in the relevant commits of the PR. |
5372a4a to
38c46e2
Compare
|
Thanks! I've written two more tests for the bits of this PR that I find trickiest. (see https://github.com/stedolan/ocaml/commits/memprof_thread_alloc_callback_2) One of them uncovers a segfault if we stop Memprof during a Memprof callback, mind having a look? |
2b87c38 to
6d34733
Compare
Thanks. I have now fixed this. The only issue with this PR is that the new test |
|
While the Flambda strangeness is being investigated, you can turn the test off in Flambda mode: |
As you prefer. But I did not want to disable a test which reveals an actual bug! Actually it depends on whether #9978 can be investigated within short time bounds. |
6d34733 to
54e829f
Compare
|
Thanks to the suggestion of @lthls, I fixed the test. CI should pass soon. |
|
The PR was approved and the CI is green. Should we merge the PR, or does @jhjourdan want to change the commit history? (I looked at the history, my impression is that squashing everything would be not-nice, in particular it would make bisecting an issue within the PR much harder. There is a small amount of back-and-forth in the history, but I think it would be acceptable to merge right away if @jhjourdan doesn't happen to be a git history control addict.) |
|
Ok, let me have a look at history then. |
This is only effetive in native mode, since function calls in bytecode mode will trigger polling and hence run pending callbacks.
…rray struct, which only contains information specific to the array.
…'s data structures. Reasons: - Better abstraction in memprof.h - Simpler Saving/restore functions - We introduce a current's thread context, so that we don't need to do a spacial case for the current thread
…e returned by the callback.
…run in the same thread the allocation takes place. This is done by using a local entry array for each thread, containing tracked blocks whose allocation callback has not yet been called. This allows some simplification in the code running callbacks for young allocations. Indeed, since the entry array is local to one thread, we know for sure that it cannot be modified during a callback, and therefore we no longer need to remember the indices of the corresponding new entries.
Instead, we use a thread-local variable [callback_status] which contains the index of the corresponding entry when a callback is running. We can do this since there can only be one running callback at the same time in a given thread. This lifts the restriction forbidding the call of Thread.exit from a memprof callback.
…ets killed by fork.
54e829f to
a6980b2
Compare
|
Alright, the history looks better now, I think. This is ready to merge and cherry-pick to 4.12. (I don't have the commit bit, though.) |
Memprof: provide the guarantee that an allocation callback is always run in the same thread the allocation takes place. Attempt II. (cherry picked from commit 0cb298f)
|
Cherry-picked in 4.12 branch as part of commit 2c85e04. |
This PR is a second attempt at implementing the feature proposed by #9449.
In this new version, I use a separate extensible array for each thread, which contains the entries for the tracked blocks for which the allocation callback has not yet been called. Another shared array contains the entries for which the allocation callback has been called.
In addition, this PR contains a few code improvements which are related to this new data structure. In particular, tracking of young blocks has been made much simpler, since we know for sure that the allocating thread is the only one allowed to modify the local entry array (and hence no concurrent modification can happen when handling the different allocations of the same combined block).
This PR also gets rid of the
idx_ptrpointers, which had relatively complex invariants. I am no completely sure that the replacement is better, though. Since this is in a separate commit, please tell me if we should of not use this.Taking advantage of the removal of
idx_ptr, this PR also lifts the restriction aboutThread.exitin Memprof callbacks. Same remark as above: this is in a separate optional commit.The diff of this PR is quite large, since it changes significantly the main data structure of Memprof. I tried to make independent commits to ease reviewing (only one commit should be non-trivial). Please review this one commit at a time.
Since this PR have more thread-local data structures, it should help rebasing on top of multicore. I'm happy to help.
Cc @stedolan, @gasche, @ctk21.