Skip to content

Memprof: provide the guarantee that an allocation callback is always run in the same thread the allocation takes place.#9449

Closed
jhjourdan wants to merge 4 commits intoocaml:trunkfrom
jhjourdan:memprof_thread_alloc_callback
Closed

Memprof: provide the guarantee that an allocation callback is always run in the same thread the allocation takes place.#9449
jhjourdan wants to merge 4 commits intoocaml:trunkfrom
jhjourdan:memprof_thread_alloc_callback

Conversation

@jhjourdan
Copy link
Copy Markdown
Contributor

This is done by maintaining a bag of pending allocation callbacks for each thread, in the form of an linked list.

We also change the behavior of Memprof.stop, so that it does no longer try to run pending callbacks. This is only effective in native mode, since function calls in bytecode mode will trigger polling and hence run pending callbacks.

This MR is rather large, but only one commit actually has non-trivial content. Reviewing should be easier by reading one commit after the other.

@jhjourdan jhjourdan force-pushed the memprof_thread_alloc_callback branch from e36c8b6 to 657cc57 Compare April 15, 2020 14:01
@jhjourdan
Copy link
Copy Markdown
Contributor Author

We also change the behavior of Memprof.stop, so that it does no longer try to run pending callbacks. This is only effective in native mode, since function calls in bytecode mode will trigger polling and hence run pending callbacks.

Perhaps I could say a bit more on that: first, this is something which have been discussed several times (including in #8920), and second, in a multithreaded setting, it makes little sense to try to handle the callbacks of the current thread while discarding the others.

@stedolan
Copy link
Copy Markdown
Contributor

Over in #8920, when responding to @gadmm, I said:

Making this guarantee can cause unbounded delays. Suppose an allocation is sampled on a thread in a C call (where the callback cannot run promptly), and the C call then enters a blocking section. It can be a very long time before the blocking section returns.

@jhjourdan has since pointed out that this already happens for single-threaded programs, for the same reason. So, making this guarantee will not in fact make anything worse, and it would be nice to be able to rely on it.

@jhjourdan jhjourdan force-pushed the memprof_thread_alloc_callback branch from 657cc57 to 1153486 Compare April 16, 2020 09:49
Copy link
Copy Markdown
Contributor

@gadmm gadmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is for the first three commits. Overall they look good to me but I have found two issues in the form of questions. I do the other three separately since you have updated them since I started the review.

@stedolan
Copy link
Copy Markdown
Contributor

@jhjourdan How important is it to get this into 4.11? I won't be able to review it tomorrow, but I could have a look on Monday if it's urgent.

(On the other hand, if you want this in 4.11 because you think the existing behaviour is broken, perhaps we could argue that this is a bugfix and spend a little longer reviewing it)

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 16, 2020

(Personally I think that rushing to review a complex PR a couple days before a deadline is an anti-pattern of collaborative software development.)

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Apr 16, 2020

Would you like to submit the first patch separately to have it in 4.11? And some of the simple refactorings if you want, they do not look risky.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

@stedolan : don't rush reviewing this for Monday. This is a rather complicated patch, for a feature which I think is relevant but not essential.

That said, I agree with @gadmm that we could consider merging the first commit before 4.11. But it is trivial and could be considered as a bugfix.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

I moved the commit about the fatal error in Thread.exit in #9458
This PR now depends on #9458.

@jhjourdan jhjourdan force-pushed the memprof_thread_alloc_callback branch from 6af745c to 4c40586 Compare April 18, 2020 13:31
@gadmm gadmm mentioned this pull request Apr 21, 2020
@jhjourdan jhjourdan force-pushed the memprof_thread_alloc_callback branch 2 times, most recently from 07c83d3 to 438493e Compare April 27, 2020 12:26
@jhjourdan jhjourdan force-pushed the memprof_thread_alloc_callback branch from 438493e to aa83f14 Compare May 20, 2020 17:05
@jhjourdan
Copy link
Copy Markdown
Contributor Author

Now that #9458 is merged, I rebased on top of trunk.

if (trackst.young_idx == i) trackst.young_idx = j;
if (trackst.promote_dealloc_callback_idx == i)
trackst.promote_dealloc_callback_idx = j;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how the next_alloc_callback_idx fields are maintained when everything shuffles around here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this is completely bogus. This is not because of next_alloc_callback_idx, but rather because of the idx_ptr pointers which are not updated accordingly.

@stedolan
Copy link
Copy Markdown
Contributor

stedolan commented Jun 2, 2020

I've started reading this, but honestly I find all of the idx_ptr manipulation and the mixture of linked lists and array structures very hard to follow. I think we might have the wrong data structure for this problem.

jhjourdan added 4 commits June 2, 2020 14:29
…run in the same thread the allocation takes place.

This is done by maintaining a bag of pending allocation callbacks for each thread, in the form of an linked list.
This is only effetive in native mode, since function calls in bytecode
mode will trigger polling and hence run pending callbacks.
@jhjourdan
Copy link
Copy Markdown
Contributor Author

Now that I look at what I have written again, I tend to agree with you :(.

We should probably redesign the data structures. Fortunately, if we get something where each thread allocates in its own data structure, then this will be easier to port to Multicore with good efficiency.

However, I know how much time we needed to spend on the current data structure, and I am afraid that this will take long too... Let me think more about this.

@jhjourdan
Copy link
Copy Markdown
Contributor Author

Closing this PR in favor of #9674.

@jhjourdan jhjourdan closed this Jun 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants