[mini] Introduce job control to the JIT. Limits by active and duplicate jobs.#4116
Merged
kumpera merged 4 commits intomono:masterfrom Apr 11, 2017
Merged
[mini] Introduce job control to the JIT. Limits by active and duplicate jobs.#4116kumpera merged 4 commits intomono:masterfrom
kumpera merged 4 commits intomono:masterfrom
Conversation
Contributor
Author
ad6f925 to
28b6a6b
Compare
Contributor
Author
|
This breaks app domain unload pretty badly |
…te jobs. This shown up as a problem under parallel roslyn. We hit frequently the case of multiple threads compiling the same method in parallel and wasting all but one. Another issue that happens, but infrequently, is having more threads JITing than there are cores in the machine. Such thrashing doesn't help. Test setup. 4/8 macbook pro compiling corlib. Baseline: real 0m7.665s user 0m20.078s sys 0m2.653s Methods JITted using mono JIT : 22422 Total time spent JITting (sec) : 18.5365 With this patch: real 0m6.149s user 0m18.504s sys 0m1.487s Methods JITted using mono JIT : 16619 Total time spent JITting (sec) : 4.9420 New counters JIT compile waited others : 7681 JIT compile 1+ jobs : 1 JIT compile overload wait : 67 JIT compile spurious wakeups : 14469 This results in a 20% wall clock reduction but only a 8% reduction on user. We JIT 26% less methods, with very few duplications. Showing this drastically improves the situation. JIT compilation time metrics are bogus due to .cctors and other sources of interference. So take it with a grain of salt. Future work: Based on the new counters, it's clear that the current wakeup design is suboptimal and we could further improve it by cutting on spurious wakeups.
The first iteration of this code suffered from spurious wakeups due to using a single cond-var for all compilation. This introduce a per-job cond var and make threads wait on that. Spurious wakeups are quite problematic in heavily threaded code such as Roslyn. On average, every wait would get 2 extra wakeups for no good reason. With this change, on my laptop compilation times are the following: With spurious wakeups: real 0m6.134s user 0m23.588s sys 0m2.331s Without spurious wakeups: real 0m5.658s user 0m22.851s sys 0m1.694s This represents a 8% wallclock speedup and 3% user time speedup. As part of this change, the active threads accounting is gone as the number of waits due to cpu overload was close to zero.
Contributor
Author
|
Add better job wake control that got an additional 8% wallclock speedup. |
Contributor
Author
|
Failures are unrelated. |
picenka21
pushed a commit
to picenka21/runtime
that referenced
this pull request
Feb 18, 2022
[mini] Introduce job control to the JIT. Limits by active and duplicate jobs. Commit migrated from mono/mono@5548745
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This shown up as a problem under parallel roslyn.
We hit frequently the case of multiple threads compiling the same method in parallel
and wasting all but one.
Another issue that happens, but infrequently, is having more threads JITing than there
are cores in the machine. Such thrashing doesn't help.
Test setup. 4/8 macbook pro compiling corlib.
Baseline:
real 0m7.665s
user 0m20.078s
sys 0m2.653s
Methods JITted using mono JIT : 22422
Total time spent JITting (sec) : 18.5365
With this patch:
real 0m6.149s
user 0m18.504s
sys 0m1.487s
Methods JITted using mono JIT : 16619
Total time spent JITting (sec) : 4.9420
New counters
JIT compile waited others : 7681
JIT compile 1+ jobs : 1
JIT compile overload wait : 67
JIT compile spurious wakeups : 14469
This results in a 20% wall clock reduction but only a 8% reduction on user.
We JIT 26% less methods, with very few duplications. Showing this drastically improves the situation.
JIT compilation time metrics are bogus due to .cctors and other sources of interference. So take it with a grain of salt.
Future work:
Based on the new counters, it's clear that the current wakeup design is suboptimal and we could further improve it by cutting on spurious wakeups.