Make Gradle's memory management work appropriately in memory constrained docker containers draft#17628
Make Gradle's memory management work appropriately in memory constrained docker containers draft#17628Thrillpool wants to merge 3 commits into
Conversation
ghale
left a comment
There was a problem hiding this comment.
Thanks for the PR @Thrillpool! The general approach here seems sound and something we should incorporate. I agree that the detection mechanism feels a little bit creaky - let me know what you think about the approach I suggested.
f0ddd5a to
d5d06e2
Compare
|
This pull request has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be closed if no further activity occurs. If you intend to work on this pull request, please ask the team to reopen the PR or push a new PR. Thank you for your contributions. |
|
This pull request has been automatically closed due to inactivity. If you are still interested in contributing this, please ensure that it is rebased against the latest branch (usually |
|
@Thrillpool @pirateskipper Ping on this. It would be really helpful |
|
Thanks for this writeup @Thrillpool! We just encountered the same issue and added this patch ourselves. |
|
Reopening as this is still an issue and worth fixing. |
|
@Thrillpool I'd like to resurrect this PR. There was a DCO issue with the initial commits which might have been part of the reason why it languished. Can you resolve this, and if at all possible, can you see about getting it rebased onto current master? If rebasing is problematic, I can try to take a look at it when I get a chance. |
|
Hi @ghale sorry it took me so long. I will do this promptly, hope there is still an interest. I will be checking this every day for the next couple of weeks, hope we can get it resolved in that time. In a sense I would actually be quite excited to contribute to Gradle! Having now used it a lot, I think it's really an excellent piece of software, and actually that I was able to identify the issue myself in the source code I think is a testament to how well written it is... |
26ed7c4 to
73b5b2e
Compare
bantonsson
left a comment
There was a problem hiding this comment.
We have been using a similar change as this one, together with a smaller change (mentioned in #22977), running for months on both 7.5 and 7.6 with excellent results.
Before (gradle daemon crashed):

After (everything works fine):

|
@ghale For some testing done I made this repo https://github.com/Thrillpool/gradle-to-break-docker When I do
Get the above repo in and try to execute Conversely, if I update gradle-process-services jar to contain my changes, then if I make another instance of eclipse-temurin with the same memory constraint and use this modified gradle, the same command will not fail. Further, now with --debug one can see the workers getting cleaned up 'worker daemon(s) expired to free some memory'. Finally, I also built the same repo on my host (linux) machine with my patched version of Gradle and it went through OK, and actually I added some additional logging to validate the right memory manager was chosen. For some insight on what the repo I made to demo the bug is, it's basically just I artifically make the compilation use 500m, and I have a series of subprojects each with different jvm args such that workers can't be reused, and then I have max workers 1, so in theory app1, app2, app3, app4 run in order and workers get cleaned up as needed, but if they are not cleaned up, they occupy 2000m and so will get 137d by our 1g container. |
Signed-off-by: chinrank <thrillpool1@gmail.com>
Signed-off-by: chinrank <thrillpool1@gmail.com>
|
Thanks @Thrillpool, @bantonsson. This is in my queue, just haven't been able to shift my focus to it yet. It hasn't been forgotten - I'll look at it soon. |
|
This has been a long running issue for us that has got dramatically worse recently as we've upped the number of parallel tasks we run. We use circleci and gradle keeps spawning workers until all memory is exhausted due to using |
…ed Linux docker containers Supersedes #17628 Co-authored-by: Gary Hale <gary@gradle.com>
|
This has been merged as part of #23944. |
Resolves the issue described here #17610, in short, Gradle in a memory constrained docker context doesn't respect its memory limit which can lead to its host killing it, as well as general poor performance.
Presented in this PR is just a rough draft, the CGroupMemoryInfo code is tried and tested and confirmed to work very well in our CI pipeline, where we have in fact overwritten the MemInfoOsMemoryInfo classfile to have the logic of CGroupMemoryInfo.
Some thought will be needed to work out when to use this class, as some context, if the memory has not been constrained then the number that appears in /sys/fs/cgroup/memory/memory.limit_in_bytes is this magic max long of 9223372036854771712.
Context
I know from experience that many users have struggled with issues of CI pipelines having the docker daemon 'disappear', the underlying cause being the CI has killed the agent due to using too much memory. I suspect vew few people have identified the root cause, instead the memory of the agent has been increased.
This issue has been around a long time, the first time I can see someone hitting upon it is
#5102
which was never really resolved.
#7058 (comment) identified that cgroups should be used in some form, but refrained from doing it. The memory handling code linked alluded to in elasticsearch, https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/monitor/os/OsProbe.java looks more respectable than my attempt.
If it is determined such a change is not suitable, I would strongly ask that Gradle make this issue more explicit to people. I did not find the above issue when initially determining the problem, instead spending quite a lot of time reading the Gradle code and reasoning it myself. Documentation concerning problems in memory constrained docker containers would be very helpful.
Contributor Checklist
<subproject>/src/integTest) to verify changes from a user perspective<subproject>/src/test) to verify logic./gradlew sanityCheck./gradlew <changed-subproject>:quickTestGradle Core Team Checklist