virtcontainers/qemu: reduce memory footprint by devimc · Pull Request #296 · kata-containers/runtime

devimc · 2018-05-09T20:15:05Z

There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container.

Before this patch a container with 256MB of RAM

              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B

With this patch

              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B

fixes #295

Signed-off-by: Julio Montes julio.montes@intel.com

devimc · 2018-05-09T20:15:41Z

/cc @sboeuf @WeiZhang555 @amshinde @egernst @laijs

bergwolf · 2018-05-10T01:43:22Z

Nice findings! I wasn't aware the maxcpus limit can have so much impact on memory footprint! Thanks @devimc ! And it

LGTM!

egernst · 2018-05-10T03:38:15Z

LGTM, though I see the CI failures.

--- FAIL: TestQemuKernelParameters (0.00s)
qemu_test.go:50: Got: panic=1 initcall_debug nr_cpus=4 foo=foo bar=bar, Expecting: panic=1 initcall_debug foo=foo bar=bar

Perhaps the test needs updating?

gnawux · 2018-05-10T03:43:09Z

Awesome

Never noticed the impaction of maxcpus

WeiZhang555 · 2018-05-10T13:50:37Z

This is a great improvement!
One question, should we make this configurable ? Then the service provider can choose to make the VM smaller via a smaller maxcpus

ChristophSGR · 2018-05-10T22:13:18Z

Von meinem iPhone gesendet Am 10.05.2018 um 15:50 schrieb zhangwei_cs <notifications@github.com>: This is a great improvement! One question, should we make this configurable ? Then the service provider can choose to make the VM smaller via a smaller maxcpus — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

egernst · 2018-05-10T22:14:45Z

@WeiZhang555 @ChristophSGR - Yes I think we should. @devimc is OOO today -- I'm sure he can take a look tomorrow (unless someone wants to add a patch to this PR in the meantime!)

gnawux · 2018-05-11T02:54:08Z

I thought about this ("make this configurable") as well, but I'd like to accept this patch and have another patch on configurable things if @devimc do not have a slot on this.

egernst · 2018-05-11T03:09:15Z

@gnawux - sure this is fine. I think we shoudl get this in ASAP for sure. I want to wait until we can get the tests update (seems like a legit CI failure)

gnawux · 2018-05-11T03:16:35Z

@egernst agree

WeiZhang555 · 2018-05-11T03:37:23Z

virtcontainers/qemu_amd64.go

 // returns the maximum number of vCPUs supported
 func maxQemuVCPUs() uint32 {
-	return uint32(240)
+	return uint32(runtime.NumCPU())


There will be some problems if the physical machine has more than 240 physical CPUs, right?
I think we should choose MAX(runtime.NumCPU(), 240) here

WeiZhang555 · 2018-05-11T03:38:33Z

Sounds fair enough to merge this first, after fixing the CI problem.

devimc · 2018-05-11T13:38:08Z

CI is not happy because of Azure VM has 4 CPUs and CRIO tries to create a container with 5 vCPUs

# time="2018-05-11 13:06:20.495616418Z" level=error msg="Container creation error: Unable to hotplug 2 CPUs, currently this SB has 3 CPUs and the maximum amount of CPUs is 4

hence I need to make maxvcpus configurable in this patch

devimc · 2018-05-13T18:29:19Z

please take a look, all these changes are needed to reduce memory footprint and make CRIO happy

gnawux · 2018-05-13T18:32:52Z

cli/config.go

+		return uint32(numcpus)
+	}
+
+	if h.DefaultMaxVCPUs > int32(numcpus) {


do we need this comparison?

yes, in case of user value is greater than the actual number of physical CPUs, we need to check this values doesn't exceed the maximum number of vCPUs supported by QEMU/KVM.
Adding a comment in the code.
Thanks

ohhh wait.. docker and kubernetes don't allow you to create containers with a number of CPUs greater than the actual number of physical CPUs, hence we don't need this condition, thanks

$ docker run --cpus 10 -ti centos bash docker: Error response from daemon: Range of CPUs is from 0.01 to 8.00, as there are only 8 CPUs available.

egernst

A few queries here. Looks good in general, and like the additional commit.

egernst · 2018-05-13T20:47:54Z

virtcontainers/qemu.go

+	// to reach out max vCPUs
 	if currentVCPUs+amount > q.config.DefaultMaxVCPUs {
-		return fmt.Errorf("Unable to hotplug %d CPUs, currently this SB has %d CPUs and the maximum amount of CPUs is %d",
+		q.Logger().Warnf("Cannot to hotplug %d CPUs, currently this SB has %d CPUs and the maximum amount of CPUs is %d",


good catch! thanks

egernst · 2018-05-13T20:52:31Z

cli/config.go

 	KernelParams          string `toml:"kernel_params"`
 	MachineType           string `toml:"machine_type"`
 	DefaultVCPUs          int32  `toml:"default_vcpus"`
+	DefaultMaxVCPUs       int32  `toml:"default_maxvcpus"`


Why signed int? Is there a scenario where you'd have a negative number number of CPUs?

this is part of configuration file, if this value is <= 0 then the actual number of vCPUs will be used

my question is why negative needs to be supported? Seems == 0 should be enough here?

yep, it should be enough, do you prefer to have an uint32 here?

I used an int32 to support negative values and be similar to default_vcpus

egernst · 2018-05-13T20:52:50Z

cli/config.go

+	numcpus := goruntime.NumCPU()
+	maxvcpus := vc.MaxQemuVCPUs()
+
+	if h.DefaultMaxVCPUs <= 0 {


can the default be negative?

yes, same as DefaultVCPUs

jodh-intel · 2018-05-14T08:21:57Z

virtcontainers/container.go

-		return c.sandbox.agent.onlineCPUMem(uint32(vCPUs))
+		vcpusAdded, ok := data.(uint32)
+		if !ok {
+			return fmt.Errorf("Could not get the number of vCPUs added")


This error is used multiple times so you could create a global for it maybe?

var errVCPUs = errors.New("Could not get the number of vCPUs added")

Having said that, it might be more useful to add the container ID to the returned error (which would justify using fmt.Errorf() rather than just errors.New(), which is all that is currently required).

I'll include data in the error, thanks

jodh-intel · 2018-05-14T08:22:02Z

virtcontainers/container.go

 	}

-	vCPUs := utils.ConstraintsToVCPUs(c.config.Resources.CPUQuota, c.config.Resources.CPUPeriod)
+	// fetch current configuration


This block is almost identical to the one added in addResources(). Could you refactor to create a new function they can both call. Something like:

// if add is true, add the VCPUs in the container config, else remove them. func (c *Container) handleVCPUs(add bool) error

you're right, updating commit, thanks

codecov · 2018-05-14T14:34:01Z

Codecov Report

Merging #296 into master will decrease coverage by <.01%.
The diff coverage is 46.72%.

@@            Coverage Diff             @@
##           master     #296      +/-   ##
==========================================
- Coverage   64.35%   64.35%   -0.01%     
==========================================
  Files          86       86              
  Lines        8430     8469      +39     
==========================================
+ Hits         5425     5450      +25     
- Misses       2429     2441      +12     
- Partials      576      578       +2

Impacted Files	Coverage Δ
virtcontainers/hypervisor.go	`72.78% <ø> (ø)`	⬆️
virtcontainers/hyperstart_agent.go	`59.39% <ø> (+0.81%)`	⬆️
virtcontainers/pkg/oci/utils.go	`78.26% <0%> (+0.72%)`	⬆️
virtcontainers/mock_hypervisor.go	`100% <100%> (ø)`	⬆️
virtcontainers/qemu_amd64.go	`93.54% <100%> (ø)`	⬆️
virtcontainers/qemu_arch_base.go	`76.99% <100%> (ø)`	⬆️
virtcontainers/container.go	`47.5% <44.73%> (-0.57%)`	⬇️
virtcontainers/qemu.go	`12.02% <6.06%> (ø)`	⬆️
virtcontainers/sandbox.go	`67.35% <75%> (+0.05%)`	⬆️
cli/config.go	`88.5% <85.71%> (-0.34%)`	⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90e3ba6...4527a80. Read the comment docs.

egernst · 2018-05-14T17:42:09Z

cli/config.go

+		return uint32(numcpus)
+	}
+
+	return uint32(h.DefaultMaxVCPUs)


This logic is still a bit odd to me. What happens if default-provided is not greater than numcpus, but it is still greater than is greater than maxvcpus?
e.g., max-from-hypervisor is 100, number of physical cores is 120, but our configured default is 110?

Also I think you should s/QEMU/hypervisor?

e.g., max-from-hypervisor is 100, number of physical cores is 120, but our configured default is 110?

to avoid issues with KVM, the maximum number of CPUs supported by KVM is used, in this example 100

Also I think you should s/QEMU/hypervisor?

sure, I can do it, but in another PR, wdyt?

devimc · 2018-05-14T17:57:06Z

@egernst @jodh-intel @gnawux changes applied, thanks

devimc · 2018-05-14T19:27:23Z

@egernst changes applied

egernst · 2018-05-14T20:37:21Z

cli/config.go

+
+	// Don't exceed the maximum number of vCPUs supported by hypervisor
+	if h.DefaultMaxVCPUs >= maxvcpus {
+		return maxvcpus


what if the default is also greater than numcpus? I think what you want is:

reqVCPUS := h.DefaultMaxVCPUS //don't exceed the number of physical CPUs. If a default is not provided, use the // numbers of physical CPUs if reqVCPUS >= numcpus || reqVCPUS == 0 { reqVCPUS := numcpus } // Don't exceed the maximum number of vCPUs supported by hypervisor if (reqVCPUS > maxvcpus ) { return maxvcpus } return reqVCPUS

actually is the same, but your code looks better, updating pr, thanks

There is a relation between the maximum number of vCPUs and the memory footprint, if QEMU maxcpus option and kernel nr_cpus cmdline argument are big, then memory footprint is big, this issue only occurs if CPU hotplug support is enabled in the kernel, might be because of kernel needs to allocate resources to watch all sockets waiting for a CPU to be connected (ACPI event). For example ``` +---------------+-------------------------+ | | Memory Footprint (KB) | +---------------+-------------------------+ | NR_CPUS=240 | 186501 | +---------------+-------------------------+ | NR_CPUS=8 | 110684 | +---------------+-------------------------+ ``` In order to do not affect CPU hotplug and allow to users to have containers with the same number of physical CPUs, this patch tries to mitigate the big memory footprint by using the actual number of physical CPUs as the maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in the runtime configuration file, otherwise `default_maxvcpus` is used as the maximum number of vCPUs. Before this patch a container with 256MB of RAM ``` total used free shared buff/cache available Mem: 195M 40M 113M 26M 41M 112M Swap: 0B 0B 0B ``` With this patch ``` total used free shared buff/cache available Mem: 236M 11M 188M 26M 36M 186M Swap: 0B 0B 0B ``` fixes kata-containers#295 Signed-off-by: Julio Montes <julio.montes@intel.com>

Don't fail if a new container with a CPU constraint was added to a POD and no more vCPUs are available, instead apply the constraint and let kernel balance the resources. Signed-off-by: Julio Montes <julio.montes@intel.com>

devimc force-pushed the cpu/fixMemFootprint branch from 063f4f4 to ded5fad Compare May 9, 2018 20:16

bergwolf mentioned this pull request May 10, 2018

kata containers memory footprint #295

Closed

WeiZhang555 reviewed May 11, 2018

View reviewed changes

devimc force-pushed the cpu/fixMemFootprint branch from ded5fad to b958d95 Compare May 11, 2018 12:22

devimc added the do-not-merge label May 11, 2018

devimc force-pushed the cpu/fixMemFootprint branch 7 times, most recently from 5444864 to ff26a40 Compare May 13, 2018 17:32

devimc removed the do-not-merge label May 13, 2018

gnawux reviewed May 13, 2018

View reviewed changes

egernst suggested changes May 13, 2018

View reviewed changes

jodh-intel reviewed May 14, 2018

View reviewed changes

devimc force-pushed the cpu/fixMemFootprint branch from ff26a40 to 2023f3c Compare May 14, 2018 14:33

egernst suggested changes May 14, 2018

View reviewed changes

devimc force-pushed the cpu/fixMemFootprint branch from 2023f3c to 92b57f7 Compare May 14, 2018 19:23

devimc force-pushed the cpu/fixMemFootprint branch from 92b57f7 to e2db932 Compare May 14, 2018 20:25

egernst reviewed May 14, 2018

View reviewed changes

Julio Montes added 2 commits May 14, 2018 17:33

virtcontainers/qemu: honour CPU constrains

4527a80

Don't fail if a new container with a CPU constraint was added to a POD and no more vCPUs are available, instead apply the constraint and let kernel balance the resources. Signed-off-by: Julio Montes <julio.montes@intel.com>

devimc force-pushed the cpu/fixMemFootprint branch from e2db932 to 4527a80 Compare May 14, 2018 22:33

egernst approved these changes May 15, 2018

View reviewed changes

egernst merged commit 90fc7e6 into kata-containers:master May 15, 2018

devimc deleted the cpu/fixMemFootprint branch August 3, 2018 17:52

Conversation

devimc commented May 9, 2018

Uh oh!

devimc commented May 9, 2018

Uh oh!

bergwolf commented May 10, 2018 • edited by amshinde Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

egernst commented May 10, 2018 • edited by amshinde Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnawux commented May 10, 2018

Uh oh!

WeiZhang555 commented May 10, 2018

Uh oh!

ChristophSGR commented May 10, 2018 via email

Uh oh!

egernst commented May 10, 2018

Uh oh!

gnawux commented May 11, 2018

Uh oh!

egernst commented May 11, 2018

Uh oh!

gnawux commented May 11, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeiZhang555 commented May 11, 2018

Uh oh!

devimc commented May 11, 2018

Uh oh!

devimc commented May 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

egernst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devimc commented May 14, 2018

Uh oh!

devimc commented May 14, 2018

Uh oh!

bergwolf commented May 10, 2018 •

edited by amshinde

Loading

egernst commented May 10, 2018 •

edited by amshinde

Loading

codecov bot commented May 14, 2018 •

edited

Loading

egernst May 14, 2018 •

edited

Loading