protocols/grpc: fix CPU hotplug race condition by devimc · Pull Request #182 · kata-containers/agent

devimc · 2018-03-21T21:18:02Z

With this patch the runtime will communicate to the agent the
number of vCPUs that were hot added, allowing to the agent online all vCPUs.
The agent will try to online all vCPUs 5 times, waiting 500 milliseconds
in each iteration, this is needed since when the runtime calls to QMP
device_add, QEMU doesn't allocate all vCPUs inmediatelly.

fixes #181

Signed-off-by: Julio Montes julio.montes@intel.com

sboeuf · 2018-03-21T22:17:33Z

grpc.go

-			sysfsOnlinePath: sysfsMemOnlinePath,
-			regexpPattern:   memRegexpPattern,
-		},
+var onlineCPUMemMux sync.Mutex


Looks like a multiplexer with the naming ...Mux. Please call this onlineCPUMemLock

sboeuf · 2018-03-21T22:20:18Z

grpc.go

-		},
+var onlineCPUMemMux sync.Mutex
+
+const onlineCPUMemWaitTime = 500 * time.Millisecond


This looks like a lot ! We cannot afford to wait for so much time between two retries. Something like 50ms seems more appropriate, and maybe increase the number of retries if you think that in some cases it might take some time.

sboeuf · 2018-03-21T22:27:45Z

grpc.go


 func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) {
-	go onlineCPUMem()
+	go onlineCPUMem(req)


I think this should not be spawned into a go routine. This is the caller responsibility to know if it should wait or not for the return of this function.
Also, if we look at this patch, we expect to be able to return a valid error in case we could not hotplug the amount of CPUs provided as parameter.

sboeuf · 2018-03-21T22:30:24Z

protocols/grpc/agent.proto


 message OnlineCPUMemRequest {
+	// Specify the number of CPUs that were added
+	uint32 cpus = 1;


Was thinking that nb_cpus or something with number in it, might be more appropriate as the parameter here, since we're providing a number of CPUs.

sboeuf · 2018-03-21T22:35:47Z

grpc.go

+		regexpPattern:   cpuRegexpPattern,
+	}
+
+	err := forceOnlineResources(cpuResource, req.Cpus, onlineCPUMaxTries)


Having functions onlineCPUResources() and onlineMemResources() seems more appropriate to make the code more readable.

func onlineCPUResources(nbCPU int) error { resource := onlineResource{ sysfsOnlinePath: sysfsCPUOnlinePath, regexpPattern: cpuRegexpPattern, } var count uint32 for i := uint32(0); i < onlineCPUMaxTries; i++ { r, err := onlineResources(resource) if err != nil { return err } count += r if count == nbCPU { return nil } time.Sleep(onlineCPUMemWaitTime) } return fmt.Errorf("only %d of %d were connected", count, expectedResources) } func onlineMemResources() error { resource := onlineResource{ sysfsOnlinePath: sysfsMemOnlinePath, regexpPattern: memRegexpPattern, } _, err := onlineResources(resource) return err }

jodh-intel · 2018-03-22T09:08:55Z

when the runtime calls to QMP device_add, QEMU doesn't allocate all vCPUs inmediatelly.

I think this the piece of code you're referring to:

https://github.com/kata-containers/runtime/blob/master/virtcontainers/qemu.go#L712..L736

If so, I was about to say that QMP needs a transaction message, when -- guess what? -- I found that it already has!

https://github.com/qemu/qemu/blob/master/qapi/transaction.json#L87..L151

Hence, we should be able to replace that loop with a call to something like:

{
  "execute": "transaction",                    
  "arguments": { "actions": [                  
    { 
      "type": "device_add",
      "data" : { 
        "id": "cpu-0", "socket-id": "...", "core-id": "...", "thread-id": "..." 
      }   
    },
    { 
      "type": "device_add",
      "data" : { 
        "id": "cpu-1", "socket-id": "...", "core-id": "...", "thread-id": "..." 
      }   
    },
    { 
      "type": "device_add",
      "data" : { 
        "id": "cpu-2", "socket-id": "...", "core-id": "...", "thread-id": "..." 
      }   
    } 
  ]                                            
}

It would be useful to have this feature in https://github.com/intel/govmm.

/cc @rarindam, @markdryan.

markdryan · 2018-03-22T15:01:34Z

@jodh-intel I'd never seen that transaction command before. I'd assumed it was new when you mentioned it but looking at the qmp docs it seems to have been around since 1.0.

The way we currently model QMP commands doesn't really lend itself to a generic transaction mechanism. Perhaps this is a sign that the API is wrong and should be changed. In the short term it should be possible to add a ExecuteAddDevices that adds multiple devices transactionally.

The other comment I would make is, would using transactions actually solve the race condition? Does qmp generate an event when adding a device? Perhaps ExecuteDeviceAdd just needs to wait for that. Modifying ExecuteDeviceAdd might be a little risky, but we could add a new blocking version of that function and migrate to it over time.

devimc · 2018-03-22T15:31:22Z

Hi @jodh-intel nice finding!
but I think transactional messages won't fix the race condition, since it only guarantee a failure if any operation fail, the problem with the race condition is that QEMU does not allocate and assign vCPUs intermediately, because it has to negotiate the resources with KVM.
For example, if 5 vCPUs are hot added to the VM, when onlineCPUMem is called, probably only 3 vCPUs have been allocated and assigned to the VM, that means the agent will online only 3 vCPUs, the other two vCPUs will be allocated and assigned later, but the agent won't be able to online them.

devimc · 2018-03-22T22:39:19Z

@sboeuf changes applied, thanks

sboeuf · 2018-03-23T01:23:14Z

grpc.go


 func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) {
-	go onlineCPUMem()
+	if !req.Wait {


IMO, no need to specify this Wait through the spec, the caller could simply decide to run the call to OnlineCPUMem() into a go routine if needed.

In order to have a fast boot time, the go routine should run inside the VM, otherwise the runtime fails with next error

rpc error: code = Unavailable desc = transport is closing.

Ok makes sense!

sboeuf · 2018-03-23T01:35:01Z

grpc.go

+			count++
+		}
+
+		if nbResources != -1 && count == uint32(nbResources) {


if nbResources > 0 && count == uint32(nbResources) {

sboeuf

@devimc this looks better, I have 2 more comments.

markdryan · 2018-03-23T10:34:29Z

grpc.go


-			onlinePath := filepath.Join(resource.sysfsOnlinePath, file.Name(), "online")
+		if strings.Trim(string(status), "\n\t ") == "0" {
 			ioutil.WriteFile(onlinePath, []byte("1"), 0600)


You should probably check the error here. What you should do I'm not sure. Perhaps simply continue. Looking and the old code no action was taken after this line, so although the error wasn't discarded, ignoring it was harmless. But with these changes I guess it's not.

fixed, thanks

I'm not sure it's fixed correctly though. If you get one write error you'll give up entirely, i.e., the for loop in onlineCPUResources will terminate. Is this intentional?

markdryan · 2018-03-23T10:42:26Z

grpc.go

+
+	var count uint32
+	for i := uint32(0); i < onlineCPUMaxTries; i++ {
+		r, err := onlineResources(resource, int32(nbCpus-count))


Do the directory contents of resource.sysfsOnlinePath change between invocations of this function? If not do you need to read the directory up to 10 times and check the online files 10 times. I'm suppose I'm thinking of a situation in which in the first invocation you online 4 out of 5 cpus. If I read the code correctly you'd then re-read the online files associated with those CPUs up to 9 more times.

In systems running low on resources, yes. Read the directory in each iteration is needed

sboeuf

Thanks @devimc

markdryan · 2018-03-23T15:38:41Z

grpc.go

-				continue
+		if strings.Trim(string(status), "\n\t ") == "0" {
+			if err := ioutil.WriteFile(onlinePath, []byte("1"), 0600); err != nil {
+				return count, err


If you get one write error you'll give up entirely, i.e., the for loop in onlineCPUResources will terminate. Is this intentional?

you're right, would be better to log the error and continue

PR changed

devimc · 2018-03-23T17:19:49Z

@markdryan @sboeuf PTAL

devimc · 2018-03-28T18:20:32Z

🙁

sboeuf · 2018-03-29T05:30:32Z

@bergwolf @laijs PTAL

bergwolf

Just two nits.
lgtm

bergwolf · 2018-03-29T07:53:26Z

grpc.go

+	onlineCPUMemLock.Lock()
+	defer onlineCPUMemLock.Unlock()
+
+	if err := onlineCPUResources(req.NbCpus); err != nil {


check req.NbCpus is larger than 0?

bergwolf · 2018-03-29T08:02:11Z

grpc.go


-			onlinePath := filepath.Join(resource.sysfsOnlinePath, file.Name(), "online")
-			ioutil.WriteFile(onlinePath, []byte("1"), 0600)
+		if nbResources > 0 && count == uint32(nbResources) {


nits: move it to after count++. It only makes sense to check when count increases.

devimc · 2018-04-03T13:57:22Z

@bergwolf changes applied , thanks

codecov · 2018-04-03T14:50:52Z

Codecov Report

Merging #182 into master will increase coverage by 2.93%.
The diff coverage is 82.35%.

@@            Coverage Diff             @@
##           master     #182      +/-   ##
==========================================
+ Coverage   32.05%   34.98%   +2.93%     
==========================================
  Files          10       10              
  Lines        1482     1612     +130     
==========================================
+ Hits          475      564      +89     
- Misses        940      963      +23     
- Partials       67       85      +18

Impacted Files	Coverage Δ
grpc.go	`14.21% <82.35%> (+10.3%)`	⬆️
network.go	`46.24% <0%> (-0.21%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8199f6...49f01ed. Read the comment docs.

devimc · 2018-04-04T13:57:32Z

ping @kata-containers/agent

jodh-intel

lgtm

jodh-intel · 2018-04-04T14:15:59Z

grpc.go

 // CPU and Memory hotplug
 const (
+	cpuRegexpPattern = "cpu[0-9]*"
+	memRegexpPattern = "memory[0-9]*"


Nit: shouldn't these be strictly?:

cpuRegexpPattern = "cpu[0-9][0-9]*" memRegexpPattern = "memory[0-9][0-9]*"

It's fine as long as files called cpu or memory are never created in that directory.

sboeuf

@devimc Only a few nits, but looks good !

sboeuf · 2018-04-04T17:20:20Z

grpc.go


 func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) {
-	go onlineCPUMem()
+	if !req.Wait {


Ok makes sense!

sboeuf · 2018-04-04T17:21:52Z

protocols/grpc/agent.proto


 message OnlineCPUMemRequest {
+	// Wait specifies if the caller waits for the agent to online all resources.
+	// If true the agent returns until all resources have been connected, else all


s/until/once/

s/else/otherwise/

sboeuf · 2018-04-04T17:22:59Z

protocols/grpc/agent.proto

 message OnlineCPUMemRequest {
+	// Wait specifies if the caller waits for the agent to online all resources.
+	// If true the agent returns until all resources have been connected, else all
+	// resources are connected asynchronously and the agent returns immediately


s/immediately/immediately./

sboeuf · 2018-04-04T17:23:57Z

protocols/grpc/agent.proto

+	// resources are connected asynchronously and the agent returns immediately
+	bool wait = 1;
+
+	// NbCpus Specifies the number of CPUs that were added and the agent has to online


s/Specifies/specifies/
s/online/online./

sboeuf · 2018-04-04T17:24:59Z

@markdryan could you give this a last review before we merge ?

markdryan · 2018-04-04T17:39:43Z

grpc.go

+const onlineCPUMaxTries = 10
+
+// online resources, nbResources specifies the maximum number of resources to online.
+// If nbResources is -1 then there is no limit, all resources are connected


Looking at the code, there's no limit if nbResources==0 either.

markdryan · 2018-04-04T17:40:35Z

@sboeuf Looks good to me. I think my comments have been addressed. I made one small comment on a comment.

sboeuf · 2018-04-04T18:02:31Z

@devimc please address both comments from @markdryan and myself, and we're good to merge this :)

With this patch the runtime will communicate to the agent the number of vCPUs that were hot added, allowing to the agent online all vCPUs. The agent will try to online all vCPUs 10 times, waiting 100 milliseconds in each iteration, this is needed since when the runtime calls to QMP `device_add`, QEMU doesn't allocate all vCPUs inmediatelly. fixes kata-containers#181 Signed-off-by: Julio Montes <julio.montes@intel.com>

devimc · 2018-04-05T12:36:36Z

@markdryan @sboeuf changes applied, thanks

sboeuf · 2018-04-05T16:04:34Z

Thanks @devimc, let's merge it !

devimc added the review label Mar 21, 2018

devimc mentioned this pull request Mar 21, 2018

virtcontainers: agent: fix CPU hot plug race condition kata-containers/runtime#91

Merged

sboeuf suggested changes Mar 21, 2018

View reviewed changes

devimc force-pushed the cpu/fixRaceCondition branch from c96df19 to 07ea173 Compare March 22, 2018 22:27

sboeuf reviewed Mar 23, 2018

View reviewed changes

sboeuf suggested changes Mar 23, 2018

View reviewed changes

markdryan reviewed Mar 23, 2018

View reviewed changes

devimc force-pushed the cpu/fixRaceCondition branch from 07ea173 to 7ee08e7 Compare March 23, 2018 14:51

sboeuf previously approved these changes Mar 23, 2018

View reviewed changes

markdryan reviewed Mar 23, 2018

View reviewed changes

devimc force-pushed the cpu/fixRaceCondition branch from 7ee08e7 to 11486e5 Compare March 23, 2018 15:44

devimc force-pushed the cpu/fixRaceCondition branch from 11486e5 to fb015e0 Compare March 23, 2018 17:19

devimc force-pushed the cpu/fixRaceCondition branch from fb015e0 to 3e417cb Compare March 23, 2018 21:37

devimc mentioned this pull request Mar 26, 2018

integration/docker: Fix CPU test kata-containers/tests#142

Merged

bergwolf approved these changes Mar 29, 2018

View reviewed changes

devimc force-pushed the cpu/fixRaceCondition branch from 3e417cb to 4c472c3 Compare April 3, 2018 13:56

jodh-intel approved these changes Apr 4, 2018

View reviewed changes

sboeuf approved these changes Apr 4, 2018

View reviewed changes

markdryan reviewed Apr 4, 2018

View reviewed changes

devimc force-pushed the cpu/fixRaceCondition branch from 4c472c3 to 49f01ed Compare April 5, 2018 12:36

sboeuf merged commit c775672 into kata-containers:master Apr 5, 2018

sboeuf removed the review label Apr 5, 2018

Conversation

devimc commented Mar 21, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jodh-intel commented Mar 22, 2018

Uh oh!

markdryan commented Mar 22, 2018

Uh oh!

devimc commented Mar 22, 2018

Uh oh!

devimc commented Mar 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sboeuf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sboeuf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devimc commented Mar 23, 2018

Uh oh!

devimc commented Mar 28, 2018

Uh oh!

sboeuf commented Mar 29, 2018

Uh oh!

bergwolf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devimc commented Apr 3, 2018

Uh oh!

codecov bot commented Apr 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

devimc commented Apr 4, 2018

Uh oh!

jodh-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sboeuf left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 3, 2018 •

edited

Loading