Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

protocols/grpc: fix CPU hotplug race condition#182

Merged
sboeuf merged 1 commit intokata-containers:masterfrom
devimc:cpu/fixRaceCondition
Apr 5, 2018
Merged

protocols/grpc: fix CPU hotplug race condition#182
sboeuf merged 1 commit intokata-containers:masterfrom
devimc:cpu/fixRaceCondition

Conversation

@devimc
Copy link
Copy Markdown

@devimc devimc commented Mar 21, 2018

With this patch the runtime will communicate to the agent the
number of vCPUs that were hot added, allowing to the agent online all vCPUs.
The agent will try to online all vCPUs 5 times, waiting 500 milliseconds
in each iteration, this is needed since when the runtime calls to QMP
device_add, QEMU doesn't allocate all vCPUs inmediatelly.

fixes #181

Signed-off-by: Julio Montes julio.montes@intel.com

grpc.go Outdated
sysfsOnlinePath: sysfsMemOnlinePath,
regexpPattern: memRegexpPattern,
},
var onlineCPUMemMux sync.Mutex
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a multiplexer with the naming ...Mux. Please call this onlineCPUMemLock

grpc.go Outdated
},
var onlineCPUMemMux sync.Mutex

const onlineCPUMemWaitTime = 500 * time.Millisecond
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a lot ! We cannot afford to wait for so much time between two retries. Something like 50ms seems more appropriate, and maybe increase the number of retries if you think that in some cases it might take some time.

grpc.go Outdated

func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) {
go onlineCPUMem()
go onlineCPUMem(req)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be spawned into a go routine. This is the caller responsibility to know if it should wait or not for the return of this function.
Also, if we look at this patch, we expect to be able to return a valid error in case we could not hotplug the amount of CPUs provided as parameter.


message OnlineCPUMemRequest {
// Specify the number of CPUs that were added
uint32 cpus = 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking that nb_cpus or something with number in it, might be more appropriate as the parameter here, since we're providing a number of CPUs.

grpc.go Outdated
regexpPattern: cpuRegexpPattern,
}

err := forceOnlineResources(cpuResource, req.Cpus, onlineCPUMaxTries)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having functions onlineCPUResources() and onlineMemResources() seems more appropriate to make the code more readable.

func onlineCPUResources(nbCPU int) error {
        resource := onlineResource{
		sysfsOnlinePath: sysfsCPUOnlinePath,
		regexpPattern:   cpuRegexpPattern,
	}

	var count uint32
	for i := uint32(0); i < onlineCPUMaxTries; i++ {
		r, err := onlineResources(resource)
		if err != nil {
			return err
		}
		count += r
		if count == nbCPU {
			return nil
		}
		time.Sleep(onlineCPUMemWaitTime)
	}

	return fmt.Errorf("only %d of %d were connected", count, expectedResources)
}

func onlineMemResources() error {
        resource := onlineResource{
		sysfsOnlinePath: sysfsMemOnlinePath,
		regexpPattern:   memRegexpPattern,
	}

        _, err := onlineResources(resource)
        return err
}

@jodh-intel
Copy link
Copy Markdown

when the runtime calls to QMP device_add, QEMU doesn't allocate all vCPUs inmediatelly.

I think this the piece of code you're referring to:

If so, I was about to say that QMP needs a transaction message, when -- guess what? -- I found that it already has!

Hence, we should be able to replace that loop with a call to something like:

{
  "execute": "transaction",                    
  "arguments": { "actions": [                  
    { 
      "type": "device_add",
      "data" : { 
        "id": "cpu-0", "socket-id": "...", "core-id": "...", "thread-id": "..." 
      }   
    },
    { 
      "type": "device_add",
      "data" : { 
        "id": "cpu-1", "socket-id": "...", "core-id": "...", "thread-id": "..." 
      }   
    },
    { 
      "type": "device_add",
      "data" : { 
        "id": "cpu-2", "socket-id": "...", "core-id": "...", "thread-id": "..." 
      }   
    } 
  ]                                            
}

It would be useful to have this feature in https://github.com/intel/govmm.

/cc @rarindam, @markdryan.

@markdryan
Copy link
Copy Markdown

@jodh-intel I'd never seen that transaction command before. I'd assumed it was new when you mentioned it but looking at the qmp docs it seems to have been around since 1.0.

The way we currently model QMP commands doesn't really lend itself to a generic transaction mechanism. Perhaps this is a sign that the API is wrong and should be changed. In the short term it should be possible to add a ExecuteAddDevices that adds multiple devices transactionally.

The other comment I would make is, would using transactions actually solve the race condition? Does qmp generate an event when adding a device? Perhaps ExecuteDeviceAdd just needs to wait for that. Modifying ExecuteDeviceAdd might be a little risky, but we could add a new blocking version of that function and migrate to it over time.

@devimc
Copy link
Copy Markdown
Author

devimc commented Mar 22, 2018

Hi @jodh-intel nice finding!
but I think transactional messages won't fix the race condition, since it only guarantee a failure if any operation fail, the problem with the race condition is that QEMU does not allocate and assign vCPUs intermediately, because it has to negotiate the resources with KVM.
For example, if 5 vCPUs are hot added to the VM, when onlineCPUMem is called, probably only 3 vCPUs have been allocated and assigned to the VM, that means the agent will online only 3 vCPUs, the other two vCPUs will be allocated and assigned later, but the agent won't be able to online them.

@devimc devimc force-pushed the cpu/fixRaceCondition branch from c96df19 to 07ea173 Compare March 22, 2018 22:27
@devimc
Copy link
Copy Markdown
Author

devimc commented Mar 22, 2018

@sboeuf changes applied, thanks


func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) {
go onlineCPUMem()
if !req.Wait {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, no need to specify this Wait through the spec, the caller could simply decide to run the call to OnlineCPUMem() into a go routine if needed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to have a fast boot time, the go routine should run inside the VM, otherwise the runtime fails with next error

rpc error: code = Unavailable desc = transport is closing.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense!

grpc.go Outdated
count++
}

if nbResources != -1 && count == uint32(nbResources) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if nbResources > 0 && count == uint32(nbResources) {

Copy link
Copy Markdown

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devimc this looks better, I have 2 more comments.

grpc.go Outdated

onlinePath := filepath.Join(resource.sysfsOnlinePath, file.Name(), "online")
if strings.Trim(string(status), "\n\t ") == "0" {
ioutil.WriteFile(onlinePath, []byte("1"), 0600)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably check the error here. What you should do I'm not sure. Perhaps simply continue. Looking and the old code no action was taken after this line, so although the error wasn't discarded, ignoring it was harmless. But with these changes I guess it's not.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's fixed correctly though. If you get one write error you'll give up entirely, i.e., the for loop in onlineCPUResources will terminate. Is this intentional?


var count uint32
for i := uint32(0); i < onlineCPUMaxTries; i++ {
r, err := onlineResources(resource, int32(nbCpus-count))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the directory contents of resource.sysfsOnlinePath change between invocations of this function? If not do you need to read the directory up to 10 times and check the online files 10 times. I'm suppose I'm thinking of a situation in which in the first invocation you online 4 out of 5 cpus. If I read the code correctly you'd then re-read the online files associated with those CPUs up to 9 more times.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In systems running low on resources, yes. Read the directory in each iteration is needed

@devimc devimc force-pushed the cpu/fixRaceCondition branch from 07ea173 to 7ee08e7 Compare March 23, 2018 14:51
sboeuf
sboeuf previously approved these changes Mar 23, 2018
Copy link
Copy Markdown

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @devimc

grpc.go Outdated
continue
if strings.Trim(string(status), "\n\t ") == "0" {
if err := ioutil.WriteFile(onlinePath, []byte("1"), 0600); err != nil {
return count, err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you get one write error you'll give up entirely, i.e., the for loop in onlineCPUResources will terminate. Is this intentional?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, would be better to log the error and continue

@devimc devimc force-pushed the cpu/fixRaceCondition branch from 7ee08e7 to 11486e5 Compare March 23, 2018 15:44
@sboeuf sboeuf dismissed their stale review March 23, 2018 15:54

PR changed

@devimc devimc force-pushed the cpu/fixRaceCondition branch from 11486e5 to fb015e0 Compare March 23, 2018 17:19
@devimc
Copy link
Copy Markdown
Author

devimc commented Mar 23, 2018

@markdryan @sboeuf PTAL

@devimc
Copy link
Copy Markdown
Author

devimc commented Mar 28, 2018

🙁

@sboeuf
Copy link
Copy Markdown

sboeuf commented Mar 29, 2018

@bergwolf @laijs PTAL

Copy link
Copy Markdown
Member

@bergwolf bergwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two nits.
lgtm

onlineCPUMemLock.Lock()
defer onlineCPUMemLock.Unlock()

if err := onlineCPUResources(req.NbCpus); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check req.NbCpus is larger than 0?

grpc.go Outdated

onlinePath := filepath.Join(resource.sysfsOnlinePath, file.Name(), "online")
ioutil.WriteFile(onlinePath, []byte("1"), 0600)
if nbResources > 0 && count == uint32(nbResources) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: move it to after count++. It only makes sense to check when count increases.

@devimc devimc force-pushed the cpu/fixRaceCondition branch from 3e417cb to 4c472c3 Compare April 3, 2018 13:56
@devimc
Copy link
Copy Markdown
Author

devimc commented Apr 3, 2018

@bergwolf changes applied , thanks

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2018

Codecov Report

Merging #182 into master will increase coverage by 2.93%.
The diff coverage is 82.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #182      +/-   ##
==========================================
+ Coverage   32.05%   34.98%   +2.93%     
==========================================
  Files          10       10              
  Lines        1482     1612     +130     
==========================================
+ Hits          475      564      +89     
- Misses        940      963      +23     
- Partials       67       85      +18
Impacted Files Coverage Δ
grpc.go 14.21% <82.35%> (+10.3%) ⬆️
network.go 46.24% <0%> (-0.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8199f6...49f01ed. Read the comment docs.

@devimc
Copy link
Copy Markdown
Author

devimc commented Apr 4, 2018

ping @kata-containers/agent

Copy link
Copy Markdown

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// CPU and Memory hotplug
const (
cpuRegexpPattern = "cpu[0-9]*"
memRegexpPattern = "memory[0-9]*"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: shouldn't these be strictly?:

cpuRegexpPattern = "cpu[0-9][0-9]*"
memRegexpPattern = "memory[0-9][0-9]*"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine as long as files called cpu or memory are never created in that directory.

Copy link
Copy Markdown

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devimc Only a few nits, but looks good !


func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) {
go onlineCPUMem()
if !req.Wait {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense!


message OnlineCPUMemRequest {
// Wait specifies if the caller waits for the agent to online all resources.
// If true the agent returns until all resources have been connected, else all
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/until/once/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/else/otherwise/

message OnlineCPUMemRequest {
// Wait specifies if the caller waits for the agent to online all resources.
// If true the agent returns until all resources have been connected, else all
// resources are connected asynchronously and the agent returns immediately
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/immediately/immediately./

// resources are connected asynchronously and the agent returns immediately
bool wait = 1;

// NbCpus Specifies the number of CPUs that were added and the agent has to online
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Specifies/specifies/
s/online/online./

@sboeuf
Copy link
Copy Markdown

sboeuf commented Apr 4, 2018

@markdryan could you give this a last review before we merge ?

grpc.go Outdated
const onlineCPUMaxTries = 10

// online resources, nbResources specifies the maximum number of resources to online.
// If nbResources is -1 then there is no limit, all resources are connected
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, there's no limit if nbResources==0 either.

@markdryan
Copy link
Copy Markdown

@sboeuf Looks good to me. I think my comments have been addressed. I made one small comment on a comment.

@sboeuf
Copy link
Copy Markdown

sboeuf commented Apr 4, 2018

@devimc please address both comments from @markdryan and myself, and we're good to merge this :)

With this patch the runtime will communicate to the agent the
number of vCPUs that were hot added, allowing to the agent online all vCPUs.
The agent will try to online all vCPUs 10 times, waiting 100 milliseconds
in each iteration, this is needed since when the runtime calls to QMP
`device_add`, QEMU doesn't allocate all vCPUs inmediatelly.

fixes kata-containers#181

Signed-off-by: Julio Montes <julio.montes@intel.com>
@devimc devimc force-pushed the cpu/fixRaceCondition branch from 4c472c3 to 49f01ed Compare April 5, 2018 12:36
@devimc
Copy link
Copy Markdown
Author

devimc commented Apr 5, 2018

@markdryan @sboeuf changes applied, thanks

@sboeuf
Copy link
Copy Markdown

sboeuf commented Apr 5, 2018

Thanks @devimc, let's merge it !

@sboeuf sboeuf merged commit c775672 into kata-containers:master Apr 5, 2018
@sboeuf sboeuf removed the review label Apr 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CPU hotplug race condition

5 participants