protocols/grpc: fix CPU hotplug race condition#182
protocols/grpc: fix CPU hotplug race condition#182sboeuf merged 1 commit intokata-containers:masterfrom
Conversation
grpc.go
Outdated
| sysfsOnlinePath: sysfsMemOnlinePath, | ||
| regexpPattern: memRegexpPattern, | ||
| }, | ||
| var onlineCPUMemMux sync.Mutex |
There was a problem hiding this comment.
Looks like a multiplexer with the naming ...Mux. Please call this onlineCPUMemLock
grpc.go
Outdated
| }, | ||
| var onlineCPUMemMux sync.Mutex | ||
|
|
||
| const onlineCPUMemWaitTime = 500 * time.Millisecond |
There was a problem hiding this comment.
This looks like a lot ! We cannot afford to wait for so much time between two retries. Something like 50ms seems more appropriate, and maybe increase the number of retries if you think that in some cases it might take some time.
grpc.go
Outdated
|
|
||
| func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) { | ||
| go onlineCPUMem() | ||
| go onlineCPUMem(req) |
There was a problem hiding this comment.
I think this should not be spawned into a go routine. This is the caller responsibility to know if it should wait or not for the return of this function.
Also, if we look at this patch, we expect to be able to return a valid error in case we could not hotplug the amount of CPUs provided as parameter.
protocols/grpc/agent.proto
Outdated
|
|
||
| message OnlineCPUMemRequest { | ||
| // Specify the number of CPUs that were added | ||
| uint32 cpus = 1; |
There was a problem hiding this comment.
Was thinking that nb_cpus or something with number in it, might be more appropriate as the parameter here, since we're providing a number of CPUs.
grpc.go
Outdated
| regexpPattern: cpuRegexpPattern, | ||
| } | ||
|
|
||
| err := forceOnlineResources(cpuResource, req.Cpus, onlineCPUMaxTries) |
There was a problem hiding this comment.
Having functions onlineCPUResources() and onlineMemResources() seems more appropriate to make the code more readable.
func onlineCPUResources(nbCPU int) error {
resource := onlineResource{
sysfsOnlinePath: sysfsCPUOnlinePath,
regexpPattern: cpuRegexpPattern,
}
var count uint32
for i := uint32(0); i < onlineCPUMaxTries; i++ {
r, err := onlineResources(resource)
if err != nil {
return err
}
count += r
if count == nbCPU {
return nil
}
time.Sleep(onlineCPUMemWaitTime)
}
return fmt.Errorf("only %d of %d were connected", count, expectedResources)
}
func onlineMemResources() error {
resource := onlineResource{
sysfsOnlinePath: sysfsMemOnlinePath,
regexpPattern: memRegexpPattern,
}
_, err := onlineResources(resource)
return err
}
I think this the piece of code you're referring to: If so, I was about to say that QMP needs a transaction message, when -- guess what? -- I found that it already has! Hence, we should be able to replace that loop with a call to something like: It would be useful to have this feature in https://github.com/intel/govmm. /cc @rarindam, @markdryan. |
|
@jodh-intel I'd never seen that transaction command before. I'd assumed it was new when you mentioned it but looking at the qmp docs it seems to have been around since 1.0. The way we currently model QMP commands doesn't really lend itself to a generic transaction mechanism. Perhaps this is a sign that the API is wrong and should be changed. In the short term it should be possible to add a ExecuteAddDevices that adds multiple devices transactionally. The other comment I would make is, would using transactions actually solve the race condition? Does qmp generate an event when adding a device? Perhaps ExecuteDeviceAdd just needs to wait for that. Modifying ExecuteDeviceAdd might be a little risky, but we could add a new blocking version of that function and migrate to it over time. |
|
Hi @jodh-intel nice finding! |
c96df19 to
07ea173
Compare
|
@sboeuf changes applied, thanks |
|
|
||
| func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) { | ||
| go onlineCPUMem() | ||
| if !req.Wait { |
There was a problem hiding this comment.
IMO, no need to specify this Wait through the spec, the caller could simply decide to run the call to OnlineCPUMem() into a go routine if needed.
There was a problem hiding this comment.
In order to have a fast boot time, the go routine should run inside the VM, otherwise the runtime fails with next error
rpc error: code = Unavailable desc = transport is closing.
grpc.go
Outdated
| count++ | ||
| } | ||
|
|
||
| if nbResources != -1 && count == uint32(nbResources) { |
There was a problem hiding this comment.
if nbResources > 0 && count == uint32(nbResources) {
grpc.go
Outdated
|
|
||
| onlinePath := filepath.Join(resource.sysfsOnlinePath, file.Name(), "online") | ||
| if strings.Trim(string(status), "\n\t ") == "0" { | ||
| ioutil.WriteFile(onlinePath, []byte("1"), 0600) |
There was a problem hiding this comment.
You should probably check the error here. What you should do I'm not sure. Perhaps simply continue. Looking and the old code no action was taken after this line, so although the error wasn't discarded, ignoring it was harmless. But with these changes I guess it's not.
There was a problem hiding this comment.
I'm not sure it's fixed correctly though. If you get one write error you'll give up entirely, i.e., the for loop in onlineCPUResources will terminate. Is this intentional?
|
|
||
| var count uint32 | ||
| for i := uint32(0); i < onlineCPUMaxTries; i++ { | ||
| r, err := onlineResources(resource, int32(nbCpus-count)) |
There was a problem hiding this comment.
Do the directory contents of resource.sysfsOnlinePath change between invocations of this function? If not do you need to read the directory up to 10 times and check the online files 10 times. I'm suppose I'm thinking of a situation in which in the first invocation you online 4 out of 5 cpus. If I read the code correctly you'd then re-read the online files associated with those CPUs up to 9 more times.
There was a problem hiding this comment.
In systems running low on resources, yes. Read the directory in each iteration is needed
07ea173 to
7ee08e7
Compare
grpc.go
Outdated
| continue | ||
| if strings.Trim(string(status), "\n\t ") == "0" { | ||
| if err := ioutil.WriteFile(onlinePath, []byte("1"), 0600); err != nil { | ||
| return count, err |
There was a problem hiding this comment.
If you get one write error you'll give up entirely, i.e., the for loop in onlineCPUResources will terminate. Is this intentional?
There was a problem hiding this comment.
you're right, would be better to log the error and continue
7ee08e7 to
11486e5
Compare
11486e5 to
fb015e0
Compare
|
@markdryan @sboeuf PTAL |
fb015e0 to
3e417cb
Compare
|
🙁 |
| onlineCPUMemLock.Lock() | ||
| defer onlineCPUMemLock.Unlock() | ||
|
|
||
| if err := onlineCPUResources(req.NbCpus); err != nil { |
There was a problem hiding this comment.
check req.NbCpus is larger than 0?
grpc.go
Outdated
|
|
||
| onlinePath := filepath.Join(resource.sysfsOnlinePath, file.Name(), "online") | ||
| ioutil.WriteFile(onlinePath, []byte("1"), 0600) | ||
| if nbResources > 0 && count == uint32(nbResources) { |
There was a problem hiding this comment.
nits: move it to after count++. It only makes sense to check when count increases.
3e417cb to
4c472c3
Compare
|
@bergwolf changes applied , thanks |
Codecov Report
@@ Coverage Diff @@
## master #182 +/- ##
==========================================
+ Coverage 32.05% 34.98% +2.93%
==========================================
Files 10 10
Lines 1482 1612 +130
==========================================
+ Hits 475 564 +89
- Misses 940 963 +23
- Partials 67 85 +18
Continue to review full report at Codecov.
|
|
ping @kata-containers/agent |
| // CPU and Memory hotplug | ||
| const ( | ||
| cpuRegexpPattern = "cpu[0-9]*" | ||
| memRegexpPattern = "memory[0-9]*" |
There was a problem hiding this comment.
Nit: shouldn't these be strictly?:
cpuRegexpPattern = "cpu[0-9][0-9]*"
memRegexpPattern = "memory[0-9][0-9]*"
There was a problem hiding this comment.
It's fine as long as files called cpu or memory are never created in that directory.
|
|
||
| func (a *agentGRPC) OnlineCPUMem(ctx context.Context, req *pb.OnlineCPUMemRequest) (*gpb.Empty, error) { | ||
| go onlineCPUMem() | ||
| if !req.Wait { |
protocols/grpc/agent.proto
Outdated
|
|
||
| message OnlineCPUMemRequest { | ||
| // Wait specifies if the caller waits for the agent to online all resources. | ||
| // If true the agent returns until all resources have been connected, else all |
protocols/grpc/agent.proto
Outdated
| message OnlineCPUMemRequest { | ||
| // Wait specifies if the caller waits for the agent to online all resources. | ||
| // If true the agent returns until all resources have been connected, else all | ||
| // resources are connected asynchronously and the agent returns immediately |
protocols/grpc/agent.proto
Outdated
| // resources are connected asynchronously and the agent returns immediately | ||
| bool wait = 1; | ||
|
|
||
| // NbCpus Specifies the number of CPUs that were added and the agent has to online |
There was a problem hiding this comment.
s/Specifies/specifies/
s/online/online./
|
@markdryan could you give this a last review before we merge ? |
grpc.go
Outdated
| const onlineCPUMaxTries = 10 | ||
|
|
||
| // online resources, nbResources specifies the maximum number of resources to online. | ||
| // If nbResources is -1 then there is no limit, all resources are connected |
There was a problem hiding this comment.
Looking at the code, there's no limit if nbResources==0 either.
|
@sboeuf Looks good to me. I think my comments have been addressed. I made one small comment on a comment. |
|
@devimc please address both comments from @markdryan and myself, and we're good to merge this :) |
With this patch the runtime will communicate to the agent the number of vCPUs that were hot added, allowing to the agent online all vCPUs. The agent will try to online all vCPUs 10 times, waiting 100 milliseconds in each iteration, this is needed since when the runtime calls to QMP `device_add`, QEMU doesn't allocate all vCPUs inmediatelly. fixes kata-containers#181 Signed-off-by: Julio Montes <julio.montes@intel.com>
4c472c3 to
49f01ed
Compare
|
@markdryan @sboeuf changes applied, thanks |
|
Thanks @devimc, let's merge it ! |
With this patch the runtime will communicate to the agent the
number of vCPUs that were hot added, allowing to the agent online all vCPUs.
The agent will try to online all vCPUs 5 times, waiting 500 milliseconds
in each iteration, this is needed since when the runtime calls to QMP
device_add, QEMU doesn't allocate all vCPUs inmediatelly.fixes #181
Signed-off-by: Julio Montes julio.montes@intel.com