Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

protocols: client: Add timeout for hybrid vsock handshake#734

Merged
devimc merged 1 commit intokata-containers:masterfrom
jcvenegas:fix-372
Feb 20, 2020
Merged

protocols: client: Add timeout for hybrid vsock handshake#734
devimc merged 1 commit intokata-containers:masterfrom
jcvenegas:fix-372

Conversation

@jcvenegas
Copy link
Copy Markdown
Member

When the client tries to connect sometimes a race condition
could happen when the between bind and listen calls in the agent
vsock side.

This will block the hypervisor wanting for a response and
as consequence the agent client where it checks for an OK
response.

This case needs to be handled by the guest kernel, see
https://lore.kernel.org/netdev/668b0eda8823564cd604b1663dc53fbaece0cd4e.camel@intel.com/

As an extra protection make the agent client timeout if no OK response
is given. The response should be quick so is OK to wait a few seconds
and then timeout.

This also allow to return an error from the dialler function so retry
does not fallback on grpc retry making retries faster.

Fixes: #372

@jcvenegas jcvenegas requested review from devimc and sboeuf February 13, 2020 18:12
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 13, 2020

Codecov Report

Merging #734 into master will decrease coverage by 0.18%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master     #734      +/-   ##
==========================================
- Coverage   60.63%   60.45%   -0.19%     
==========================================
  Files          17       17              
  Lines        2558     2569      +11     
==========================================
+ Hits         1551     1553       +2     
- Misses        858      868      +10     
+ Partials      149      148       -1

@jcvenegas
Copy link
Copy Markdown
Member Author

kata-containers/runtime#2454 test of code in runtime

@jcvenegas
Copy link
Copy Markdown
Member Author

/test

@jcvenegas
Copy link
Copy Markdown
Member Author

@sboeuf @chavafg @amshinde @devimc @jodh-intel @grahamwhaley this PR is ready folks, please take a look

@chavafg
Copy link
Copy Markdown
Contributor

chavafg commented Feb 18, 2020

restarted CI jobs that failed.
The fedora one failed with

Stderr: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing

Copy link
Copy Markdown

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jcvenegas , I left some questions

agentClientLog.WithField("response", response).Debug("HybridVsock trivial handshake")

if strings.Contains(response, "OK") {
successfulConnectionChan <- true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse the errChan ? errChan <- nil

case <-successfulConnectionChan:
return conn, nil
case err = <-errChan:
conn.Close()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if errChan can be reused, here we need to check if err != nil , right?

When the client tries to connect sometimes a race condition
could happen when the between bind and listen calls in the agent
vsock side.

This will block the hypervisor wanting for a response and
as consequence the agent client where it checks for an OK
response.

This case needs to be handled by the guest kernel, see
https://lore.kernel.org/netdev/668b0eda8823564cd604b1663dc53fbaece0cd4e.camel@intel.com/

As an extra protection make the agent client timeout if no OK response
is given. The response should be quick so is OK to wait a few seconds
and then timeout.

This also allow to return an error from the dialler function so retry
does not fallback on grpc retry making retries faster.

Fixes: #372

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
@jcvenegas
Copy link
Copy Markdown
Member Author

/test-ubuntu

@jcvenegas
Copy link
Copy Markdown
Member Author

kata-containers/runtime#2454 CI passing with latest changes, @devimc PR updated

@devimc devimc merged commit d26a505 into kata-containers:master Feb 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uevent listening goroutine is not (always) working

4 participants