-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Description
containerd.WithTimeout doesn't work on containerd v2 client and returns immediately if the connection can't be established.
This is a consequence of moving from grpc.DialContext to grpc.NewClient and dropping the usage of WithBlock: 63b4688.
The former attempts the connection immediately, and the WithBlock blocks until the connection is ready.
As explained in https://github.com/grpc/grpc-go/blob/master/Documentation/anti-patterns.md - this is considered as an anti-pattern because one can't assume that the connection would still be available later.
The shift to grpc.NewClient makes the gRPC connection "lazy" - it isn't performed at the time the client is being created, it's deferred to a later time where an actual RPC is performed.
This makes the containerd.WithTimeout useless, because it only worked for the initial connection, and not all connections dialed throughout the whole lifecycle of the containerd client.
The timeout was first introduced in #2554 - however I'm not able to determine whether it was intentional that it only applies to the initial connection. This is where I need your input, because we have two options depending on this decision :)
- Deprecate/remove
WithTimeoutbecause there's no initial connection anymore - Pass the timeout to the dialer, so the timeout is applied to every dial attempt client: Respect
client.WithTimeoutoption #11508
On the other hand, there's also a question of whether the new behavior of client.New should produce such "lazily" initialized client, instead of verifying that at least the initial connection is performed successfully.
Looking at the godoc:
// New returns a new containerd client that is connected to the containerd
// instance provided by address
func New(address string, opts ...Opt) (*Client, error) {The that is connected to the containerd instance sounds like that function should at least make sure that there's a containerd instance listening on the provided address.
Currently, on the v2 client this behavior isn't really consistent, because it depends on whether the default namespace was specified or not (client.WithDefaultNamespace):
Lines 173 to 180 in c63d132
| // check namespace labels for default runtime | |
| if copts.defaultRuntime == "" && c.defaultns != "" { | |
| if label, err := c.GetLabel(context.Background(), defaults.DefaultRuntimeNSLabel); err != nil { | |
| return nil, err | |
| } else if label != "" { | |
| c.runtime = label | |
| } | |
| } |
If it was, then the client will attempt to query the daemon for the default runtime label which effectively "unlazies" the client.
- a PR to fix that is here: client/New: Don't unlazy the gRPC connection implicitly #11509)
We should decide whether the client.New should produce an "unlazied" connection and if so - encode that explicitly.
Steps to reproduce the issue
package main
import (
"context"
"testing"
"time"
"github.com/containerd/containerd"
containerdv2 "github.com/containerd/containerd/v2/client"
)
func TestContainerdV1(t *testing.T) {
start := time.Now()
defer func() {
s := time.Since(start)
t.Logf("took %s", s)
if s < time.Second*8 {
t.Fatal("returned earlier than expected")
}
}()
c8d, err := containerd.New("/run/containerd.sock",
containerd.WithTimeout(time.Second*10),
)
if err != nil {
t.Logf("new error: %s", err)
return
}
defer c8d.Close()
_, err = c8d.ListImages(context.Background())
if err != nil {
t.Logf("list error: %s", err)
return
}
}
func TestContainerdV2(t *testing.T) {
start := time.Now()
defer func() {
s := time.Since(start)
t.Logf("took %s", s)
if s < time.Second*8 {
t.Fatal("returned earlier than expected")
}
}()
c8d, err := containerdv2.New("/run/containerd.sock",
containerdv2.WithTimeout(time.Second*10),
)
if err != nil {
t.Logf("new error: %s", err)
return
}
defer c8d.Close()
_, err = c8d.ListImages(context.Background())
if err != nil {
t.Logf("list error: %s", err)
return
}
}Describe the results you received and expected
Got:
$ go test -v .
=== RUN TestContainerdV1
main_test.go:28: new error: failed to dial "/run/containerd.sock": context deadline exceeded: connection error: desc = "transport: error while dialing: dial unix:///run/containerd.sock: timeout"
main_test.go:18: took 10.002191797s
--- PASS: TestContainerdV1 (10.00s)
=== RUN TestContainerdV2
main_test.go:62: list error: connection error: desc = "transport: Error while dialing: dial unix:///run/containerd.sock: timeout"
main_test.go:44: took 1.003387459s
main_test.go:46: returned earlier than expected
--- FAIL: TestContainerdV2 (1.00s)
FAIL
FAIL github.com/vvoland/ctrdnewtimeout 11.018s
FAIL
Expected the second test to pass (retry the connection for at least 10 seconds).
What version of containerd are you using?
v2.0.3 and v1.7.26
Any other relevant information
No response
Show configuration if it is related to CRI plugin.
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status