Skip to content

CSI plugins sent incorrect authority headers during registration with kubelet #108254

@EricRnR

Description

@EricRnR

What happened?

When using the CSI driver node registration sidecar container, the kubelet-registration-path parameter is set to either unix:///path/to/unix.sock or /path/to/unix.sock. One or both of these options should cause kubelet to send a valid authority header to the socket. In the former case, kubelet will fail to find the file path since it will pass the unix header in the net.Dialer target to dial. In the latter, the dialer will succeed to call the container over the socket but send an incorrect :authority pseudo header (the /path/to/unix.sock).

What did you expect to happen?

A call into the CSI container with a valid authority header, using one or either kubelet-registration-path parameter.

How can we reproduce it (as minimally and precisely as possible)?

Deploy a CSI plugin example, setting kubelet-registration-path on the node-driver-registrar sidecar container to either a unix:///path/to/unix.sock or /path/to/unix.sock. Note, a plugin may not fail in the latter case if the CSI plugin is written in a language with an http2 library that does not strictly check the authority header, however the header will still be incorrect. If using Rust as a language, the h2 library will strictly check the authority header and return a protocol error. Go does not seem to reject the invalid authority header, which is perhaps why most plugins do not notice the issue.

Anything else we need to know?

Using /path/to/unix.sock will not have a 'unix:' header. It checks for this to substitute 'localhost' as the authority here, which will not happen in this case.

Using unix:///path/to/unix.sock will get the authority substituted, but will pass the full 'unix:' header in as part of the path file. Related code can be seen here (non-nil custom dialer) and here (newGrpcConn looks like it expects no unix header based on log entry and externally supplied dialcontext). This work may have been overlapping with related work in grpc-go here and plans here, where both libraries seem to be taking responsibility for managing the authority header for unix sockets now.

Kubelet logs using the unix header:

Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: I0219 14:41:11.454702  238934 csi_plugin.go:99] kubernetes.io/csi: Trying to validate a new CSI Driver with name: thin-sync-csi.racksandrails.com endpoint: unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock versions: 1.0.0
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: I0219 14:41:11.454787  238934 csi_plugin.go:112] kubernetes.io/csi: Register new plugin with name: thin-sync-csi.racksandrails.com at endpoint: unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: W0219 14:41:11.455279  238934 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins/csi-thinsync/csi.sock localhost 0xc0044710c0 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock: connect: no such file or directory". Reconnecting...
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: W0219 14:41:11.455476  238934 csi_client.go:184] Error calling CSI NodeGetInfo(): rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock: connect: no such file or directory"
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: E0219 14:41:11.478062  238934 goroutinemap.go:150] Operation for "/var/lib/kubelet/plugins_registry/thin-sync-csi.racksandrails.com-reg.sock" failed. No retries permitted until 2022-02-19 14:41:11.978009921 -0500 EST m=+3309.907550017 (durationBeforeRetry 500ms). Error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix unix:///var/lib/kubelet/plugins/csi-thinsync/csi.sock: connect: no such file or directory": rpc error: code = Unavailable desc = error reading from server: EOF
Feb 19 14:41:11 server001.ga.racksandrails.net kubelet[238934]: W0219 14:41:11.478162  238934 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {/var/lib/kubelet/plugins_registry/thin-sync-csi.racksandrails.com-reg.sock /var/lib/kubelet/plugins_registry/thin-sync-csi.racksandrails.com-reg.sock <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins_registry/thin-sync-csi.racksandrails.com-reg.sock: connect: connection refused". Reconnecting...

Of note: the inconsistent 'Error while dialing dial unix' entries: one showing the unix: header for the path, while the registration socket shows it without. Also, the 'localhost' replacement is visible in the logs for the first (unix:-prefixed) and not the second registration notification call (non-prefixed).

node registration sidecar logs when not using the unix header:

I0221 13:07:20.377752 1 main.go:167] Running node-driver-registrar in mode=registration
I0221 13:07:20.378223 1 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0221 13:07:20.378237 1 connection.go:154] Connecting to unix:///csi/csi.sock
I0221 13:07:20.378554 1 main.go:198] Calling CSI driver to discover driver name
I0221 13:07:20.378565 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0221 13:07:20.378568 1 connection.go:184] GRPC request: {}
I0221 13:07:20.380870 1 connection.go:186] GRPC response: {"name":"thin-sync-csi.racksandrails.com","vendor_version":"0.1"}
I0221 13:07:20.380918 1 connection.go:187] GRPC error: <nil>
I0221 13:07:20.380924 1 main.go:208] CSI driver name: "thin-sync-csi.racksandrails.com"
I0221 13:07:20.380954 1 node_register.go:53] Starting Registration Server at: /registration/thin-sync-csi.racksandrails.com-reg.sock
I0221 13:07:20.381066 1 node_register.go:62] Registration Server started at: /registration/thin-sync-csi.racksandrails.com-reg.sock
I0221 13:07:20.381106 1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0221 13:07:21.953021 1 main.go:102] Received GetInfo call: &InfoRequest{}
I0221 13:07:21.953276 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi-thinsync/registration"
I0221 13:07:21.961349 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR,}
E0221 13:07:21.961377 1 main.go:122] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR, restarting registration container.

Of note, kubelet notifies the container it received a protocol error. The CSI container rust logs show the matching protocol error and authority header value:

[2022-02-21T15:05:56Z DEBUG h2::server] malformed headers: malformed authority (b"/var/lib/kubelet/plugins/csi-thinsync/csi.sock"): invalid uri character
[2022-02-21T15:05:56Z DEBUG h2::codec::framed_read] received frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
[2022-02-21T15:05:56Z DEBUG h2::codec::framed_write] send frame=Reset { stream_id: StreamId(1), error_code: PROTOCOL_ERROR }

Kubernetes version

Details
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:30:48Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:32:02Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Details Bare metal

OS version

Details
# On Linux:
$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="35 (Server Edition)"
ID=fedora
VERSION_ID=35
VERSION_CODENAME=""
PLATFORM_ID="platform:f35"
PRETTY_NAME="Fedora Linux 35 (Server Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:35"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f35/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=35
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=35
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Server Edition"
VARIANT_ID=server

$ uname -a
Linux server001.ga.racksandrails.net 5.15.14-200.fc35.x86_64 #1 SMP Tue Jan 11 16:49:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Details

Container runtime (CRI) and and version (if applicable)

Details cri-o

Related plugins (CNI, CSI, ...) and versions (if applicable)

Details calico, metal-lb, bgp, applicable 1.23 versions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/storageCategorizes an issue or PR as relevant to SIG Storage.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions