nvmeof: add examples and deployment files#5641
Conversation
|
/test ci/centos/mini-e2e/k8s-1.34 |
1 similar comment
|
/test ci/centos/mini-e2e/k8s-1.34 |
|
/test ci/centos/mini-e2e/k8s-1.34 |
4d5676f to
0298e90
Compare
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
|
/test ci/centos/mini-e2e/k8s-1.33 |
0298e90 to
dd3d82d
Compare
|
/test ci/centos/mini-e2e/k8s-1.34 |
|
/test ci/centos/mini-e2e/k8s-1.33/nvmeof |
|
/test ci/centos/mini-e2e/k8s-1.32 |
|
Deploying the NVMe-oF gateway fails because the Log from the init-container that constructs the configuration and announces the gateway in the Ceph environment: |
|
/test ci/centos/mini-e2e/k8s-1.34 |
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
2 similar comments
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
19de06a to
be0d918
Compare
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
1 similar comment
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
394858c to
77aacdb
Compare
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
77aacdb to
72e0e02
Compare
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
1 similar comment
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
14a2ccf to
7af1a69
Compare
|
/test ci/centos/mini-e2e/k8s-1.34/nvmeof |
gadididi
left a comment
There was a problem hiding this comment.
LGTM!
small nits\questions
| listeners: | | ||
| [{ | ||
| "address": "10.242.64.32", | ||
| "port": "4420", | ||
| "hostname": "cephnvme-vm9" | ||
| }] |
There was a problem hiding this comment.
maybe we should write under comment the alternative way (AutoListener. and write it will be supported from ceph tentacle?
UPDATE:
Aviv does not want to release the manually option at all, because he does not want users will start use this (uneasy) approach..
There was a problem hiding this comment.
Autolisteners are not available yet, right? Needs a newer Ceph version, new gateway and updated Ceph-CSI.
This is what we have now, and it works for Ceph clusters that run their gateways outside of the Kubernetes cluster.
We can update the examples later on, if we know when or what version combination works better.
But hopefully we get a way that deploys/configures the gateway in a stable fashion with Rook. Services and Cluster-IP addresses without specifying IP-addresses in the listerners section would be a clean approach too.
| metadata: | ||
| name: "ceph-csi-config" |
There was a problem hiding this comment.
We can add here in metadata entry- the kms type (under comment)
//"encryptionKMSType": "metadata"
(If someone works with dh-chap he must add it)
There was a problem hiding this comment.
ideally the PR that adds dh-chap would add the option here. If that feature gets merged before this, we can include it here.
| # FIXME: delete imagePullPolicy, or set to "IfNotPresent" | ||
| imagePullPolicy: "Always" |
| @@ -0,0 +1,12 @@ | |||
| --- | |||
There was a problem hiding this comment.
maybe we can add another PVC example with
volumeAttributesClassName property, and its value will be default
There was a problem hiding this comment.
Not all kubernetes versions we are testing with, support that yet. Creating the PVC with an old Kubernetes version would fail.
|
|
||
| # subsystemNQN is the NQN of the subsystem where all volumes belong to | ||
| subsystemNQN: "nqn.2025-08.io.ceph:k8s-ceph-csi" | ||
|
|
There was a problem hiding this comment.
maybe add dh-chap option under comment?
|
|
||
| e2eGatewaySecurityContext = e2eGatewayPath + "scc.yaml" | ||
| e2eGatewayServiceAccount = e2eGatewayPath + "serviceaccount.yaml" | ||
| e2eGatewayConfig = e2eGatewayPath + "config.yaml" |
There was a problem hiding this comment.
inside the e2e/nvmeof/config.yaml file I think we should add this line:
enable_monitor_client = True in the GW configuration. without monitor client the GW cannot be deployed
There was a problem hiding this comment.
Well, it works now, I'd rather change as little as possible at the moment. Hopefully we can use Rook to deploy the gateway soon.
| // After a failure, the deployment is automatically removed. The gateway Pod | ||
| // is not available anymore when the e2e job gathers all the logs. Record | ||
| // the logs in the job for now (the way of deploying will move to Rook in | ||
| // the future anyway). | ||
| for _, container := range []string{"generate-minimal-ceph-conf", "nvmeof-gateway"} { | ||
| logs, _ := frameworkPod.GetPodLogs(context.TODO(), f.ClientSet, rookNamespace, pod, container) | ||
| framework.Logf("Logs from the %q container of the NVMe-oF gateway:\n%s", container, logs) | ||
| } |
There was a problem hiding this comment.
I hope we can drop all the manual deploying of the gateway and rely on Rook soon.
| Expect(err).ShouldNot(HaveOccurred()) | ||
| Expect(pods.Items).Should(HaveLen(1)) | ||
|
|
||
| return pods.Items[0].Name, pods.Items[0].Status.PodIP |
There was a problem hiding this comment.
maybe return list of ip port , depending on how many GW you deployed in e2e/nvmeof/deployment.yaml
| Port int `json:"port"` | ||
| }{ | ||
| { | ||
| Hostname: gwHost, // FIXME: is this required? |
| validateRBDImageCount(f, 1, nvmeofPool) | ||
| validateOmapCount(f, 1, rbdType, nvmeofPool, volumesType) | ||
|
|
There was a problem hiding this comment.
maybe in the future we can add another pod nvmeog-gw-toolbox and then validate the listener\host\subsystem\namespace were created, what do you think?
There was a problem hiding this comment.
Sure, when we add tests that validate the state of the gateway, we should deploy the additional toolbox too.
| metadata: | ||
| name: ceph-nvmeof-nodeplugin | ||
| rules: | ||
| - apiGroups: [""] |
There was a problem hiding this comment.
Are these RBAC's copy from the RBD driver or nvme needs these permissions as well?
There was a problem hiding this comment.
the NVMe implementation calls some RBD functionality. It may not need all permissions at the moment, but new features get added, so the permissions need to be updated at that time too. Instead of trying to remove permissions that get added later, I just kept them.
| - mountPath: /csi-logs | ||
| name: logs-dir | ||
| - args: | ||
| - --v=5 |
There was a problem hiding this comment.
log level 1 like in deployment?
| return s | ||
| } | ||
|
|
||
| // parse splilts the version string into sepate pieces. |
There was a problem hiding this comment.
splilts to splits sepate to separate
| flag.BoolVar(&testNBD, "test-nbd", false, "test rbd csi driver with rbd-nbd mounter") | ||
| flag.BoolVar(&testNFS, "test-nfs", false, "test nfs csi driver") | ||
| flag.BoolVar(&testNVMeoF, "test-nvmeof", false, "test nvmeof csi driver") | ||
| flag.BoolVar(&testNVMeoF, "test-nvmeof", true, "test nvmeof csi driver") |
There was a problem hiding this comment.
this should be set to false as we dont deploy-nvme by default?
There was a problem hiding this comment.
testing should be done by default. When no support for the nvme-gateway is detected, the tests are skipped. This option makes it possible to explicitly disable the testing, if needed for some reason.
| @@ -0,0 +1,249 @@ | |||
| /* | |||
| Copyright 2025 The Ceph-CSI Authors. | |||
| } | ||
| rawListeners, err := json.Marshal(listeners) | ||
| Expect(err).ShouldNot(HaveOccurred()) | ||
| sc.Parameters["listeners"] = string(rawListeners) |
There was a problem hiding this comment.
i thought we were going to use auto listeners as the listeners will have problem if the gateway pod restarts, this will leave inconcistent test results
There was a problem hiding this comment.
Autolisteners are not available yet. Ideally we do not need to provide IP-addresses when the gateway gets deployed by Rook. But how that will be done, is something we need to figure out.
There was a problem hiding this comment.
@Madhu-1
the AutoListener will not resolve the nvme connection issue (from the nodesever side) . it just will make easier life for the administrator . No need to re-edit the SC every time the GW pod restarts with the IPs..
the connection issue is still open without solution. hopefully soon will be resolved
| csi.storage.k8s.io/controller-expand-secret-name: csi-nvmeof-secret | ||
| csi.storage.k8s.io/controller-expand-secret-namespace: default | ||
| csi.storage.k8s.io/node-stage-secret-name: csi-nvmeof-secret | ||
| csi.storage.k8s.io/node-stage-secret-namespace: default |
There was a problem hiding this comment.
publish secrets are not required?
There was a problem hiding this comment.
No, maybe later when dh-chap is supported.
| @@ -0,0 +1,13 @@ | |||
| --- | |||
| # apiVersion: storage.k8s.io/v1beta1 for k8s version 1.33 | |||
There was a problem hiding this comment.
IMO one class example is enough with commented parameters, we dont need to add multiple files to show different parameters
There was a problem hiding this comment.
I think practical examples are valuable for users. It makes it easier to understand that they should create multiple VolumeAttributesClasses.
| sc.Parameters["csi.storage.k8s.io/provisioner-secret-name"] = nvmeofProvisionerSecretName | ||
|
|
||
| sc.Parameters["csi.storage.k8s.io/controller-publish-secret-namespace"] = cephCSINamespace | ||
| sc.Parameters["csi.storage.k8s.io/controller-publish-secret-name"] = nvmeofProvisionerSecretName |
There was a problem hiding this comment.
expand secrets are missing here?
There was a problem hiding this comment.
Not really missing, there is no expand volume test yet. I'll add them to keep everything a little more complete.
| // After a failure, the deployment is automatically removed. The gateway Pod | ||
| // is not available anymore when the e2e job gathers all the logs. Record | ||
| // the logs in the job for now (the way of deploying will move to Rook in | ||
| // the future anyway). | ||
| for _, container := range []string{"generate-minimal-ceph-conf", "nvmeof-gateway"} { | ||
| logs, _ := frameworkPod.GetPodLogs(context.TODO(), f.ClientSet, rookNamespace, pod, container) | ||
| framework.Logf("Logs from the %q container of the NVMe-oF gateway:\n%s", container, logs) | ||
| } |
There was a problem hiding this comment.
I hope we can drop all the manual deploying of the gateway and rely on Rook soon.
Yes, ceph-csi-operator isn't ready yet, so we have to do this for the time being. |
Add example of storage class and pvc, need to modify storage class before use! Signed-off-by: Niels de Vos <ndevos@ibm.com> Signed-off-by: gadi-didi <gadi.didi@ibm.com>
The deployment files make it possible to start the NVMe-oF controller and nodeplugin by the e2e suite. Co-authored-by: gadi-didi <gadi.didi@ibm.com> Signed-off-by: Niels de Vos <ndevos@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Create a PVC and wait for it to get Bound. This does not exersice the node-plugin yet, only the controller/provisioner. Signed-off-by: Niels de Vos <ndevos@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Recent versions of Rook use Ceph-CSI Operator by default. There is no need to have Ceph-CSI deployed by Rook, as our CI deploys it from the PR that is being tested. Signed-off-by: Niels de Vos <ndevos@ibm.com>
ceph-volume in Tentacle (v20) uses udev data to detect some properties of the potential OSDs. Inside the minikube cluster (running with Podman), there is no udev. Mounting the host udev runtime data into the containers makes ceph-volume happy. Signed-off-by: Niels de Vos <ndevos@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Add example of volumeAttributesClass yaml files. it is possible to use in with PVC, by using the field volumeAttributesClassName. Co-authored-by: Niels de Vos <ndevos@ibm.com> Signed-off-by: gadi-didi <gadi.didi@ibm.com>
Ceph Tentacle is required for deploying the NVMe-oF gateway. Earlier versions do not support the gateway. Signed-off-by: Niels de Vos <ndevos@ibm.com>
e0bb01c to
17d5ccc
Compare
|
@Merigfyio queue |
|
@Mergifyio queue |
Merge Queue StatusRule:
This pull request spent 2 hours 44 minutes 44 seconds in the queue, including 2 hours 44 minutes 29 seconds running CI. Required conditions to merge
|
Examples and deployment files for NVMe-oF. These are used with the e2e testing
as well.
See-also: #5672