Ceph: provision OSDs using Drive Groups#4916
Conversation
|
This is obviously not complete. Unit tests have been commented out instead of changed, but what currently exists propagates drive group config to the OSD provisioning pod. Relevant drive groups config in driveGroups:
# # Examples below
regular_nodes:
host_pattern: "*-osd"
data_devices:
all: true
fast_nodes:
host_pattern: "*-osd-fast"
data_devices:
limit: 6
size: "10TB:10TB"Relevant OSD prepare pod logs |
8830c98 to
32643cb
Compare
414447f to
f850e3a
Compare
0d8e125 to
e8bea7e
Compare
pkg/daemon/ceph/osd/volume.go
Outdated
| // TODO: what if log path were /dev/stderr and we didn't do combined output so that logs would | ||
| // appear in the prepare pod log? |
There was a problem hiding this comment.
Sounds good for the logs to be piped to the same pod log
| # deviceFilter: "^sd." | ||
| # Ceph supports adding devices as OSDs by Drive Group definitions. TODO: LINK | ||
| # Rook will give precedence to creating OSDs using Drive Groups before using other methods. | ||
| driveGroups: |
There was a problem hiding this comment.
What if we added a new example for drive groups? The main cluster.yaml would be used for raw devices, cluster-on-pvc.yaml for storageClassDeviceSets, and cluster-drivegroups.yaml for this example. If we keep it in cluster.yaml we would need to comment it all out by default.
There was a problem hiding this comment.
I still like the idea of doing this in the main cluster.yaml and commenting these out. I intend to comment this out, but for my testing they are uncommented. Thoughts?
There was a problem hiding this comment.
IMO if the example is a big chunk of yaml it makes more sense as a separate yaml, but I'm ok with it staying in cluster.yaml as long as it's not too long. If it's a separate yaml, the docs can also refer to a working example instead of needing to uncomment it. Another thought is to comment out a very short example in cluster.yaml, and have a bigger example in a new yaml.
There was a problem hiding this comment.
I opted for short example in cluster.yaml and to make a new cluster-with-drive-groups.yaml
| # Rook will give precedence to creating OSDs using Drive Groups before using other methods. | ||
| driveGroups: | ||
| # # Examples below | ||
| regular_nodes: |
There was a problem hiding this comment.
This is this the same schema defined by ceph-volume (outside rook's control), right? It just has a different look and feel with the underscores. Not a problem, only an observation.
There was a problem hiding this comment.
This is defined by Ceph (and ceph-volume) yes. Somewhat annoying, but... 🤷♀️
pkg/daemon/ceph/osd/volume.go
Outdated
| // TODO: what if log path were /dev/stderr and we didn't do combined output so that logs would | ||
| // appear in the prepare pod log? |
There was a problem hiding this comment.
Sounds good for the logs to be piped to the same pod log
d9158b2 to
7435565
Compare
|
Still waiting on ceph-volume PR ceph/ceph#35728 for Ceph to support this in master. |
5094665 to
4b4dfaa
Compare
Since the admission controller isn't always enabled, I think having both is still good. I added admission controller checks. |
cluster/examples/kubernetes/ceph/cluster-with-drive-groups.yaml
Outdated
Show resolved
Hide resolved
cluster/examples/kubernetes/ceph/cluster-with-drive-groups.yaml
Outdated
Show resolved
Hide resolved
pkg/operator/ceph/admission.go
Outdated
| return nil | ||
| } | ||
|
|
||
| func validateStorageSpec(cluster cephv1.CephCluster) error { |
There was a problem hiding this comment.
Note that the admission controller code will move to a different location with #5761.
|
This pull request has merge conflicts that must be resolved before it can be merged. @BlaineEXE please rebase it. https://rook.io/docs/rook/master/development-flow.html#updating-your-fork |
042c8f1 to
ead8207
Compare
|
Had a failure on k8s 1.18. Restarting CI. |
|
This pull request has merge conflicts that must be resolved before it can be merged. @BlaineEXE please rebase it. https://rook.io/docs/rook/master/development-flow.html#updating-your-fork |
travisn
left a comment
There was a problem hiding this comment.
LGTM after the merge conflict is resolved
Add the ability to provision Ceph OSDs with Drive Groups. This adds Drive Groups to the CephCluster CRD, and it sets code in place for propagating this config to the OSD provisioning pod. Signed-off-by: Blaine Gardner <blaine.gardner@suse.com>
travisn
left a comment
There was a problem hiding this comment.
The CI issues look unrelated to this PR.
2d8e0d7 to
7117fc1
Compare
[test ceph]
[test full]
Add a design document to plan how Rook will allow Ceph's Rook
Orchestrator Mgr Module to add and remove OSDs to/from a Rook-Ceph
cluster including changes needed in both the Orchestrator Module and in
Rook.
Related to CI:
I will implement unit tests in Rook as needed, but I intend to implement CI tests for Drive Group work using the SUSE/rookcheck framework we hope to soon present and use as a Ceph nightly build tester since it is primarily SUSE who will use the drive group functionality.
Integration test matrix to verify functionality around drive groups:
useAllNodes: falseandstorageconfig emptyplacementdefinedplacement,host_pattern, andservice_idin the drive group specstorageconfig applied previously (upgrade)Signed-off-by: Blaine Gardner blaine.gardner@suse.com
Description of your changes:
Which issue is resolved by this Pull Request:
Resolves #
Checklist:
make codegen) has been run to update object specifications, if necessary.