Skip to content

cephadm, ceph-volume: deploy crimson OSDs using cephadm#66811

Merged
shraddhaag merged 2 commits intoceph:mainfrom
shraddhaag:wip-shraddhaag-cephadm-add-osd-type
Feb 4, 2026
Merged

cephadm, ceph-volume: deploy crimson OSDs using cephadm#66811
shraddhaag merged 2 commits intoceph:mainfrom
shraddhaag:wip-shraddhaag-cephadm-add-osd-type

Conversation

@shraddhaag
Copy link
Contributor

@shraddhaag shraddhaag commented Jan 6, 2026

This commit enables us to deploy both classic and crimson type OSDs using cephadm. To enable the same, a new feature, osd_type is added to DriverGroupSpec. The default value for the same is classic, but can also be set to crimson. When this value is read by cephadm, the entrypoint is changed from /usr/bin/ceph-osd to /usr/bin/ceph-osd-crimson.

Fixes: https://tracker.ceph.com/issues/74081

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@shraddhaag shraddhaag force-pushed the wip-shraddhaag-cephadm-add-osd-type branch 4 times, most recently from 168cf2a to 8a66d63 Compare January 12, 2026 06:50
@Matan-B Matan-B added this to Crimson Jan 12, 2026
@Matan-B Matan-B moved this to In Progress in Crimson Jan 12, 2026
Copy link
Contributor

@phlogistonjohn phlogistonjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in commit message subject: add osd_type to orchestratory

I also generally recommend prefixing the subject with a directoy/topic like nearly all the existing ceph commits do eg. cephadm: add osd_type to orchestrator

@shraddhaag shraddhaag force-pushed the wip-shraddhaag-cephadm-add-osd-type branch from 1888d7a to 9c784b5 Compare January 19, 2026 14:48
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once ready, can you please update the Crimson user docs with how a Crimson OSD could be added (command example)?

@guits
Copy link
Contributor

guits commented Jan 30, 2026

jenkins ceph-volume unit tests

@shraddhaag shraddhaag force-pushed the wip-shraddhaag-cephadm-add-osd-type branch 2 times, most recently from afa11f0 to 9302a65 Compare January 30, 2026 13:48
@guits
Copy link
Contributor

guits commented Jan 30, 2026

jenkins ceph-volume unit tests

@guits
Copy link
Contributor

guits commented Jan 30, 2026

@shraddhaag should this test be updated ?

_run_cephadm.assert_any_call(
'test', 'osd', 'ceph-volume',
['--config-json', '-', '--', 'lvm', 'batch',
'--no-auto', '/dev/sdb', '--db-devices', '/dev/sdc',
'--wal-devices', '/dev/sdd', '--objectstore', 'bluestore', '--yes', '--no-systemd'],
env_vars=['CEPH_VOLUME_OSDSPEC_AFFINITY=noncollocated'],
error_ok=True, stdin='{"config": "", "keyring": ""}',
)
(is it missing the new --osd-type flag ?)

@shraddhaag shraddhaag force-pushed the wip-shraddhaag-cephadm-add-osd-type branch 4 times, most recently from 151cf17 to 1521dcb Compare February 2, 2026 07:55
@shraddhaag shraddhaag force-pushed the wip-shraddhaag-cephadm-add-osd-type branch from 1521dcb to 346a396 Compare February 2, 2026 12:05
This commit enables us to deploy both classic and crimson
type OSDs using cephadm. To enable the same, a new feature,
osd_type is added to DriverGroupSpec. The default value for
the same is classic, but can also be set to crimson.
When this value is read by cephadm, the entrypoint is
changed from /usr/bin/ceph-osd to /usr/bin/ceph-osd-crimson.

Fixes: https://tracker.ceph.com/issues/74081
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
Prior to this commit, ceph-volume was using hardcoded OSD binary
to issue commands (eg - to perform mkfs, etc). This commit enables
ceph-volume to start supporting crimson OSDs.

A new argument, --osd-type is introduced with the default value
classic. When this parameter is set to 'crimson', ceph-osd-crimson
binary will be used to execute OSD commands.

Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
@shraddhaag shraddhaag force-pushed the wip-shraddhaag-cephadm-add-osd-type branch from 346a396 to 585992a Compare February 2, 2026 12:15
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!
I think that we have an ack from cephadm side (@adk3798).
Missing an ack from ceph-vol (@guits) to merge this.

[
# no preview and only one disk, prepare is used due the hack that is in place.
(['/dev/sda'], False, ["lvm batch --no-auto /dev/sda --objectstore bluestore --yes --no-systemd"]),
(['/dev/sda'], False, ["lvm batch --no-auto /dev/sda --objectstore bluestore --osd-type classic --yes --no-systemd"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep classic as the default to avoid passing it in the test files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added classic as the default in both cephadm and ceph-volume.

The reason we have to supply the commands this way is because DriveGroupSpec injects these arguments. Also, in future when we have to make crimson as the default, we won't have to make much changes, only change the default value.

@shraddhaag
Copy link
Contributor Author

jenkins test rook e2e

@shraddhaag
Copy link
Contributor Author

jenkins test make check

@shraddhaag
Copy link
Contributor Author

Some context about why ceph-volume changes were needed:

While testing out these changes 0fdc81b, OSDs were getting killed in the boot process itself. Looking at the debug logs for OSD boot process (post #67090), the final error looked something like this:

DEBUG 2026-01-28 15:14:36,301 [shard 0:main] osd - OSD::start: open metadata collection
DEBUG 2026-01-28 15:14:36,301 [shard 0:main] osd - OSD::open_meta_coll: opening metadata collection
DEBUG 2026-01-28 15:14:36,301 [shard 0:main] alienstore - open_collection
INFO  2026-01-28 15:14:36,301 [shard 0:main] prioritycache - prioritycache tune_memory target: 4294967296 mapped: 6004736 unmapped: 688128 heap: 6692864 old mem: 134217728 new mem: 2841625343
DEBUG 2026-01-28 15:14:36,301 [shard 0:main] osd - OSD::open_meta_coll: registering metadata collection
DEBUG 2026-01-28 15:14:36,302 [shard 0:main] osd - OSD::start: loading superblock
DEBUG 2026-01-28 15:14:36,302 [shard 0:main] osd - OSDMeta::load_superblock:
DEBUG 2026-01-28 15:14:36,302 [shard 0:main] alienstore - read
ERROR 2026-01-28 15:14:36,302 [shard 0:main] osd - /ceph/rpmbuild/BUILD/ceph-20.3.0-4805-g06aa012d/src/crimson/common/errorator.h:1319 : In function 'crimson::ct_error::assert_all::operator()<const crimson::unthrowable_wrapper<const std::error_code&, ((const std::error_code&)(& crimson::ec<2>))>&>(const crimson::unthrowable_wrapper<const std::error_code&, ((const std::error_code&)(& crimson::ec<2>))>&)::<lambda(auto:119&&)> [with auto:119 = const std::error_code&]', abort() 
 open_meta_coll error: No such file or directory

ERROR 2026-01-28 15:14:36,302 [shard 0:main] osd - Aborting Got SIGABRT on shard 0 - Stopping all shards
ERROR 2026-01-28 15:14:37,591 [shard 0:main] osd - Got SIGABRT on shard 0

ie., crimson OSD was failing when trying to open the superblock. The error being No such file or directory. So I looked in the logs above to see if the superblock was created or not:

2026-01-28T15:08:59.138+0000 7f2628a6a780 10 write_superblock sb(0577bd6c-fc5b-11f0-a2d1-525400b1a44c osd.0 401f34c7-c712-4e99-a948-d47fba81a834 e0 maps [] lci=[0,0] tlb=0)

From the above log, while the superblock is indeed getting created, this log is not coming from crimson OSD, but instead from classic OSD. (ref: https://github.com/shraddhaag/ceph/blob/main/src/osd/OSD.cc#L2134).

The problem being, the first command to do mkfs was being issued by the classic OSD binary and the OSD boot was being done by crimson OSD. In cephadm, mkfs is done by ceph-volume. Looking at the code, the OSD binary used was hardcoded ().

To fix this, a new flag is introduced in ceph-volume --osd type, with default being set to classic. When crimson value is provided, the crimson binary is chosen to issue the mkfs command.

@shraddhaag
Copy link
Contributor Author

shraddhaag commented Feb 4, 2026

The PR has been tested with KCLI. Both a classic and a crimson cluster came up successfully with HEALTH_OK.

Docs to follow in a follow-up PR.

@shraddhaag shraddhaag marked this pull request as ready for review February 4, 2026 08:08
@shraddhaag shraddhaag requested a review from a team as a code owner February 4, 2026 08:08
@shraddhaag shraddhaag requested a review from guits February 4, 2026 08:09
@shraddhaag
Copy link
Contributor Author

jenkins ceph-volume unit tests

@shraddhaag shraddhaag changed the title WIP: deploy crimson OSDs using cephadm cephadm, ceph-volume: deploy crimson OSDs using cephadm Feb 4, 2026
@shraddhaag shraddhaag dismissed phlogistonjohn’s stale review February 4, 2026 11:21

Dismissing the review for merging this PR. Will add the default value in a followup PR.

@shraddhaag shraddhaag merged commit 030f947 into ceph:main Feb 4, 2026
21 of 25 checks passed
@github-actions
Copy link

github-actions bot commented Feb 4, 2026

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/21669492167

@shraddhaag shraddhaag moved this from In Progress to Merged (Main) in Crimson Feb 4, 2026
shraddhaag added a commit to shraddhaag/ceph that referenced this pull request Feb 17, 2026
This commit adds support for deploying crimson OSDs using
cephadm with the method raw.

Support for lvm crimson OSD was added previously in:
ceph#66811.

Fixes: https://tracker.ceph.com/issues/74960
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
shraddhaag added a commit to shraddhaag/ceph that referenced this pull request Feb 18, 2026
This commit adds support for deploying crimson OSDs using
cephadm with the method raw.

Support for lvm crimson OSD was added previously in:
ceph#66811.

Fixes: https://tracker.ceph.com/issues/74960
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
kginonredhat pushed a commit to kginonredhat/ceph that referenced this pull request Mar 13, 2026
This commit adds support for deploying crimson OSDs using
cephadm with the method raw.

Support for lvm crimson OSD was added previously in:
ceph#66811.

Fixes: https://tracker.ceph.com/issues/74960
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Merged (Main)

Development

Successfully merging this pull request may close these issues.

5 participants