Skip to content

crimson/osd: add verbose DEBUG logs for OSD startup#67090

Merged
shraddhaag merged 1 commit intoceph:mainfrom
shraddhaag:wip-shraddhaag-add-more-osd-bootstrap-logs
Feb 2, 2026
Merged

crimson/osd: add verbose DEBUG logs for OSD startup#67090
shraddhaag merged 1 commit intoceph:mainfrom
shraddhaag:wip-shraddhaag-add-more-osd-bootstrap-logs

Conversation

@shraddhaag
Copy link
Contributor

@shraddhaag shraddhaag commented Jan 27, 2026

This commit adds verbose logs for each step of OSD startup to aid in debugging OSD startup failures.

Logs in a vstart cluster:

INFO   osd - OSD::main: early config parsed successfully
INFO   osd - OSD::main: initializing seastar app_template
INFO   osd - OSD::main: registering CLI options
INFO   osd - OSD::main: entering seastar runtime
INFO  2026-01-28 07:08:10,428 [shard 0:main] osd - OSD::main: seastar runtime started
INFO  2026-01-28 07:08:10,429 [shard 0:main] osd - OSD::main: enabling debug logging
DEBUG 2026-01-28 07:08:10,429 [shard 0:main] osd - OSD::main: starting sharded config service
DEBUG 2026-01-28 07:08:10,485 [shard 0:main] osd - OSD::main: starting performance counters
DEBUG 2026-01-28 07:08:10,485 [shard 0:main] osd - OSD::main: parsing config files
DEBUG 2026-01-28 07:07:52,660 [shard 0:main] osd - OSD::main: writing pidfile
DEBUG 2026-01-28 07:07:52,660 [shard 0:main] osd - OSD::main: setting ignore SIGHUP
DEBUG 2026-01-28 07:07:52,660 [shard 0:main] osd - OSD::main: creating messengers
DEBUG 2026-01-28 07:07:52,660 [shard 0:main] osd - OSD::main: creating object store
INFO  2026-01-28 07:07:52,660 [shard 0:main] osd - OSD::main: passed objectstore is bluestore
DEBUG 2026-01-28 07:07:52,661 [shard 0:main] osd - OSD::main: constructing OSD instance
DEBUG 2026-01-28 07:07:52,661 [shard 0:main] osd - OSD::OSD: 
WARN  2026-01-28 07:07:52,666 [shard 0:main] osd - OSD::OSD: warning: got an error loading one or more classes: (5) Input/output error
INFO  2026-01-28 07:07:52,666 [shard 0:main] osd - OSD::OSD: nonce is 1024147670
DEBUG 2026-01-28 07:07:52,666 [shard 0:main] osd - OSD::main: fetching config from monitors
DEBUG 2026-01-28 07:07:52,687 [shard 0:main] osd - OSD::main: running mkfs
DEBUG 2026-01-28 07:07:52,687 [shard 0:main] osd - OSD::mkfs: starting store mkfs
DEBUG 2026-01-28 07:07:52,706 [shard 0:main] osd - OSD::mkfs: calling store mkfs
DEBUG 2026-01-28 07:07:54,752 [shard 0:main] osd - OSD::mkfs: mounting store mkfs
DEBUG 2026-01-28 07:07:55,314 [shard 0:main] osd - OSD::open_or_create_meta_coll: 
DEBUG 2026-01-28 07:07:55,314 [shard 0:main] osd - OSD::open_or_create_meta_coll: creating new metadata collection
DEBUG 2026-01-28 07:07:55,315 [shard 0:main] osd - OSD::_write_superblock: try loading existing superblock
INFO  2026-01-28 07:07:55,316 [shard 0:main] osd - OSD::_write_superblock: writing superblock cluster_fsid 2b6d0539-5cf2-4771-88a0-af2fef08e9b5 osd_fsid f13b4da1-943b-4e99-a26b-6aa09dfe1835
DEBUG 2026-01-28 07:07:55,316 [shard 0:main] osd - OSD::_write_superblock: do_transaction: create meta collection and store superblock
INFO  2026-01-28 07:07:55,334 [shard 0:main] osd - OSD::mkfs: created object store /home/shraddhaag/Documents/ceph/build/dev/osd0 for osd.0 fsid 2b6d0539-5cf2-4771-88a0-af2fef08e9b5
DEBUG 2026-01-28 07:07:55,552 [shard 0:main] osd - OSD::main: exiting, mkkey 0, mkfs 1
DEBUG 2026-01-28 07:08:00,257 [shard 0:main] osd - OSD::main: writing pidfile
DEBUG 2026-01-28 07:08:00,257 [shard 0:main] osd - OSD::main: setting ignore SIGHUP
DEBUG 2026-01-28 07:08:00,257 [shard 0:main] osd - OSD::main: creating messengers
DEBUG 2026-01-28 07:08:00,257 [shard 0:main] osd - OSD::main: creating object store
INFO  2026-01-28 07:08:00,257 [shard 0:main] osd - OSD::main: passed objectstore is bluestore
DEBUG 2026-01-28 07:08:00,257 [shard 0:main] osd - OSD::main: constructing OSD instance
DEBUG 2026-01-28 07:08:00,257 [shard 0:main] osd - OSD::OSD: 
WARN  2026-01-28 07:08:00,263 [shard 0:main] osd - OSD::OSD: warning: got an error loading one or more classes: (5) Input/output error
INFO  2026-01-28 07:08:00,263 [shard 0:main] osd - OSD::OSD: nonce is 3576695781
DEBUG 2026-01-28 07:08:00,263 [shard 0:main] osd - OSD::main: fetching config from monitors
DEBUG 2026-01-28 07:08:00,284 [shard 0:main] osd - OSD::main: starting OSD services
INFO  2026-01-28 07:08:00,284 [shard 0:main] osd - OSD::start: seastar::smp::count 1
DEBUG 2026-01-28 07:08:00,284 [shard 0:main] osd - OSD::start: starting store
DEBUG 2026-01-28 07:08:00,306 [shard 0:main] osd - OSD::start: mounting store
DEBUG 2026-01-28 07:08:01,076 [shard 0:main] osd - OSD::start: open metadata collection
DEBUG 2026-01-28 07:08:01,076 [shard 0:main] osd - OSD::open_meta_coll: opening metadata collection
DEBUG 2026-01-28 07:08:01,076 [shard 0:main] osd - OSD::open_meta_coll: registering metadata collection
DEBUG 2026-01-28 07:08:01,077 [shard 0:main] osd - OSD::start: loading superblock
DEBUG 2026-01-28 07:08:01,081 [shard 0:main] osd - OSD::start: loading PGs
DEBUG 2026-01-28 07:08:01,091 [shard 0:main] osd - OSD::start: starting mon and mgr clients
DEBUG 2026-01-28 07:08:01,107 [shard 0:main] osd - OSD::start: adding to crush
INFO  2026-01-28 07:08:01,108 [shard 0:main] osd - OSD::_add_me_to_crush: crush location is "host=sockeni05", "root=default"
INFO  2026-01-28 07:08:01,167 [shard 0:main] osd - OSD::_add_me_to_crush: added to crush: create-or-move updating item name 'osd.0' weight 0.0986 at location {host=sockeni05,root=default} to crush map
INFO  2026-01-28 07:08:01,170 [shard 0:main] osd - OSD::_add_device_class: device_class is ssd 
INFO  2026-01-28 07:08:02,190 [shard 0:main] osd - OSD::_add_device_class: device_class was set: set osd(s) 0 to class 'ssd'
DEBUG 2026-01-28 07:08:02,192 [shard 0:main] osd - OSD::start: replacing unkwnown addrs of cluster messenger
DEBUG 2026-01-28 07:08:02,205 [shard 0:main] osd - OSD::start: starting boot
INFO  2026-01-28 07:08:02,226 [shard 0:main] osd - OSD::_handle_osd_map: osd_map(6..6 src has 1..6) v4
INFO  2026-01-28 07:08:02,226 [shard 0:main] osd - OSD::_handle_osd_map:  epochs [6..6], i have 0, src has [1..6]
DEBUG 2026-01-28 07:08:02,226 [shard 0:main] osd - OSD::_handle_osd_map: superblock cluster_osdmap_trim_lower_bound new epoch is: 1
INFO  2026-01-28 07:08:02,226 [shard 0:main] osd - OSD::_handle_osd_map: message skips epochs 1..5
INFO  2026-01-28 07:08:02,227 [shard 0:main] osd - OSD::_preboot: osd.0
INFO  2026-01-28 07:08:02,227 [shard 0:main] osd - OSD::_preboot: waiting for initial osdmap
INFO  2026-01-28 07:08:02,228 [shard 0:main] osd - OSD::main: crimson startup completed

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@shraddhaag shraddhaag moved this to In Progress in Crimson Jan 27, 2026
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should make osd boot more verbose on debug.
Added few suggestions. Could be a good opportunity to update the shorter functions to coroutines such as open_or_create_meta_coll, open_meta_coll or even OSD::start() (which is a bit longer)

// handled by S* must be blocked for alien threads (see AlienStore).
seastar::handle_signal(SIGHUP, [] {});

DEBUG("initializing prometheus (if enabled)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, might be confusing, the log line within the if branch should be sufficient if is enabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main.cc has a static seastar::logger& logger() which could be removed now if there aren't any other user of it.

crimson::osd::populate_config_from_mon().get();
}
if (config.count("mkfs")) {
DEBUG("running mkfs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might help!

logger().info("crimson startup completed");
INFO("crimson startup completed");

DEBUG("waiting for stop signal");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be confusing, let's avoid this line. We are waiting as long as the OSD is alive. Might be better to let the last log be crimson startup completed (after completing).

seastar::future<> OSD::open_meta_coll()
{
LOG_PREFIX(OSD::open_meta_coll);
DEBUG("verifing this is the primary core before registering metadata collection");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the assert would fail we would get abort message with the assertion condition.
Otherwise, we know we passed it.
How about:

ceph_assert(seastar::this_shard_id() == PRIMARY_CORE);
DEBUG("opening collection");
return store.get_sharded_store().open_collection

}).then([this] {
return log_client.set_fsid(monc->get_fsid());
}).then([this] {
DEBUG("osd: start, starting boot");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the prefix as we already have this prefix LOG_PREFIX(OSD::start); in this funciton.

ret, cpp_strerror(-ret)));
}

DEBUG("installing signal handlers");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are setting SIGHUP to be ignored here

logger().info("passed objectstore is {}", local_conf().get_val<std::string>("osd_objectstore"));
INFO("passed objectstore is {}", local_conf().get_val<std::string>("osd_objectstore"));

DEBUG("constructing OSD instance");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "creating OSD"? Or alternatively add log line to OSD::OSD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added log line to OSD::OSD instead, seemed cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove l.243 as OSD::OSD was adjusted

DEBUG("running mkfs");
auto osd_uuid = local_conf().get_val<uuid_d>("osd_uuid");
if (osd_uuid.is_zero()) {
DEBUG("generating random osd uuid");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uuid not specified, generating random osd uuid

config["osdspec-affinity"].as<std::string>()).get();
}
if (config.count("mkkey") || config.count("mkfs")) {
DEBUG("administrative mode complete, exiting");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit,
"Exiting, mkkey {}, mkfs {}", config.count("mkkey").....

@shraddhaag shraddhaag force-pushed the wip-shraddhaag-add-more-osd-bootstrap-logs branch 5 times, most recently from da8365e to 78aa116 Compare January 28, 2026 07:02
@shraddhaag shraddhaag marked this pull request as ready for review January 28, 2026 07:21
@shraddhaag shraddhaag requested a review from a team as a code owner January 28, 2026 07:21
@shraddhaag
Copy link
Contributor Author

@Matan-B I've addressed all the comments, thank you for the detailed review! You can see the final logs in the description, these are from the vstart cluster with these changes build.

As for converting some of these functions to coroutines, I'll handle that in a followup PR as I'd like to test that more thoroughly, and I want these changes to land in main sooner to unblock testing my cephadm changes for debugging (#66811).

If you could take a final look at the PR that'd be great, thanks!

@shraddhaag shraddhaag requested a review from Matan-B January 28, 2026 07:27
@shraddhaag shraddhaag moved this from In Progress to Awaits review in Crimson Jan 28, 2026
);
}
if (config.count("trace")) {
INFO("enabling trace logging");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed all logs above this to INFO as debug level is not set till config is parsed.

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

logger().info("passed objectstore is {}", local_conf().get_val<std::string>("osd_objectstore"));
INFO("passed objectstore is {}", local_conf().get_val<std::string>("osd_objectstore"));

DEBUG("constructing OSD instance");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove l.243 as OSD::OSD was adjusted

@Matan-B Matan-B moved this from Awaits review to Needs QA in Crimson Jan 28, 2026
@shraddhaag shraddhaag force-pushed the wip-shraddhaag-add-more-osd-bootstrap-logs branch from 78aa116 to 58280a0 Compare January 28, 2026 09:54
This commit adds verbose logs for each step of OSD startup
to aid in debugging OSD startup failures.

Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
@shraddhaag shraddhaag force-pushed the wip-shraddhaag-add-more-osd-bootstrap-logs branch from 58280a0 to 5473386 Compare January 28, 2026 09:57
@shraddhaag shraddhaag merged commit c21c063 into ceph:main Feb 2, 2026
12 of 13 checks passed
@shraddhaag shraddhaag moved this from Needs QA to Merged (Main) in Crimson Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Merged (Main)

Development

Successfully merging this pull request may close these issues.

2 participants