Skip to content

crimson: Set device class during spawn of a crimson osd#60747

Merged
Matan-B merged 1 commit intoceph:mainfrom
mohit84:crimson_device_class
Nov 26, 2024
Merged

crimson: Set device class during spawn of a crimson osd#60747
Matan-B merged 1 commit intoceph:mainfrom
mohit84:crimson_device_class

Conversation

@mohit84
Copy link
Contributor

@mohit84 mohit84 commented Nov 15, 2024

Implement a wrapper for different backend storage to set device_class during spawn of a process

Fixes: https://tracker.ceph.com/issues/66627

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@mohit84 mohit84 requested a review from a team as a code owner November 15, 2024 05:13
@mohit84
Copy link
Contributor Author

mohit84 commented Nov 15, 2024

jenkins test make check

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far, left a few comments.
I would also suggest trying to use coroutines in OSD::_add_device_class() as it can make the implementation easier to read.

Comment on lines +149 to +150
//TODO In case of classic memstore we are always
// returning hdd so for the timebeing we are skipping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return empty string, no need for TODO with CyanStore for now.

seastar::future<> OSD::_add_device_class()
{
LOG_PREFIX(OSD::_add_device_class);
if (!local_conf().get_val<bool>("osd_crush_update_on_start")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

osd_class_update_on_start should be used here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +623 to +626
}).handle_exception([FNAME](std::exception_ptr e) {
ERROR("Failed to get device class: ", e);
return seastar::make_ready_future<std::string>("");
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting an thrown exception here?
For expected errors in Crimson we use errorator, see dev/crimson/error-handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the code now, please let me know if it is ok now.

Comment on lines +631 to +633
string cmd = string("{\"prefix\": \"osd crush set-device-class\", ") +
string("\"class\": \"") + device_class + string("\", ") +
string("\"ids\": [\"") + stringify(whoami) + string("\"]}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's switch to fmt::format here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

});
};

return get_device_class().then([FNAME, this](auto device_class) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check that device_class is not an empty string before using it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

[[maybe_unused]] auto [code, message, out] = std::move(command_result);
if (code) {
WARN("fail to set device_class : {} ({})", message, code);
throw std::runtime_error("fail to add to crush");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fail to set device_class"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Matan-B
Copy link
Contributor

Matan-B commented Nov 17, 2024

makecheck:

/home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/osd/osd.cc:621:58: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    return store.get_default_device_class().then([FNAME, this](std::string device_class) {
                                                  ~~~~~~~^~~~
/home/jenkins-build/build/workspace/ceph-pull-requests/src/crimson/osd/osd.cc:623:26: error: variable 'FNAME' cannot be implicitly captured in a lambda with no capture-default specified
    }).handle_exception([FNAME](std::exception_ptr e) {

@mohit84
Copy link
Contributor Author

mohit84 commented Nov 18, 2024

jenkins test make check

@mohit84 mohit84 force-pushed the crimson_device_class branch from ea5684f to 29d8312 Compare November 18, 2024 11:47
@mohit84
Copy link
Contributor Author

mohit84 commented Nov 18, 2024

jenkins test make check

Comment on lines +620 to +628
auto get_device_class = [FNAME, this]() -> seastar::future<std::string> {
try {
std::string device_class = co_await store.get_default_device_class();
co_return device_class;
} catch (const std::exception& e) {
ERROR("Failed to get device class: ", e.what());
co_return "";
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for try/catch we discarding exceptions as we handle error with Crimson's errorator concept.
See TEST_F(coroutine_test_t, test_ertr_coroutine_error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip the get_device_class wrapper:

std::string device_class = co_await store.get_default_device_class();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +651 to +640
} catch (const std::exception& e) {
ERROR("Failed to add device class: ", e.what());
throw;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the code

device_class, stringify(whoami)
);

auto [code, message, out] = co_await monc->run_command(std::move(cmd), {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great usage!

auto [code, message, out] = co_await monc->run_command(std::move(cmd), {});
if (code) {
WARN("fail to set device_class : {} ({})", message, code);
throw std::runtime_error("fail to set device_class");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

// to be caught by crimson/osd/main.cc 
throw std::runtime_error("fail to set device_class");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mohit84 mohit84 force-pushed the crimson_device_class branch from 29d8312 to 39305b2 Compare November 19, 2024 08:53
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Left final comments

co_return;
}

INFO("device_class is {} whoami: {}", device_class, whoami);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The osd id would be printed as a prefix, whoami is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

WARN("fail to set device_class : {} ({})", message, code);
throw std::runtime_error("fail to set device_class");
} else {
INFO("newly added to crush: {}", message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"device_class was set: {}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Implement a wrapper for different backend storage to
set device_class during spawn of a process.

Fixes: https://tracker.ceph.com/issues/66627
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
@mohit84 mohit84 force-pushed the crimson_device_class branch from 39305b2 to d2ee4c1 Compare November 19, 2024 09:33
@mohit84
Copy link
Contributor Author

mohit84 commented Nov 19, 2024

jenkins test make check

@Matan-B
Copy link
Contributor

Matan-B commented Nov 24, 2024

jenkins test make check

@Matan-B
Copy link
Contributor

Matan-B commented Nov 24, 2024

@Matan-B https://pulpito.ceph.com/moagrawa-2024-11-21_08:55:14-crimson-rados-wip-mohit-crimson-device_class-distro-crimson-smithi/

Failures 8003283 https://tracker.ceph.com/issues/67972 8003305 https://tracker.ceph.com/issues/66852

Please also include "dead" jobs.
8003264 from perf suite failed as well, unrelated since a similar failure was seen in recent main run:
https://pulpito.ceph.com/teuthology-2024-11-23_20:56:02-crimson-rados-main-distro-crimson-smithi/

Merging, thank you!

@Matan-B
Copy link
Contributor

Matan-B commented Nov 25, 2024

jenkins test make check

@Matan-B
Copy link
Contributor

Matan-B commented Nov 25, 2024

Dead job links, it seems both jobs are dead due to some issue at infra level. https://pulpito.ceph.com/teuthology-2024-11-23_20:56:02-crimson-rados-main-distro-crimson-smithi/8005981/ https://pulpito.ceph.com/teuthology-2024-11-23_20:56:02-crimson-rados-main-distro-crimson-smithi/8005998/

There seems to be something off with Crimson and cbt (perf suite). As mentioned earlier these failures also appear in main so I'll merge once CI passes.

@Matan-B
Copy link
Contributor

Matan-B commented Nov 26, 2024

jenkins test make check

@Matan-B Matan-B merged commit 3caa542 into ceph:main Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants