Skip to content

crypto/isal: add vectorized async md5 library#52385

Closed
cbodley wants to merge 2 commits intoceph:mainfrom
cbodley:wip-isal-md5
Closed

crypto/isal: add vectorized async md5 library#52385
cbodley wants to merge 2 commits intoceph:mainfrom
cbodley:wip-isal-md5

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Jul 10, 2023

builds an asynchronous batching library on top of the isal crypto library's multi-buffer md5 facilities:

The MD5 CTX interface functions are available for 4 architectures: SSE, AVX, AVX2 and
AVX512. In addition, a multibinary interface is provided, which selects the appropriate
architecture-specific function at runtime.

the goal, tracked in https://tracker.ceph.com/issues/61646, is to reduce the cpu load from ETag calculations in rgw

based on top of two PRs, #51573 and #50064, to enable the use of new asio features like any_completion_handler

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@github-actions
Copy link

github-actions bot commented Aug 2, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor Author

cbodley commented Aug 10, 2023

this work is specific both to MD5 hashes and to the isa-l library, but i think there's interest both in support other hashes, and for other libraries like QAT

// have to post() the completion instead
boost::asio::post(boost::asio::bind_executor(get_executor(), std::move(h)));
} else {
boost::asio::dispatch(std::move(h));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while testing #52488, we sometimes see a crash here due to what looks like an empty any_completion_handler

in the trace i'm looking at, Batch::init_async_hash() was calling complete() on the ctx returned by md5_ctx_mgr_submit(), which was not the same as submit_ctx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added a unit test that can reproduce this crash. after enabling the thread sanitizer:

diff --git a/src/test/isal/CMakeLists.txt b/src/test/isal/CMakeLists.txt
index bbff3bd0d03..1b83b89d14a 100644
--- a/src/test/isal/CMakeLists.txt
+++ b/src/test/isal/CMakeLists.txt
@@ -1,3 +1,5 @@
 add_executable(unittest_isal_md5 test_isal_md5.cc)
 add_ceph_unittest(unittest_isal_md5)
 target_link_libraries(unittest_isal_md5 isal_md5)
+target_compile_options(unittest_isal_md5 PRIVATE "-fsanitize=thread")
+target_link_options(unittest_isal_md5 PRIVATE "-fsanitize=thread")

and running that test case repeatedly:

bin/unittest_isal_md5 --gtest_filter=isal_md5.batch_thread_pool --gtest_repeat=100 &> batch_thread_pool.log

thread sanitizer reports these data races: https://gist.github.com/cbodley/5f4dcc6b1148b2302313a5ba7edc1c15

Comment on lines +113 to +120
void init_async_hash(boost::asio::any_completion_handler<void(error_code)> handler,
Digest& digest, std::string_view input, bool last);

struct initiate_async_hash {
Batch* self;

using executor_type = Batch::executor_type;
executor_type get_executor() const noexcept { return self->ex; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intent of struct initiate_async_hash and get_executor() here was to run init_async_hash() within the Batch executor, but that doesn't seem to happen for 'awaitable' completion tokens

in the isal_md5.batch_thread_pool unit test, the calls to init_async_hash() are running on the thread pool context rather than the strand executor, hence the data races that lead to crashes

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

instead of building isa-l_crypto source files directly into the
ceph_crypto_isal plugin, use a cmake ExternalProject to build the
library with make and expose the static library as cmake target
ISAL::Crypto

Signed-off-by: Casey Bodley <cbodley@redhat.com>
builds an asynchronous batching library on top of the isal crypto
library's multi-buffer md5 facilities:

> The MD5 CTX interface functions are available for 4 architectures: SSE, AVX, AVX2 and
> AVX512. In addition, a multibinary interface is provided, which selects the appropriate
> architecture-specific function at runtime.

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Feb 19, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant