Skip to content

rgw: fix valgrind errors when protected_fixedsize_stack is used#39025

Merged
cbodley merged 1 commit intoceph:masterfrom
yuvalif:wip-yuval-fix-48963
Jan 29, 2021
Merged

rgw: fix valgrind errors when protected_fixedsize_stack is used#39025
cbodley merged 1 commit intoceph:masterfrom
yuvalif:wip-yuval-fix-48963

Conversation

@yuvalif
Copy link
Contributor

@yuvalif yuvalif commented Jan 22, 2021

Fixes: https://tracker.ceph.com/issues/48963

Signed-off-by: Yuval Lifshitz ylifshit@redhat.com


Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting; does protected_fixedsize_stack supercede the approaches we've been using in other coroutines?

endfunction()

add_definitions(-DBOOST_USE_VALGRIND)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this one will need to apply globally in ceph, because we use boost all over. we don't want rgw to build it differently than other libraries like common/librados

it's also not clear what performance impact this has, so we might only want to turn it on when ALLOCATOR=libc, which (i think?) is the only way we can run with valgrind

i'd also like to see a short comment here describing why we need the flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be turned on when WITH_BOOST_VALGRIND is turned on. I thought it was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the pointer @adamemerson!

in CMakeLists.txt:

CMAKE_DEPENDENT_OPTION(WITH_BOOST_VALGRIND "Boost support for valgrind" OFF
  "NOT WITH_SYSTEM_BOOST" OFF)

and cmake/modules/BuildBoost.cmake:

    if((c MATCHES "coroutine|context") AND (WITH_BOOST_VALGRIND))
      set_target_properties(Boost::${c} PROPERTIES
        INTERFACE_COMPILE_DEFINITIONS "BOOST_USE_VALGRIND")
    endif()

it looks like that only takes effect for non-system boost, which i don't think is the case for any of our shaman builds. for our teuthology testing, we'd specifically need this on for the centos notcmalloc build

@cbodley
Copy link
Contributor

cbodley commented Jan 22, 2021

interesting; does protected_fixedsize_stack supercede the approaches we've been using in other coroutines?

nope, that is the same mmap+mprotect allocator that we switched to for rgw_asio_frontend.cc. i don't really think it's necessary here in rgw_notify.cc, where the code is very small and unlikely to overflow its stack - but these coroutines are long-lived, so the overhead of the extra system calls on spawn() is negligible

@yuvalif yuvalif force-pushed the wip-yuval-fix-48963 branch from 2b27f39 to f197ce6 Compare January 22, 2021 17:21
@yuvalif
Copy link
Contributor Author

yuvalif commented Jan 22, 2021

interesting; does protected_fixedsize_stack supercede the approaches we've been using in other coroutines?

nope, that is the same mmap+mprotect allocator that we switched to for rgw_asio_frontend.cc. i don't really think it's necessary here in rgw_notify.cc, where the code is very small and unlikely to overflow its stack - but these coroutines are long-lived, so the overhead of the extra system calls on spawn() is negligible

i did not increase the stack size beyond the default, but wanted to make sure it crashed if actual size is too big

@yuvalif
Copy link
Contributor Author

yuvalif commented Jan 23, 2021

@cbodley
Copy link
Contributor

cbodley commented Jan 25, 2021

that's with valgrind off. there's #38871 to turn it back on. you can test against that with:

--suite-repo https://github.com/cbodley/ceph.git --suite-branch wip-qa-rgw-valgrind-on

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@yuvalif yuvalif force-pushed the wip-yuval-fix-48963 branch from 343842d to 76d4030 Compare January 28, 2021 15:33
@yuvalif yuvalif requested a review from cbodley January 29, 2021 07:35
Copy link
Contributor

@adamemerson adamemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lawndarts, a Game Too Menacing

@cbodley cbodley merged commit 09bdf42 into ceph:master Jan 29, 2021
@cbodley
Copy link
Contributor

cbodley commented Feb 3, 2021

i still saw the InvalidRead issues from spawn::detail::continuation_context::resume() on master with https://pulpito.ceph.com/cbodley-2021-02-02_00:26:24-rgw:verify-master-distro-basic-gibba/

looking through the shaman build logs, it doesn't look like it's using WITH_SYSTEM_BOOST so this PR doesn't take effect

WITH_BOOST_VALGRIND seems to be the right way to resolve this one. i opened ceph/ceph-build#1736 against the ceph-build project to add that flag to the cmake command line for notcmalloc builds

regarding this PR, i think we should revert the cmake change but keep and backport the protected_fixedsize_stack change in rgw_notify.cc

@cbodley
Copy link
Contributor

cbodley commented Feb 3, 2021

opened #39263 to revert the cmake change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants