test/rgw: fix use of poll() with timers in unittest_rgw_dmclock_scheduler by cbodley · Pull Request #42425 · ceph/ceph

cbodley · 2021-07-20T17:00:05Z

the AsyncScheduler uses an asio timer to dispatch work to its executor with an optional delay. when no delay is requested, it waits on the timer with an expiration time in the past (crimson::dmclock::TimeZero)

tests are failing here because poll() is returning without executing the handlers of those expired timers

asio implements these timers with timerfd and epoll. debugging with strace, i see that these timers armed with timerfd_settime() are not always immediately ready according to epoll_wait():

  eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK)   = 3
  epoll_create1(EPOLL_CLOEXEC)            = 4
  timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC) = 5
  epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLERR|EPOLLET, data={u32=14164052, u64=14164052}}) = 0
  epoll_ctl(4, EPOLL_CTL_ADD, 5, {events=EPOLLIN|EPOLLERR, data={u32=14164064, u64=14164064}}) = 0
  timerfd_settime(5, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=0}}) = 0
  epoll_wait(4, [{events=EPOLLIN, data={u32=14164052, u64=14164052}}], 128, 0) = 1
  epoll_wait(4, [], 128, 0)               = 0
  epoll_wait(4, [], 128, 0)               = 0
  epoll_wait(4, [], 128, 0)               = 0
  epoll_wait(4, [], 128, 0)               = 0
  epoll_wait(4, [{events=EPOLLIN, data={u32=14164064, u64=14164064}}], 128, 0) = 1

in this example, it took 6 calls to context.poll() before it was ready to execute the timer's handler

to work around this, replace calls to context.poll() with calls to context.run_for() with a very short duration

Fixes: https://tracker.ceph.com/issues/42788

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

…uler the AsyncScheduler uses an asio timer to dispatch work to its executor with an optional delay. when no delay is requested, it waits on the timer with an expiration time in the past (crimson::dmclock::TimeZero) tests are failing here because poll() is returning without executing the handlers of those expired timers asio implements these timers with timerfd and epoll. debugging with strace, i see that these timers armed with timerfd_settime() are not always immediately ready according to epoll_wait(): eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 3 epoll_create1(EPOLL_CLOEXEC) = 4 timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC) = 5 epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLERR|EPOLLET, data={u32=14164052, u64=14164052}}) = 0 epoll_ctl(4, EPOLL_CTL_ADD, 5, {events=EPOLLIN|EPOLLERR, data={u32=14164064, u64=14164064}}) = 0 timerfd_settime(5, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=0}}) = 0 epoll_wait(4, [{events=EPOLLIN, data={u32=14164052, u64=14164052}}], 128, 0) = 1 epoll_wait(4, [], 128, 0) = 0 epoll_wait(4, [], 128, 0) = 0 epoll_wait(4, [], 128, 0) = 0 epoll_wait(4, [], 128, 0) = 0 epoll_wait(4, [{events=EPOLLIN, data={u32=14164064, u64=14164064}}], 128, 0) = 1 in this example, it took 6 calls to context.poll() before it was ready to execute the timer's handler to work around this, replace calls to context.poll() with calls to context.run_for() with a very short duration Fixes: https://tracker.ceph.com/issues/42788 Signed-off-by: Casey Bodley <cbodley@redhat.com>

Signed-off-by: Casey Bodley <cbodley@redhat.com>

cbodley · 2021-07-20T17:05:14Z

@tchaikov @t-msn essentially the same fix as before, except it uses run_for(1ms) instead of run(). hopefully 1ms is long enough that we don't see test failures from jenkins

t-msn · 2021-07-21T05:57:44Z

@cbodley I tested this on my local machine and the test passed successfully. Thanks!

cbodley added 2 commits July 19, 2021 18:07

test/rgw: use spawn library for test_rgw_dmclock_scheduler

a8e3589

Signed-off-by: Casey Bodley <cbodley@redhat.com>

cbodley added rgw tests labels Jul 20, 2021

github-actions bot added the build/ops label Jul 20, 2021

cbodley requested a review from tchaikov July 20, 2021 17:02

mgfritch mentioned this pull request Jul 21, 2021

mgr/cephadm: replace execnet and remoto with asyncssh #42051

Merged

3 tasks

tchaikov approved these changes Jul 21, 2021

View reviewed changes

tchaikov merged commit b4920f0 into ceph:master Jul 21, 2021

cbodley deleted the wip-42788 branch July 21, 2021 16:16

cfsnyder mentioned this pull request Aug 4, 2021

pacific: test/rgw: fix use of poll() with timers in unittest_rgw_dmclock_scheduler #42651

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/rgw: fix use of poll() with timers in unittest_rgw_dmclock_scheduler#42425

test/rgw: fix use of poll() with timers in unittest_rgw_dmclock_scheduler#42425
tchaikov merged 2 commits intoceph:masterfrom
cbodley:wip-42788

cbodley commented Jul 20, 2021

Uh oh!

cbodley commented Jul 20, 2021

Uh oh!

t-msn commented Jul 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cbodley commented Jul 20, 2021

Checklist

Uh oh!

cbodley commented Jul 20, 2021

Uh oh!

t-msn commented Jul 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants