Conversation
jdurgin
left a comment
There was a problem hiding this comment.
hmm, this may explain some transient watch/notify test failures! nice find!
could you file a tracker ticket and reference it in the commit message so we can backport this as well? the logic here hasn't changed since 2014
|
@jdurgin here is the ticket https://tracker.ceph.com/issues/47004 |
when linger ping failed with error, like ENOTCONN last_error set to error. after that, last_error will never recovery to succecss(0), even reconnecting successfully, which stops from sending linger ping to osd. as a result, this normal client ** can't receive notify message ** after osd_client_watch_timeout goes away. Fixes: https://tracker.ceph.com/issues/47004 Signed-off-by: Song Shun <song.shun3@zte.com.cn>
f886150 to
65d05fd
Compare
Watches can be expected to fail randomly. librbd uses the [1] https://github.com/ceph/ceph/blob/master/src/include/rados/librados.hpp#L191 |
|
@jdurgin ceph API tests failure seems to be high kv commit latency, so is not related to this pr? |
We at least have 3 tracker tickets related to watch/notify tests, if not more. It will be good to verify if this PR fixes all or any of these issues.
|
|
@jdurgin @neha-ojha Going to run a few iterations of these to see if I can reproduce reasonably reliably. I'll update here with findings. |
|
@jdurgin @neha-ojha I successfully completed 50 iterations without being able to reproduce so I don't really have a way of testing this fix against those bugs. |
when linger ping failed with error, like ENOTCONN
last_error set to error.
after that, last_error will never recovery to succecss(0),
even reconnecting successfully, which stops from sending linger ping to osd.
as a result, this normal client ** can't receive notify message **
after osd_client_watch_timeout goes away.
Fixes: https://tracker.ceph.com/issues/47004
Signed-off-by: Song Shun song.shun3@zte.com.cn
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox