Conversation
|
IMO we need to limit the total number of the |
6a0669b to
f465af3
Compare
|
Just curious to understand, you mentioned that this issue being fixed is easily seen with the PR #45073 (*#45073 (comment))**. The num_fwd is per request right? Does a single create request caused this to overflow ? |
Correct.
Yeah, it is. A single create request was bouncing between |
|
jenkins test make check |
If the CInode and CDir of the directory are on different mdses, I am failing to understand why it should take more than 256 hops? Is this some corner case that is being hit here ? |
|
The PR and the commits miss the tracker id. Doesn't this need backport ? |
As mentioned in PR #45073, this bug was found from @luis-henrix previous patches, which has been fixed after that. Except that as I know there is another case that when the pool is full, it will forward the create requests to the CInode auth MDS which could cause the requests hopping, maybe there are some others cases we haven't figured out yet which also could cause this. For the retry attempt one, it should be triggered by the clients instead, such as when reconnecting. Anyway the type mismatch issue should be fixed, which introduced confusion in the mds logs. |
IMO it should be okay, since the client side fixes have been backported. |
kotreshhr
left a comment
There was a problem hiding this comment.
Looks good to me otherwise
|
@lxbsz Can you talk about how you made sure these encoding changes maintain compatibility with old clients? I see you're changing the ceph_mds_request_head from a WRITE_RAW_ENCODER to hand-rolled encode/decode functions, and it's possible to do that correctly but makes me nervous — we need to be very careful with it. I also note that it looks like the changes mean the MDS simply ignores the "classic" u8 num_fwd, which seems like it will break old clients if the MDS forwards requests around? |
In the MDS side it will only maintain The old clients are still using the old
In the MDS side it will only maintain NOTE: the retry counter will be increased by the clients. As above mentioned, for the old clients they will continue parsing and using the old counters and in the MDS side the new counters are just two relay stations, receiving the values from clients and saving them and then passing them back via the old counters. ==== Okay, I found bug: In old clients when parsing the new |
|
This is good to merge. @lxbsz please rebase. |
Prepare switching num_retry and num_fwd to __le32. Fixes: https://tracker.ceph.com/issues/57854 Signed-off-by: Xiubo Li <xiubli@redhat.com>
To make sure both the num_retry and num_fwd in all the structs have the same type, which is 32 bits. And also for old cephs and clients they still could use the 8 bits members. Fixes: https://tracker.ceph.com/issues/57854 Signed-off-by: Xiubo Li <xiubli@redhat.com>
Check the CEPHFS_FEATURE_32BITS_RETRY_FWD feature bit and if not set, that means it's connecting to an old MDS and will limit the max retry to 256 times. Fixes: https://tracker.ceph.com/issues/57854 Signed-off-by: Xiubo Li <xiubli@redhat.com>
The MDS will increase the forward count, and if the forward count is less than the one saved in request in client side, that means the MDS is old version and it was overflowed. Then just stop forwarding. Fixes: https://tracker.ceph.com/issues/57854 Signed-off-by: Xiubo Li <xiubli@redhat.com>
Done. The conflicts were introduced by one old commit, which is doing |
Comments were addressed.
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> (cherry picked from commit 43f32a4)
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> (cherry picked from commit 43f32a4)
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> (cherry picked from commit 43f32a4)
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> (cherry picked from commit 43f32a4)
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph/ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> (cherry picked from commit 43f32a4)
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph/ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> (cherry picked from commit 43f32a4) (cherry picked from commit 312bb5b)
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
…quest_head_legacy When a client is too old and uses struct ceph_mds_request_head_legacy we must fill new ext_num_retry and ext_num_fwd fields from an old num_retry and num_fwd. Fixes: ceph#45669 Fixes: https://tracker.ceph.com/issues/63288 Fixes: commit cbd7e30 ("ceph_fs.h: add 32 bits extended num_retry and num_fwd support") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> (cherry picked from commit 43f32a4)
The num_fwd in MClientRequestForward is int32_t, while the num_fwd
in ceph_mds_request_head is __u8. This is buggy when the num_fwd
is larger than 256 it will always be truncate to 0 again. But the
client couldn't recoginize this.
Fixes: https://tracker.ceph.com/issues/57854
Signed-off-by: Xiubo Li xiubli@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows