mds: prevent clients from exceeding the xattrs key/value limits#46357
mds: prevent clients from exceeding the xattrs key/value limits#46357
Conversation
741f8ba to
4fbe62b
Compare
|
I've just pushed another iteration of this PR. I think I've implemented all the bits. The thing I'm almost sure isn't 100% right is how to not grant 'Xx' caps to clients that don't support the new feature. It seems to be working, but the MDS code isn't really something I feel comfortable changing. More knowledgeable people are welcome to jump in. I'll send out the companion kernel client RFC patch shortly. |
|
Thanks for your review @lxbsz I'll update the code accordingly. Eventually, the new feature bit may be dropped and the mdsmap may be used instead as suggested on the mailing-list. |
|
Ok, I've pushed yet another draft to fix this issue. It now uses the mdsmap to share the xattrs max size knob with the clients, and this knob is now per-filesystem (not per-MDS). |
4fbe62b to
16fb361
Compare
src/mon/FSCommands.cc
Outdated
| } | ||
| if (n < MDS_MAX_XATTR_SIZE) { | ||
| ss << var << " must at least " << MDS_MAX_XATTR_SIZE; | ||
| return -ERANGE; |
There was a problem hiding this comment.
Why fail it here ? Shouldn't the MDS_MAX_XATTR_SIZE just a default value ? Users should be allowed to set any value larger or less than it, right ?
There was a problem hiding this comment.
Sure, I can drop this check. I'm not really sure if in which situations admins should be really playing with this setting anyway.
16fb361 to
56a4c74
Compare
|
Pushed a revised version, added a note to the PendingReleaseNotes files as suggested by @lxbsz. Also opened this PR for review (not a "draft" anymore). |
|
Hi @luis-henrix , BTW, shouldn't we fix the same issue in the |
Good point! I remember trying to reproduce this issue earlier using the fuse client and I wasn't able to, so I assumed it wasn't affected (i.e. I assumed it always did sync setxattr operations). But a quick look at the code seems to indicate otherwise. Let me go look closer. (And try to figure out again how to debug the fuse client again, as can never remember how to do that!) UPDATE: No, I was right in the first place. It looks like the fuse client always uses the |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
56a4c74 to
984fded
Compare
|
jenkins test make check |
|
jenkins test dashboard cephadm |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
@luis-henrix I hope this is the last rebase you have to push for this PR. Tests are running here: https://pulpito.ceph.com/vshankar-2023-02-24_04:31:46-fs-wip-vshankar-testing-20230222.044949-testing-default-smithi/ |
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de>
Commit eb915d0 ("cephfs: fix write_buf's _len overflow problem") added a limit to the total size of xattrs. This limit is respected by clients doing a "sync" operation, i.e. MDS_OP_SETXATTR. However, clients with CAP_XATTR_EXCL can still buffer these operations and ignore these limits. This patch prevents clients from crashing the MDSs by also imposing the xattr limits even when they have the Xx caps. Replaces the per-MDS knob "max_xattr_pairs_size" by the new mdsmap setting that the clients can access. Unfortunately, clients that misbehave, i.e. old clients that don't respect this xattrs limit and buffer their xattrs, will see them vanishing. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de>
When doing a sync setxattr (MDS_OP_SETXATTR) to set the first xattr in an inode it is possible for the client to set a huge xattr key/value that would exceed the limit. Prevent this from from happening by checking the size against the maximum allowed size. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de>
…p schema
Let the mdsmap schema know about the new field 'max_xattr_size'. This prevents
the following error:
tasks.mgr.dashboard.helper._ValError: \
In `input['fs_map']['filesystems'][0]['mdsmap']`: unknown keys: {'max_xattr_size'}
URL: https://tracker.ceph.com/issues/55725
Signed-off-by: Luís Henriques <lhenriques@suse.de>
…ield Signed-off-by: Luís Henriques <lhenriques@suse.de>
25e9be0 to
b70c752
Compare
|
@luis-henrix I hope this is the last rebase you have to push for this PR.
I hope too :-)
Pushed a new rebase. Thanks @vshankar
|
|
I've no idea what these failures are. I've just received an email with this cryptic information:
Do I need to do something here? Or is this still some infrastruture problem? |
|
jenkins test docs |
vshankar
left a comment
There was a problem hiding this comment.
Tests ran successfully: https://pulpito.ceph.com/vshankar-2023-03-02_09:21:58-fs-wip-vshankar-testing-20230222.044949-testing-default-smithi/
Failures are not related to this PR. This should get merged soon.
|
jenkins test make check arm64 |
1 similar comment
|
jenkins test make check arm64 |
This is required for a reef client to work with a higher revision
MDS, since essentially, this happens:
reef(client):
if (version >=17) {
decode(bal_rank_mask, p);
}
and higher-revision MDS (say, upcoming squid):
version = 17
encode(version, bl);
...
...
encode(max_xattr_size, bl);
encode(bal_rank_mask, bl);
The client incorrectly decodes max_xattr_size (type: uint64_t) into
bal_rank_mask (type: string).
This situation ended up due to a couple of reasons:
* the kclient patchset hanlding `max_xattr_size` was merged early on
and another MDS side change that bumped the MDSMap encoding version
to 17 got merged in the midst (PR ceph#43284). Details in comment:
ceph#46357 (comment)
* The reef backport for PR ceph#46357 got delayed (and, reef branched out).
Which means reef(18.2.0) user-space clients are broken with higher version
MDSs.
Fixes: https://tracker.ceph.com/issues/63713
Signed-off-by: Venky Shankar <vshankar@redhat.com>
This is required for a reef client to work with a higher revision
MDS, since essentially, this happens:
reef(client):
if (version >=17) {
decode(bal_rank_mask, p);
}
and higher-revision MDS (say, upcoming squid):
version = 17
encode(version, bl);
...
...
encode(max_xattr_size, bl);
encode(bal_rank_mask, bl);
The client incorrectly decodes max_xattr_size (type: uint64_t) into
bal_rank_mask (type: string).
This situation ended up due to a couple of reasons:
* the kclient patchset hanlding `max_xattr_size` was merged early on
and another MDS side change that bumped the MDSMap encoding version
to 17 got merged in the midst (PR ceph#43284). Details in comment:
ceph#46357 (comment)
* The reef backport for PR ceph#46357 got delayed (and, reef branched out).
Which means reef(18.2.0) user-space clients are broken with higher version
MDSs.
Fixes: https://tracker.ceph.com/issues/63713
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 36ee8e7)
This is required for a reef client to work with a higher revision
MDS, since essentially, this happens:
reef(client):
if (version >=17) {
decode(bal_rank_mask, p);
}
and higher-revision MDS (say, upcoming squid):
version = 17
encode(version, bl);
...
...
encode(max_xattr_size, bl);
encode(bal_rank_mask, bl);
The client incorrectly decodes max_xattr_size (type: uint64_t) into
bal_rank_mask (type: string).
This situation ended up due to a couple of reasons:
* the kclient patchset hanlding `max_xattr_size` was merged early on
and another MDS side change that bumped the MDSMap encoding version
to 17 got merged in the midst (PR #43284). Details in comment:
ceph/ceph#46357 (comment)
* The reef backport for PR #46357 got delayed (and, reef branched out).
Which means reef(18.2.0) user-space clients are broken with higher version
MDSs.
Fixes: https://tracker.ceph.com/issues/63713
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 36ee8e7)
A client is allowed to set as many xattrs in an inode as it wants, as long as it has CAP_XATTR_EXCL. This will allow him to buffer the xattrs and send them to the MDS later on. This will eventually result in crashing the MDS (and possibly also requires recovering the filesystem?).
There was an attempt long time ago to fix this in https://tracker.ceph.com/issues/19033, which resulted in commit eb915d0 ("cephfs: fix write_buf's _len overflow problem"). This fix added a maximum size for the xattrs (keys + values). But this maximum is only enforced when a client uses the synchronous MDS_OP_SETXATTR operation.
Here's a small test case:
Note that this PR is just an RFC, I'm really not sure what's the best way to fix this issue: a user that sets an attribute in a file doesn't get any error at all. So, maybe we should simply not buffer xattrs on the client. Also, I think the MDS will need to take some further actions after detecting the problem: besides dropping the xattrs, it probably needs to force the revogation of caps from the client so that it can get an update.
Fixes: https://tracker.ceph.com/issues/55725
Signed-off-by: Luís Henriques lhenriques@suse.de