Skip to content

Revert "quincy: ceph_fs.h: add separate owner_{u,g}id fields"#54108

Merged
yuriw merged 1 commit intoquincyfrom
revert-53139-wip-62571-quincy
Oct 23, 2023
Merged

Revert "quincy: ceph_fs.h: add separate owner_{u,g}id fields"#54108
yuriw merged 1 commit intoquincyfrom
revert-53139-wip-62571-quincy

Conversation

@vshankar
Copy link
Contributor

@vshankar vshankar commented Oct 19, 2023

Reverts #53139

Revert "quincy: ceph_fs.h: add separate owner_{u,g}id fields"

This seems to be causing test failures:

        https://pulpito.ceph.com/yuriw-2023-10-19_16:09:31-smoke-quincy-release-distro-default-smithi/7432399

Where the MDS crashes with

In function 'CInode* Server::prepare_new_inode(MDRequestRef&, CDir*, inodeno_t, unsigned int, const file_layout_t*)'
3449: FAILED ceph_assert(_inode->gid != (unsigned)-1)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7f991631c89e]
2: /usr/lib/ceph/libceph-common.so.2(+0x27eab0) [0x7f991631cab0]
3: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t const*)+0x2791) [0x55563bec4831]
4: (Server::handle_client_mkdir(boost::intrusive_ptr<MDRequestImpl>&)+0x1ec) [0x55563bede88c]

Revert this change till the failure is fixed.

@vshankar vshankar added the cephfs Ceph File System label Oct 19, 2023
@vshankar vshankar requested a review from a team October 19, 2023 17:09
@github-actions github-actions bot added this to the quincy milestone Oct 19, 2023
@vshankar vshankar force-pushed the revert-53139-wip-62571-quincy branch from df24f7a to cca2217 Compare October 19, 2023 17:12
This seems to be causing test failures:

        https://pulpito.ceph.com/yuriw-2023-10-19_16:09:31-smoke-quincy-release-distro-default-smithi/7432399

Where the MDS crashes with

In function 'CInode* Server::prepare_new_inode(MDRequestRef&, CDir*, inodeno_t, unsigned int, const file_layout_t*)'
3449: FAILED ceph_assert(_inode->gid != (unsigned)-1)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7f991631c89e]
2: /usr/lib/ceph/libceph-common.so.2(+0x27eab0) [0x7f991631cab0]
3: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t const*)+0x2791) [0x55563bec4831]
4: (Server::handle_client_mkdir(boost::intrusive_ptr<MDRequestImpl>&)+0x1ec) [0x55563bede88c]

Revert this change till the failure is fixed.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
@vshankar
Copy link
Contributor Author

Test runs in ~3hrs.

@mihalicyn
Copy link
Contributor

Test runs in ~3hrs.

ugh. So, this failure is hardly reproducible and can be reproduced only after many iterations, right?

This assertion was added in my PR. Before that we had no analogical assertion in this place. I will take a look closely on the test code and try to understand what happens here.

@vshankar
Copy link
Contributor Author

jenkins test make check

@vshankar
Copy link
Contributor Author

Test runs in ~3hrs.

ugh. So, this failure is hardly reproducible and can be reproduced only after many iterations, right?

Not really. We saw this smoke tests and not in fs suite. So, there is something that smoke test that's not in fs suite tests - I'm going to audit that. We probably didn't see this in the main branch run since smoke tests are not run :/

This assertion was added in my PR. Before that we had no analogical assertion in this place. I will take a look closely on the test code and try to understand what happens here.

ACK. Thx!

@mihalicyn
Copy link
Contributor

Ok, I have analyzed the ceph code and test code too. At this point I haven't found anything suspicious that can make gid == -1.

But, there is one serious thing about this ceph_assert. During a review it was proposed to add a sanity check for the uid/gid values (https://github.com/ceph/ceph/pull/52575/files#r1285297994). And I have added ceph_assert(_inode->uid != (unsigned)-1);. But I think that was a bad idea to add assert here, because assert AFAIU kills the whole MDS server, right? At the same time this uid/gid fields are under the full control of the client. It means that malicious client can send -1 by purpose and crash the entire MDS, right? My suggestion is to remove this assertions from the request handling code as they are unsafe.
@lxbsz Xiubo, what do you think about it?

Probably we need to revert #53138 for now too.

Speaking about smoke test failure, I wasn't able to understand how -1 appears in the owner_gid field. Is it possible somehow to rerun this tests with requests dump/debug for the MDS?

As it follows from test logs we have 5.4.0-165-generic (Ubuntu kernel?) and cephfs kernel client. So it means that cephfs client is old (without new feature that adds owner_uid/gid fields support).

@vshankar
Copy link
Contributor Author

Ok, I have analyzed the ceph code and test code too. At this point I haven't found anything suspicious that can make gid == -1.

But, there is one serious thing about this ceph_assert. During a review it was proposed to add a sanity check for the uid/gid values (https://github.com/ceph/ceph/pull/52575/files#r1285297994). And I have added ceph_assert(_inode->uid != (unsigned)-1);. But I think that was a bad idea to add assert here, because assert AFAIU kills the whole MDS server, right?

Correct.

At the same time this uid/gid fields are under the full control of the client. It means that malicious client can send -1 by purpose and crash the entire MDS, right? My suggestion is to remove this assertions from the request handling code as they are unsafe. @lxbsz Xiubo, what do you think about it?

Probably we need to revert #53138 for now too.

See my comment below - this failure isn't seen in main branch smoke test.

Speaking about smoke test failure, I wasn't able to understand how -1 appears in the owner_gid field. Is it possible somehow to rerun this tests with requests dump/debug for the MDS?

We have a core. With some effort we can setup an env and gdb the core to examine frames.

As it follows from test logs we have 5.4.0-165-generic (Ubuntu kernel?) and cephfs kernel client. So it means that cephfs client is old (without new feature that adds owner_uid/gid fields support).

@mihalicyn I haven't looked at the failures closely but the very recent smoke run on main branch doesn't have this failure - https://pulpito.ceph.com/?branch=wip-vshankar-testing-20231018.065603 - its possible that its something in the quincy that tripping it off...

Just FYI - I'm on vacation next week, so won't be able to look much into this.

@mihalicyn
Copy link
Contributor

Ok, I have analyzed the ceph code and test code too. At this point I haven't found anything suspicious that can make gid == -1.
But, there is one serious thing about this ceph_assert. During a review it was proposed to add a sanity check for the uid/gid values (https://github.com/ceph/ceph/pull/52575/files#r1285297994). And I have added ceph_assert(_inode->uid != (unsigned)-1);. But I think that was a bad idea to add assert here, because assert AFAIU kills the whole MDS server, right?

Correct.

Ok, then I have to send a new PR to fix that and replace ceph_assert with something like (existing example from the prepare_new_inode function):

  if (useino && useino != _inode->ino) {
    dout(0) << "WARNING: client specified " << useino << " and i allocated " << _inode->ino << dendl;
    mds->clog->error() << mdr->client_request->get_source()
       << " specified ino " << useino
       << " but mds." << mds->get_nodeid() << " allocated " << _inode->ino;
    //ceph_abort(); // just for now.
  }

So we can replace ceph_assert(_inode->gid != (unsigned)-1) to:

  if (_inode->gid == (unsigned)-1) {
    dout(0) << "WARNING: client specified gid = " << _inode->gid << " for ino " << _inode->ino << dendl;
    mds->clog->error() << mdr->client_request->get_source()
       << " specified gid = " << _inode->gid << " for ino " << _inode->ino;
  }

Do I need to create a new issue in the ceph tracker, or I can send PR and reuse an old https://tracker.ceph.com/issues/62217 ?
Also I need to create a backports PRs for it after it will be merged to the main branch.

Speaking about smoke test failure, I wasn't able to understand how -1 appears in the owner_gid field. Is it possible somehow to rerun this tests with requests dump/debug for the MDS?

We have a core. With some effort we can setup an env and gdb the core to examine frames.

Great! This would be awesome to analyze core.

I haven't looked at the failures closely but the very recent smoke run on main branch doesn't have this failure -

It's very very interesting information...

its possible that its something in the quincy that tripping it off...

Yep it's possible. Buffer overflow somewhere or something else. Ugh. :(

By the way, how much smoke tests runs we had for quincy branch? Only one? Or more?
If more then question is that if all of runs are failed on this assertion or not?

Just FYI - I'm on vacation next week, so won't be able to look much into this.

Have a great vacation, Venky!

I'll start from replacing the unsafe ceph_assert with something safer then.

mihalicyn added a commit to mihalicyn/ceph that referenced this pull request Oct 20, 2023
Having ceph_assert calls is unsafe in the Server::prepare_new_inode
when assertion depends on a data which is a client-provided.
A malicious client can easily crash MDS by purpose.

Let's replace two ceph_assert calls for owner_{u,g}id's with
log prints.

See also:
ceph#54108

Fixes: commit 46cb244 ("ceph_fs.h: add separate owner_{u,g}id fields")
Fixes: https://tracker.ceph.com/issues/62217

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
@vshankar
Copy link
Contributor Author

Ok, then I have to send a new PR to fix that and replace ceph_assert with something like (existing example from the prepare_new_inode function):

I meant to say that the assert will obviously crash the mds, but, I'm not sure if the assert was incorrectly placed or not - that would require to carefully look at the original PR. I can do that when I'm back.

@vshankar
Copy link
Contributor Author

I'll start from replacing the unsafe ceph_assert with something safer then.

You could run this through smoke test in quincy and see what shows up - that would be interesting.

By the way, how much smoke tests runs we had for quincy branch? Only one? Or more?
If more then question is that if all of runs are failed on this assertion or not?

The automatic cron jobs that trigger smoke runs were disabled until it was run manually and its when we saw this. Those cron jobs are been enabled now, so, we can see if the crash is reproducible.

@vshankar
Copy link
Contributor Author

@vshankar
Copy link
Contributor Author

Reviewed-by: Venky Shankar vshankar@redhat.com

@yuriw yuriw requested a review from mchangir October 21, 2023 14:28
Copy link
Contributor

@mchangir mchangir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Milind Changire mchangir@redhat.com

@mihalicyn
Copy link
Contributor

mihalicyn commented Oct 23, 2023

I think I found where the issue is. Issue is that two recent protocol extensions cbd7e30 and 46cb244 are not handling very old clients of ceph.

Let me show that. We have (https://github.com/ceph/ceph/blob/c672d7aab0240cefa8a3decdffd498e0ade3d5be/src/messages/MClientRequest.h#L230C13-L230C13):

  void decode_payload() override { // << decoder for message from the client
    using ceph::decode;
    auto p = payload.cbegin();

    if (header.version >= 4) { // this is for relatively new clients of ceph (after a "version" field was added to the "struct ceph_mds_request_head" structure.
      decode(head, p); // << in this method we have checks for h.version and correctly handle 32-bit ext_num_fwd and owner_uid/gid
    } else { // this if for "struct ceph_mds_request_head_legacy" and this structure does *not* contain the "version" field.
      struct ceph_mds_request_head_legacy old_mds_head;

      decode(old_mds_head, p); //<<< here we don't have any fallback code that fills a new fields ext_num_retry, ext_num_fwd, owner_uid, owner_gid with any data from an "old" fields num_retry, num_fwd, caller_uid, caller_gid
      copy_from_legacy_head(&head, &old_mds_head);
      head.version = 0;

      /* Can't set the btime from legacy struct */
      if (head.op == CEPH_MDS_OP_SETATTR) {
	int localmask = head.args.setattr.mask;

	localmask &= ~CEPH_SETATTR_BTIME;

	head.args.setattr.btime = { ceph_le32(0), ceph_le32(0) };
	head.args.setattr.mask = localmask;
      }
    }

сс @lxbsz

I'm ready to prepare fixes for both CEPHFS_FEATURE_32BITS_RETRY_FWD and CEPHFS_FEATURE_HAS_OWNER_UIDGID features. The only reason why it failed on the CEPHFS_FEATURE_HAS_OWNER_UIDGID is that we have added an assertion in the code. But we don't have any analogical assertions for CEPHFS_FEATURE_32BITS_RETRY_FWD so it's hard to notice any issues here.

The reproducer in the smoke test includes Linux kernel 5.4 which uses a very old cephfs protocol with struct ceph_mds_request_head_legacy.

@mihalicyn
Copy link
Contributor

mihalicyn commented Oct 23, 2023

I was able to reproduce this issue with Linux kernel 5.4.240 and ceph version from main branch.
I'm on the way to fix it and create a tracker issue.

We also need to fix it for reef branch.

cc @yuriw @batrick @mchangir

@mihalicyn
Copy link
Contributor

Fixed in #54149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants