Skip to content

client: handle snapped inode reply in add_updated_inode()#61838

Closed
vshankar wants to merge 2 commits intoceph:mainfrom
vshankar:wip-69624
Closed

client: handle snapped inode reply in add_updated_inode()#61838
vshankar wants to merge 2 commits intoceph:mainfrom
vshankar:wip-69624

Conversation

@vshankar
Copy link
Contributor

@vshankar vshankar commented Feb 17, 2025

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Calling

        ceph_ll_lookup(cmount, root, "/", ...)

cause the MDS to receive client request as

        handle_client_request client_request(client.44210:2 lookup #0x1// ...)

The MDS interprets this as a lookup on a snapped inode (snapdir in this case)
and replies back to the client with snap based inode metadata. Using this inode
to perform further lookups causes the client to segfault since the snapdir
component (metadata) isn't available.

Fixes: https://tracker.ceph.com/issues/69624
Signed-off-by: Venky Shankar <vshankar@redhat.com>
@vshankar vshankar added the cephfs Ceph File System label Feb 17, 2025
@vshankar vshankar requested a review from a team February 17, 2025 06:48
@github-actions github-actions bot added the tests label Feb 17, 2025

/* treated as a request under a snapdir */
ret = ceph_ll_lookup(cmount, slash, dir_name, &diri, &stx, CEPH_STATX_INO, 0, perms);
ASSERT_EQ(ret, -ENOENT);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, note that using the parent inode (slash), this is treated as a request under a snap since // is interpreted as a snap request by the MDS.

Credit to @synarete (Shachar Sharon) and @anoopcs9 (Anoop C S)
for reproducer.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
@dparmar18
Copy link
Contributor

dparmar18 commented Feb 17, 2025

@vshankar locally it's failing with coredump in mds:

2025-02-17T15:17:16.997+0530 7ff983d4c6c0 10 mds.0.server reply to stat on client_request(client.4285:3 lookup #0x1// 2025-02-17T15:17:16.997746+0530 caller_uid=1000, caller_gid=1000{})
2025-02-17T15:17:17.001+0530 7ff983d4c6c0 -1 *** Caught signal (Aborted) **
 in thread 7ff983d4c6c0 thread_name:ms_dispatch

 ceph version 19.3.0-6441-g2163cb10cf4 (2163cb10cf47667861d059b3dd887eabc2c5f413) squid (dev)
 1: /home/dparmar/CephRepoForRunningTests/ceph/build/bin/ceph-mds(+0x70e406) [0x55824f718406]
 2: /lib64/libc.so.6(+0x40d10) [0x7ff988a4ed10]
 3: /lib64/libc.so.6(+0x99bd4) [0x7ff988aa7bd4]
 4: gsignal()
 5: abort()
 6: (std::chrono::_V2::system_clock::now()+0) [0x7ff988cdabf0]
 7: (std::vector<CDentry*, std::allocator<CDentry*> >::operator[](unsigned long)+0) [0x55824f48df4e]
 8: (Server::handle_client_getattr(boost::intrusive_ptr<MDRequestImpl> const&, bool)+0xd1e) [0x55824f444898]
 9: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl> const&)+0xa7d) [0x55824f48673b]
 10: (Server::handle_client_request(boost::intrusive_ptr<MClientRequest const> const&)+0xb2c) [0x55824f487776]
 11: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0xa4) [0x55824f48afaa]
 12: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0x4ee) [0x55824f3a7bda]
 13: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x239) [0x55824f3a9d81]
 14: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x206) [0x55824f3aa728]
 15: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x2b1) [0x55824f39877f]
 16: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x5d) [0x7ff989f540bd]
 17: (DispatchQueue::entry()+0x418) [0x7ff989f50bfc]
 18: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff989ffce33]
 19: (Thread::entry_wrapper()+0x2f) [0x7ff989df713d]
 20: (Thread::_entry_func(void*)+0x9) [0x7ff989df714f]
 21: /lib64/libc.so.6(+0x97c17) [0x7ff988aa5c17]
 22: /lib64/libc.so.6(+0x11bb0c) [0x7ff988b29b0c]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

eats up the standby:

    health: HEALTH_ERR
            1 filesystem is degraded
            insufficient standby MDS daemons available
mds: 1/1 daemons up

culprit looks to be ret = ceph_ll_lookup(cmount, root, "/", &slash, &stx, CEPH_STATX_INO, 0, perms); ASSERT_EQ(ret, 0);

@vshankar
Copy link
Contributor Author

@vshankar locally it's failing with coredump in mds:

ha! I saw that failure initially when I used the reproducer and mentioned is here: https://tracker.ceph.com/issues/69624#note-4, but then I used the latest HEAD and didn't see it again, which made me believe that some my local changes were contributing to it.

@vshankar
Copy link
Contributor Author

@dparmar18 can you track down the LoC where the MDS is crashing if you are able to reproduce this with latest HEAD?

@dparmar18
Copy link
Contributor

@vshankar locally it's failing with coredump in mds:

ha! I saw that failure initially when I used the reproducer and mentioned is here: https://tracker.ceph.com/issues/69624#note-4, but then I used the latest HEAD and didn't see it again, which made me believe that some my local changes were contributing to it.

hmm, let me rebase and try with latest head, im using old repo

sprintf(dir_name, "dir_%d", mypid);
sprintf(dir_path, "/%s", dir_name);

ASSERT_EQ(ceph_mkdir(cmount, dir_path, 0777), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to create this dir? we are making use of dir_name instead of dir_path in the last lookup (because we want ENOENT), i guess we can just omit this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err, actually I wanted to create the file and check if we really get back an ENOENT.

Copy link
Contributor

@anoopcs9 anoopcs9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I could no longer reproduce the crash with the reproducer from tracker using a custom built libcephfs including the suggested fix.

# ./lookup foo cephvol
root inode = 3
first lookup at / completed
Segmentation fault (core dumped)

# LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib ./lookup foo cephvol
root inode = 3
first lookup at / completed
ceph_ll_lookup failed: No such file or directory

Copy link
Contributor

@dparmar18 dparmar18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested it locally and works fine.

@anoopcs9
Copy link
Contributor

Can we move forward with the next steps to get the changes merged soon?

@vshankar
Copy link
Contributor Author

Can we move forward with the next steps to get the changes merged soon?

Need to run this through tests. Will be doing it this week.

@vshankar
Copy link
Contributor Author

This PR is under test in https://tracker.ceph.com/issues/70201.

@vshankar
Copy link
Contributor Author

vshankar commented Mar 3, 2025

Tests did not run over the weekend. Trigerred it again.

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@anoopcs9
Copy link
Contributor

anoopcs9 commented Mar 4, 2025

@vshankar I see that our CI runs have passed this morning and my guess is that #60746 might have fixed it. It can very well be the case that the reproducer(from the tracker) and/or test case from this change is already passing with the latest code base. I'll also check in sometime.

@vshankar
Copy link
Contributor Author

vshankar commented Mar 4, 2025

@vshankar I see that our CI runs have passed this morning and my guess is that #60746 might have fixed it. It can very well be the case that the reproducer(from the tracker) or test case from this change is already passing with the latest code base. I'll also check in sometime.

I will check the path walk/traverse changes in that PR and see if this change still needs to be taken in.

@anoopcs9
Copy link
Contributor

anoopcs9 commented Mar 4, 2025

This is interesting. With the changes from #60746 the second lookup from the reproducer is successfully completed whereas it returned ENOENT previously with just the fix from the current PR.

# ./lookup foo cephvol
root inode = 1
first lookup at / completed
Segmentation fault (core dumped)

With just the changes from the current PR(excluding #60746):

# ./lookup foo cephvol
root inode = 1
first lookup at / completed
ceph_ll_lookup failed: No such file or directory

With the current code base including #60746:

# ./lookup foo cephvol
root inode = 1
first lookup at / completed

@anoopcs9
Copy link
Contributor

anoopcs9 commented Mar 5, 2025

@vshankar I see that our CI runs have passed this morning and my guess is that #60746 might have fixed it. It can very well be the case that the reproducer(from the tracker) and/or test case from this change is already passing with the latest code base. I'll also check in sometime.

To confirm, I ran the test case from the PR against the current code base and it failed:

# bin/ceph_test_libcephfs --gtest_filter="LibCephFS.PathWalkAndLookup"
Note: Google Test filter = LibCephFS.PathWalkAndLookup
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from LibCephFS
[ RUN      ] LibCephFS.PathWalkAndLookup
/home/workspace/ceph.git/src/test/libcephfs/test.cc:4079: Failure
Expected equality of these values:
  ret
    Which is: 0
  -2
[  FAILED  ] LibCephFS.PathWalkAndLookup (178 ms)
[----------] 1 test from LibCephFS (178 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (178 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] LibCephFS.PathWalkAndLookup

 1 FAILED TEST

@vshankar
Copy link
Contributor Author

vshankar commented Mar 5, 2025

@anoopcs9 Thx for the updates. I will get to this tomorrow (finishing up another priority bug).

@vshankar
Copy link
Contributor Author

vshankar commented Mar 6, 2025

This is interesting. With the changes from #60746 the second lookup from the reproducer is successfully completed whereas it returned ENOENT previously with just the fix from the current PR.

Yeah - that path_walk changes in the case-insensitivity PR is treating looking up "/" as success and the I think the request isn't even forwarded to the MDS. Question is, what's the correct/intended behaviour?

@vshankar
Copy link
Contributor Author

vshankar commented Mar 6, 2025

The thing is if its possible to sneak in a request to the MDS with path as <ino>//, then the issue in the client can still be hit.

@vshankar
Copy link
Contributor Author

So, here what required from this change: The path_walk changes from the case-insentivity changes have transparently worked around the issue that this change was aiming to fix. The question about the intended behaviour to lookup / is actually undefined -- in the sense, should it return the inode of the directory itself or the inode of its snapdir (since // has a special meaning in the MDS). Unfortunately, this behaviour is undefined. So, we can leave the behaviour as-it-is (@anoopcs9 FYI).

What can be done in this change is to disallow operations if the dname has an /, esp. the lower-level API which is used by nfs-ganesha and SMB.

@anoopcs9
Copy link
Contributor

So, here what required from this change: The path_walk changes from the case-insentivity changes have transparently worked around the issue that this change was aiming to fix. The question about the intended behaviour to lookup / is actually undefined -- in the sense, should it return the inode of the directory itself or the inode of its snapdir (since // has a special meaning in the MDS). Unfortunately, this behaviour is undefined. So, we can leave the behaviour as-it-is (@anoopcs9 FYI).

Here's my take:
As of now ceph_ll_lookup() treats anything that starts with / as absolute and ignores the parent which is the natural(and expected) behaviour. So I agree that the current approach is defined and expected.

What can be done in this change is to disallow operations if the dname has an /, esp. the lower-level API which is used by nfs-ganesha and SMB.

I'm not sure it is needed. / is valid and if I get it correctly parent is ignored if the name is absolute which should mostly avoid the problem of // but I may be wrong.

@anoopcs9
Copy link
Contributor

anoopcs9 commented Apr 22, 2025

@vshankar Should we proceed with this PR further? I don't think so.

Also what to do with the underlying tracker #69624?

@vshankar
Copy link
Contributor Author

@vshankar Should we proceed with this PR further? I don't think so.

Also what to do with the underlying tracker #69624?

I think this can be close this the issue is fixed by the path_walk changes for case-insentivity.

@vshankar vshankar closed this Apr 22, 2025
@vshankar
Copy link
Contributor Author

I will fixup the tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants