Skip to content

mgr/nfs: disallow non-existent paths when creating export#49460

Merged
vshankar merged 9 commits intoceph:mainfrom
dparmar18:wip-58228
Mar 31, 2023
Merged

mgr/nfs: disallow non-existent paths when creating export#49460
vshankar merged 9 commits intoceph:mainfrom
dparmar18:wip-58228

Conversation

@dparmar18
Copy link
Contributor

@dparmar18 dparmar18 commented Dec 15, 2022

Fixes:

Signed-off-by: Dhairya Parmar dparmar@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@dparmar18 dparmar18 marked this pull request as ready for review December 15, 2022 12:02
@dparmar18 dparmar18 requested a review from a team December 15, 2022 12:02
@dparmar18 dparmar18 force-pushed the wip-58228 branch 4 times, most recently from 857a116 to 7f9e2c3 Compare December 15, 2022 19:51
@dparmar18 dparmar18 requested review from a team and vshankar December 15, 2022 19:52
@dparmar18 dparmar18 force-pushed the wip-58228 branch 5 times, most recently from b41a50c to e0c613c Compare December 19, 2022 11:05
@dparmar18 dparmar18 requested review from a team and vshankar December 19, 2022 11:09
@dparmar18
Copy link
Contributor Author

jenkins test make check

@dparmar18
Copy link
Contributor Author

jenkins test api

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@dparmar18
Copy link
Contributor Author

no changes made, resolved a conflicted and rebased.

@dparmar18
Copy link
Contributor Author

@dparmar18 Seeing your latest commits it looks like integrating mocking for the added changes is on-going (as we discussed in IRC). Let me know if you need help.

As we discussed it's a test that is causing the issue, im figuring out how to fix it

@dparmar18
Copy link
Contributor Author

@phlogistonjohn Hi there. Seeing your contributions to the mgr/nfs code, pinging you case you have some insights on the issue this PR is facing.

Summary of the code before moving to the issue: This PR uses mgr_util.py's CephfsClient and open_filesystem() in export.py to open a fs handle to check if the cephfs path is valid or not before creating cephfs exports.

Issues are not faced in teuthology, it's all green but rather make check tests never complete, runs till timeout i.e. 3 hours. I had to run tox tests locally to find out the actual issue that at the time of exit, the run would halt and need manual interruption (CTRL + C) else it would wait till infinity.

At first the issue seemed to be with fs not monkey patched in test_nfs.py and thus leaded to Exception getting caught and raised which I fixed by wrapping the code that checks for cephfs path into a helper function under utils.py and mocking it as true in test_nfs.py (52c5812). Once this got sorted the tests passed but the halting issue still persisted, then i pushed a commit a9cb9e5 that comments a test case(how did i figure this out? trial and error with every test case) in test_nfs.py and also monkey patches CephfsClient but very vaguely, i mean we would need all the things to be patched but this was just to see if bypassing it makes it work or not and to my surprise it went fine and make check passed. So there's a lot weird going on. @vshankar and I agree to completely monkey patch CephfsClient which should this issue however we thought to ping and ask you if you have any thoughts on this. Especially the mystery of the test case that is commented out.

@phlogistonjohn
Copy link
Contributor

phlogistonjohn commented Mar 27, 2023

@phlogistonjohn Hi there. Seeing your contributions to the mgr/nfs code, pinging you case you have some insights on the issue this PR is facing.

Summary of the code before moving to the issue: This PR uses mgr_util.py's CephfsClient and open_filesystem() in export.py to open a fs handle to check if the cephfs path is valid or not before creating cephfs exports.

Issues are not faced in teuthology, it's all green but rather make check tests never complete, runs till timeout i.e. 3 hours. I had to run tox tests locally to find out the actual issue that at the time of exit, the run would halt and need manual interruption (CTRL + C) else it would wait till infinity.

At first the issue seemed to be with fs not monkey patched in test_nfs.py and thus leaded to Exception getting caught and raised which I fixed by wrapping the code that checks for cephfs path into a helper function under utils.py and mocking it as true in test_nfs.py (52c5812). Once this got sorted the tests passed but the halting issue still persisted, then i pushed a commit a9cb9e5 that comments a test case(how did i figure this out? trial and error with every test case) in test_nfs.py and also monkey patches CephfsClient but very vaguely, i mean we would need all the things to be patched but this was just to see if bypassing it makes it work or not and to my surprise it went fine and make check passed. So there's a lot weird going on. @vshankar and I agree to completely monkey patch CephfsClient which should this issue however we thought to ping and ask you if you have any thoughts on this. Especially the mystery of the test case that is commented out.

I haven't dug into the behavior of CephfsClient much myself but it seems that if adding:

                mock.patch('nfs.export.check_cephfs_path',
                           return_value=(0, "cephfs path is valid")), \

did not fix the hang but adding:

                mock.patch('nfs.export.CephfsClient', return_value=mock.MagicMock()), \

did I bet the fact that the CephfsClient is created and assigned in ExportMgr.init is an issue:

self.fs_client = CephfsClient(mgr)

IOW, if there's state that needs to connect to some "real" ceph process this could interfere with the unit tests.

I really like that you moved the path check to a function anyway, I think that is a good change regardless of the test hang. But I do think it would be better to only mock check_cephfs_path if possible. Could you experiment with moving the instantiation of the CephfsClient into check_cephfs_path instead of in __init__?

I'm not sure about the other aspect. I would probably need to experiment with it locally myself. If you think that would help please just say so, otherwise I'll let you try the above first.

Once you have things working I'll be happy to review this as well. I will make one small style suggestion today.

this helper instantiates CephfsClient, however this was
initially planned in ExportMgr class in export.py but
due to make check failure where main python thread
experienced a dead lock which after several efforts
pointed at instantiation of CephfsClient in ExportMgr
was problematic, it was decided in order to achieve
singleton behavior, func has been added inside this
helper func that restricts instantiation using functool's
lru_cache.

Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
@dparmar18
Copy link
Contributor Author

@dparmar18
Copy link
Contributor Author

Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Fixes: https://tracker.ceph.com/issues/58228
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Also adds a function _nfs_complete_cmd() that returns process obj so that stdout/stderr
can be used for evaluation(_nfs_cmd() uses raw_cluster_cmd() that returns just stdout
and it became difficult to time cluster creation errors in _test_create_cluster()).

It takes sometime to update the cluster data, therefore running the command set
(check nfs server status -> nfs cluster create test -> check cluster status) in
a loop (max six iteration with sleep of 5 secs at each iteration) fixes the issue.

Fixes: https://tracker.ceph.com/issues/58744
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Comment in the code says to wait for two minutes as cluster
creation takes time but actually it's waiting for thirteen
minutes, it's not required to wait this long, i think a minute
here is more than enough, also switched to using safe_while().

Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
_get_port_ip_info() fails to fetch port and ip due to empty 'backend' key:

2023-02-24T20:49:09.084 DEBUG:teuthology.orchestra.run.smithi042:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph nfs cluster info test
2023-02-24T20:49:09.471 INFO:teuthology.orchestra.run.smithi042.stdout:{
2023-02-24T20:49:09.472 INFO:teuthology.orchestra.run.smithi042.stdout:  "test": {
2023-02-24T20:49:09.472 INFO:teuthology.orchestra.run.smithi042.stdout:    "backend": [],
2023-02-24T20:49:09.472 INFO:teuthology.orchestra.run.smithi042.stdout:    "virtual_ip": null
2023-02-24T20:49:09.472 INFO:teuthology.orchestra.run.smithi042.stdout:  }
2023-02-24T20:49:09.472 INFO:teuthology.orchestra.run.smithi042.stdout:}

it then raises:

2023-02-24T20:49:10.323 INFO:tasks.cephfs_test_runner:    info_output = json.loads(self._nfs_cmd('cluster', 'info', self.cluster_id))['test']['backend'][0]
2023-02-24T20:49:10.323 INFO:tasks.cephfs_test_runner:IndexError: list index out of range

Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
@dparmar18
Copy link
Contributor Author

dparmar18 commented Mar 30, 2023

no major changes in last push - https://github.com/ceph/ceph/compare/0ae4720686608595dcf475a83292464a745d3e3c..310286fa185194fdd27bfe7692d516703e18fa11

just the removal of (0, "cephfs path is valid") tuple as return value as its not needed anymore. helper check_cephfs_path doesn't return anything now.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @dparmar18

@dparmar18
Copy link
Contributor Author

Nice work @dparmar18

thanks venky!

@phlogistonjohn
Copy link
Contributor

Hi, sorry for the delayed response to this but I am behind on many tasks.
In go-ceph we had some tests that were impacted by this change. I updated the tests but I also noticed that the text of the error message was a bit awkward. (See ceph/go-ceph#858)

Specifically No such file or directory: "error in stat: b'/march': No such file or directory [Errno 2] is a bit long and redundant IMO. If you don't care to tweak this in a follow up PR, let me know and I'll submit a PR to improve the error text. Thanks @dparmar18

@dparmar18
Copy link
Contributor Author

That looks quite ugly to be honest, don't worry I'll push appropriate change in a follow up PR soon. Thanks for reporting this @phlogistonjohn

@dparmar18
Copy link
Contributor Author

Follow up PR #51005

@dparmar18
Copy link
Contributor Author

@phlogistonjohn above commented PR contains patch for the redundancy mentioned in #49460 (comment), as per test results, it should be now something like this:

No such file or directory path <non_existent_path> does not exist

@dparmar18
Copy link
Contributor Author

dparmar18 commented May 2, 2023

@vshankar #51005 this need to backported right? Should I create a tracker for this or just add the commits to the open backport PRs i.e. #50807, #50808, #50809?

@vshankar
Copy link
Contributor

vshankar commented May 2, 2023

@vshankar this need to backported right? Should I create a tracker for this or just add the commits to the open backport PRs i.e. #50807, #50808, #50809?

Just add the additional commits. I've updated the tracker mentioning a note for backporters.

@dparmar18
Copy link
Contributor Author

@vshankar this need to backported right? Should I create a tracker for this or just add the commits to the open backport PRs i.e. #50807, #50808, #50809?

Just add the additional commits. I've updated the tracker mentioning a note for backporters.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants