Skip to content

mgr/volumes: handle bad arguments during subvolume create#53989

Merged
rishabh-d-dave merged 5 commits intoceph:mainfrom
mchangir:mds-handle-bad-arguments-during-subvolume-create-2
May 6, 2025
Merged

mgr/volumes: handle bad arguments during subvolume create#53989
rishabh-d-dave merged 5 commits intoceph:mainfrom
mchangir:mds-handle-bad-arguments-during-subvolume-create-2

Conversation

@mchangir
Copy link
Contributor

@mchangir mchangir commented Oct 13, 2023

test subvolgroup and subvol names and declare empty/whitespace strings as bad names

Fixes: https://tracker.ceph.com/issues/58645
Signed-off-by: Milind Changire mchangir@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@mchangir mchangir requested a review from a team as a code owner October 13, 2023 05:39
@mchangir
Copy link
Contributor Author

this is a new PR to resurrect PR #50028

@mchangir mchangir self-assigned this Oct 16, 2023
@vshankar
Copy link
Contributor

This changes isn't really an mds change - the PR title is a bit misleading ;)

@mchangir mchangir changed the title mds: handle bad arguments during subvolume create mgr/volumes: handle bad arguments during subvolume create Oct 16, 2023
@mchangir
Copy link
Contributor Author

This changes isn't really an mds change - the PR title is a bit misleading ;)

fixed the PR title

@vshankar vshankar requested a review from a team October 20, 2023 10:42
@R-0-Y
Copy link

R-0-Y commented Nov 23, 2023

Hi @mchangir , is there any update on this PR ? Is someone working on the make check (arm64) ?
Thanks in advance for your help on this.

@mchangir
Copy link
Contributor Author

jenkins test make check arm64

@mchangir
Copy link
Contributor Author

Hi @mchangir , is there any update on this PR ? Is someone working on the make check (arm64) ? Thanks in advance for your help on this.

@R-0-Y make check (arm64) is not a mandatory test that needs to pass

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@vshankar
Copy link
Contributor

@mchangir Did we ever run this through tests? Also, I had requested a review from broader folks from other teams a while back since this was a generic change.

@mchangir
Copy link
Contributor Author

@mchangir Did we ever run this through tests? Also, I had requested a review from broader folks from other teams a while back since this was a generic change.

Sorry about that. Probably slipped from my mind.
I'll tag people appropriately to solicit comments.

@mchangir mchangir requested a review from a team January 16, 2024 11:55
@mchangir mchangir force-pushed the mds-handle-bad-arguments-during-subvolume-create-2 branch from 7e70864 to 33d8e27 Compare January 16, 2024 12:00
@mchangir
Copy link
Contributor Author

rebased

@mchangir
Copy link
Contributor Author

@ljflores could you review this PR ... especially the applicability of allowempty attribute to other parts of the codebase.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Mar 16, 2024
@mchangir mchangir force-pushed the mds-handle-bad-arguments-during-subvolume-create-2 branch from 33d8e27 to 7506ed2 Compare March 19, 2024 07:54
@rishabh-d-dave
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/71020.

@rishabh-d-dave
Copy link
Contributor

jenkins test make check arm64

@rishabh-d-dave
Copy link
Contributor

jenkins test make check

Copy link
Contributor

@rishabh-d-dave rishabh-d-dave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA run[1] had some failures in fs:volumes jobs, but they were unrelated to this PR. I re-ran fs:volumes suite with a branch that contained only this PR for extra confirmation, see [2][3]. All the jobs passed, therefore this PR has completely passed QA.

[1] https://tracker.ceph.com/projects/cephfs/wiki/QA_main_2025#wip-rishabh-testing-20250426123842-debug
[2] https://pulpito.ceph.com/rishabh-2025-05-04_19:44:23-fs:volumes-pr-53989-testing-default-smithi/
[3] https://pulpito.ceph.com/rishabh-2025-05-05_07:34:59-fs:volumes-pr-53989-testing-default-smithi/

@rishabh-d-dave rishabh-d-dave added the passed-qa for PRs that have passed QA but there's an on-going conversation, so merging would be premature label May 5, 2025
@rishabh-d-dave
Copy link
Contributor

Waitng for this CI to pass now.

@rishabh-d-dave
Copy link
Contributor

jenkins test make check

2 similar comments
@rishabh-d-dave
Copy link
Contributor

jenkins test make check

@rishabh-d-dave
Copy link
Contributor

jenkins test make check

Copy link
Contributor

@rishabh-d-dave rishabh-d-dave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to approve while reporting about (successful) QA run.

@rishabh-d-dave rishabh-d-dave dismissed vshankar’s stale review May 6, 2025 12:34

Changes requested were incorporated.

@rishabh-d-dave rishabh-d-dave merged commit a88c5ec into ceph:main May 6, 2025
12 checks passed
@VallariAg
Copy link
Member

VallariAg commented May 12, 2025

Hi @rishabh-d-dave, this might have created a regression in ceph CLI for latest jenkins main branch builds:

After cephadm bootstrap, "ceph orch host add" command failed:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 1327, in <module>
    retval = main()
  File "/usr/bin/ceph", line 1247, in main
    sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 1004, in parse_json_funcsigs
    cmd['sig'] = parse_funcsig(cmd['sig'])
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 948, in parse_funcsig
    newsig.append(argdesc(t,
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 803, in __init__
    self.instance = self.t(**self.typeargs)
TypeError: __init__() got an unexpected keyword argument 'allowempty'

I think the reason we probably didn't catch this in teuthology tests is because teuthology install ceph rpm packages and cephadm binary of same build. So we didn't know that it was a breaking change that would make new builds including this PR incompatible with ceph builds that do not include this PR.

cc: @gbregman

@vshankar
Copy link
Contributor

Hi @rishabh-d-dave, this might have created a regression in ceph CLI for latest jenkins main branch builds:

After cephadm bootstrap, "ceph orch host add" command failed:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 1327, in <module>
    retval = main()
  File "/usr/bin/ceph", line 1247, in main
    sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 1004, in parse_json_funcsigs
    cmd['sig'] = parse_funcsig(cmd['sig'])
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 948, in parse_funcsig
    newsig.append(argdesc(t,
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 803, in __init__
    self.instance = self.t(**self.typeargs)
TypeError: __init__() got an unexpected keyword argument 'allowempty'

I think the reason we probably didn't catch this in teuthology tests is because teuthology install ceph rpm packages and cephadm binary of same build. So we didn't know that it was a breaking change that would make new builds including this PR incompatible with ceph builds that do not include this PR.

cc: @gbregman

Maybe its not the package version but rather kwargs holding allowempty keyword argument, but some type-specific classes in ceph-argparse (CephInt, CephFloat, etc...), not accepting the argument.

@mchangir
Copy link
Contributor Author

Hi @rishabh-d-dave, this might have created a regression in ceph CLI for latest jenkins main branch builds:
After cephadm bootstrap, "ceph orch host add" command failed:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 1327, in <module>
    retval = main()
  File "/usr/bin/ceph", line 1247, in main
    sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 1004, in parse_json_funcsigs
    cmd['sig'] = parse_funcsig(cmd['sig'])
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 948, in parse_funcsig
    newsig.append(argdesc(t,
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 803, in __init__
    self.instance = self.t(**self.typeargs)
TypeError: __init__() got an unexpected keyword argument 'allowempty'

I think the reason we probably didn't catch this in teuthology tests is because teuthology install ceph rpm packages and cephadm binary of same build. So we didn't know that it was a breaking change that would make new builds including this PR incompatible with ceph builds that do not include this PR.
cc: @gbregman

Maybe its not the package version but rather kwargs holding allowempty keyword argument, but some type-specific classes in ceph-argparse (CephInt, CephFloat, etc...), not accepting the argument.

I was also concerned about where does the allowempty keyword get injected for non-CephString data types ... since allowempty is in use only with the CephString data type.

@vshankar
Copy link
Contributor

Hi @rishabh-d-dave, this might have created a regression in ceph CLI for latest jenkins main branch builds:

After cephadm bootstrap, "ceph orch host add" command failed:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 1327, in <module>
    retval = main()
  File "/usr/bin/ceph", line 1247, in main
    sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 1004, in parse_json_funcsigs
    cmd['sig'] = parse_funcsig(cmd['sig'])
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 948, in parse_funcsig
    newsig.append(argdesc(t,
  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 803, in __init__
    self.instance = self.t(**self.typeargs)
TypeError: __init__() got an unexpected keyword argument 'allowempty'

I think the reason we probably didn't catch this in teuthology tests is because teuthology install ceph rpm packages and cephadm binary of same build. So we didn't know that it was a breaking change that would make new builds including this PR incompatible with ceph builds that do not include this PR.

cc: @gbregman

This issues is seen with fs:upgrade. See: https://pulpito.ceph.com/vshankar-2025-05-12_08:22:09-fs-wip-vshankar-testing-20250508.200127-debug-testing-default-smithi/8281059/

@vshankar
Copy link
Contributor

QA run[1] had some failures in fs:volumes jobs, but they were unrelated to this PR. I re-ran fs:volumes suite with a branch that contained only this PR for extra confirmation, see [2][3]. All the jobs passed, therefore this PR has completely passed QA.

[1] https://tracker.ceph.com/projects/cephfs/wiki/QA_main_2025#wip-rishabh-testing-20250426123842-debug [2] https://pulpito.ceph.com/rishabh-2025-05-04_19:44:23-fs:volumes-pr-53989-testing-default-smithi/ [3] https://pulpito.ceph.com/rishabh-2025-05-05_07:34:59-fs:volumes-pr-53989-testing-default-smithi/

@rishabh-d-dave - I'm see the first link you shared above and that points to QA tracker: https://tracker.ceph.com/issues/71020. The QA run link there does not point to a teuthology run. What's going on?

@vshankar
Copy link
Contributor

I am confused as in what tests this change went through since https://tracker.ceph.com/issues/71020 points to a non-existent teuthology run. It's unlikely that ptl-tool messed up the run link while updating the redmine ticket. @rishabh-d-dave please provide information on what happened.

This change is breaking fs:upgrade tests. See: https://pulpito.ceph.com/vshankar-2025-05-12_08:22:09-fs-wip-vshankar-testing-20250508.200127-debug-testing-default-smithi/8281059/

2025-05-12T08:56:36.912 INFO:teuthology.orchestra.run.smithi082.stderr:2025-05-12T08:56:36.911+0000 7f0d1971a640  1 --2- 172.21.15.82:0/4233805895 >> [v2:172.21.15.121:6828/1515519538,v1:172.21.15.121:6829/1515519538] conn(0x7f0cdc076a30 0x7f0
cdc078ef0 secure :-1 s=READY pgs=20 cs=0 l=1 rev1=1 crypto rx=0x7f0d140694d0 tx=0x7f0cfc031040 comp rx=0 tx=0).ready entity=mgr.24365 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2025-05-12T08:56:36.917 INFO:teuthology.orchestra.run.smithi082.stderr:2025-05-12T08:56:36.916+0000 7f0d12ffd640  1 -- 172.21.15.82:0/4233805895 <== mon.1 v2:172.21.15.121:3300/0 6 ==== mon_command_ack([{"prefix": "get_command_descriptions"}]=
0  v0) v1 ==== 72+0+223317 (secure 0 0 0) 0x7f0d0005dc30 con 0x7f0d140ff440
2025-05-12T08:56:36.923 INFO:teuthology.orchestra.run.smithi082.stderr:Traceback (most recent call last):
2025-05-12T08:56:36.923 INFO:teuthology.orchestra.run.smithi082.stderr:  File "/usr/bin/ceph", line 1327, in <module>
2025-05-12T08:56:36.923 INFO:teuthology.orchestra.run.smithi082.stderr:    retval = main()
2025-05-12T08:56:36.923 INFO:teuthology.orchestra.run.smithi082.stderr:  File "/usr/bin/ceph", line 1247, in main
2025-05-12T08:56:36.923 INFO:teuthology.orchestra.run.smithi082.stderr:    sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
2025-05-12T08:56:36.924 INFO:teuthology.orchestra.run.smithi082.stderr:  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 1004, in parse_json_funcsigs
2025-05-12T08:56:36.924 INFO:teuthology.orchestra.run.smithi082.stderr:    cmd['sig'] = parse_funcsig(cmd['sig'])
2025-05-12T08:56:36.924 INFO:teuthology.orchestra.run.smithi082.stderr:  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 948, in parse_funcsig
2025-05-12T08:56:36.924 INFO:teuthology.orchestra.run.smithi082.stderr:    newsig.append(argdesc(t,
2025-05-12T08:56:36.924 INFO:teuthology.orchestra.run.smithi082.stderr:  File "/usr/lib/python3.9/site-packages/ceph_argparse.py", line 803, in __init__
2025-05-12T08:56:36.924 INFO:teuthology.orchestra.run.smithi082.stderr:    self.instance = self.t(**self.typeargs)
2025-05-12T08:56:36.924 INFO:teuthology.orchestra.run.smithi082.stderr:TypeError: __init__() got an unexpected keyword argument 'allowempty'
2025-05-12T08:56:38.351 DEBUG:teuthology.orchestra.run.smithi082:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:reef shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 69096072-2f0d-11f0-a0c9-31e
8a22acf7e -e sha1=e0d253341dc86cddb888d7b0ea9782af334ba45a -- bash -c 'ceph versions | jq -e '"'"'.mgr | length == 1'"'"''

I proposed to revert this change as @VallariAg too mentioned that issues have been seen elsewhere.

vshankar added a commit that referenced this pull request May 14, 2025
PR #53989 is causing failures in
fs:upgrade. Also, @VallariAg reported an issue with something
similar. I don't think adequate tests were run to qualify the PR
as mergeable. Reverting the change for now.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
@VallariAg
Copy link
Member

@vshankar, thank you for looking into this and reverting the change! Looks like it was also caught by rook team: rook/rook#15844

@rishabh-d-dave
Copy link
Contributor

rishabh-d-dave commented May 14, 2025

I am confused as in what tests this change went through since https://tracker.ceph.com/issues/71020 points to a non-existent teuthology run. It's unlikely that ptl-tool messed up the run link while updating the redmine ticket.

The tracker was very slow few days ago, and due to it I remember running into trouble while running backport scripts. Likely same thing happened with that QA ticket as well. Will look into it.

@vshankar
Copy link
Contributor

I am confused as in what tests this change went through since https://tracker.ceph.com/issues/71020 points to a non-existent teuthology run. It's unlikely that ptl-tool messed up the run link while updating the redmine ticket.

The tracker was very slow few days ago, and due to it I remember running into trouble while running backport scripts. Likely same thing happened with that QA ticket as well. Will look into it.

I do not understand how a slowness in redmine can cause non-existent build and run.

@rishabh-d-dave
Copy link
Contributor

I am confused as in what tests this change went through since https://tracker.ceph.com/issues/71020 points to a non-existent teuthology run. It's unlikely that ptl-tool messed up the run link while updating the redmine ticket.

The tracker was very slow few days ago, and due to it I remember running into trouble while running backport scripts. Likely same thing happened with that QA ticket as well. Will look into it.

I do not understand how a slowness in redmine can cause non-existent build and run.

Perhaps the part of the script that was trying to update the ticket timed out since redmine tracker was too slow.

@vshankar
Copy link
Contributor

I am confused as in what tests this change went through since https://tracker.ceph.com/issues/71020 points to a non-existent teuthology run. It's unlikely that ptl-tool messed up the run link while updating the redmine ticket.

The tracker was very slow few days ago, and due to it I remember running into trouble while running backport scripts. Likely same thing happened with that QA ticket as well. Will look into it.

I do not understand how a slowness in redmine can cause non-existent build and run.

Perhaps the part of the script that was trying to update the ticket timed out since redmine tracker was too slow.

Even if it timed out, the branch didn't build and then how were you able to schedule the suite?

@rishabh-d-dave
Copy link
Contributor

Even if it timed out, the branch didn't build and then how were you able to schedule the suite?

Tracker was down but Sepia lab/build infra is different and it wasn't down.

aainscow pushed a commit to aainscow/ceph that referenced this pull request Jun 10, 2025
PR ceph#53989 is causing failures in
fs:upgrade. Also, @VallariAg reported an issue with something
similar. I don't think adequate tests were run to qualify the PR
as mergeable. Reverting the change for now.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System documentation passed-qa for PRs that have passed QA but there's an on-going conversation, so merging would be premature pybind tests wip-rishabh-testing2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants