Skip to content

ceph-volume: major batch refactor#34740

Merged
jan--f merged 31 commits intoceph:masterfrom
jan--f:c-v-refactor-batch-use-create
Sep 30, 2020
Merged

ceph-volume: major batch refactor#34740
jan--f merged 31 commits intoceph:masterfrom
jan--f:c-v-refactor-batch-use-create

Conversation

@jan--f
Copy link
Contributor

@jan--f jan--f commented Apr 24, 2020

This is a major rewrite of batch in order to use the create/prepare code path and significantly simplify this subcommands code. Simultaneously this adds support for vg re-use, idempotency, use of LVs with batch and additional implicit sizing arguments.

This also closes a number of bugs and unblocks other work in cephadm and rook.

Fixes: https://tracker.ceph.com/issues/24969
Fixes: https://tracker.ceph.com/issues/36242
Fixes: https://tracker.ceph.com/issues/36283
Fixes: https://tracker.ceph.com/issues/37502
Fixes: https://tracker.ceph.com/issues/37590
Fixes: https://tracker.ceph.com/issues/38168
Fixes: https://tracker.ceph.com/issues/42412
Fixes: https://tracker.ceph.com/issues/43899
Fixes: https://tracker.ceph.com/issues/44749
Fixes: https://tracker.ceph.com/issues/44783
Fixes: https://tracker.ceph.com/issues/44951
Fixes: https://tracker.ceph.com/issues/46031
Fixes: https://tracker.ceph.com/issues/46033

Signed-off-by: Jan Fajerski jfajerski@suse.com

Some things are still TODO

  • Add auto functionality by sorting passed drives
  • Tidy code
  • Tidy commits
  • Tidy tests
  • Add new unit tests
  • Documentation

@tchaikov
Copy link
Contributor

jenkins test docs

@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch from cbbd9f8 to 7f0f8c0 Compare April 27, 2020 09:53
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch from 7f0f8c0 to e523d64 Compare April 27, 2020 10:29
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch 3 times, most recently from f79fabc to 63b0e8a Compare May 21, 2020 11:41
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch from 63b0e8a to a1b8cac Compare June 9, 2020 16:03
jschmid1 pushed a commit to SUSE/DeepSea that referenced this pull request Jun 15, 2020
Signed-off-by: Joshua Schmid <jschmid@suse.de>
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch 2 times, most recently from b7855f0 to de8f034 Compare June 18, 2020 09:09
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch from e86004a to 6d746d6 Compare June 23, 2020 15:06
@jan--f jan--f changed the title [WIP] ceph-volume: major batch refactor ceph-volume: major batch refactor Jun 23, 2020
@jan--f jan--f marked this pull request as ready for review June 23, 2020 15:06
@jan--f jan--f requested a review from a team as a code owner June 23, 2020 15:06
@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch 3 times, most recently from f6ce9f9 to 6ba8195 Compare June 23, 2020 15:13
@jan--f
Copy link
Contributor Author

jan--f commented Jun 23, 2020

jenkins test ceph-volume tox

@jan--f
Copy link
Contributor Author

jan--f commented Jun 23, 2020

jenkins test ceph-volume all

@jan--f jan--f force-pushed the c-v-refactor-batch-use-create branch 5 times, most recently from 38d109c to 9920e6f Compare June 24, 2020 11:31
@jan--f jan--f removed the DNM label Jun 24, 2020
@jan--f
Copy link
Contributor Author

jan--f commented Sep 29, 2020

Doc render available at http://docs.ceph.com/ceph-prs/34740/

it just throws a 404 error man_shrugging

I suppose that is a symptom of the switch-over to readthedocs.

@rishabh-d-dave
Copy link
Contributor

rishabh-d-dave commented Sep 29, 2020

I suppose that is a symptom of the switch-over to readthedocs.

Yep. I triggered the render second time just to confirm that.

Copy link
Contributor

@rishabh-d-dave rishabh-d-dave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for two things; first, the PR description says that it fixes a dozen tracker issues but the commit messages only link one tracker issue and, second, the lvm/batch doc doesn't have a sample output.

@guits
Copy link
Contributor

guits commented Sep 29, 2020

Looks good except for two things; first, the PR description says that it fixes a dozen tracker issues but the commit messages only link one tracker issue and, second, the lvm/batch doc doesn't have a sample output.

indeed, it could be nice to know which commit fixes which issue.

@rishabh-d-dave
Copy link
Contributor

That would also help a lot with backporting.

@smithfarm
Copy link
Contributor

That would also help a lot with backporting.

My working assumption is that this PR will get backported as a whole to both octopus and nautilus. @jan--f Am I far from the truth?

If that's correct, and if @jan--f is going to shoulder the backporting responsibility, then I don't think we need to insist on knowing exactly which commit(s) fix which tracker issue(s).

@guits
Copy link
Contributor

guits commented Sep 29, 2020

That would also help a lot with backporting.

My working assumption is that this PR will get backported as a whole to both octopus and nautilus. @jan--f Am I far from the truth?

If that's correct, and if @jan--f is going to shoulder the backporting responsibility, then I don't think we need to insist on knowing exactly which commit(s) fix which tracker issue(s).

This is not only for backporting concerns.
IMO, when you debug/troubleshoot, if you have to reverse the code months and months later, it's very nice when you can link a commit with the corresponding tracker so you get more context.

I would say generally not having the corresponding tracker (if there's one...) and no commit message at all can be a nightmare in that kind of situation.

Just saying...

@dsavineau
Copy link
Contributor

dsavineau commented Sep 29, 2020

FYI the ceph-volume CI jobs via ceph-ansible are useless since the PR changes aren't included. (because those changes could break ceph-ansible)

@jan--f
Copy link
Contributor Author

jan--f commented Sep 29, 2020

FYI the ceph-volume CI jobs via ceph-ansible are useless since the PR changes aren't included. (because those changes could break ceph-ansible)

Can you elaborate please? I do think the changes are tested. ceph-ansible did receive patches to handle the changed output format a while ago.

@dsavineau
Copy link
Contributor

FYI the ceph-volume CI jobs via ceph-ansible are useless since the PR changes aren't included. (because those changes could break ceph-ansible)

Can you elaborate please? I do think the changes are tested. ceph-ansible did receive patches to handle the changed output format a while ago.

You're currently using ceph-ansible@master which uses ceph@master so it uses the latest shaman build from the master branch but not your PR.
Instead you should have you own build via a wip-xxx label and uses it via the ceph_dev_branch variable.

https://github.com/ceph/ceph-ansible/blob/master/group_vars/all.yml.sample#L199-L204

@jan--f
Copy link
Contributor Author

jan--f commented Sep 29, 2020

This is not only for backporting concerns.
IMO, when you debug/troubleshoot, if you have to reverse the code months and months later, it's very nice when you can link a commit with the corresponding tracker so you get more context.

I would say generally not having the corresponding tracker (if there's one...) and no commit message at all can be a nightmare in that kind of situation.

Just saying...

The majority of the Fixes would have to be added to b0b7973. I'm not entirely sure how helpful that would be, but I can add those links to this commit. We should already be able to track a change back to this PR: Get the sha from git blame, search for it either in the git log or on github, determine PR from either branch or github tells you directly.

I was going to agree with @smithfarm...

@jan--f
Copy link
Contributor Author

jan--f commented Sep 29, 2020

You're currently using ceph-ansible@master which uses ceph@master so it uses the latest shaman build from the master branch but not your PR.
Instead you should have you own build via a wip-xxx label and uses it via the ceph_dev_branch variable.

https://github.com/ceph/ceph-ansible/blob/master/group_vars/all.yml.sample#L199-L204

The functional testing setup rsyncs the ceph-volume code to the VMs (see https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/tests/functional/playbooks/deploy.yml). Its all rather convoluted but the PR code is actually run. Search for DEPRECATION NOTICE in a jenkins log, e.g. https://jenkins.ceph.com/job/ceph-volume-prs-batch-centos8-filestore-mixed_type_explicit/98/consoleFull

Those notices were added in this PR.

@jan--f
Copy link
Contributor Author

jan--f commented Sep 29, 2020

second, the lvm/batch doc doesn't have a sample output.

Thx, good catch. Will push those asap.

@dsavineau
Copy link
Contributor

You're currently using ceph-ansible@master which uses ceph@master so it uses the latest shaman build from the master branch but not your PR.
Instead you should have you own build via a wip-xxx label and uses it via the ceph_dev_branch variable.
https://github.com/ceph/ceph-ansible/blob/master/group_vars/all.yml.sample#L199-L204

The functional testing setup rsyncs the ceph-volume code to the VMs (see https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/tests/functional/playbooks/deploy.yml). Its all rather convoluted but the PR code is actually run. Search for DEPRECATION NOTICE in a jenkins log, e.g. https://jenkins.ceph.com/job/ceph-volume-prs-batch-centos8-filestore-mixed_type_explicit/98/consoleFull

Those notices were added in this PR.

ack, I wasn't aware of the rsync of the ceph-volume source.

Sorry for the noise.

@guits
Copy link
Contributor

guits commented Sep 29, 2020

This is not only for backporting concerns.
IMO, when you debug/troubleshoot, if you have to reverse the code months and months later, it's very nice when you can link a commit with the corresponding tracker so you get more context.
I would say generally not having the corresponding tracker (if there's one...) and no commit message at all can be a nightmare in that kind of situation.
Just saying...

The majority of the Fixes would have to be added to b0b7973. I'm not entirely sure how helpful that would be.

yes, in that case maybe it's not really relevant, I was more speaking about how I think it should be from a more general point of view.

Jan Fajerski added 5 commits September 29, 2020 22:37
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Some device we never want to pass to the batch subcommand. For now this
includes devices that have a partition or are mounted on the machine.
One goal is to filter the root device, so it is not included on a batch
command and thus would contribute to its implicit sizing calculation.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
This enables user to pass sizes like "10G", which batch now understands.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
@jan--f
Copy link
Contributor Author

jan--f commented Sep 30, 2020

Alright, I added the missing output in the doc. One last round of tests and unless someone has objections I'll merge this.

@jan--f
Copy link
Contributor Author

jan--f commented Sep 30, 2020

jenkins test ceph-volume all

@jan--f
Copy link
Contributor Author

jan--f commented Sep 30, 2020

jenkins test ceph-volume tox

@haslersn
Copy link

haslersn commented Mar 10, 2021

I just experienced the following error with docker.io/ceph/ceph/v15.2.9. It seems to be related to this PR.

2021-03-10 00:17:53.440791 D | exec: Traceback (most recent call last):
2021-03-10 00:17:53.440803 D | exec:   File "/usr/sbin/ceph-volume", line 11, in <module>
2021-03-10 00:17:53.440809 D | exec:     load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2021-03-10 00:17:53.440817 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__
2021-03-10 00:17:53.440824 D | exec:     self.main(self.argv)
2021-03-10 00:17:53.440828 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2021-03-10 00:17:53.440832 D | exec:     return f(*a, **kw)
2021-03-10 00:17:53.440836 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 152, in main
2021-03-10 00:17:53.440840 D | exec:     terminal.dispatch(self.mapper, subcommand_args)
2021-03-10 00:17:53.440844 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-03-10 00:17:53.440849 D | exec:     instance.main()
2021-03-10 00:17:53.440853 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 42, in main
2021-03-10 00:17:53.440857 D | exec:     terminal.dispatch(self.mapper, self.argv)
2021-03-10 00:17:53.440861 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-03-10 00:17:53.440865 D | exec:     instance.main()
2021-03-10 00:17:53.440869 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2021-03-10 00:17:53.440872 D | exec:     return func(*a, **kw)
2021-03-10 00:17:53.440876 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 402, in main
2021-03-10 00:17:53.440879 D | exec:     plan = self.get_plan(self.args)
2021-03-10 00:17:53.440883 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 440, in get_plan
2021-03-10 00:17:53.440888 D | exec:     args.wal_devices)
2021-03-10 00:17:53.440895 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 472, in get_deployment_layout
2021-03-10 00:17:53.440901 D | exec:     fast_type)
2021-03-10 00:17:53.440910 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 509, in fast_allocations
2021-03-10 00:17:53.440914 D | exec:     ret.extend(get_lvm_fast_allocs(lvm_devs))
2021-03-10 00:17:53.440918 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 146, in get_lvm_fast_allocs
2021-03-10 00:17:53.440922 D | exec:     disk.Size(b=int(d.lvs[0].lv_size)), 1) for d in lvs if not
2021-03-10 00:17:53.440926 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 147, in <listcomp>
2021-03-10 00:17:53.440930 D | exec:     d.used_by_ceph]
2021-03-10 00:17:53.440933 D | exec: IndexError: list index out of range

@guits
Copy link
Contributor

guits commented Mar 10, 2021

I just experienced the following error with docker.io/ceph/ceph/v15.2.9. It seems to be related to this PR.

2021-03-10 00:17:53.440791 D | exec: Traceback (most recent call last):
2021-03-10 00:17:53.440803 D | exec:   File "/usr/sbin/ceph-volume", line 11, in <module>
2021-03-10 00:17:53.440809 D | exec:     load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2021-03-10 00:17:53.440817 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__
2021-03-10 00:17:53.440824 D | exec:     self.main(self.argv)
2021-03-10 00:17:53.440828 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2021-03-10 00:17:53.440832 D | exec:     return f(*a, **kw)
2021-03-10 00:17:53.440836 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 152, in main
2021-03-10 00:17:53.440840 D | exec:     terminal.dispatch(self.mapper, subcommand_args)
2021-03-10 00:17:53.440844 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-03-10 00:17:53.440849 D | exec:     instance.main()
2021-03-10 00:17:53.440853 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 42, in main
2021-03-10 00:17:53.440857 D | exec:     terminal.dispatch(self.mapper, self.argv)
2021-03-10 00:17:53.440861 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-03-10 00:17:53.440865 D | exec:     instance.main()
2021-03-10 00:17:53.440869 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2021-03-10 00:17:53.440872 D | exec:     return func(*a, **kw)
2021-03-10 00:17:53.440876 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 402, in main
2021-03-10 00:17:53.440879 D | exec:     plan = self.get_plan(self.args)
2021-03-10 00:17:53.440883 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 440, in get_plan
2021-03-10 00:17:53.440888 D | exec:     args.wal_devices)
2021-03-10 00:17:53.440895 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 472, in get_deployment_layout
2021-03-10 00:17:53.440901 D | exec:     fast_type)
2021-03-10 00:17:53.440910 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 509, in fast_allocations
2021-03-10 00:17:53.440914 D | exec:     ret.extend(get_lvm_fast_allocs(lvm_devs))
2021-03-10 00:17:53.440918 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 146, in get_lvm_fast_allocs
2021-03-10 00:17:53.440922 D | exec:     disk.Size(b=int(d.lvs[0].lv_size)), 1) for d in lvs if not
2021-03-10 00:17:53.440926 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 147, in <listcomp>
2021-03-10 00:17:53.440930 D | exec:     d.used_by_ceph]
2021-03-10 00:17:53.440933 D | exec: IndexError: list index out of range

@haslersn could you paste the exact command you tried to run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.