Skip to content

Extend batch to accept explicit device lists#25542

Merged
andrewschoen merged 12 commits intoceph:masterfrom
jan--f:c-v-extend-batch
Jan 30, 2019
Merged

Extend batch to accept explicit device lists#25542
andrewschoen merged 12 commits intoceph:masterfrom
jan--f:c-v-extend-batch

Conversation

@jan--f
Copy link
Contributor

@jan--f jan--f commented Dec 13, 2018

This is the first PR to address parts of http://tracker.ceph.com/issues/37086.

On a high level this adds the ability to batch to accept explicit device lists for data and db/journal.

The RFC here is mostly for the argument parsing. For this PR I added 3 new subcommands to batch: auto to keep the existing functionality, and bluestore and filestore. Other implementations are possible, for example:

  1. Adding the new functionality under a its own subcommand instead of batch.
  2. Adding the new functionality as flags (though I think this is messy as there will be a long list of arguments)
  3. Keep the bluestore and filestore subcommands and move the existing functionality to a flag, e.g. --auto.

Imho the implementation here is the cleanest as it allows for proper separation between non-overlapping argument groups (i.e. not having mutual exclusive arguments under the same subcommand) and keeps the existing functionality untouched (besides moving it to a subcommand. Ideally the subcommands would be optional and we could keep the existing functionality as is but one cannot have an optional positional argument (the subcommands) and a mandatory positional argument list (the existing DEVICES arguments), but that would require a not-so-pretty hack to the argument parsing (peeking at the positional and deciding what the appropriate action is)

To fully address http://tracker.ceph.com/issues/37086 a few more PRs are necessary:

  1. Add an option to explicitly deploy bluestore on 3 devices.
  2. Add the arguments related to [db-wal]-slots on shared devices.
  3. Add the replace-OSDs functionality.

I think 1) is neither very hard (as it is new functionality) nor terribly urgent. I started on 2) but this requires a bit of an overhaul how c-v figures out the appropriate sizes for logical volumes. Part of that are two tracker tickets (http://tracker.ceph.com/issues/37590 and http://tracker.ceph.com/issues/37502). I have not yet looked into 3) but since ceph-volume lvm create can do that already, most functionality should be in place already.

Once there is consensus over the argument parsing I'll clean up the wip commit and add functional testing for the new functionality.

Copy link
Contributor

@alfredodeza alfredodeza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to see grouping happening with the same sub-command as before, without adding three more sub commands which don't really do anything special for grouping.

In ceph-volume lvm zap we've added recently the ability to accept both devices as positional arguments or devices as part of flags.

This approach could be still used here, something like:

ceph-volume lvm batch --db-devices=/dev/sda --block-devices=/dev/sdb

Which would keep the same API as before, and would need much less refactoring.

_help = dedent("""
Automatically size devices ready for OSD provisioning based on default strategies.
Automatically size devices ready for OSD provisioning.
The 'auto' subcommand will automtatically determine the strategy (based on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why another (3rd level) sub command is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above

one cannot have an optional positional argument (the subcommands) and a mandatory positional argument list (the existing DEVICES arguments)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly not needed unless you are objecting to anything besides a 3rd level sub-command implementation. We already have this type of arrangement for zap (positional arguments for devices as well as flags defining devices)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in other places this RFC is mostly about the argparsing. I'm absolutely not married to the subcommand structure, I just think this is the clearest approach.

My worry about adding a bunch of new flags is that batch's argument list will grow rather long and confusing. I'll prepare a commit with this style of argparsing to compare.

The 'auto' subcommand will automtatically determine the strategy (based on
detection and presence of the drives rotaional flags.

The 'bluestore' and 'filestore' subcommands allow to specfify lists of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these subcommands added vs. just using the existing --filestore and --bluestore flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because having the object store as a subcommand allows for proper separation of bluestore (or filestore) specific arguments. I.e. this allows the argument parser to reject, say the use of --journal-size with bluestore instead of add a code validation (or simply ignore it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not how any other part of ceph-volume currently operates with both objectstores. Not that it means it can't change, but just to say that there is no need for extra validation to be done. In the case of lvm create for example, if --filestore is specified then any bluestore flag is ignored


ceph-volume lvm batch --report [DEVICE...]
ceph-volume lvm batch auto --report [DEVICE...]
ceph-volume lvm batch bluestore --report
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if using a sub-command list this one a user has to understand that there are two types of objectstores now. This goes against a design principle in ceph-volume which we (currently) default to bluestore, and that a user doesn't need to understand what an objectstore is or which one does it need.

@jan--f
Copy link
Contributor Author

jan--f commented Dec 14, 2018

We would like to see grouping happening with the same sub-command as before, without adding three more sub commands which don't really do anything special for grouping.

Yes that is certainly a way to go. I'm just worried that this means batch will grow quite a long list of flags, some of which are conflicting. That'll require quite a bit of additional validation code to catch incompatible arguments.

@jan--f jan--f force-pushed the c-v-extend-batch branch 2 times, most recently from 26e59ae to 1e23caf Compare December 21, 2018 15:10
@jan--f
Copy link
Contributor Author

jan--f commented Dec 21, 2018

@alfredodeza I pushed a version with flag based arg parsing. Note that this still misses a bunch of flags and some flag validation.

help='Devices to provision OSDs journal volumes',
)
parser.add_argument(
'--auto',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would one need to define --auto here instead of assuming automatic behavior if db/wal/journal aren't specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because without this it would be impossible to deploy standalone OSDs on a mixed group of drives (i.e. a bunch of spinners and ssd's).

Though to be fair --no-auto is probably the better choice here, will update.


if args.report:
self.report(args)
if (self.args.no_auto or self.args.db_devices or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could just use any

Suggested change
if (self.args.no_auto or self.args.db_devices or
if any([self.args.no_auto, self.args.db_devices, self.args.journal_devices, self.args.wal_devices]):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any makes sense for a list of 'truthy' values of unknown length. Whats the benefit here that offsets creating a list purely to use any?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a cosmetic change, but I also think that using any reads better here.

self.strategy = strategies.bluestore.MixedType(
self.usable,
self.db_usable,
[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this an emtpy list if --wal-devices is allowed? shouldn't there be a condition here to accommodate the different options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--wal-devices currently only exists as a parameter for the RFC purpose. The bluestore strategy does not yet deploy on three devices. Depending on opinions I'm happy to either add a NotImplemented exception and provide an implementation in a separate PR or remove the argument altogether for now.

[],
self.args)
else:
self.strategy = strategies.bluestore.SingleType(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the SingleType case, why is self.db_usable being passed in along with an empty list for wal devices? SingleType will only deploy OSDs with 1 device only, no db or wal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, a copy-paste error. Amounts to the same code though, will fix this.

default=[],
help='Devices to provision OSDs',
)
parser.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was all this moved into __init__ ? adding the parsing in the constructor makes it harder to test (there are no tests added with these changes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think __init__ is the right place for argument parsing, since putting it into main requires the run under root just to see the help text. As for testing, as I mentioned in a previous comment, I'll add testing once this is no longer RFC. As for test-ablity: I'm not sure we need to test what argparse does..?

parser.add_argument(
'--no-auto',
action='store_true',
help=('deploy standalone OSDs if rotational and non-rotational drives'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to hold true for filestore. Saying standalone also reads a bit odd, I don't think we refer to a bluestore OSD with only block (data) defined as standalone. Maybe it would be better to describe that this flag would disable the automatic grouping of devices by checking solid and rotational?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to hold true for filestore.

Why not? What if a user wants to deploy standalone (for lack of a better term) filestore OSDs on spinners and SSDs?

Saying standalone also reads a bit odd, I don't think we refer to a bluestore OSD with only block (data) defined as standalone. Maybe it would be better to describe that this flag would disable the automatic grouping of devices by checking solid and rotational?

I'm aware of the lack of standard terms here. I'm happy to make this a bit more verbose and clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused on what the intention of the --no-auto flag is here? A user would need to pass --no-auto and something like --db-devices? If so, can't we just assume they don't want an auto behavior if those flags are passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if a user passes a mix of HDD's and SSD's in the devices list but intends to only deploy standalone OSD's (OSDs occupying only one device)? Without --no-auto or similar this is not possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they have a mix of HDD'S and SSD's then it'd use a MixedType strategy and would not collocate all services on the same device. So this --no-auto flag is an attempt to always make OSDS with everything collocated on the same device regardless of the type of device? If so, maybe naming it --collocated makes more sense? Collocated is the term ceph-ansible used to describe an OSD with either block.db or the journal on the same device as data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to rename the option, but it is needed in any case.

Ideally we'd have some project wide term to use for the different OSD deployment modes.

class Strategy(object):

def __init__(self, devices, args):
def __init__(self, data_devs, db_devs, wal_devs, args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that all strategies are inheriting from this, it makes it odd to have the same parent for both filestore and bluestore. Filestore doesn't have a db_dev or a wal_dev but somehow the child classes for filestore use this forcing the code to have to pick either db_devs or wal_devs.

Either these parent classes need to be split into bluestore and filestore or the whole sub-classing would have to go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python does not allow for ctor overloading, so I went with the most explicit version (which happens to be bluestore on three devices).

Splitting this into bluestore and filestore specific implementations will again add duplicate code. I took the pragmatic route. I think this can be resolved with better naming and a comment, like

def __init__(self, data_devs, db_or_journal_devs, wal_devs, args):
'''
Note that any filestore strategy will always pass an empty list for wal_devs
'''

Not sure what can be gained by going back on the subclassing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed implementation is trying to save a few lines of code but adding incorrect definitions for devices (journals aren't present in filestore, and it is using db_devs which don't exist)

The gain here would be that it would be clear that filestore uses journals, not db. At the cost of some duplication. I would prefer clarity over Lines Of Code savings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure: I changed the name prior to your comment.

The saved lines of code actually stem from #25238. Not sure I'd call ~50 loc "a few lines". In any case the main goal was to remove duplicate implementations in four strategy constructors to adhere a bit more to a DRY approach.

I'm certainly happy to add some boilerplate code in order to make a name in an internal implementation less confusing.

def split_devices_rotational(devices):
data_devs = [device for device in devices if device.sys_api['rotational'] == '1']
db_devs = [device for device in devices if device.sys_api['rotational'] == '0']
return data_devs, db_devs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens here for wal devs? why would this omit them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

self.validate_compute()

@classmethod
def with_auto_devices(cls, devices, args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to have been added for test purposes only, was it not possible to declare the auto/non-auto behavior via an argument? Maybe more difficult now that the args moved to init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used here:

return strategy.with_auto_devices(unused_devices, self.args)

was it not possible to declare the auto/non-auto behavior via an argument? Maybe more difficult now that the args moved to init

Not sure what you mean here? Using auto or no-auto is of course decided by the arguments. When auto is desired the drives still have to be split at some point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is why do we need this class method when the class should be able to figure out if it needs the auto behavior or not by inspecting self.args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the existing code batch inspects args and figures out the strategy, I just kept with that. That could certainly be implemented differently, though I'd argue to find consensus on this feature and refactor things in a later PR.


if args.report:
self.report(args)
if (self.args.no_auto or self.args.db_devices or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a cosmetic change, but I also think that using any reads better here.

parser.add_argument(
'--no-auto',
action='store_true',
help=('deploy standalone OSDs if rotational and non-rotational drives'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused on what the intention of the --no-auto flag is here? A user would need to pass --no-auto and something like --db-devices? If so, can't we just assume they don't want an auto behavior if those flags are passed?

total_lv_size=str(self.total_available_journal_space),
total_lvs=self.journals_needed,
block_db_devices=', '.join([d.path for d in self.ssds]),
block_db_devices=', '.join([d.path for d in self.db_or_journal_devs]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be journal_devices instead of block_db_devices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.computed = {'osds': [], 'vgs': [], 'filtered_devices': args.filtered_devices}
self.devices = data_devs + wal_devs + db_or_journal_devs
self.data_devs = data_devs
self.db_or_journal_devs = db_or_journal_devs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of db_or_journal_devs can we have explicit db_devs and journal_devs properties? I think this will make the code easier to understand in the long run.

self.validate_compute()

@classmethod
def with_auto_devices(cls, devices, args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is why do we need this class method when the class should be able to figure out if it needs the auto behavior or not by inspecting self.args?

@jan--f
Copy link
Contributor Author

jan--f commented Jan 14, 2019

@alfredodeza @andrewschoen Can I assume this is generally ready to be dressed up for a merge? Are there any fundamental concerns about the arg parsing or something similar? Tests (functional and unit) are on their way.

@andrewschoen
Copy link
Contributor

@jan--f I've been wondering if what you're trying to accomplish here might be better suited in lvm create instead of lvm batch. I believe that lvm create already has all the flags you might want (except maybe for --osds-per-device) and it'd just be a matter of using some of the lv creation utilities used in batch to ensure every flag in lvm create can accept a raw device and then create or reuse vgs on them for lv creation. How would you feel about exploring this direction instead?

This would also solve problems for others who want the flexibility of lvm create with the lv and vg creation of lvm batch. The lvm batch command was really originally designed to be strict anyway and not flexible.

@jan--f
Copy link
Contributor Author

jan--f commented Jan 16, 2019

@andrewschoen I'm not sure I understand. This PR intends to add the option to explicitly specify drives for the batch creation of OSDs.
Can you explain a bit more how this functionality is better suited to be implemented in lvm create?

@andrewschoen
Copy link
Contributor

@jan--f ok, so you do need many OSDs created with one call? In that case lvm create will not work. My thought was that if you're only needing to spin up one OSD at a time lvm create already allows you to specify drives for each component of an OSD. All it's missing was the ability to pass in raw devices for each flag and have the vgs/lvs created automatically.

@andrewschoen
Copy link
Contributor

jenkins test ceph-volume tox

@andrewschoen
Copy link
Contributor

jenkins test ceph-volume batch all

@andrewschoen
Copy link
Contributor

@jan--f I think that you're right that dressing this up and writing tests is the next step. Once we've got test coverage on the new work and the existing tests passing then we can move forward with this.

@jan--f jan--f changed the title [DNM][RFC] Extend batch to accept explicit device lists Extend batch to accept explicit device lists Jan 25, 2019
@andrewschoen
Copy link
Contributor

@jan--f Some of those do look like jenkins troubles, but one of the idempotency checks did find what looks to be a valid failure:

TASK [ensure batch create is idempotent] ***************************************
task path: /home/jenkins-build/build/workspace/ceph-volume-prs-batch-centos7-bluestore-mixed_type_dmcrypt_explicit/src/ceph-volume/ceph_volume/tests/functional/batch/centos7/bluestore/mixed-type-dmcrypt-explicit/test.yml:37
changed: [osd0] => {
    "changed": true, 
    "cmd": [
        "ceph-volume", 
        "--cluster", 
        "test", 
        "lvm", 
        "batch", 
        "--yes", 
        "--bluestore", 
        "--dmcrypt", 
        "/dev/sdb", 
        "/dev/sdc", 
        "--db-devices", 
        "/dev/nvme0n1", 
        "/dev/nvme1n1"
    ], 
    "delta": "0:00:01.078254", 
    "end": "2019-01-25 15:36:34.372398", 
    "failed_when_result": false, 
    "rc": 1, 
    "start": "2019-01-25 15:36:33.294144"
}

STDERR:

Traceback (most recent call last):
  File "/sbin/ceph-volume", line 9, in <module>
    load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 38, in __init__
    self.main(self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch
    instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 40, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch
    instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/batch.py", line 316, in main
    self._get_explicit_strategy()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/batch.py", line 334, in _get_explicit_strategy
    self.wal_usable)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/strategies/bluestore.py", line 138, in __init__
    self.validate_compute()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/strategies/strategies.py", line 29, in validate_compute
    self.validate()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/strategies/bluestore.py", line 409, in validate
    self._validate_db_devs()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/strategies/bluestore.py", line 466, in _validate_db_devs
    self.block_db_size = self.total_available_db_space / self.dbs_needed
  File "/usr/lib/python2.7/site-packages/ceph_volume/util/disk.py", line 649, in __div__
    _b = self._b / other
ZeroDivisionError: float division by zero


MSG:

non-zero return code

From:https://jenkins.ceph.com/job/ceph-volume-prs-batch-centos7-bluestore-mixed_type_dmcrypt_explicit/3/console

@andrewschoen
Copy link
Contributor

jenkins test ceph-volume tox

@jan--f
Copy link
Contributor Author

jan--f commented Jan 25, 2019

@andrewschoen indeed, I see this one locally as well:

 stderr: WARNING: Not using lvmetad because duplicate PVs were found.
  WARNING: Use multipath or vgimportclone to resolve duplicate PVs?
  WARNING: After duplicates are resolved, run "pvscan --cache" to enable lvmetad.
 stderr: WARNING: Not using device /dev/nvme0n1 for PV 11dvnc-qJsW-0TQF-6fu6-ROT9-k0n8-WcSPiA.
  WARNING: Not using device /dev/nvme1n1 for PV 6zGo16-pXNk-T4tN-HqKH-yaZJ-o3fc-Z8Irb6.
  WARNING: PV 11dvnc-qJsW-0TQF-6fu6-ROT9-k0n8-WcSPiA prefers device /dev/loop0 because device is used by LV.
  WARNING: PV 6zGo16-pXNk-T4tN-HqKH-yaZJ-o3fc-Z8Irb6 prefers device /dev/loop1 because device is used by LV.
 stdout: Logical volume "osd-block-718de379-9e57-4795-b3db-1bf413c37b5b" created.
Running command: /usr/sbin/lvcreate --yes -l 10 -n osd-block-db-41ba0fdf-95a8-4435-936f-70bce3f18702 ceph-block-dbs-1295c4bb-9d2f-4596-86ff-54b16dac12e1
 stderr: WARNING: Not using lvmetad because duplicate PVs were found.
  WARNING: Use multipath or vgimportclone to resolve duplicate PVs?
  WARNING: After duplicates are resolved, run "pvscan --cache" to enable lvmetad.
 stderr: WARNING: Not using device /dev/nvme0n1 for PV 11dvnc-qJsW-0TQF-6fu6-ROT9-k0n8-WcSPiA.
  WARNING: Not using device /dev/nvme1n1 for PV 6zGo16-pXNk-T4tN-HqKH-yaZJ-o3fc-Z8Irb6.
  WARNING: PV 11dvnc-qJsW-0TQF-6fu6-ROT9-k0n8-WcSPiA prefers device /dev/loop0 because device is used by LV.
  WARNING: PV 6zGo16-pXNk-T4tN-HqKH-yaZJ-o3fc-Z8Irb6 prefers device /dev/loop1 because device is used by LV.
 stdout: Logical volume "osd-block-db-41ba0fdf-95a8-4435-936f-70bce3f18702" created.

So it creates the LV's on loop0 instead if nvme0n1. Seems like an LVM config issue?

In any case the behaviour (divide by 0) could be caught be the validation (no data devices but some db-devices). Will add a check.

@jan--f
Copy link
Contributor Author

jan--f commented Jan 26, 2019

@andrewschoen I added a test and fix for this scenario.

@jan--f jan--f force-pushed the c-v-extend-batch branch 2 times, most recently from 25437e1 to c58fe2f Compare January 26, 2019 15:28
@ceph ceph deleted a comment from tspmelo Jan 26, 2019
@jan--f jan--f force-pushed the c-v-extend-batch branch 2 times, most recently from ca0d416 to 81061d7 Compare January 27, 2019 13:08
@jan--f
Copy link
Contributor Author

jan--f commented Jan 27, 2019

jenkins test ceph-volume batch all

Copy link
Contributor

@andrewschoen andrewschoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting really close. Thanks for working on the functional testing!

I've only found one small testing related concern.

- batch_cmd.rc != 0
- "'strategy changed' not in batch_cmd.stdout"

- name: run batch --report to see if devices get filtered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command here is missing the slice to only use HDD for the data devices.

It needs to be:

{{ devices[:2] | join(' ') }} --db-devices {{ devices[2:] | join(' ') }}"

This test needs to ensure that running the same command again with --report will show changed: False in the json output. I've noticed that we don't check that json output in the following task, which might be good to add here as well. I realize the rest of the tests don't do that, which was an oversight for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the slice is missing indeed, coming up. Should we deal with the additional check in a separate PR?

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
@jan--f
Copy link
Contributor Author

jan--f commented Jan 29, 2019

anything still blocking this @andrewschoen @alfredodeza ?

@andrewschoen
Copy link
Contributor

@jan--f it looks like there are flake8 failures from the tox tests. If we can get that resolved I think we're good to go.

flake8 ceph_volume 
ceph_volume/devices/lvm/strategies/strategies.py:42:32: F821 undefined name 'osd_id_available'

Jan Fajerski added 3 commits January 30, 2019 15:48
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Also add test case for a MixedStrategy with unusable data devices.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
@jan--f
Copy link
Contributor Author

jan--f commented Jan 30, 2019

@andrewschoen Thx, good catch. Done.

@andrewschoen
Copy link
Contributor

jenkins test ceph-volume tox

@andrewschoen
Copy link
Contributor

jenkins test ceph-volume batch all

@andrewschoen andrewschoen merged commit 93094a5 into ceph:master Jan 30, 2019
@andrewschoen
Copy link
Contributor

@jan--f thanks for your work here. Would you please create backport PRs for luminous and mimic? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants