Skip to content

Rely on "bootstrap" to configure MGR module#270

Merged
ricardoasmarques merged 1 commit intoceph:masterfrom
ricardoasmarques:rely-on-cephadm-bootstrap-to-configure-mgr-module
Jul 16, 2020
Merged

Rely on "bootstrap" to configure MGR module#270
ricardoasmarques merged 1 commit intoceph:masterfrom
ricardoasmarques:rely-on-cephadm-bootstrap-to-configure-mgr-module

Conversation

@ricardoasmarques
Copy link
Contributor

@ricardoasmarques ricardoasmarques commented Jun 18, 2020

After ceph/ceph#35678 is backported
After ceph/ceph#35805 is backported
After #283


This PR will rely on cephadm bootstrap to configure cephadm MGR module, which will guarantee that all initializations are properly done (including the execution of ceph orch apply crash)

Fixes: #236

(Requires ceph/ceph#35195 - merged and backported, and ceph/ceph#35678)

Signed-off-by: Ricardo Marques rimarques@suse.com

@ricardoasmarques ricardoasmarques added bug Something isn't working Add To Changelog Marks the PR to be included in the changelog of the next version release labels Jun 18, 2020
@ricardoasmarques ricardoasmarques changed the title Rely on "bootstrap" to configure MGR module [DNM] Rely on "bootstrap" to configure MGR module Jun 18, 2020
@ricardoasmarques
Copy link
Contributor Author

ricardoasmarques commented Jun 18, 2020

Sometimes, when testing this PR with:

sesdev create octopus --ceph-salt-repo https://github.com/ricardoasmarques/ceph-salt.git --ceph-salt-branch rely-on-cephadm-bootstrap-to-configure-mgr-module octopus

I'm facing the following failure:

    master: Failure in minion: node2.octopus.com
    master: __id__: add host to ceph orch
    master: __run_num__: 65
    master: __sls__: ceph-salt.cephorch
    master: changes: {}
    master: comment: ''
    master: duration: 7234.587
    master: name: add host to ceph orch
    master: result: false
    master: start_time: '23:12:19.267270'
    master: state: ceph_orch_|-add host to ceph orch_|-add host to ceph orch_|-add_host

On node2:/var/log/salt/minion, I see the following error:

2020-06-18 23:12:26,500 [salt.loaded.int.module.cmdmod:838 ][ERROR   ][9434] Command '['ssh', '-o', 'StrictHostKeyChecking=no', '-i', '/tmp/ceph-salt-ssh-id_rsa', 'root@node1.octopus.com', 'ceph orch host add node2']' failed with return code: 2
2020-06-18 23:12:26,501 [salt.loaded.int.module.cmdmod:842 ][ERROR   ][9434] stderr: Error ENOENT: Failed to connect to node2 (node2).
Check that the host is reachable and accepts connections using the cephadm SSH key

you may want to run:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > key
> ssh -F ssh_config -i key root@node2

But after the failure, when I run ssh -o StrictHostKeyChecking=no -i /tmp/ceph-salt-ssh-id_rsa root@node1.octopus.com 'ceph orch host add node2' manually, it works:

node1:~ # ceph orch host ls
HOST   ADDR   LABELS  STATUS  
node1  node1     

node1:~ # ssh node2
Last login: Thu Jun 18 23:21:04 2020 from 10.20.25.201
Have a lot of fun...

node2:~ # ssh -o StrictHostKeyChecking=no -i /tmp/ceph-salt-ssh-id_rsa root@node1.octopus.com 'ceph orch host add node2'
Added host 'node2'

node2:~ # exit
logout
Connection to node2 closed.

node1:~ # ceph orch host ls
HOST   ADDR   LABELS  STATUS  
node1  node1                  
node2  node2  

I wasn't able to reproduce this on master, so I'm pretty sure this is an issue introduced in this PR.

@ricardoasmarques
Copy link
Contributor Author

I'm tempted to say this is a bug in ceph/ceph#35195 (but I'm not sure yet)

@ricardoasmarques
Copy link
Contributor Author

Apparently the error is caused by ceph orch status returning "OK" even before SSH keys are set.

The following PR should fix that ceph/ceph#35678

@ricardoasmarques ricardoasmarques changed the title [DNM] Rely on "bootstrap" to configure MGR module [After ceph/ceph#35678] Rely on "bootstrap" to configure MGR module Jun 19, 2020
@ricardoasmarques
Copy link
Contributor Author

Note that ATM, we are relying on ceph orch status exit code [1] , but ceph/ceph#35678 will not change the exit code so we need to find a different wait to check ceph orch status before merging this PR:

master:~ # ceph orch status
Backend: cephadm
Available: False (SSH keys not set. Use `ceph cephadm set-priv-key` and `ceph cephadm set-pub-key` or `ceph cephadm generate-key`)
master:~ # echo $?
0

[1] https://github.com/ceph/ceph-salt/blob/master/ceph-salt-formula/salt/_modules/ceph_orch.py#L10

@ricardoasmarques
Copy link
Contributor Author

ricardoasmarques commented Jun 26, 2020

The exit code issue is now reported in cephadm: https://tracker.ceph.com/issues/46233

@ricardoasmarques ricardoasmarques changed the title [After ceph/ceph#35678] Rely on "bootstrap" to configure MGR module [After #283] Rely on "bootstrap" to configure MGR module Jul 1, 2020
@ricardoasmarques ricardoasmarques force-pushed the rely-on-cephadm-bootstrap-to-configure-mgr-module branch from e0dccff to b90c0f2 Compare July 6, 2020 09:40
Fixes: ceph#236

Signed-off-by: Ricardo Marques <rimarques@suse.com>
@ricardoasmarques ricardoasmarques force-pushed the rely-on-cephadm-bootstrap-to-configure-mgr-module branch from b90c0f2 to e794816 Compare July 16, 2020 09:05
@ricardoasmarques ricardoasmarques changed the title [After #283] Rely on "bootstrap" to configure MGR module Rely on "bootstrap" to configure MGR module Jul 16, 2020
@ricardoasmarques
Copy link
Contributor Author

All required ceph PRs are now backported, and this PR is now working as expected.

@ricardoasmarques ricardoasmarques merged commit 9fc113c into ceph:master Jul 16, 2020
@ricardoasmarques ricardoasmarques removed the Add To Changelog Marks the PR to be included in the changelog of the next version release label Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need to invoke ceph orch apply crash

2 participants