cephadm: Get rid of injected_argv by jmolmo · Pull Request #37495 · ceph/ceph

jmolmo · 2020-09-30T13:48:04Z

Removed the injected_argv parameter and the injection of code in the cephadm
script we send to hosts.
Now the script is copied and after that we execute the cephadm command.
I would like to copy it only one time (when adding new hosts) but this will be
part of a future PR, together with other prs to:

Introduce cephadm version
Get rid of packaged/root mode
Use pex or eggs

Signed-off-by: Juan Miguel Olmo Martínez jolmomar@redhat.com

Note:
To test this is a bit tricky .. because the cephadm binary is by default obtained from "/usr/sbin" folder in the mgr container.
The new cephadm binary included in this pr is the one to test, so in order to do it properly:

1. Use the new script to bootstarp the cluster

2. Copy the new script to a location accessible from manager container:
for example:
/usr/share/ceph/mgr/cephadm/cephadm

3. Open a ceph shell:
 Change "cephadm_path": point to the new "cephadm tool" script
# ceph config get mgr cephadm_path
/usr/sbin/cephadm

# ceph config set mgr cephadm_path /usr/share/ceph/mgr/cephadm/cephadm

# Force read the new "cephadm" script from the orchestrator module
ceph mgr module disable cephadm
ceph mgr module enable cephadm

4. Add new hosts and OSDs..
```.

src/pybind/mgr/cephadm/module.py

src/cephadm/cephadm

src/pybind/mgr/cephadm/module.py

jschmid1 · 2020-10-01T09:45:01Z

Looks like there's a test failure:

  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/orchestrator/_interface.py", line 295, in _finalize
    next_result = self._on_complete(self._value)
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 110, in <lambda>
    return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 1365, in add_host
    return self._add_host(spec)
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 1351, in _add_host
    error_ok=True, no_fsid=True)
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 1284, in _run_cephadm
    self._copy_cephadm(conn, cephadm_dest)
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 1227, in _copy_cephadm
    stdin=self._cephadm.encode('utf-8'))
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/tests/test_cephadm.py", line 802, in _check
    raise Exception("boom: connection is dead")
Exception: boom: connection is dead

jschmid1 · 2020-10-01T09:59:02Z

For my own clarity. This PR does the following:

Every time we execute a command from mgr/cephadm, we no longer execute it by directly feeding it to python's stdin but rather save it to /var/lib/ceph/fsid/cephadm/$mgr_name and execute this script with the provided args instead.

We also mount /var/lib/ceph/fsid/ it to any type->mgr container.

Do we do anything with the mounted cephadm_path inside the container?

Is that correct @jmolmo ?

jmolmo · 2020-10-01T10:34:25Z

Looks like there's a test failure:

  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/orchestrator/_interface.py", line 295, in _finalize
    next_result = self._on_complete(self._value)
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 110, in <lambda>
    return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 1365, in add_host
    return self._add_host(spec)
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 1351, in _add_host
    error_ok=True, no_fsid=True)
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 1284, in _run_cephadm
    self._copy_cephadm(conn, cephadm_dest)
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/module.py", line 1227, in _copy_cephadm
    stdin=self._cephadm.encode('utf-8'))
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/tests/test_cephadm.py", line 802, in _check
    raise Exception("boom: connection is dead")
Exception: boom: connection is dead

By the context it seems that something happened to the destination host when trying to copy cephadm to it .... By the error message we can think that maybe it has been a network error....

Can you pass the link to the log file in order to see if i can get more information?

jmolmo · 2020-10-01T11:16:04Z

For my own clarity. This PR does the following:

Every time we execute a command from mgr/cephadm, we no longer execute it by directly feeding it to python's stdin but rather save it to /var/lib/ceph/fsid/cephadm/$mgr_name and execute this script with the provided args instead.

yes. You're right. The goal here is to avoid "inject" code in execution time. We are copying the script and after that execute the commands.
I do not like to copy the script each time we need to call cephadm tool, but I hope to solve that soon in a new PR ( It seems team do not like to have lot of things together in the same PR)

We also mount /var/lib/ceph/fsid/ it to any type->mgr container.

Do we do anything with the mounted cephadm_path inside the container?

No!. Good catch!! (this was part of my "source" PR where i had more things ... ) Removed!

Is that correct @jmolmo ?

Everything correct @jschmid1

jschmid1 · 2020-10-01T11:37:16Z

Can you pass the link to the log file in order to see if i can get more information?

Click on make check at the bottom of the pullrequest.

jschmid1 · 2020-10-01T11:40:01Z

For my own clarity. This PR does the following:
Every time we execute a command from mgr/cephadm, we no longer execute it by directly feeding it to python's stdin but rather save it to /var/lib/ceph/fsid/cephadm/$mgr_name and execute this script with the provided args instead.

yes. You're right. The goal here is to avoid "inject" code in execution time. We are copying the script and after that execute the commands.
I do not like to copy the script each time we need to call cephadm tool, but I hope to solve that soon in a new PR ( It seems team do not like to have lot of things together in the same PR)

Indeed it's easier to review and judge smaller portions of code.

Looking forward to future PRs

toabctl · 2020-10-01T12:35:48Z

src/pybind/mgr/cephadm/module.py

+                    cephadm_dest = '/var/lib/ceph/%s/cephadm/%s' % (self._cluster_fsid, self.get_mgr_id())
+                    self._copy_cephadm(conn, cephadm_dest)
+
+                    self.log.info("Tying to execute : %s" % (['sudo', 'python', cephadm_dest] + final_args))


python, not 'python'

jmolmo · 2020-10-01T13:16:45Z

It seems there is a test failing because I reuse the remoto connection object ... as we can read in the test:

"A mocked connection class that only allows the use of the connection once. If you attempt to use it again via a _check, it'll explode (go boom!)."

Somebody knows why to reuse a remoto connection object is not a good idea?

jmolmo · 2020-10-02T08:58:28Z

It seems there is a test failing because I reuse the remoto connection object ... as we can read in the test:

"A mocked connection class that only allows the use of the connection once. If you attempt to use it again via a _check, it'll explode (go boom!)."

Somebody knows why to reuse a remoto connection object is not a good idea?

Yes. Nothing like read the code ... We are reusing connections all the time ... not needed to do it explicitly :-)

jmolmo · 2020-10-06T10:24:36Z

I have modified the unit test because the previous one tries to test the "refresh" of the connection indirectly using higher level calls (check_host).
This approach has several problems:

You need to mock more things than the strictly needed.
A change in the implementation in other functions can affect negatively the test, although what is tested (deal with stale connections) is not affected by the modifications ( this is what has happened with this pr).

The modification in unit test, now test directly only the function where "reuse connection or reconnect" is implemented.

jmolmo · 2020-10-08T11:30:10Z

jenkins test make check

src/pybind/mgr/cephadm/module.py

src/pybind/mgr/cephadm/tests/test_cephadm.py

src/cephadm/cephadm

Removed the injected_argv parameter and the injection of code in the cephadm script we send to hosts. Now the script is copied and after that we execute the cephadm command. I would like to copy it only one time (when adding new hosts) but this will be part of a future PR, together with other prs to: - Introduce cephadm version - Get rid of packaged/root mode - Use pex or eggs Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>

Added container image hash to cephadm binary location. Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>

sebastian-philipp

minor nit and then I think we're good to go

sebastian-philipp · 2020-11-24T10:43:49Z

src/pybind/mgr/cephadm/module.py

+                    mgrd = self.cache.get_daemon("mgr.%s" % self.get_mgr_id())
+                    cephadm_dest = '/var/lib/ceph/%s/cephadm/%s/%s/cephadm' % (
+                        self._cluster_fsid, self.get_mgr_id(), mgrd.container_image_id)


this looks already good. There is just one problem: in a vstart cluster, there is no
containerized MGR. Therefore I think we can simplify this to:

Suggested change

mgrd = self.cache.get_daemon("mgr.%s" % self.get_mgr_id())

cephadm_dest = '/var/lib/ceph/%s/cephadm/%s/%s/cephadm' % (

self._cluster_fsid, self.get_mgr_id(), mgrd.container_image_id)

cephadm_hash = sha256(self._read_cephadm_script())

cephadm_dest = f'/var/lib/ceph/{self._cluster_fsid}/cephadm-{cephadm_hash}/cephadm'

ping. Needs rebase: now you also need to fix

ceph/src/pybind/mgr/cephadm/module.py

Lines 1930 to 1939 in fd1c8dd

def _deploy_cephadm_binary(self, host: str) -> bool:

# Use tee (from coreutils) to create a copy of cephadm on the target machine

self.log.info(f"Deploying cephadm binary to {host}")

with self._remote_connection(host) as tpl:

conn, _connr = tpl

_out, _err, code = remoto.process.check(

conn,

['tee', '-', '/var/lib/ceph/{}/cephadm'.format(self._cluster_fsid)],

stdin=self._cephadm.encode('utf-8'))

return code == 0

to contain the hash.

sebastian-philipp · 2020-12-09T14:04:41Z

src/pybind/mgr/cephadm/module.py

+                    mgrd = self.cache.get_daemon("mgr.%s" % self.get_mgr_id())
+                    cephadm_dest = '/var/lib/ceph/%s/cephadm/%s/%s/cephadm' % (
+                        self._cluster_fsid, self.get_mgr_id(), mgrd.container_image_id)


ping. Needs rebase: now you also need to fix

ceph/src/pybind/mgr/cephadm/module.py

Lines 1930 to 1939 in fd1c8dd

def _deploy_cephadm_binary(self, host: str) -> bool:

# Use tee (from coreutils) to create a copy of cephadm on the target machine

self.log.info(f"Deploying cephadm binary to {host}")

with self._remote_connection(host) as tpl:

conn, _connr = tpl

_out, _err, code = remoto.process.check(

conn,

['tee', '-', '/var/lib/ceph/{}/cephadm'.format(self._cluster_fsid)],

stdin=self._cephadm.encode('utf-8'))

return code == 0

to contain the hash.

liewegas · 2021-01-15T20:02:29Z

Is the goal to eliminate injected_args, or to avoid the overhead of transferring the script each time? If it is the former, we just replace that with an environment variable. e.g.,

                    out, err, code = remoto.process.check(
                        conn,
                        [python, '-u'],
                        stdin=script.encode('utf-8'),
                        extend_env={'CEPHADM_ARGS': json.dumps(args)})

sebastian-philipp · 2021-01-28T15:11:02Z

Is the goal to eliminate injected_args, or to avoid the overhead of transferring the script each time?

Both: injected_args prevents us from splitting up bin/cephadm into multiple source code files. And in addition, we already have cephadm deployed on the hosts:

ceph/src/pybind/mgr/cephadm/serve.py

Lines 1018 to 1027 in 39fd806

    
           def _deploy_cephadm_binary(self, host: str) -> bool: 
        
               # Use tee (from coreutils) to create a copy of cephadm on the target machine 
        
               self.log.info(f"Deploying cephadm binary to {host}") 
        
               with self._remote_connection(host) as tpl: 
        
                   conn, _connr = tpl 
        
                   _out, _err, code = remoto.process.check( 
        
                       conn, 
        
                       ['tee', '-', '/var/lib/ceph/{}/cephadm'.format(self.mgr._cluster_fsid)], 
        
                       stdin=self.mgr._cephadm.encode('utf-8')) 
        
               return code == 0

why not use it, if it is there already.

sebastian-philipp · 2021-01-28T16:01:56Z

just created #39141 to fix the unsafe cephadm daemon path

jmolmo added the cephadm label Sep 30, 2020

jmolmo requested a review from a team September 30, 2020 13:48

Daniel-Pivonka reviewed Sep 30, 2020

View reviewed changes

src/pybind/mgr/cephadm/module.py Outdated Show resolved Hide resolved

jschmid1 reviewed Oct 1, 2020

View reviewed changes

src/cephadm/cephadm Outdated Show resolved Hide resolved

jschmid1 reviewed Oct 1, 2020

View reviewed changes

src/pybind/mgr/cephadm/module.py Outdated Show resolved Hide resolved

jmolmo force-pushed the rm_injected_argv branch from 2e8a3f0 to a898367 Compare October 1, 2020 10:29

jmolmo force-pushed the rm_injected_argv branch from a898367 to 0bf7b2b Compare October 1, 2020 11:08

toabctl reviewed Oct 1, 2020

View reviewed changes

jmolmo force-pushed the rm_injected_argv branch from 0bf7b2b to f880d2a Compare October 6, 2020 10:14

jmolmo requested review from jschmid1 and toabctl October 6, 2020 11:09

jmolmo force-pushed the rm_injected_argv branch 6 times, most recently from c651b7e to 1fbb837 Compare October 7, 2020 15:30

jmolmo force-pushed the rm_injected_argv branch from 1fbb837 to 8f81f81 Compare October 9, 2020 10:19

sebastian-philipp suggested changes Oct 12, 2020

View reviewed changes

src/pybind/mgr/cephadm/module.py Outdated Show resolved Hide resolved

src/pybind/mgr/cephadm/module.py Outdated Show resolved Hide resolved

src/pybind/mgr/cephadm/tests/test_cephadm.py Show resolved Hide resolved

src/cephadm/cephadm Outdated Show resolved Hide resolved

jmolmo force-pushed the rm_injected_argv branch from 8f81f81 to 0a25e21 Compare October 13, 2020 10:11

jmolmo requested review from matthewoliver and sebastian-philipp October 13, 2020 10:14

jmolmo added 2 commits October 29, 2020 10:17

cephadm: Get rid of injected_argv

4482488

Added container image hash to cephadm binary location. Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>

jmolmo force-pushed the rm_injected_argv branch from 0a25e21 to 4482488 Compare October 29, 2020 15:35

sebastian-philipp suggested changes Nov 24, 2020

View reviewed changes

sebastian-philipp mentioned this pull request Nov 24, 2020

cephadm:Add a daemon mode for cephadm to provide a metadata endpoint #37130

Merged

sebastian-philipp suggested changes Dec 9, 2020

View reviewed changes

sebastian-philipp added the needs-rebase label Dec 10, 2020

sebastian-philipp mentioned this pull request Dec 31, 2020

cephadm: splits bootstrap function, add context, drop global variables #38739

Merged

3 tasks

sebastian-philipp mentioned this pull request Feb 2, 2021

cephadm: Make path to cephadm binary unique #39141

Closed

3 tasks

This was referenced Feb 21, 2021

gravel: expose bootstrap progress aquarist-labs/aquarium#179

Merged

cephadm: remove injected_args #39619

Merged

sebastian-philipp closed this Feb 22, 2021

	def _deploy_cephadm_binary(self, host: str) -> bool:
	# Use tee (from coreutils) to create a copy of cephadm on the target machine
	self.log.info(f"Deploying cephadm binary to {host}")
	with self._remote_connection(host) as tpl:
	conn, _connr = tpl
	_out, _err, code = remoto.process.check(
	conn,
	['tee', '-', '/var/lib/ceph/{}/cephadm'.format(self._cluster_fsid)],
	stdin=self._cephadm.encode('utf-8'))
	return code == 0

Conversation

jmolmo commented Sep 30, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jschmid1 commented Oct 1, 2020

Uh oh!

jschmid1 commented Oct 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmolmo commented Oct 1, 2020

Uh oh!

jmolmo commented Oct 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jschmid1 commented Oct 1, 2020

Uh oh!

jschmid1 commented Oct 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toabctl Oct 1, 2020

Choose a reason for hiding this comment

Uh oh!

jmolmo commented Oct 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmolmo commented Oct 2, 2020

Uh oh!

jmolmo commented Oct 6, 2020

Uh oh!

jmolmo commented Oct 8, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sebastian-philipp left a comment

Choose a reason for hiding this comment

Uh oh!

sebastian-philipp Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

sebastian-philipp Dec 9, 2020

Choose a reason for hiding this comment

Uh oh!

sebastian-philipp Dec 9, 2020

Choose a reason for hiding this comment

Uh oh!

liewegas commented Jan 15, 2021

Uh oh!

sebastian-philipp commented Jan 28, 2021

Uh oh!

sebastian-philipp commented Jan 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jschmid1 commented Oct 1, 2020 •

edited

Loading

jmolmo commented Oct 1, 2020 •

edited

Loading

jschmid1 commented Oct 1, 2020 •

edited

Loading

jmolmo commented Oct 1, 2020 •

edited

Loading