cephadm,msg: ensure msgr address is unique when we have an init in our container by liewegas · Pull Request #39739 · ceph/ceph

liewegas · 2021-02-27T20:47:59Z

We normally detect we're in a container by checking for our pid being 1, but when we have an init process, that doesn't work.. our pid will be small (e.g., 7). Ensure we choose a random nonce in such situations.

src/msg/Messenger.cc

liewegas · 2021-02-28T02:37:10Z

https://pulpito.ceph.com/sage-2021-02-27_23:26:44-rados:cephadm:thrash-wip-sage3-testing-2021-02-27-1555-distro-basic-smithi/
should still go through the full rados suite tho

sebastian-philipp · 2021-03-01T11:22:50Z

Fixes: https://tracker.ceph.com/issues/49336

src/msg/Messenger.cc

liewegas · 2021-03-01T15:43:28Z

jenkins test make check

liewegas · 2021-03-01T15:43:40Z

retest this please

sebastian-philipp · 2021-03-01T15:55:12Z

we probably need to backport this. @travisn this might also be important for Rook!

mgfritch · 2021-03-01T15:49:13Z

src/msg/Messenger.cc

 {
  uint64_t nonce = getpid();
-  if (nonce == 1) {
+  if (nonce <= 10 || getenv("CEPH_CONTAINER_HAS_INIT")) {


Suggested change

if (nonce <= 10 || getenv("CEPH_CONTAINER_HAS_INIT")) {

if (nonce <= 10 || getenv("CONTAINER_IMAGE")) {

(nit) rook and cephadm currently set CONTAINER_IMAGE ... maybe that would simply things?

See: https://github.com/rook/rook/blob/bd9010e7fcae43edc6f7a076bfe9f2a5f8dc03c8/pkg/operator/k8sutil/pod.go#L301

I thought about that, but (1) it seems possible that this variable is set and we're not in a container, and (2) we might be in a container and not have an init process, in which case the pid behavior is still appropriate.

We could do CEPH_USE_RANDOM_NONCE instead though!

updated. also switched it back to pid == 1

should we always set CEPH_USE_RANDOM_NONCE (regardless of if an init is used)?

This reverts commit 9200b1e, reversing changes made to e42bbba. For running tests to narrow down the root cause of: https://tracker.ceph.com/issues/49237 Signed-off-by: Michael Fritch <mfritch@suse.com>

If we are in a container, then we do not have a unique pid, and need to use a random nonce. We normally detect this if our pid is 1, but that doesn't work when we have a init process--we'll (probably?) have a small pid (in my tests, the OSDs were getting pid 7). To be safe, also check for an environment variable set by cephadm. This avoids problems that arise when we don't have a unique address. Fixes: https://tracker.ceph.com/issues/49534 Signed-off-by: Sage Weil <sage@newdream.net>

This ensures that daemon messenger nonces don't collide by using PIDs that are no longer unique for the IP address. Signed-off-by: Sage Weil <sage@newdream.net>

travisn · 2021-03-01T18:57:25Z

we probably need to backport this. @travisn this might also be important for Rook!

@sebastian-philipp Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

sebastian-philipp · 2021-03-01T19:14:32Z

we probably need to backport this. @travisn this might also be important for Rook!

@sebastian-philipp Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

Interesting! I thought Rook uses tini. Just make sure the MGR is not accumulating zombies.

travisn · 2021-03-01T19:23:29Z

we probably need to backport this. @travisn this might also be important for Rook!

@sebastian-philipp Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

Interesting! I thought Rook uses tini. Just make sure the MGR is not accumulating zombies.

Yes, in the past we were using tini, but it was removed since the conversion to take the rook container out of the ceph daemon pods. Does ceph not handle them as pid 1? If not, which daemons would this affect? I imagine the mgr with the modules would be mostly affected, although I haven't heard of any regression in this regard since that change.
@leseb fyi

sebastian-philipp · 2021-03-01T19:44:51Z

you shoudn't be able to get coredumps though.

smithfarm · 2021-03-02T14:39:01Z

Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

@travisn How do you get coredumps from the daemons when they crash, without a parent init process?

leseb · 2021-03-02T15:58:41Z

Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

@travisn How do you get coredumps from the daemons when they crash, without a parent init process?

Each container still has a parent from the host namespace, which is the container engine process. So we can coredumps normally just like any daemons most of the time in /var/lib/systemd/coredump/.

sebastian-philipp · 2021-03-02T16:44:45Z

@leseb @travisn fyi, we just saw zombies in the MGR container in the LRC cluster

smithfarm · 2021-03-02T17:08:06Z

Each container still has a parent from the host namespace, which is the container engine process. So we can coredumps normally just like any daemons most of the time in /var/lib/systemd/coredump/.

Sounds like the missing coredumps issue on the cephadm side is a "feature" of podman, then.

travisn · 2021-03-02T19:23:58Z

@leseb @travisn fyi, we just saw zombies in the MGR container in the LRC cluster

@sebastian-philipp That's in a rook cluster? Could you open a rook issue with any details you observed or repro steps? thanks

sebastian-philipp · 2021-03-02T19:28:45Z

@leseb @travisn fyi, we just saw zombies in the MGR container in the LRC cluster

@sebastian-philipp That's in a rook cluster? Could you open a rook issue with any details you observed or repro steps? thanks

cephadm

liewegas requested a review from a team as a code owner February 27, 2021 20:47

liewegas force-pushed the cephadm-nonce branch from 7e1cc91 to 02ff19c Compare February 27, 2021 20:48

github-actions bot added cephadm core labels Feb 27, 2021

liewegas added the wip-sage3-testing label Feb 27, 2021

github-actions bot added documentation pybind labels Feb 27, 2021

liewegas force-pushed the cephadm-nonce branch from f760336 to 2d3f64f Compare February 27, 2021 20:57

sebastian-philipp reviewed Feb 27, 2021

View reviewed changes

src/msg/Messenger.cc Outdated Show resolved Hide resolved

liewegas force-pushed the cephadm-nonce branch from 2d3f64f to 00b3a3f Compare February 28, 2021 02:40

sebastian-philipp added the rook label Feb 28, 2021

liewegas added the wip-sage-testing label Feb 28, 2021

smithfarm reviewed Mar 1, 2021

View reviewed changes

src/msg/Messenger.cc Outdated Show resolved Hide resolved

liewegas force-pushed the cephadm-nonce branch from 00b3a3f to 718d6ce Compare March 1, 2021 13:10

mgfritch mentioned this pull request Mar 1, 2021

cephadm: Add --container-init again #39530

Closed

3 tasks

sebastian-philipp approved these changes Mar 1, 2021

View reviewed changes

mgfritch approved these changes Mar 1, 2021

View reviewed changes

liewegas force-pushed the cephadm-nonce branch from 718d6ce to 58a086a Compare March 1, 2021 16:07

mgfritch mentioned this pull request Mar 1, 2021

octopus: mgr/cephadm: make --container-init a global option #39423

Closed

3 tasks

liewegas added 3 commits March 1, 2021 11:27

Revert "Merge PR ceph#39482 into master"

a16e46e

This reverts commit 9200b1e, reversing changes made to e42bbba. For running tests to narrow down the root cause of: https://tracker.ceph.com/issues/49237 Signed-off-by: Michael Fritch <mfritch@suse.com>

cephadm: set CEPH_USE_RANDOM_NONCE if using --init

576823b

This ensures that daemon messenger nonces don't collide by using PIDs that are no longer unique for the IP address. Signed-off-by: Sage Weil <sage@newdream.net>

liewegas force-pushed the cephadm-nonce branch from 58a086a to 576823b Compare March 1, 2021 16:27

liewegas merged commit 65724b1 into ceph:master Mar 2, 2021

sebastian-philipp mentioned this pull request Mar 3, 2021

pacific: cephadm: Batch backport March (1) #39807

Merged

sebastian-philipp removed the rook label Mar 4, 2021

mgfritch mentioned this pull request Mar 8, 2021

octopus: cephadm: run containers using --init #39914

Merged

3 tasks

travisn mentioned this pull request Dec 1, 2022

core: Set env var for ceph msgr to always use random nonce rook/rook#11373

Merged

7 tasks

	if (nonce <= 10 \|\| getenv("CEPH_CONTAINER_HAS_INIT")) {
	if (nonce <= 10 \|\| getenv("CONTAINER_IMAGE")) {

Conversation

liewegas commented Feb 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

liewegas commented Feb 28, 2021

Uh oh!

sebastian-philipp commented Mar 1, 2021

Uh oh!

Uh oh!

liewegas commented Mar 1, 2021

Uh oh!

liewegas commented Mar 1, 2021

Uh oh!

sebastian-philipp commented Mar 1, 2021

Uh oh!

mgfritch Mar 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liewegas Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

liewegas Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

mgfritch Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

travisn commented Mar 1, 2021

Uh oh!

sebastian-philipp commented Mar 1, 2021

Uh oh!

travisn commented Mar 1, 2021

Uh oh!

sebastian-philipp commented Mar 1, 2021

Uh oh!

smithfarm commented Mar 2, 2021

Uh oh!

leseb commented Mar 2, 2021

Uh oh!

sebastian-philipp commented Mar 2, 2021

Uh oh!

smithfarm commented Mar 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

travisn commented Mar 2, 2021

Uh oh!

sebastian-philipp commented Mar 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

liewegas commented Feb 27, 2021 •

edited

Loading

mgfritch Mar 1, 2021 •

edited

Loading

smithfarm commented Mar 2, 2021 •

edited

Loading