Skip to content

cephadm,msg: ensure msgr address is unique when we have an init in our container#39739

Merged
liewegas merged 3 commits intoceph:masterfrom
liewegas:cephadm-nonce
Mar 2, 2021
Merged

cephadm,msg: ensure msgr address is unique when we have an init in our container#39739
liewegas merged 3 commits intoceph:masterfrom
liewegas:cephadm-nonce

Conversation

@liewegas
Copy link
Member

@liewegas liewegas commented Feb 27, 2021

We normally detect we're in a container by checking for our pid being 1, but when we have an init process, that doesn't work.. our pid will be small (e.g., 7). Ensure we choose a random nonce in such situations.

@liewegas
Copy link
Member Author

@sebastian-philipp
Copy link
Contributor

@liewegas
Copy link
Member Author

liewegas commented Mar 1, 2021

jenkins test make check

@liewegas
Copy link
Member Author

liewegas commented Mar 1, 2021

retest this please

@sebastian-philipp
Copy link
Contributor

we probably need to backport this. @travisn this might also be important for Rook!

{
uint64_t nonce = getpid();
if (nonce == 1) {
if (nonce <= 10 || getenv("CEPH_CONTAINER_HAS_INIT")) {
Copy link
Contributor

@mgfritch mgfritch Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (nonce <= 10 || getenv("CEPH_CONTAINER_HAS_INIT")) {
if (nonce <= 10 || getenv("CONTAINER_IMAGE")) {

(nit) rook and cephadm currently set CONTAINER_IMAGE ... maybe that would simply things?

See: https://github.com/rook/rook/blob/bd9010e7fcae43edc6f7a076bfe9f2a5f8dc03c8/pkg/operator/k8sutil/pod.go#L301

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but (1) it seems possible that this variable is set and we're not in a container, and (2) we might be in a container and not have an init process, in which case the pid behavior is still appropriate.

We could do CEPH_USE_RANDOM_NONCE instead though!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. also switched it back to pid == 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we always set CEPH_USE_RANDOM_NONCE (regardless of if an init is used)?

liewegas added 3 commits March 1, 2021 11:27
This reverts commit 9200b1e, reversing
changes made to e42bbba.

For running tests to narrow down the root cause of:
https://tracker.ceph.com/issues/49237

Signed-off-by: Michael Fritch <mfritch@suse.com>
If we are in a container, then we do not have a unique pid, and need to
use a random nonce.  We normally detect this if our pid is 1, but that
doesn't work when we have a init process--we'll (probably?) have a small
pid (in my tests, the OSDs were getting pid 7).

To be safe, also check for an environment variable set by cephadm.

This avoids problems that arise when we don't have a unique address.

Fixes: https://tracker.ceph.com/issues/49534
Signed-off-by: Sage Weil <sage@newdream.net>
This ensures that daemon messenger nonces don't collide by using PIDs that are
no longer unique for the IP address.

Signed-off-by: Sage Weil <sage@newdream.net>
@travisn
Copy link
Member

travisn commented Mar 1, 2021

we probably need to backport this. @travisn this might also be important for Rook!

@sebastian-philipp Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

@sebastian-philipp
Copy link
Contributor

we probably need to backport this. @travisn this might also be important for Rook!

@sebastian-philipp Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

Interesting! I thought Rook uses tini. Just make sure the MGR is not accumulating zombies.

@travisn
Copy link
Member

travisn commented Mar 1, 2021

we probably need to backport this. @travisn this might also be important for Rook!

@sebastian-philipp Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

Interesting! I thought Rook uses tini. Just make sure the MGR is not accumulating zombies.

Yes, in the past we were using tini, but it was removed since the conversion to take the rook container out of the ceph daemon pods. Does ceph not handle them as pid 1? If not, which daemons would this affect? I imagine the mgr with the modules would be mostly affected, although I haven't heard of any regression in this regard since that change.
@leseb fyi

@sebastian-philipp
Copy link
Contributor

you shoudn't be able to get coredumps though.

@smithfarm
Copy link
Contributor

Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

@travisn How do you get coredumps from the daemons when they crash, without a parent init process?

@leseb
Copy link
Member

leseb commented Mar 2, 2021

Rook runs the ceph daemons as PID 1, so I believe we're actually fine unless there is another implication I'm missing.

@travisn How do you get coredumps from the daemons when they crash, without a parent init process?

Each container still has a parent from the host namespace, which is the container engine process. So we can coredumps normally just like any daemons most of the time in /var/lib/systemd/coredump/.

@liewegas liewegas merged commit 65724b1 into ceph:master Mar 2, 2021
@sebastian-philipp
Copy link
Contributor

@leseb @travisn fyi, we just saw zombies in the MGR container in the LRC cluster

@smithfarm
Copy link
Contributor

smithfarm commented Mar 2, 2021

Each container still has a parent from the host namespace, which is the container engine process. So we can coredumps normally just like any daemons most of the time in /var/lib/systemd/coredump/.

Sounds like the missing coredumps issue on the cephadm side is a "feature" of podman, then.

@travisn
Copy link
Member

travisn commented Mar 2, 2021

@leseb @travisn fyi, we just saw zombies in the MGR container in the LRC cluster

@sebastian-philipp That's in a rook cluster? Could you open a rook issue with any details you observed or repro steps? thanks

@sebastian-philipp
Copy link
Contributor

@leseb @travisn fyi, we just saw zombies in the MGR container in the LRC cluster

@sebastian-philipp That's in a rook cluster? Could you open a rook issue with any details you observed or repro steps? thanks

cephadm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants