Skip to content

cephadm: Add --container-init#36822

Merged
sebastian-philipp merged 2 commits intoceph:masterfrom
sebastian-philipp:cephadm-container-init
Sep 11, 2020
Merged

cephadm: Add --container-init#36822
sebastian-philipp merged 2 commits intoceph:masterfrom
sebastian-philipp:cephadm-container-init

Conversation

@sebastian-philipp
Copy link
Contributor

@sebastian-philipp sebastian-philipp commented Aug 26, 2020

cephadm: Add --container-image

The kernel treats any process with PID 1 different. Especially
it does not generate a core dump. Call podman / docker with
--init in order to get core dumps.

In addition, we can now properly reap zombies processes.

Fixes: https://tracker.ceph.com/issues/44231

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

The kernel treats any process with PID 1 different. Especially
it does not generate a core dump. Call podman / docker with
--init in order to get core dumps.

In addition, we can now properly reap zombies processes.

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
@sebastian-philipp sebastian-philipp requested a review from a team as a code owner August 26, 2020 12:54
@sebastian-philipp sebastian-philipp changed the title Cephadm container init cephadm: Add --container-image Aug 26, 2020
@sebastian-philipp
Copy link
Contributor Author

@rouming

@rouming
Copy link

rouming commented Aug 26, 2020

@sebastian-philipp So does "--init" for podman work as expected?

@denisok
Copy link

denisok commented Aug 26, 2020

blueshark-1:~ # podman exec -it ceph-fa407826-e3a1-11ea-bae5-0cc47aaa2edc-mgr.blueshark-1.mtdpez bash
blueshark-1:/ # ps -ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 /dev/init -- /usr/bin/ceph-mgr -n mgr.blueshark-1.mtdpez -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug
    8 ?        Sl     0:01 /usr/bin/ceph-mgr -n mgr.blueshark-1.mtdpez -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug
   48 pts/0    Ss     0:00 bash
   76 pts/0    R+     0:00 ps -ax
blueshark-1:/ # kill -11 8
blueshark-1:/ # Error: non zero exit code: 137: OCI runtime error
blueshark-1:~ # podman ps
CONTAINER ID  IMAGE                                     COMMAND               CREATED            STATUS                PORTS  NAMES
0275b375677f  registry.suse.com/ses/7/ceph/ceph:latest  -n mon.blueshark-...  About an hour ago  Up About an hour ago         ceph-fa407826-e3a1-11ea-bae5-0cc47aaa2edc-mon.blueshark-1
92662573450b  registry.suse.com/ses/7/ceph/ceph:latest  -n client.crash.b...  2 hours ago        Up 2 hours ago               ceph-fa407826-e3a1-11ea-bae5-0cc47aaa2edc-crash.blueshark-1
blueshark-1:~ # podman ps
CONTAINER ID  IMAGE                                     COMMAND               CREATED            STATUS                PORTS  NAMES
e1a5b644b2db  registry.suse.com/ses/7/ceph/ceph:latest  -n mgr.blueshark-...  1 second ago       Up 1 second ago              ceph-fa407826-e3a1-11ea-bae5-0cc47aaa2edc-mgr.blueshark-1.mtdpez
0275b375677f  registry.suse.com/ses/7/ceph/ceph:latest  -n mon.blueshark-...  About an hour ago  Up About an hour ago         ceph-fa407826-e3a1-11ea-bae5-0cc47aaa2edc-mon.blueshark-1
92662573450b  registry.suse.com/ses/7/ceph/ceph:latest  -n client.crash.b...  2 hours ago        Up 2 hours ago               ceph-fa407826-e3a1-11ea-bae5-0cc47aaa2edc-crash.blueshark-1
blueshark-1:~ # coredumpctl list
TIME                            PID   UID   GID SIG COREFILE  EXE
Wed 2020-08-26 10:48:29 UTC    3682     0     0  11 present   /usr/bin/sleep
Wed 2020-08-26 10:50:08 UTC    2677   167   167  11 present   /usr/bin/ceph-mgr
Wed 2020-08-26 10:52:16 UTC    4389   167   167  11 present   /usr/bin/ceph-mgr
Wed 2020-08-26 11:25:57 UTC    5856   167   167  11 present   /usr/bin/ceph-mgr
Wed 2020-08-26 11:26:57 UTC    8838   167   167  11 present   /usr/bin/ceph-mgr
Wed 2020-08-26 13:08:45 UTC   17417   167   167  11 present   /usr/bin/ceph-mgr
Wed 2020-08-26 13:10:59 UTC   17838   167   167  11 present   /usr/bin/ceph-mgr
blueshark-1:~ # date
Wed Aug 26 13:11:24 UTC 2020

looks like it works, only thing that is strange is:
Error: non zero exit code: 137: OCI runtime error

@sebastian-philipp
Copy link
Contributor Author

@sebastian-philipp So does "--init" for podman work as expected?

when adding --init, podman will spawn the new process as a child of (typically) catatonit. (cc @cyphar ) I haven't been able to verify this, though.

@denisok
Copy link

denisok commented Aug 26, 2020

btw, I just added --init to existing mgr. So I didn't verify this code, just that --init option works.

@sebastian-philipp
Copy link
Contributor Author

btw, I just added --init to existing mgr. SO I didn't verify this code, just that --init option works.

Thanks!

@denisok
Copy link

denisok commented Aug 26, 2020

btw, I just added --init to existing mgr. SO I didn't verify this code, just that --init option works.

Thanks!

np, but please check Error: non zero exit code: 137: OCI runtime error doesn't look good.

@sebastian-philipp
Copy link
Contributor Author

btw, I just added --init to existing mgr. SO I didn't verify this code, just that --init option works.

Thanks!

np, but please check Error: non zero exit code: 137: OCI runtime error doesn't look good.

137 seems to be related to OOM.

@smithfarm smithfarm changed the title cephadm: Add --container-image cephadm: Add --container-init Aug 26, 2020
@smithfarm
Copy link
Contributor

@sebastian-philipp Do I understand this PR correctly that the --container-init option

  1. causes core dumps to be properly generated,
  2. causes zombie processes to be properly reaped, and
  3. is not the default?

Maybe the option should be --no-container-init instead?

@sebastian-philipp
Copy link
Contributor Author

  1. is not the default?

Maybe the option should be --no-container-init instead?

If I run this on my machine, I'm getting

$ podman run --init -it opensuse/leap:15.2 
Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory 

No way we can make this the default right now.

@cyphar
Copy link

cyphar commented Aug 27, 2020

For SUSE/openSUSE that is a packaging bug -- it's because podman doesn't want to use the installed catatonit (in /usr/bin/catatonit) and instead wants to use a podman-specific one (this is how RedHat packages it). We probably need to patch that (or otherwise configure it).

@sebastian-philipp
Copy link
Contributor Author

sebastian-philipp commented Aug 27, 2020

For SUSE/openSUSE that is a packaging bug -- it's because podman doesn't want to use the installed catatonit (in /usr/bin/catatonit) and instead wants to use a podman-specific one (this is how RedHat packages it). We probably need to patch that (or otherwise configure it).

I think SUSE is fine here. I was testing this on Ubuntu

@denisok
Copy link

denisok commented Aug 27, 2020

I would argue that coredump is very important to tackle many issues and sometimes that only thing devs have. I think --init or similar approach should be default behavior.
There is also --init-path in case there is binary somewhere else.

It is also could be workaround by something like --entrypoint=["bash", "-c", "ceph-...", "<ceph pararams", "; kill -9 -1"], but I think it is better to investigate if just --init could work on all supported platforms.

So I would say cephadm/ceph orch might be a little bit more intelligent about it and have some init process by default.

@sebastian-philipp
Copy link
Contributor Author

2020-08-27T11:17:24.294 INFO:tasks.workunit.client.0.smithi089.stderr:Traceback (most recent call last):
2020-08-27T11:17:24.294 INFO:tasks.workunit.client.0.smithi089.stderr:  File "/tmp/tmp.CRma7MlGmD/cephadm", line 5759, in <module>
2020-08-27T11:17:24.295 INFO:tasks.workunit.client.0.smithi089.stderr:    r = args.func()
2020-08-27T11:17:24.295 INFO:tasks.workunit.client.0.smithi089.stderr:  File "/tmp/tmp.CRma7MlGmD/cephadm", line 1215, in _default_image
2020-08-27T11:17:24.295 INFO:tasks.workunit.client.0.smithi089.stderr:    return func()
2020-08-27T11:17:24.295 INFO:tasks.workunit.client.0.smithi089.stderr:  File "/tmp/tmp.CRma7MlGmD/cephadm", line 3639, in command_adopt
2020-08-27T11:17:24.295 INFO:tasks.workunit.client.0.smithi089.stderr:    command_adopt_ceph(daemon_type, daemon_id, fsid);
2020-08-27T11:17:24.296 INFO:tasks.workunit.client.0.smithi089.stderr:  File "/tmp/tmp.CRma7MlGmD/cephadm", line 3843, in command_adopt_ceph
2020-08-27T11:17:24.296 INFO:tasks.workunit.client.0.smithi089.stderr:    c = get_container(fsid, daemon_type, daemon_id)
2020-08-27T11:17:24.296 INFO:tasks.workunit.client.0.smithi089.stderr:  File "/tmp/tmp.CRma7MlGmD/cephadm", line 1841, in get_container
2020-08-27T11:17:24.296 INFO:tasks.workunit.client.0.smithi089.stderr:    init=args.container_init,
2020-08-27T11:17:24.296 INFO:tasks.workunit.client.0.smithi089.stderr:AttributeError: 'Namespace' object has no attribute 'container_init'

https://pulpito.ceph.com/swagner-2020-08-27_10:58:02-rados:cephadm-wip-swagner3-testing-2020-08-27-0951-distro-basic-smithi/5380123/

@cyphar
Copy link

cyphar commented Aug 27, 2020

@denisok

It is also could be workaround by something like --entrypoint=["bash", "-c", "ceph-...", "<ceph pararams", "; kill -9 -1"], but I think it is better to investigate if just --init could work on all supported platforms.

--init solves several problems in addition to the whole coredump thing -- running a program under bash is probably not a good idea in general because bash doesn't forward signals to child processes (so you won't get SIGTERM for graceful shutdowns by the container runtime). To be fair, this problem likely already exists (I don't know if Ceph does explicit signal handling if run as PID1 -- but most programs don't).

@sebastian-philipp Ah okay, yeah. Someone should probably send a bug report to Ubuntu. I just double-checked and --init does actually work on openSUSE.

@denisok
Copy link

denisok commented Aug 27, 2020

@denisok

It is also could be workaround by something like --entrypoint=["bash", "-c", "ceph-...", "<ceph pararams", "; kill -9 -1"], but I think it is better to investigate if just --init could work on all supported platforms.

--init solves several problems in addition to the whole coredump thing -- running a program under bash is probably not a good idea in general because bash doesn't forward signals to child processes (so you won't get SIGTERM for graceful shutdowns by the container runtime). To be fair, this problem likely already exists (I don't know if Ceph does explicit signal handling if run as PID1 -- but most programs don't).

I don't like that as well. In that example bash runs as pid 1, at least at my machine. You need number of commands so bash would be pid 1.

@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Aug 28, 2020
@sebastian-philipp
Copy link
Contributor Author

@sebastian-philipp sebastian-philipp removed the wip-swagner-testing My Teuthology tests label Aug 31, 2020
The kernel treats any process with PID 1 different. Especially
it does not generate a core dump. Call podman / docker with
--init in order to get core dumps.

In addition, we can now properly reap zombies processes.

Fixes: https://tracker.ceph.com/issues/44231

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants