Skip to content

oem-gce.service is broken after upgrading to flatcar 3139.2.0 #714

@ffilippopoulos

Description

@ffilippopoulos

Description

We have a Kubernetes cluster running on gcp nodes using flatcar os. When we updated nodes from flatcar version 3033.2.4 to 3139.2.0 (latest) we noticed that oem-gce.service fails to start with the following logs:

core@master-k8s-exp-1-839c ~ $ journalctl -u oem-gce.service                                                                                                                                                       
Apr 12 09:29:25 localhost systemd[1]: Starting GCE Linux Agent...                                                                                                                                                  
Apr 12 09:29:26 localhost mkfs.ext4[981]: mke2fs 1.45.5 (07-Jan-2020)
Apr 12 09:29:26 localhost mkfs.ext4[981]: [82B blob data]
Apr 12 09:29:26 localhost mkfs.ext4[981]: Creating filesystem with 262144 4k blocks and 65536 inodes
Apr 12 09:29:26 localhost mkfs.ext4[981]: Filesystem UUID: 9376da71-5201-4916-b603-141a89463da7
Apr 12 09:29:26 localhost mkfs.ext4[981]: Superblock backups stored on blocks:
Apr 12 09:29:26 localhost mkfs.ext4[981]:         32768, 98304, 163840, 229376
Apr 12 09:29:26 localhost mkfs.ext4[981]: [41B blob data]
Apr 12 09:29:26 localhost mkfs.ext4[981]: [38B blob data]
Apr 12 09:29:26 localhost mkfs.ext4[981]: Creating journal (8192 blocks): done
Apr 12 09:29:27 localhost mkfs.ext4[981]: [75B blob data]
Apr 12 09:29:27 localhost umount[1033]: umount: /var/lib/flatcar-oem-gce.img: not mounted.
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd-nspawn[1714]: Spawning container oem-gce on /var/lib/flatcar-oem-gce.img.
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd-nspawn[1714]: Press ^] three times within 1s to kill container.
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd[1]: Started GCE Linux Agent.
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd-nspawn[1714]: + '[' -e /etc/default/instance_configs.cfg.template ']'
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd-nspawn[1714]: + echo -e '[InstanceSetup]\nset_host_keys = false'
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd-nspawn[1714]: + /usr/bin/google_instance_setup
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd-nspawn[1714]: /init.sh: /usr/bin/google_instance_setup: /usr/lib/python-exec/python3.9/python3: bad interpreter: No such file or directory
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd-nspawn[1714]: Container oem-gce failed with error code 126.
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd[1]: oem-gce.service: Main process exited, code=exited, status=126/n/a
Apr 12 09:29:33 master-k8s-exp-1-839c.c.uw-dev.internal systemd[1]: oem-gce.service: Failed with result 'exit-code'

Impact

Since this service is critical for a few gcp features, including setting up host ip routes, a lot of things broke. In particular, we noticed as our external load balancing to the cluster failed.

Environment and steps to reproduce

Latest flatcar on gcp nodes should be enough to observe this behaviour.

Additional information

Flatcar Container Linux by Kinvolk 3139.2.0 (Oklo)
Kernel: 5.15.32-flatcar
Kubernetes: 1.22.5
Container-Runtime: docker://20.10.12

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't workingplatform/GCPRelated to Google Cloud Platform

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions