Skip to content

Controller pod has been restarted 90 times in the last week #607

@bourquep

Description

@bourquep

On my prod cluster, which has been running Conduit 0.3.1 for a week now:

k get pod -n conduit

NAME                          READY     STATUS    RESTARTS   AGE
controller-598b867959-5qw44   6/6       Running   90         6d
grafana-758656bbf4-x7qhn      2/2       Running   0          6d
prometheus-55b86d854b-wjm7f   2/2       Running   0          6d
web-698bd69459-2b25z          2/2       Running   0          1m

Notice the 90 restarts of the controller pod.


k describe pod -n conduit controller-598b867959-5qw44

Name:           controller-598b867959-5qw44
Namespace:      conduit
Node:           gke-studyo-prod-default-pool-6cc404b0-zr5h/10.142.0.4
Start Time:     Fri, 16 Mar 2018 14:01:09 -0400
Labels:         conduit.io/control-plane-component=controller
                conduit.io/control-plane-ns=conduit
                pod-template-hash=1546423515
Annotations:    conduit.io/created-by=conduit/cli v0.3.1
                conduit.io/proxy-version=v0.3.1
                kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"conduit","name":"controller-598b867959","uid":"018241b3-2944-11e8-8aed-42010af000...
Status:         Running
IP:             10.32.0.215
Created By:     ReplicaSet/controller-598b867959
Controlled By:  ReplicaSet/controller-598b867959
Init Containers:
  conduit-init:
    Container ID:  docker://5655baddb2d72b3f84562d0d61c5064240c09316c21f3a579953763127e62364
    Image:         gcr.io/runconduit/proxy-init:v0.3.1
    Image ID:      docker-pullable://gcr.io/runconduit/proxy-init@sha256:b82bec4add084a618fe75e419401190ce3044a23706aa846b6f5e59c8225586d
    Port:          <none>
    Args:
      --incoming-proxy-port
      4143
      --outgoing-proxy-port
      4140
      --proxy-uid
      2102
      --inbound-ports-to-ignore
      4190
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 16 Mar 2018 14:01:11 -0400
      Finished:     Fri, 16 Mar 2018 14:01:11 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
Containers:
  public-api:
    Container ID:  docker://645c669882b7690f1097fd43579400293339c1dcd67a8a73a3cabb59cabcd819
    Image:         gcr.io/runconduit/controller:v0.3.1
    Image ID:      docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
    Ports:         8085/TCP, 9995/TCP
    Args:
      public-api
      -controller-namespace=conduit
      -log-level=info
      -logtostderr=true
    State:          Running
      Started:      Fri, 16 Mar 2018 14:01:16 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
  destination:
    Container ID:  docker://82bcaa31fb1dc74e60f5bb4a03cedd73559ca176146aca60237ebf232808af7b
    Image:         gcr.io/runconduit/controller:v0.3.1
    Image ID:      docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
    Ports:         8089/TCP, 9999/TCP
    Args:
      destination
      -log-level=info
      -logtostderr=true
    State:          Running
      Started:      Fri, 16 Mar 2018 14:01:17 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
  proxy-api:
    Container ID:  docker://a7ebbf3b120e23798483b6747c99746cfd2aebf9a4ab7f5493a896607452850c
    Image:         gcr.io/runconduit/controller:v0.3.1
    Image ID:      docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
    Ports:         8086/TCP, 9996/TCP
    Args:
      proxy-api
      -log-level=info
      -logtostderr=true
    State:          Running
      Started:      Fri, 16 Mar 2018 14:01:17 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
  tap:
    Container ID:  docker://8690ab373e84fc0497953daac78b550fbbf2aeea0874b06b34244e3d3575bd19
    Image:         gcr.io/runconduit/controller:v0.3.1
    Image ID:      docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
    Ports:         8088/TCP, 9998/TCP
    Args:
      tap
      -log-level=info
      -logtostderr=true
    State:          Running
      Started:      Fri, 16 Mar 2018 14:01:18 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
  telemetry:
    Container ID:  docker://e6030099a2f5e95e00c0b323982dbe79ee9504522aaa481fef6596c207e44819
    Image:         gcr.io/runconduit/controller:v0.3.1
    Image ID:      docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
    Ports:         8087/TCP, 9997/TCP
    Args:
      telemetry
      -ignore-namespaces=kube-system
      -prometheus-url=http://prometheus.conduit.svc.cluster.local:9090
      -log-level=info
      -logtostderr=true
    State:          Running
      Started:      Fri, 16 Mar 2018 14:01:18 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
  conduit-proxy:
    Container ID:   docker://93201f10b729767bf5f277bbb1fabb6052b2f015682fe15d4ecf755e499dc260
    Image:          gcr.io/runconduit/proxy:v0.3.1
    Image ID:       docker-pullable://gcr.io/runconduit/proxy@sha256:847986d67b64fd68b0c369a9db84dc4f76a661248be61f5fc0b8e877cda42591
    Port:           4143/TCP
    State:          Running
      Started:      Fri, 23 Mar 2018 13:07:16 -0400
    Last State:     Terminated
      Reason:       Error
      Exit Code:    101
      Started:      Fri, 23 Mar 2018 12:57:41 -0400
      Finished:     Fri, 23 Mar 2018 13:07:01 -0400
    Ready:          True
    Restart Count:  90
    Environment:
      CONDUIT_PROXY_LOG:               warn,conduit_proxy=info
      CONDUIT_PROXY_CONTROL_URL:       tcp://localhost:8086
      CONDUIT_PROXY_CONTROL_LISTENER:  tcp://0.0.0.0:4190
      CONDUIT_PROXY_PRIVATE_LISTENER:  tcp://127.0.0.1:4140
      CONDUIT_PROXY_PUBLIC_LISTENER:   tcp://0.0.0.0:4143
      CONDUIT_PROXY_NODE_NAME:          (v1:spec.nodeName)
      CONDUIT_PROXY_POD_NAME:          controller-598b867959-5qw44 (v1:metadata.name)
      CONDUIT_PROXY_POD_NAMESPACE:     conduit (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  conduit-controller-token-xpjlv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  conduit-controller-token-xpjlv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason      Age                From                                                 Message
  ----     ------      ----               ----                                                 -------
  Warning  BackOff     41m (x7 over 6d)   kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h  Back-off restarting failed container
  Warning  FailedSync  41m (x7 over 6d)   kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h  Error syncing pod
  Normal   Created     41m (x91 over 6d)  kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h  Created container
  Normal   Pulled      41m (x90 over 6d)  kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h  Container image "gcr.io/runconduit/proxy:v0.3.1" already present on machine
  Normal   Started     41m (x91 over 6d)  kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h  Started container

k get events --sort-by='.metadata.creationTimestamp' -n conduit

LAST SEEN   FIRST SEEN   COUNT     NAME                                           KIND         SUBOBJECT                           TYPE      REASON                  SOURCE                                                MESSAGE
45m         6d           91        controller-598b867959-5qw44.151c7883611b3dea   Pod          spec.containers{conduit-proxy}      Normal    Started                 kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h   Started container
45m         6d           90        controller-598b867959-5qw44.151c8fe15df4a8f2   Pod          spec.containers{conduit-proxy}      Normal    Pulled                  kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h   Container image "gcr.io/runconduit/proxy:v0.3.1" already present on machine
45m         6d           91        controller-598b867959-5qw44.151c788357a90359   Pod          spec.containers{conduit-proxy}      Normal    Created                 kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h   Created container
45m         6d           7         controller-598b867959-5qw44.151cb385400d97d9   Pod                                              Warning   FailedSync              kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h   Error syncing pod
45m         6d           7         controller-598b867959-5qw44.151cb3853ffa2160   Pod          spec.containers{conduit-proxy}      Warning   BackOff                 kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h   Back-off restarting failed container

This does not affect runtime behavior - all calls are being proxied as expected, and live stats are reported by conduit stats and conduit dashboard.

However, conduit dashboard does not reflect deployment scale changes (e.g. scaling up a deployment, dashboard still shows the old cound of green dots).

I won't touch anything for now, if there is any other info you need me to provide I'll be happy to do so.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions