-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Description
On my prod cluster, which has been running Conduit 0.3.1 for a week now:
k get pod -n conduit
NAME READY STATUS RESTARTS AGE
controller-598b867959-5qw44 6/6 Running 90 6d
grafana-758656bbf4-x7qhn 2/2 Running 0 6d
prometheus-55b86d854b-wjm7f 2/2 Running 0 6d
web-698bd69459-2b25z 2/2 Running 0 1m
Notice the 90 restarts of the controller pod.
k describe pod -n conduit controller-598b867959-5qw44
Name: controller-598b867959-5qw44
Namespace: conduit
Node: gke-studyo-prod-default-pool-6cc404b0-zr5h/10.142.0.4
Start Time: Fri, 16 Mar 2018 14:01:09 -0400
Labels: conduit.io/control-plane-component=controller
conduit.io/control-plane-ns=conduit
pod-template-hash=1546423515
Annotations: conduit.io/created-by=conduit/cli v0.3.1
conduit.io/proxy-version=v0.3.1
kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"conduit","name":"controller-598b867959","uid":"018241b3-2944-11e8-8aed-42010af000...
Status: Running
IP: 10.32.0.215
Created By: ReplicaSet/controller-598b867959
Controlled By: ReplicaSet/controller-598b867959
Init Containers:
conduit-init:
Container ID: docker://5655baddb2d72b3f84562d0d61c5064240c09316c21f3a579953763127e62364
Image: gcr.io/runconduit/proxy-init:v0.3.1
Image ID: docker-pullable://gcr.io/runconduit/proxy-init@sha256:b82bec4add084a618fe75e419401190ce3044a23706aa846b6f5e59c8225586d
Port: <none>
Args:
--incoming-proxy-port
4143
--outgoing-proxy-port
4140
--proxy-uid
2102
--inbound-ports-to-ignore
4190
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 16 Mar 2018 14:01:11 -0400
Finished: Fri, 16 Mar 2018 14:01:11 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
Containers:
public-api:
Container ID: docker://645c669882b7690f1097fd43579400293339c1dcd67a8a73a3cabb59cabcd819
Image: gcr.io/runconduit/controller:v0.3.1
Image ID: docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
Ports: 8085/TCP, 9995/TCP
Args:
public-api
-controller-namespace=conduit
-log-level=info
-logtostderr=true
State: Running
Started: Fri, 16 Mar 2018 14:01:16 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
destination:
Container ID: docker://82bcaa31fb1dc74e60f5bb4a03cedd73559ca176146aca60237ebf232808af7b
Image: gcr.io/runconduit/controller:v0.3.1
Image ID: docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
Ports: 8089/TCP, 9999/TCP
Args:
destination
-log-level=info
-logtostderr=true
State: Running
Started: Fri, 16 Mar 2018 14:01:17 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
proxy-api:
Container ID: docker://a7ebbf3b120e23798483b6747c99746cfd2aebf9a4ab7f5493a896607452850c
Image: gcr.io/runconduit/controller:v0.3.1
Image ID: docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
Ports: 8086/TCP, 9996/TCP
Args:
proxy-api
-log-level=info
-logtostderr=true
State: Running
Started: Fri, 16 Mar 2018 14:01:17 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
tap:
Container ID: docker://8690ab373e84fc0497953daac78b550fbbf2aeea0874b06b34244e3d3575bd19
Image: gcr.io/runconduit/controller:v0.3.1
Image ID: docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
Ports: 8088/TCP, 9998/TCP
Args:
tap
-log-level=info
-logtostderr=true
State: Running
Started: Fri, 16 Mar 2018 14:01:18 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
telemetry:
Container ID: docker://e6030099a2f5e95e00c0b323982dbe79ee9504522aaa481fef6596c207e44819
Image: gcr.io/runconduit/controller:v0.3.1
Image ID: docker-pullable://gcr.io/runconduit/controller@sha256:ad98feaff4d25cc7ef6d70aa6c9a65e4334896363cfb9095fb361587e6e013a2
Ports: 8087/TCP, 9997/TCP
Args:
telemetry
-ignore-namespaces=kube-system
-prometheus-url=http://prometheus.conduit.svc.cluster.local:9090
-log-level=info
-logtostderr=true
State: Running
Started: Fri, 16 Mar 2018 14:01:18 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
conduit-proxy:
Container ID: docker://93201f10b729767bf5f277bbb1fabb6052b2f015682fe15d4ecf755e499dc260
Image: gcr.io/runconduit/proxy:v0.3.1
Image ID: docker-pullable://gcr.io/runconduit/proxy@sha256:847986d67b64fd68b0c369a9db84dc4f76a661248be61f5fc0b8e877cda42591
Port: 4143/TCP
State: Running
Started: Fri, 23 Mar 2018 13:07:16 -0400
Last State: Terminated
Reason: Error
Exit Code: 101
Started: Fri, 23 Mar 2018 12:57:41 -0400
Finished: Fri, 23 Mar 2018 13:07:01 -0400
Ready: True
Restart Count: 90
Environment:
CONDUIT_PROXY_LOG: warn,conduit_proxy=info
CONDUIT_PROXY_CONTROL_URL: tcp://localhost:8086
CONDUIT_PROXY_CONTROL_LISTENER: tcp://0.0.0.0:4190
CONDUIT_PROXY_PRIVATE_LISTENER: tcp://127.0.0.1:4140
CONDUIT_PROXY_PUBLIC_LISTENER: tcp://0.0.0.0:4143
CONDUIT_PROXY_NODE_NAME: (v1:spec.nodeName)
CONDUIT_PROXY_POD_NAME: controller-598b867959-5qw44 (v1:metadata.name)
CONDUIT_PROXY_POD_NAMESPACE: conduit (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from conduit-controller-token-xpjlv (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
conduit-controller-token-xpjlv:
Type: Secret (a volume populated by a Secret)
SecretName: conduit-controller-token-xpjlv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 41m (x7 over 6d) kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Back-off restarting failed container
Warning FailedSync 41m (x7 over 6d) kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Error syncing pod
Normal Created 41m (x91 over 6d) kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Created container
Normal Pulled 41m (x90 over 6d) kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Container image "gcr.io/runconduit/proxy:v0.3.1" already present on machine
Normal Started 41m (x91 over 6d) kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Started container
k get events --sort-by='.metadata.creationTimestamp' -n conduit
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
45m 6d 91 controller-598b867959-5qw44.151c7883611b3dea Pod spec.containers{conduit-proxy} Normal Started kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Started container
45m 6d 90 controller-598b867959-5qw44.151c8fe15df4a8f2 Pod spec.containers{conduit-proxy} Normal Pulled kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Container image "gcr.io/runconduit/proxy:v0.3.1" already present on machine
45m 6d 91 controller-598b867959-5qw44.151c788357a90359 Pod spec.containers{conduit-proxy} Normal Created kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Created container
45m 6d 7 controller-598b867959-5qw44.151cb385400d97d9 Pod Warning FailedSync kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Error syncing pod
45m 6d 7 controller-598b867959-5qw44.151cb3853ffa2160 Pod spec.containers{conduit-proxy} Warning BackOff kubelet, gke-studyo-prod-default-pool-6cc404b0-zr5h Back-off restarting failed container
This does not affect runtime behavior - all calls are being proxied as expected, and live stats are reported by conduit stats and conduit dashboard.
However, conduit dashboard does not reflect deployment scale changes (e.g. scaling up a deployment, dashboard still shows the old cound of green dots).
I won't touch anything for now, if there is any other info you need me to provide I'll be happy to do so.
Reactions are currently unavailable