K8S & Calico information
HostOS: RHEL 8.2
K8S: on-premise cluster; version is v1.21.1; "IPVS" mode; IP4/IP6 dual stack; installed using kubespray
Calico: version is v3.18.4; non-BGP mode; enabled "IP6" DNAT.
Our docker image is built on top of "RHEL ubi:8"
We do not setup external ETCD cluster.
"kubectl describe" output
[support@node-cont-1-qa conf]$ kubectl describe pod export-job-job-dp8hb
Name: export-job-job-dp8hb
Namespace: pio
Priority: 0
Node: node-df1-1/10.0.156.180
Start Time: Wed, 23 Feb 2022 05:57:18 -0800
Labels: app.kubernetes.io/instance=export-job-job
controller-uid=5d9f3e4b-e74c-4280-a3be-e31d37e92b84
job-name=export-job-job
Annotations: cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Pending
IP:
IPs: <none>
Controlled By: Job/export-job-job
Containers:
export-job-job:
Container ID:
Image: 10.0.156.250:5000/img-admf:9.3.0.0B038
Image ID:
Port: <none>
Host Port: <none>
Command:
csh
Args:
-c
source /TT9/configXcp.sh; lis_conf; python2 /etc/pio/APPL/XcdbBackup.py --exportdb --dir /var/tmp; sleep 300
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
cpu: 500m
memory: 512Mi
Requests:
cpu: 200m
memory: 256Mi
Environment: <none>
Mounts:
/TT9/PIO/9.0.0/RUN/config/APPL/DBConMgr.cnfg from db-conf (rw,path="DBConMgr.cnfg")
/TT9/PIO/9.0.0/RUN/config/feature_conf.json from feature-conf (rw,path="feature_conf.json")
/TT9/PIO/9.0.0/RUN/license/license.json from license-conf (rw,path="license.json")
/etc/pio/APPL/XcdbBackup.py from job-script (rw,path="XcdbBackup.py")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jh7lg (ro)
/var/tmp from external-pv (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
job-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: export-job-script
Optional: false
db-conf:
Type: Secret (a volume populated by a Secret)
SecretName: db-secret
Optional: false
feature-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: feature
Optional: false
license-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: license
Optional: false
external-pv:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: backup-pvc
ReadOnly: false
kube-api-access-jh7lg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 52m default-scheduler Successfully assigned pio/export-job-job-dp8hb to node-df1-1
Warning FailedCreatePodSandBox 52m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "e46d8d9df11ef97e7e1d8b38ced7efef32e1cb4bfb0aa85809cb3198464b6167" network for pod "export-job-job-dp8hb": networkPlugin cni failed to set up pod "export-job-job-dp8hb_pio" network: connection is unauthorized: Unauthorized, failed to clean up sandbox container "e46d8d9df11ef97e7e1d8b38ced7efef32e1cb4bfb0aa85809cb3198464b6167" network for pod "export-job-job-dp8hb": networkPlugin cni failed to teardown pod "export-job-job-dp8hb_pio" network: error getting ClusterInformation: connection is unauthorized: Unauthorized]
Normal SandboxChanged 50m (x10 over 52m) kubelet Pod sandbox changed, it will be killed and re-created.
Expected Behavior
Should start POD successfully
Steps to Reproduce
Sorry, the issue happened two times on different K8S cluster in our lab. And I did not keep any logs....
Myself want to know to reproduce too.
My initial thought(maybe wrong)
Since "kubectl describe" has "connection is unauthorized", I searched source code of K8S v1.21.1. K8S code does NOT has it. Then search it in Calico v3.22 (I am using V3.18.4, but there is not be big difference), find that "connection is unauthorized" exist in "libcalico-go/lib/erros/errors.go" . So, looks like the issue is caused by Calico. Then, use "error getting ClusterInformation" as keyword to search in K8S code but cannot find. And search in Calico code, can find it. So, I have confidence to say the issue is 100% related with Calico.
Because "connection is unauthorized" error prompt is related with "type ErrorConnectionUnauthorized struct", and "ErrorConnectionUnauthorized " is related with cooperation with ETCD, looks like that the issue is communication issue between Calico and ETCD.
By the way, /var/log/calico/cni/ does NOT has anything related with "etcd" during POD start/destroy while I did normal operation.
What I expect:
If possible, can you please tell me
1). Which webpage describes control/data flow between Calico and ETCD
2). log files and location that whole Calico uses
3). Did I miss any debug information
Thanks
K8S & Calico information
HostOS: RHEL 8.2
K8S: on-premise cluster; version is v1.21.1; "IPVS" mode; IP4/IP6 dual stack; installed using kubespray
Calico: version is v3.18.4; non-BGP mode; enabled "IP6" DNAT.
Our docker image is built on top of "RHEL ubi:8"
We do not setup external ETCD cluster.
"kubectl describe" output
Expected Behavior
Should start POD successfully
Steps to Reproduce
Sorry, the issue happened two times on different K8S cluster in our lab. And I did not keep any logs....
Myself want to know to reproduce too.
My initial thought(maybe wrong)
Since "kubectl describe" has "connection is unauthorized", I searched source code of K8S v1.21.1. K8S code does NOT has it. Then search it in Calico v3.22 (I am using V3.18.4, but there is not be big difference), find that "connection is unauthorized" exist in "libcalico-go/lib/erros/errors.go" . So, looks like the issue is caused by Calico. Then, use "error getting ClusterInformation" as keyword to search in K8S code but cannot find. And search in Calico code, can find it. So, I have confidence to say the issue is 100% related with Calico.
Because "connection is unauthorized" error prompt is related with "type ErrorConnectionUnauthorized struct", and "ErrorConnectionUnauthorized " is related with cooperation with ETCD, looks like that the issue is communication issue between Calico and ETCD.
By the way, /var/log/calico/cni/ does NOT has anything related with "etcd" during POD start/destroy while I did normal operation.
What I expect:
If possible, can you please tell me
1). Which webpage describes control/data flow between Calico and ETCD
2). log files and location that whole Calico uses
3). Did I miss any debug information
Thanks