Skip to content

CI: K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running #13552

@pchaigno

Description

@pchaigno

Stacktrace

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Kernel/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:461
cilium pre-flight checks failed
Expected
    <*errors.errorString | 0xc000523410>: {
        s: "Cilium validation failed: 4m0s timeout expired: Last polled error: connectivity health is failing: Cluster connectivity is unhealthy on 'cilium-v8b2c': Exitcode: 255 \nErr: exit status 255\nStdout:\n \t \nStderr:\n \t Error: Cannot get status/probe: Put \"http://%2Fvar%2Frun%2Fcilium%2Fhealth.sock/v1beta/status/probe\": context deadline exceeded\n\t \n\t command terminated with exit code 255\n\t \n",
    }
to be nil
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Kernel/src/github.com/cilium/cilium/test/k8sT/assertionHelpers.go:107

Standard Output

Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
No errors/warnings found in logs
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 0
No errors/warnings found in logs
⚠️  Number of "context deadline exceeded" in logs: 16
Number of "level=error" in logs: 0
⚠️  Number of "level=warning" in logs: 6
Number of "Cilium API handler panicked" in logs: 0
⚠️  Number of "Goroutine took lock for more than" in logs: 7
Top 3 errors/warnings:
Session affinity for host reachable services needs kernel 5.7.0 or newer to work properly when accessed from inside cluster: the same service endpoint will be selected from all network namespaces on the host.
BPF bandwidth manager needs kernel 5.0 or newer. Disabling the feature.
Unable to update ipcache map entry on pod add
Cilium pods: [cilium-htxg2 cilium-v8b2c]
Netpols loaded: 
CiliumNetworkPolicies loaded: 
Endpoint Policy Enforcement:
Pod                           Ingress   Egress
grafana-54dbdc987-hgv4n                 
prometheus-6ff848df8b-5klz7             
coredns-7964865f77-t6r8z                
Cilium agent 'cilium-htxg2': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 17 Failed 0
Cilium agent 'cilium-v8b2c': Status: Ok  Health: Ok Nodes "" ContinerRuntime:  Kubernetes: Ok KVstore: Ok Controllers: Total 21 Failed 0

https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Kernel/3445/testReport/junit/Suite-k8s-1/18/K8sChaosTest_Connectivity_demo_application_Endpoint_can_still_connect_while_Cilium_is_not_running/
0892951b_K8sChaosTest_Connectivity_demo_application_Endpoint_can_still_connect_while_Cilium_is_not_running.zip

This test failing then caused two other subsequent tests to fail with failed due to BeforeAll failure:

Suite-k8s-1.18.K8sChaosTest Restart with long lived connections TCP connection is not dropped when cilium restarts
Suite-k8s-1.18.K8sChaosTest Restart with long lived connections L3/L4 policies still work while Cilium is restarted

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/CIContinuous Integration testing issue or flakearea/proxyImpacts proxy components, including DNS, Kafka, Envoy and/or XDS servers.ci/flakeThis is a known failure that occurs in the tree. Please investigate me!staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions