You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In cert-manager 1.12 we introduced an experimental feature that allows cert-manager controller to only list the Secrets that it is concerned with. In cert-manager 1.13 this SecretsFilteredCaching feature was enabled by default. This feature is primarily intended to reduce the memory usage of the cert-manager controller, but it is relevant to this issue because it also stops cert-manager from requesting the data of all Secrets when it starts up. That query is partly responsible for overloading the Kubernetes API server.
We've documented why, when and how to configure cainjector to only list the Secrets from the cert-manager namespace, which will also avoid it requesting all the data of all Secrets on start up.
The WatchList feature, will be the ultimate solution to this problem, but it's an alpha / beta feature and not enabled by default in Kubernetes API server. Meanwhile you can experiment with the feature and tell us whether it helps.
Describe the bug:
On clusters with more than 20000 secrets this becomes a problem . The query that Cert manager does is not optimal.
/api/v1/secrets?limit=500&resourceVersion=0
resourceVersion=0 will cause to query always all secrects and limit=500 will not be taken into account. This way cert manager is not scalable for large deployments. Secrets are used not only for certificates.
I suggest to remove the resourceVersion=0 from the query which should make it a lot more faster.
Furhtermore cert manager will retry those queries without waiting for them to complete and they pile up and cause significant load even crashes on the API server. Cert manager basically DDoS'es the Api server.
We're hitting the same issue with:
quay.io/jetstack/cert-manager-cainjector:v0.11.0
quay.io/jetstack/cert-manager-controller:v0.11.0
and
quay.io/jetstack/cert-manager-controller:v1.1.0
quay.io/jetstack/cert-manager-cainjector:v1.1.0
Logs from API server
E0115 18:27:27.893242 1 runtime.go:78] Observed a panic: &errors.errorString{s:"killing connection/stream because serving request timed out and response had been started"} (killing connection/stream because serving request timed out and response had been started)
goroutine 79221267 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x3b1fda0, 0xc0001c6650)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc0feb65c90, 0x1, 0x1)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x3b1fda0, 0xc0001c6650)
/usr/local/go/src/runtime/panic.go:679 +0x1b2
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).timeout(0xc08dc08740, 0xc09ea59b80)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:257 +0x1cf
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc019eb1960, 0x4edf040, 0xc0a1206af0, 0xc07749d900)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:141 +0x310
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1(0x4edf040, 0xc0a1206af0, 0xc07749d800)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/waitgroup.go:47 +0x10f
net/http.HandlerFunc.ServeHTTP(0xc0434bf3e0, 0x4edf040, 0xc0a1206af0, 0xc07749d800)
/usr/local/go/src/net/http/server.go:2007 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x4edf040, 0xc0a1206af0, 0xc07749d600)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x274
net/http.HandlerFunc.ServeHTTP(0xc0434bf470, 0x4edf040, 0xc0a1206af0, 0xc07749d600)
/usr/local/go/src/net/http/server.go:2007 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithCacheControl.func1(0x4edf040, 0xc0a1206af0, 0xc07749d600)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/cachecontrol.go:31 +0xa8
net/http.HandlerFunc.ServeHTTP(0xc019eb1a20, 0x4edf040, 0xc0a1206af0, 0xc07749d600)
/usr/local/go/src/net/http/server.go:2007 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x4ed2980, 0xc0c4b6b240, 0xc07749d500)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:89 +0x2ca
net/http.HandlerFunc.ServeHTTP(0xc019eb1a40, 0x4ed2980, 0xc0c4b6b240, 0xc07749d500)
/usr/local/go/src/net/http/server.go:2007 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x4ed2980, 0xc0c4b6b240, 0xc07749d500)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:51 +0x13e
net/http.HandlerFunc.ServeHTTP(0xc019eb1a60, 0x4ed2980, 0xc0c4b6b240, 0xc07749d500)
/usr/local/go/src/net/http/server.go:2007 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc0434bf4a0, 0x4ed2980, 0xc0c4b6b240, 0xc07749d500)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/handler.go:189 +0x51
net/http.serverHandler.ServeHTTP(0xc009896a80, 0x4ed2980, 0xc0c4b6b240, 0xc07749d500)
/usr/local/go/src/net/http/server.go:2802 +0xa4
net/http.initNPNRequest.ServeHTTP(0x4eeb300, 0xc06df08a50, 0xc07a0df180, 0xc009896a80, 0x4ed2980, 0xc0c4b6b240, 0xc07749d500)
/usr/local/go/src/net/http/server.go:3366 +0x8d
k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).runHandler(0xc094106480, 0xc0c4b6b240, 0xc07749d500, 0xc08dc08340)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:2149 +0x9f
created by k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).processHeaders
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:1883 +0x4eb
E0115 18:27:27.893364 1 wrap.go:39] apiserver panic'd on GET /api/v1/secrets?limit=500&resourceVersion=0
I0115 18:27:27.893567 1 log.go:172] http2: panic serving 10.148.0.16:53202: killing connection/stream because serving request timed out and response had been started
goroutine 79221267 [running]:
k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).runHandler.func1(0xc0c4b6b240, 0xc0feb65f67, 0xc094106480)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:2142 +0x16b
panic(0x3b1fda0, 0xc0001c6650)
/usr/local/go/src/runtime/panic.go:679 +0x1b2
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc0feb65c90, 0x1, 0x1)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x105
panic(0x3b1fda0, 0xc0001c6650)
/usr/local/go/src/runtime/panic.go:679 +0x1b2
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).timeout(0xc08dc08740, 0xc09ea59b80)
/workspace/anago-v1.16.13-rc.0.25+dda9914de448ab/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:257 +0x1cf
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*time
Logs from ETCD:
...
Dec 11 09:21:08 ow-prod-k8s-master01 etcd[6830]: 2020-12-11 08:21:08.106948 W | etcdserver: failed to send out heartbeat on time (exceeded the 250ms timeout for 5.348150525s)
Dec 11 09:21:08 ow-prod-k8s-master01 etcd[6830]: 2020-12-11 08:21:08.106954 W | etcdserver: server is likely overloaded
posle bavni no uspqvashti ...
Dec 11 09:23:26 ow-prod-k8s-master01 etcd[6830]: 2020-12-11 08:23:26.433315 W | etcdserver: read-only range request "key:\"/registry/persistentvolumes/pvc-f31decea-7a39-4d11-bbbf-8eb45f433239\" " with result "range_response_count:1 size:1017" took too long (13.750148565s) to execute
Logs from cert-manager:
E0203 15:18:34.063192 1 wrap.go:39] apiserver panic'd on GET /api/v1/secrets?limit=500&resourceVersion=0
E0203 15:18:33.969252 1 reflector.go:123] external/io_k8s_client_go/tools/cache/reflector.go:96: Failed to list *v1.Secret: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 37511; INTERNAL_ERROR
Expected behaviour:
cert-manager to not try making heavy queries that need to query all secrets from all namespaces, but instead work per namespace.
Steps to reproduce the bug:
Generate 15000 secrets - no need for them to be for TLS certificates, any secret will do.
Look at the API server load and Cert-manager logs
Anything else we need to know?:
Environment details::
Kubernetes version: Kubernetes v1.16.13
Cloud-provider/provisioner: Vanilla K8s
cert-manager version: v1.1.0
Install method: helm (with CRDs applied before that)
Describe the bug:
On clusters with more than 20000 secrets this becomes a problem . The query that Cert manager does is not optimal.
/api/v1/secrets?limit=500&resourceVersion=0
resourceVersion=0 will cause to query always all secrects and limit=500 will not be taken into account. This way cert manager is not scalable for large deployments. Secrets are used not only for certificates.
As mentioned in : kubernetes/kubernetes#56278 and https://kubernetes.io/docs/reference/using-api/api-concepts/
I suggest to remove the resourceVersion=0 from the query which should make it a lot more faster.
Furhtermore cert manager will retry those queries without waiting for them to complete and they pile up and cause significant load even crashes on the API server. Cert manager basically DDoS'es the Api server.
We're hitting the same issue with:
quay.io/jetstack/cert-manager-cainjector:v0.11.0
quay.io/jetstack/cert-manager-controller:v0.11.0
and
quay.io/jetstack/cert-manager-controller:v1.1.0
quay.io/jetstack/cert-manager-cainjector:v1.1.0
Logs from API server
Logs from ETCD:
Logs from cert-manager:
Expected behaviour:
cert-manager to not try making heavy queries that need to query all secrets from all namespaces, but instead work per namespace.
Steps to reproduce the bug:
Generate 15000 secrets - no need for them to be for TLS certificates, any secret will do.
Look at the API server load and Cert-manager logs
Anything else we need to know?:
Environment details::
/kind bug