client-go: do not print the request latency logs at level 0 by default#101634
client-go: do not print the request latency logs at level 0 by default#101634SataQiu wants to merge 1 commit intokubernetes:masterfrom
Conversation
|
/priority important-soon |
|
/sig instrumentation |
|
/cc @logicalhan |
416d8ec to
f7bd795
Compare
|
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
logicalhan
left a comment
There was a problem hiding this comment.
I'm confused by this PR. From what I can tell you are changing a throttling setting, which shouldn't affect which level request latency logs are printed at. This change, IIUC, should just mean that instead of throttling at logs of V(0) we are throttling V(1) logs.
|
@logicalhan This change doesn't look like it's changing the throttling setting, but rather the log level at which the And it's true, it was last changed in the PR #88134 that @SataQiu linked. One commit before that the general log level for the Since I updated my When querying any resource. Hope this explanation helps and I give this a +1 to merge because this info is not useful for most api clients. And for the ones that need a bit more verbosity it will still show up when set verbosity to 1. |
|
/triage accept |
|
@caesarxuchao: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/triage accepted |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: logicalhan, SataQiu The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/assign @liggitt |
|
/unassign looks like #88134 cited #87740 (comment) as the reason for the higher log level |
It prints to stderr, right? There are many things that can print to stderr, I would not expect structured output to be passed via that output |
|
@liggitt Maybe V(1) and V(2) would be appropriate log levels (instead of V(0) and V(2)). |
|
cc @deads2k |
|
TBH, the messages have been useful to us at their current level because component authors find them and wonder why they see them. I'm biased towards keeping delays of over one second as info level messages. @SataQiu do you have examples of healthy components (no hot or warm loops) with properly configured client-side rate limits seeing this message unnecessarily? It seems like these messages at an info level would be helpful, not harmful. |
Emm... /close |
|
@SataQiu: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
My guess is that most people don't know they are using client side rate limits and on average printing this log message saves 10 hours of debugging for every hour it costs someone trying to hide it. |
|
@lavalamp since everyone finds these messages helpful maybe I'm just interpreting them wrong. And rather than wanting to hide them I should figure out what causes them. Edit: I know that our clusters have many CRDs (mainly through cloud vendor provisioners like Google Config Connector) and I know it has to do with that amount of CRDs and the discovery cache that kubernetes keeps around. But I don't know how to improve on the situation and how to not constantly run into rate-limiting... |
|
@jonnylangefeld In code you can modify the rest.Config struct fields QPS and Burst. -1 disables, 0 chooses a default (50qps IIRC). Now that APF is defaulted on, I think most users can just disable client side limits and let the server worry about it (exception is if there are many watchers of the resources you are modifying, e.g. endpoints). We were going to add these to kubeconfig but didn't because it's only applicable to the standard go client: kubernetes/enhancements#1630 Some binaries (kube-controller-manager) might expose command line flags for these. |
|
Thanks for the explanation! hmm so for kubectl that would mean we maintain our own fork and build ourselves? |
|
How are you hitting this limit with kubectl? |
|
It seems like that has to do with the discovery cache. If I run So it's requests to all the custom resources we have installed on the cluster. I think the many GET requests trigger the client-side throttling. |
|
OK, that was going to be my guess. How many CRDs do you have? You should need over 100 group versions to trigger this. |
|
Yeah we have a bunch. As mentioned in my comment above it's over 200 individual CRD resources: And since you said only distinct group versions matter, I ran the following: So not quite 100 group versions but close. Now the question would be: What options do we have to either extend the validity of the cache (so it can rely on it longer), or maybe even better only build the cache if I'm querying for a resource that's not already in the discovery cache? |
|
That jq query is undercounting (need to emit a name for every version, not just the first). I have filed an issue about this since no one is going to see this buried here :) #105489 Thanks for the report. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Printing the latency logs by default will break the user's original behavior, especially if they are doing something with the command output.
IMO, we should print this message at a low level, but not at the default level of 0.
This PR:
throttledLoggerWhich issue(s) this PR fixes:
Fixes #101631
Ref #88134 (before this PR, we print the log at level 2)
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: