Add option to k8s apiserver to reject incoming requests upon audit failure#65763
Add option to k8s apiserver to reject incoming requests upon audit failure#65763k8s-ci-robot merged 1 commit intokubernetes:masterfrom
Conversation
tallclair
left a comment
There was a problem hiding this comment.
just a handful of nits, then LGTM.
There was a problem hiding this comment.
I don't think this is the right error code... maybe just return a 500?
There was a problem hiding this comment.
If this is a plain 500, then we won't be able to differentiate it from other errors. The one I used is from 5xx family, since it's a server side error. At the same time, it is more specific than "something broke server side" which is what 500 means. Maybe someone from apimachinery should take a look?
There was a problem hiding this comment.
Yeah, I was thinking maybe we don't want to distinguish it intentionally. Error messages like this can leak useful data to attackers, so distinguishing this case makes me a little nervous. Debug logs & metrics can provide information for those authorized, without clueing in the (possibly unprivileged) requester.
There was a problem hiding this comment.
That is a valid point. At the same time, using the same error code will limit the ability to monitor how often requests are rejected because of audit backend failures. We'll be able to see just how often they are rejected in general. Is it possible to have both security and observability?
There was a problem hiding this comment.
If you're talking about monitoring from the client side, then no, not without returning this information. But if you're talking about monitoring the server, then that's what prometheus metrics are for.
There was a problem hiding this comment.
I thought about existing prometheus metric that already exposes http response code, but you're right, this can be a separate metric exclusively for audit errors.
There was a problem hiding this comment.
Is this rendered correctly for plain content? I would think setting content-type + nosniff should be sufficient...
There was a problem hiding this comment.
Use html.EscapeString instead? https://golang.org/pkg/html/
There was a problem hiding this comment.
Maybe just use responsewriters.InternalErorr?
There was a problem hiding this comment.
alternatively, just use a named return value. I'll let you decide which is cleaner.
There was a problem hiding this comment.
Actually, buffer will not be used when blocking-strict is selected, changed to true to keep existing behavior.
There was a problem hiding this comment.
hmm, I was surprised you didn't just return the error, but I guess since we're handling the errors in-place it makes sense.
There was a problem hiding this comment.
alternatively: success = success && b.logEvent(ev)
There was a problem hiding this comment.
This would cause events to stop being logged after first failure.
There was a problem hiding this comment.
success = b.logEvent(ev) && success would work though.
There was a problem hiding this comment.
nit: just embed it, then you only need to define ProcessEvents.
There was a problem hiding this comment.
A dedicated package feels a bit heavy weight for this.... maybe just embed it in options? (should only need ~6 lines)
There was a problem hiding this comment.
What do you mean by embedding this in options? Wouldn't it require handling this flag in every backend?
There was a problem hiding this comment.
I meant just append https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/options/audit.go with:
type safeBackend struct {
audit.Backend
}
func (s *safeBackend) ProcessEvents(ev ...*auditinternal.Event) bool {
s.Backend.ProcessEvents(ev...)
return true
}
Since it's only used in that file anyway.
There was a problem hiding this comment.
Geez, you're right, that makes way more sense. Thanks, updated.
There was a problem hiding this comment.
How about the buffered webhook audit? Does this function always return true in this scenario?
There was a problem hiding this comment.
Saw it, return false when the buffer is full.
There was a problem hiding this comment.
Actually, changed to true since buffered backend will not be used when blocking-strict is selected.
There was a problem hiding this comment.
Only reject request on stage StageRequestReceived, It is not consistent ignoring other stags audit failure.
There was a problem hiding this comment.
I doesn't make sense to reject request at later stages (at least for mutating requests), as the request will already have been processed at this point.
There was a problem hiding this comment.
Yes,it has been processed. But also without audit logs of these stages. So what i want to say is this pr does not handle lost audit events completely.
There was a problem hiding this comment.
With my change, an auditor can get a guarantee that if there was no event with RequestReceived stage, there was no change to apiserver state and the caller will have a guarantee that if the request failed, the state wasn't changed.
With what you are suggesting, auditor will still get the guarantee that no event implies no change, but you will take away that guarantee from the caller. When request fails, you don't really know what happened.
One change I think might make sense here would be to reject non-mutating requests upon failures on other stages. WDYT?
There was a problem hiding this comment.
Yeah, this is kind of enhancement whatever .
There was a problem hiding this comment.
After some thought I don't like the idea of rejecting requests inconsistently - with audit policy configured to omit RequestReceived stage this would mean that audit backend issue will block all non-mutating requests, but allow mutating ones.
There was a problem hiding this comment.
@x13n I don't understand your response to this - The code that is here only rejects requests when the RequestReceived event cannot be written. I think this is the correct behavior, but I'm not sure it matches what you said above?
There was a problem hiding this comment.
Apologies for being unclear. Let me try to clarify what I meant above.
I think this check should be applied only at RequestReceived stage. Later on, a failure to audit log anything shouldn't affect apiserver response, because the request was already processed, possibly modifying etcd state. Now, one could argue that there is no harm in rejecting the request if it was non-mutating, but failed to be audit logged e.g. at ResponseCompleted stage. However, this would lead to some weirdness, when RequestReceived stage is omitted and audit backend has problems: mutating requests would ignore audit logging failures and simply work, while read only requests would be rejected because of audit backend problems.
|
/retest |
|
/retest |
|
/assign @deads2k |
tallclair
left a comment
There was a problem hiding this comment.
lgtm, please squash commits
There was a problem hiding this comment.
@x13n I don't understand your response to this - The code that is here only rejects requests when the RequestReceived event cannot be written. I think this is the correct behavior, but I'm not sure it matches what you said above?
|
Thanks! Squashed. |
|
Ping. Is there any additional work needed in this PR? |
|
/lgtm |
There was a problem hiding this comment.
Looks like type safeBackend should be put in package pluginbuffered
There was a problem hiding this comment.
I initially had a separate package for this, but @tallclair convinced me this makes more sense here, as it is quite trivial. Since nothing in safeBackend is pluginbuffered-specific, I'd rather leave it here.
There was a problem hiding this comment.
Good catch, some leftover. Removed now.
|
/retest Review the full test history for this PR. Silence the bot with an |
12 similar comments
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/hold I need to fix some merge conflicts and failing tests... |
|
New changes are detected. LGTM label has been removed. |
|
/retest |
|
@sttts I rebased this and fixed failing tests. PTAL. |
|
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
|
/retest Review the full test history for this PR. Silence the bot with an |
What this PR does / why we need it: Part of #65266
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Special notes for your reviewer:
Release note:
cc @loburm @tallclair @wojtek-t