log,debug: fix logspy, new HTTP API to change vmodule#61378
log,debug: fix logspy, new HTTP API to change vmodule#61378craig[bot] merged 2 commits intocockroachdb:masterfrom
logspy, new HTTP API to change vmodule#61378Conversation
stevendanna
left a comment
There was a problem hiding this comment.
Neat! I left some comments from my initial read through.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @miretskiy)
pkg/util/log/intercept.go, line 59 at r1 (raw file):
type InterceptMessageFn func(message []byte) // interceptorsPtr stores a pointer to the list of currently active interceptors.
Should this comment be on the line below?
pkg/util/log/intercept.go, line 72 at r1 (raw file):
// interceptorID returns the ID for the specified interceptor. func interceptorID(interceptor *InterceptMessageFn) InterceptorID { return InterceptorID(unsafe.Pointer(interceptor))
I will likely expose some ignorance of Go internals here, but if I'm reading the documentation right, comparing the uintptr's works today for heap-allocated objects but isn't guaranteed by the language spec (and I think wouldn't work for stack-allocated objects: https://play.golang.org/p/0_f3MAp96JL).
Not recommending a change, just talking through my understanding. I believe this will work on our current version of Go.
pkg/util/log/intercept.go, line 99 at r1 (raw file):
} if yank == -1 { panic("cannot remove interceptor")
Might be worth a comment on why we want this to be a hard panic vs a no-op (perhaps just returning l)?
pkg/util/log/intercept.go, line 129 at r1 (raw file):
newInterceptorsPtr = &newInterceptors } if atomic.CompareAndSwapPointer(&interceptorsPtr, addr, unsafe.Pointer(newInterceptorsPtr)) {
Question: Would it be worth a comment on why a mutex wouldn't suffice here?
miretskiy
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz, @miretskiy, and @stevendanna)
pkg/util/log/intercept.go, line 72 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
I will likely expose some ignorance of Go internals here, but if I'm reading the documentation right, comparing the uintptr's works today for heap-allocated objects but isn't guaranteed by the language spec (and I think wouldn't work for stack-allocated objects: https://play.golang.org/p/0_f3MAp96JL).
Not recommending a change, just talking through my understanding. I believe this will work on our current version of Go.
It's unlikely you're ignorant in Go -- I'd say I am... I was looking for weak_ptr in Go -- that kinda shows the level of ignorance I have :)
So, this particular code, I was going back and forth, at least 4 or 5 times...
There are few things at play here:
- (and that's the most important) Interceptors should have "id" associated with them so that
I have tried using a pointer itself as an ID, but found it not very satisfying. - To support add/remove as a pointer value, I have to pass an implementation of a (now deleted) LogMessageInterceptor interface. But interface comparisons are tricky. In particular, you could have different
interceptors, but as long as their underlying implementation is the same (i.e. empty struct), then interface value comparison would not work. - I felt (not strongly) that having an interface might be an overkill, where a function would work (note, you cannot compare functions).
So, taking these together, i'm back to my original desire: I need a unique ID for an interceptor.
One way: just maybe use time, or some such, and redo things in a loop when adding interceptor. And maybe, that's a good way to go.
The above works (and maybe it's too smart for its own good) because to add an interceptor, you pass in an InterceptMessageFn (note, not a pointer) by value to the AddInterceptor method. Thereafter, I take the address of that stack allocated parameter, and If I understand correctly, that parameter "escapes to heap" (because I will save it inside my interceptorList). As long as that object is in use, nothing should be stored at that memory address -- so I think taking uintptr address of that value is okay.
But then, maybe I'm entirely wrong...
pkg/util/log/intercept.go, line 129 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Question: Would it be worth a comment on why a mutex wouldn't suffice here?
Having mutex means you have to lock to check if you have interceptors. We don't want to lock every time you log.
knz
left a comment
There was a problem hiding this comment.
I wish to review this more in depth but note that I am on PTO at this moment. So please give me some time to process this.
My initial remark though, in response to the ongoing conversation in the review thread, is that you made this overly complicated. IMHO you could have written type InterceptorID *interceptorSink and that would have made your API simpler. We'll discuss more when I have a chance to scrutinize this further.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @stevendanna)
|
Not sure *interceptorSink would work (at least not as implemented now)....
and the interceptor type must be available outside log package.. Anyway,
have great PTO and we'll discuss when you get back
…On Wed, Mar 3, 2021 at 1:26 PM kena ***@***.***> wrote:
***@***.**** commented on this pull request.
I wish to review this more in depth but note that I am on PTO at this
moment. So please give me some time to process this.
My initial remark though, in response to the ongoing conversation in the
review thread, is that you made this overly complicated. IMHO you could
have written type InterceptorID *interceptorSink and that would have made
your API simpler. We'll discuss more when I have a chance to scrutinize
this further.
*Reviewable
<https://reviewable.io/reviews/cockroachdb/cockroach/61378#-:-MUtKR7r8r2ctwJN_53b:b986aqr>*
status: [image:
|
miretskiy
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
pkg/util/log/intercept.go, line 59 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Should this comment be on the line below?
obsolete.
pkg/util/log/intercept.go, line 72 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
It's unlikely you're ignorant in Go -- I'd say I am... I was looking for weak_ptr in Go -- that kinda shows the level of ignorance I have :)
So, this particular code, I was going back and forth, at least 4 or 5 times...
There are few things at play here:
- (and that's the most important) Interceptors should have "id" associated with them so that
I have tried using a pointer itself as an ID, but found it not very satisfying.- To support add/remove as a pointer value, I have to pass an implementation of a (now deleted) LogMessageInterceptor interface. But interface comparisons are tricky. In particular, you could have different
interceptors, but as long as their underlying implementation is the same (i.e. empty struct), then interface value comparison would not work.- I felt (not strongly) that having an interface might be an overkill, where a function would work (note, you cannot compare functions).
So, taking these together, i'm back to my original desire: I need a unique ID for an interceptor.
One way: just maybe use time, or some such, and redo things in a loop when adding interceptor. And maybe, that's a good way to go.The above works (and maybe it's too smart for its own good) because to add an interceptor, you pass in an InterceptMessageFn (note, not a pointer) by value to the AddInterceptor method. Thereafter, I take the address of that stack allocated parameter, and If I understand correctly, that parameter "escapes to heap" (because I will save it inside my interceptorList). As long as that object is in use, nothing should be stored at that memory address -- so I think taking uintptr address of that value is okay.
But then, maybe I'm entirely wrong...
Alright -- I think it was trying to be too fancy, as pointed out by @stevendanna and @knz
Reworked to something which I think is A LOT simpler.
pkg/util/log/intercept.go, line 99 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Might be worth a comment on why we want this to be a hard panic vs a no-op (perhaps just returning
l)?
Obsolete.
pkg/util/log/intercept.go, line 129 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Having mutex means you have to lock to check if you have interceptors. We don't want to lock every time you log.
This is now obsolete.
knz
left a comment
There was a problem hiding this comment.
Ok I have given this PR a deeper look.
The main contention I have with the current approach is that it incurs the cost of an extra sink on all channels even when there is no interceptor currently defined. This is surprising and arguably undesirable.
In my opinion, we should add a sink only when a test requires interception and remove it when the interception is not needed any more. We should also do 1 interception per sink. So if there are 10 interceptors registered, that would be 10 different sinks.
Meanwhile, I would also like to point out that we do not need to intercept all channels.
The tests that need an interceptor only need to intercept 1 channel.
The logspy feature only needs to intercept the DEV channel.
So the intercept API should take a channel argument and only install an interceptor sink for that channel.
For example I am thinking of the following API:
package log
// RegisterInterceptor registers the given interceptFn on the specificed channel.
// The log entries are formatted using the format specified in desiredFormat
// before they are passed to the intercept fn.
// The interceptor is set up as a sink with no severity filter, and with redaction markers enabled.
//
// The caller is responsible for calling the returned cleanup fn when the interceptor is
// not needed any more, to de-register the sink.
func RegisterInterceptor(ch channels.Channel, desiredFormat string, interceptFn func([]byte)) (cleanup func())
Reviewed 2 of 4 files at r1, 1 of 3 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
miretskiy
left a comment
There was a problem hiding this comment.
"The main contention I have with the current approach is that it incurs the cost of an extra sink on all channels even when there is no interceptor currently defined. This is surprising and arguably undesirable."
I'm not sure the cost is larger in any meaningful way than it is today. We check for atomic variable when calling outputLogEntry; as opposed to this implementation, which, when iterating over configured sinks checks that interceptor is not active (by loading atomic variable) -- are you concerned over the cost of 1 more sink in the loop?
I like the suggestion re adding sink per channel (which makes above moot, btw). Before I do that, I wanted to ask your opinion on the implementation details... loggerT contains the list of sinkInfos -- and presumably, if the intercept is active, that's where I would need to add interceptor sink. However, the loggerT is initialized at the startup and never modified.
I don't love the idea of adding any locks there.
I could use the loggingT rmu lock, which protects channel->*loggerT map, but, the comment on that lock seems to
imply it's a mis-design and should be removed.
Finally, I could change loggerT itself to have an atomic ptr instead of []sinkInfos, and use atomic compare/swap when I want to add/remove interceptor sink(s).
Finally, while I can certainly use func([]byte) as interceptor, and I did so in the earlier incarnations, is it much better
vs using an interface? Interface could be extended -- for example to support redaction for example.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
Very good concern and very good finding!
More specifically, you mean using a proper linked list, with a Note: I have added some slack in our 21.2 release planning to improve this area. So don't feel pressured by time to complete everything in-between other projects; we can make it a proper milestone achievement, for example in April or May.
Yes that seems reasonable - but what interface would be suitable? To make a good choice, we'd need to review existing usage of the interceptor in unit tests to see what is actually needed. |
3c8addd to
40ffffe
Compare
miretskiy
left a comment
There was a problem hiding this comment.
@knz
Just pushed new version which uses atomics to manage sink infos w/out locks.
A bit more complex than before.
Take a look -- let me know what you think. I'm not sure about the Config() method I added
to the interceptor and whether it's a good idea.
I'm almost inclined to use default interceptor configuration that will redact by default
(and make LogMessageInterceptor not configurable). On the other hand, maybe there are uses
for non-redacted interceptors. WDYT?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
40ffffe to
a7ed153
Compare
miretskiy
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
pkg/util/log/intercept.go, line 72 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Alright -- I think it was trying to be too fancy, as pointed out by @stevendanna and @knz
Reworked to something which I think is A LOT simpler.
Done.
miretskiy
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
pkg/util/log/intercept.go, line 72 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Done.
Done.
a7ed153 to
a1c8d3e
Compare
a1c8d3e to
0060aa3
Compare
|
ok so time is coming to resuscitate this work. I'll be doing a new pass of reviews. |
|
Thanks, @knz. Just pushed freshly rebased version (and fixed lint warning). |
|
@knz any updates on this? |
logspy, new HTTP API to change vmodule
|
Hi Yevgueniy, I have rebased your PR on top of #66096, made a few changes and added another commit. Feel free to have a look, and perhaps refresh your local copy of the branch from github. Can you and/or Steven review these changes (specifically, in the light of the new and updated commit messages). Then we can merge this. |
3327c17 to
45b3d7b
Compare
|
Thank you for working on this! I haven't reviewed this in depth, but there is a UX problem here that I don't think has been discussed, and it's the split between vmodule and logspy. It's generally great that we can now logspy without fully turning on the verbose firehose. However, the UX here will make it very hard to debug in practice. It's not great, but we often do end up on a screenshare and ask a customer to open a certain logspy page on their browser. Asking them to also open parallel tabs that set the right vmodule is going to degrade the debugging experience during customer issues. I would really like to avoid that. The separate vmodule endpoint is fine, but it would be a great help if The logspy endpoint is also currently self-documenting (if you call it without parameters, you get one line showing you the parameters). This should remain true, or the link on the debug page in the ui should include all parameters (maybe it already does, as I mentioned, haven't reviewed the code yet). The commit message could specify how overlapping |
65342: multi-region syntax diagrams r=ericharmeling a=ericharmeling - Updated ALTER/CREATE DATABASE diagrams for multi-region - Updated CREATE TABLE diagram for multi-region - Added opt_locality diagram Release note: None 66096: util/log: extend `logpb.Entry` with more fine grained fields r=cameronnunez a=knz Needed for #65633 and #61378. cc @cockroachdb/server 66142: cli: skip under testrace as this test might be timing out r=jlinder a=adityamaru Release note: None Co-authored-by: Eric Harmeling <eric.harmeling@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Aditya Maru <adityamaru@gmail.com>
1bcc587 to
4f7769d
Compare
knz
left a comment
There was a problem hiding this comment.
it would be a great help if vmodule could be specified within a logspy invocation as well
Good idea. Done. (The way you suggested.)
The logspy endpoint is also currently self-documenting (if you call it without parameters, you get one line showing you the parameters)
It shows you a representation of the Go struct, not the HTTP query parameters. I don't think that's a big deal, because...
the link on the debug page in the ui should include all parameters (maybe it already does)
it does.
The commit message could specify how overlapping vmodule requests are handled.
Done:
Note: at most one in-flight HTTP API request is allowed to modify the
vmodule parameter. This maintain the invariant that the
configuration restored at the end of each request is the same as when
the request started.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
Please make sure that the logspy link in the UI, or otherwise the logspy default, is to enable all the log messages. I've enjoyed the fact that logspy used to grep through everything; the less I need to tell anyone to type, and the less confusion about the parameters, the better. |
|
Was just testing this out, the debug page prints:
I think this should also list Also, the actual link is too vanilla: http://td:8080/debug/logspy?count=1&duration=10s&grep=. Could all the parameters be populated in the link? That way, it's easier to click through and adjust rather than always having to remember the description. I think a sane default would be Overall this is great though! Very happy about this. |
Prior to this patch, the logspy mechanism was utterly broken: any time it was running, it would cut off any and all log entries from going to files, stderr, network sinks etc. This was a gross violation of the guarantees we have constructed around structured logging, as well as a security vulnerability (see release note below). Additionally, it was impossible to launch multiple logspy sessions simultaneously on the same node, for example using different grep filters. This commit rectifies the situation. At a technical level, we have two changes: one in the logging Go API and one in the `/debug/logspy` HTTP API. **For the logging changes**, this patch replaces the "interceptor" singleton callback function that takes over `(*loggerT).outputLogEntry()`, by a new *interceptor sink* that lives alongside the other sinks on every channel. Because the interceptor logic is now a regular sink, log entries continue to be delivered to other sinks while an interceptor is active. Reusing the sink abstraction, with its own sink configuration with no severity filter, clarifies that the interceptor accepts all the log entries regardless of which filters are configured on other sinks. Additionally, the interceptor sink now supports multiple concurrent interception sessions. Each log entry is delivered to all current interceptors. The `logspy` logic is adapted to use this facility, so that multiple `logspy` requests can run side-by-side. **For the HTTP API change**, we are changing the `/debug/logspy` semantics. This is explained in the release note below. Release note (security update): All the logging output to files or network sinks was previously disabled temporarily while an operator was using the `/debug/logspy` HTTP API, resulting in lost entries and a breach of auditability guarantees. This behavior has been corrected. Release note (bug fix): Log entries are not lost any more while the `/debug/logspy` HTTP API is being used. This bug had existed since CockroachDB v1.1. Release note (api change): The `/debug/logspy` HTTP API has changed. The endpoint now returns JSON data by default. This change is motivated as follows: - the previous format, `crdb-v1`, cannot be parsed reliably. - using JSON entries guarantees that the text of each entry all fits on a single line of output (newline characters inside the messages are escaped). This makes filtering easier and more reliable. - using JSON enables the user to apply `jq` on the output, for example via `curl -s .../debug/logspy | jq ...` If the previous format is desired, the user can pass the query argument `&flatten=1` to the `logspy` URL to obtain the previous flat text format (`crdb-v1`) instead. Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Prior to this patch, any ongoing `/debug/logspy` query would trigger maximum verbosity in the logging package - i.e. cause all logging API calls under `log.V` to be activated. This was problematic, as it would cause a large amount of logging traffic to be pumped through the interceptor logic, increasing the chance for entries to be dropped (especially when `logspy?grep=...` is not used). Additionally, since the previous change to make the interceptor logic a regular sink, all the entries captured by the interceptors are now also duplicated to the other sinks (this is a feature / bug fix, as explained in the previous commit message). However, this change also meant that any ongoing `logspy` request would cause maximum verbosity to all the sinks with threshold INFO (most logging calls under `log.V` have severity INFO). For example, the DEV channel accepts all entries at level INFO or higher in the default config. This in turn could incur unacceptable disk space or IOPS consumption in certain deployments. In orde to mitigate this new problem, this patch removes the special conditional from the logging package. From this point forward, the verbosity of the entries delivered via `/debug/logspy` are those configured via `vmodule` -- no more and no less. To make this configurable for a running server, including one where the SQL endpoint may not be available yet, this patch also introduces a new `/debug/vmodule` HTTP API and extends `/debug/logspy` with the ability to temporarily change `vmodule` *upon explicit request*. Release note (api change): The `/debug/logspy` API does not any more enable maximum logging verbosity automatically. To change the verbosity, use the new `/debug/vmodule` endpoint or pass the `&vmodule=` query parameter to the `/debug/logspy` endpoint. For example, suppose you wish to run a 20s logspy session: - Before: ``` curl 'https://.../debug/logspy?duration=20s&...' ``` - Now: ``` curl 'https://.../debug/logspy?duration=20s&vmodule=...' ``` OR ``` curl 'https://.../debug/vmodule?duration=22s&vmodule=...' curl 'https://.../debug/logspy?duration=20s' ``` As for the regular `vmodule` command-line flag, the maximum verbosity across all the source code can be selected with the pattern `*=4`. Note: at most one in-flight HTTP API request is allowed to modify the `vmodule` parameter. This maintain the invariant that the configuration restored at the end of each request is the same as when the request started. Release note (api change): The new `/debug/vmodule` API makes it possible for an operator to configure the logging verbosity in a similar way as the SQL built-in function `crdb_internal.set_vmodule()`, or to query the current configuration as in `crdb_internal.get_vmodule()`. Additionally, any configuration change performed via this API can be automatically reverted after a configurable delay. The API forms are: - `/debug/vmodule` - retrieve the current configuration. - `/debug/vmodule?set=[vmodule config]&duration=[duration]` - change the configuration to `[vmodule config]` . The previous configuration at the time the `/debug/vmodule` request started is restored after `[duration]`. This duration, if not specified, defaults to twice the default duration of a `logspy` request (currently, the `logspy` default duration is 5s, so the `vmodule` default duration is 10s). If the duration is zero or negative, the previous configuration is never restored.
knz
left a comment
There was a problem hiding this comment.
Was just testing this out, the debug page prints:
/debug/logspy?count=[count]&duration=[duration]&grep=[regexp]&vmodule=[vmodule]
I think this should also list &flatten=[bool]
Done.
Also, the actual link is too vanilla: http://td:8080/debug/logspy?count=1&duration=10s&grep=.
Could all the parameters be populated in the link? That way, it's easier to click through and adjust rather than always having to remember the description.
Done.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)
|
TFYRs! bors r=stevendanna,tbg,andreimatei |
|
Build succeeded: |
Fixes #61335.
First 3 commits from #66096.
tldr: the logspy endpoint does not enable maximum verbosity any more, and now returns JSON entries. Use the new vmodule endpoint to configure verbosity if needed.
See the commit messages of the last 2 commits for details and rationales.
Relevant user-facing changes:
Release note (security update): All the logging output to files
or network sinks was previously disabled temporarily while an operator
was using the
/debug/logspyHTTP API, resulting in lost entriesand a breach of auditability guarantees. This behavior has been corrected.
Release note (bug fix): Log entries are not lost any more while the
/debug/logspyHTTP API is being used. This bug had existed sinceCockroachDB v1.1.
Release note (api change): The
/debug/logspyHTTP API has changed.The endpoint now returns JSON data by default.
This change is motivated as follows:
crdb-v1, cannot be parsed reliably.all fits on a single line of output (newline characters
inside the messages are escaped). This makes filtering
easier and more reliable.
jqon the output, forexample via
curl -s .../debug/logspy | jq ...If the previous format is desired, the user can pass the query
argument
&flatten=1to thelogspyURL to obtain the previous flattext format (
crdb-v1) instead.Release note (api change): The
/debug/logspyAPI does not any moreenable maximum logging verbosity automatically. To change the
verbosity, use the new
/debug/vmoduleendpoint or pass the&vmodule=query parameter to the/debug/logspyendpoint.For example, suppose you wish to run a 20s logspy session:
Before:
Now:
OR
As for the regular
vmodulecommand-line flag, the maximum verbosityacross all the source code can be selected with the pattern
*=4.Note: at most one in-flight HTTP API request is allowed to modify the
vmoduleparameter. This maintain the invariant that theconfiguration restored at the end of each request is the same as when
the request started.
Release note (api change): The new
/debug/vmoduleAPI makes itpossible for an operator to configure the logging verbosity in a
similar way as the SQL built-in function
crdb_internal.set_vmodule(), or to query the current configurationas in
crdb_internal.get_vmodule(). Additionally, any configurationchange performed via this API can be automatically reverted after a
configurable delay. The API forms are:
/debug/vmodule- retrieve the current configuration./debug/vmodule?set=[vmodule config]&duration=[duration]- changethe configuration to
[vmodule config]. The previous configurationat the time the
/debug/vmodulerequest started is restored after[duration]. This duration, if not specified, defaults to twice thedefault duration of a
logspyrequest (currently, thelogspydefault duration is 5s, so the
vmoduledefault duration is 10s).If the duration is zero or negative, the previous configuration
is never restored.