[stability] sql: add special tracing to trap #7881#8681
[stability] sql: add special tracing to trap #7881#8681andreimatei merged 4 commits intocockroachdb:masterfrom
Conversation
|
Reviewed 4 of 4 files at r1, 1 of 1 files at r2. internal/client/txn.go, line 103 [r1] (raw file):
sql/executor.go, line 520 [r3] (raw file):
Add sql/executor.go, line 544 [r3] (raw file):
sql/executor.go, line 549 [r3] (raw file):
Perhaps hide in a method: sql/session.go, line 162 [r1] (raw file):
sql/session.go, line 253 [r1] (raw file):
Might be worth optimizing the common case where the transaction is only executed once: sql/session.go, line 55 [r3] (raw file):
This seems overly specific to 7881. Perhaps sql/session.go, line 245 [r3] (raw file):
Are you not using a snowball trace because of the greater expense? util/log/structured.go, line 63 [r2] (raw file):
Not sure if we want this. It isn't too hard to figure out which line generated a trace statement. Grep is your friend. @RaduBerinde thoughts? util/tracing/tracer.go, line 94 [r3] (raw file):
Seems like this belongs in Comments from Reviewable |
|
Do we want to retain some of this infrastructure beyond 7881? If so we should change to more generic names I don't understand one thing - is the intention that we would collect traces from other nodes involved in the txn and dump those too? Because that would only happen if we enable snowball tracing for that txn no? (or am I missing something?) Review status: 1 of 5 files reviewed at latest revision, 12 unresolved discussions, all commit checks successful. sql/session.go, line 128 [r1] (raw file):
Seems odd that we check sql/session.go, line 164 [r1] (raw file):
The comment is deceiving, "is used to init the contexts for every SQL txn" means it is some kind of generic context that is a parent for txn contexts, but from the code it seems it just is the transaction context.. I would rename to util/log/structured.go, line 63 [r2] (raw file):
|
b1bd1b3 to
b8f25cc
Compare
|
(Thanks guys! Please take another look.) Well there's no "infrastructure" here to write home about. You know I'm a fan of tracing in principle, and we should have a way to sample everything and be able to get a hold of the current trace. But I don't think this is it, by creating a new tracer per request in a random place and accumulating the trace state in a random object. We should all sit down with Tobi too and have a heart to heart. I didn't enable snowballing because, according to some comment by Peter, that's the expensive part of "sql tracing", so I didn't want to enable if for everything (and there's no way to enable it per trace). Plus frankly I don't think it'd be very useful for finding this bug. @RaduBerinde I've made a change in Review status: 0 of 7 files reviewed at latest revision, 12 unresolved discussions, some commit checks pending. internal/client/txn.go, line 103 [r1] (raw file):
|
|
Sure there is, Snowballing is enabled in a span (and its future children) by setting the snowball baggage item. In short, if you set that baggage item, requests that go out will return populated Line 793 in a635041 Review status: 0 of 7 files reviewed at latest revision, 12 unresolved discussions, all commit checks successful. Comments from Reviewable |
|
oh I see, ok. But anyway, by the time I figure out a trace is interesting, it's too late for future snowballing to be useful. Review status: 0 of 7 files reviewed at latest revision, 12 unresolved discussions, all commit checks successful. Comments from Reviewable |
|
Ah, gotcha Review status: 0 of 7 files reviewed at latest revision, 12 unresolved discussions, all commit checks successful. Comments from Reviewable |
|
Review status: 0 of 7 files reviewed at latest revision, 12 unresolved discussions, all commit checks successful. server/admin.go, line 172 [r6] (raw file):
Nice, yeah this is in principle what we need to do - we need to "merge" information from an incoming context with information that is relevant to where the request got in now. We will eventually do the same with log tags and whatever else we store in context, but there is no infrastructure for that yet. Maybe add a Comments from Reviewable |
|
Reviewed 7 of 7 files at r4, 1 of 1 files at r5, 1 of 4 files at r6. server/admin.go, line 172 [r6] (raw file):
|
|
Review status: 4 of 7 files reviewed at latest revision, 14 unresolved discussions, all commit checks successful. server/admin.go, line 172 [r6] (raw file):
|
|
Review status: 4 of 7 files reviewed at latest revision, 14 unresolved discussions, all commit checks successful. server/admin.go, line 172 [r6] (raw file):
|
b8f25cc to
23b042b
Compare
|
Review status: 1 of 7 files reviewed at latest revision, 14 unresolved discussions, some commit checks pending. server/admin.go, line 172 [r6] (raw file):
|
|
Reviewed 1 of 6 files at r7, 1 of 1 files at r8, 3 of 4 files at r9. sql/executor.go, line 287 [r4] (raw file):
|
|
Review status: 6 of 7 files reviewed at latest revision, 13 unresolved discussions, some commit checks failed. sql/executor.go, line 287 [r4] (raw file):
|
|
Review status: 6 of 7 files reviewed at latest revision, 13 unresolved discussions, some commit checks failed. sql/executor.go, line 287 [r4] (raw file):
|
|
Review status: 6 of 7 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed. util/log/structured.go, line 63 [r2] (raw file):
|
sql/executor.go
Outdated
| var err error | ||
| results, remainingStmts, err = runTxnAttempt(e, planMaker, origState, txnState, opt, stmtsToExec) | ||
|
|
||
| // Until #7881 fixed. |
b185701 to
abf2773
Compare
|
Reviewed 1 of 7 files at r4, 6 of 6 files at r7, 7 of 7 files at r10, 1 of 1 files at r11, 5 of 5 files at r13, 4 of 4 files at r14. internal/client/txn.go, line 449 [r12] (raw file):
s/Txn/transaction/ internal/client/txn.go, line 560 [r12] (raw file):
TODO internal/client/txn.go, line 569 [r12] (raw file):
TODO sql/executor.go, line 1069 [r7] (raw file):
This would always happen before sql/executor.go, line 527 [r12] (raw file):
TODO sql/executor.go, line 551 [r12] (raw file):
Is this log message above part of 7881? Because it looks pretty messy (I guess that's ok for now though if it helps). sql/session.go, line 162 [r1] (raw file):
|
|
This isn't clear from the PR description, but what this PR accomplishes is actually quite useful: It'll allow us to trace complete SQL transactions in lightstep (where now the parts of the transaction which go through TxnCoordSender aren't part of a common root span, and thus aren't related to each other). This should be useful for getting a better idea of where transactions are spending their time, and how much they retry, and where. |
abf2773 to
6c7774d
Compare
|
edited the PR description Review status: all files reviewed at latest revision, 20 unresolved discussions, all commit checks successful. internal/client/txn.go, line 449 [r12] (raw file):
|
|
Reviewed 6 of 6 files at r15, 1 of 1 files at r16. sql/session.go, line 67 [r7] (raw file):
|
|
Review status: 2 of 8 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed. sql/executor.go, line 520 [r3] (raw file):
|
|
Reviewed 2 of 2 files at r17, 4 of 4 files at r18. internal/client/txn.go, line 569 [r18] (raw file):
use %v for the error as it can be nil. sql/executor.go, line 551 [r12] (raw file):
|
6c7774d to
db50ac0
Compare
|
TFTR guys! Review status: 2 of 8 files reviewed at latest revision, 12 unresolved discussions. internal/client/txn.go, line 569 [r18] (raw file):
|
|
Review status: 2 of 8 files reviewed at latest revision, 12 unresolved discussions, some commit checks pending. sql/executor.go, line 520 [r3] (raw file):
|
|
Reviewed 6 of 6 files at r19, 2 of 2 files at r21, 4 of 4 files at r22. internal/client/txn.go, line 569 [r18] (raw file):
|
The TxnState matches the lifetime of a SQL txn, whereas the client.Txn goes away when the SQL txn goes, for example, to the Aborted state. Plus, the SQL executor has direct access to a txnState, so it's natural for it to use the TxnState's context for log messages, and we want these messages aassocited with the "transaction's trace". In fact, the TxnState was already the the holder of a txn's root span, so this is in effect unifying the owner of the span and of all the collected children spans. This commit also makes all SQL transactions be executed with a root span.
At the very least, we have to close the transaction's root span. We also do more cleanup and attempt to rollback the txn. This is nice because it lets the rollback attempt be done in the context of the txn. It supersedes the aborting that was being done by the TxnCoordSender through the cancellation of the session's context with interrupted the txn's heartbeat loop.
This only affects messages coming from log.Infof(). Messages going directly to log.Trace(), or messages going through log.VTrace() but with the verbosity turned off are not affected and will continue to not have file/line.
db50ac0 to
321d5a1
Compare
I've decided to go medieval and invest in tracing all SQL execution in the hope that cockroachdb#7881 will be triggered again. This builds on the prev commit that added the ability to collect spans across the executor and a client.Txn. The tracing is gated by an env var. When turned on, every SQL txn creates a new tracer that accumulates spans in the session's txnState (this is similar to how "SQL tracing" currently works). When we detect 7881, we mark the root span of the current one of these traces for "sampling", which means that later, when that span (and hence the trace) is closed, we dump the trace with all the log messages in it (note that currently this trace only has one span, since we're not very good at starting child spans yet).
321d5a1 to
4659e38
Compare
|
Review status: 2 of 8 files reviewed at latest revision, 8 unresolved discussions. internal/client/txn.go, line 569 [r18] (raw file):
|
|
Reviewed 6 of 6 files at r23, 2 of 2 files at r25, 4 of 4 files at r26. Comments from Reviewable |
sql: create root spans for transactions
Also:
I've decided to go medieval and invest in tracing all SQL execution in
the hope that #7881 will be triggered again.
The tracing is gated by an env var. When turned on, every SQL txn
creates a new tracer that accumulates spans in the session's txnState
(this is similar to how "SQL tracing" currently works). When we detect
7881, we mark the root span of the current one of these traces for
"sampling", which means that later, when that span (and hence the trace)
is closed, we dump the trace with all the log messages in it (note that
currently this trace only has one span, since we're not very good at
starting child spans yet).
cc @knz, @cockroachdb/stability
This change is