Skip to content

sql: enhance the handling of internal errors, add sentry reporting#35820

Merged
craig[bot] merged 7 commits intocockroachdb:masterfrom
knz:20190315-int-errs
Mar 18, 2019
Merged

sql: enhance the handling of internal errors, add sentry reporting#35820
craig[bot] merged 7 commits intocockroachdb:masterfrom
knz:20190315-int-errs

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Mar 16, 2019

Subsumes #35826.
Subsumes #35823.

This PR achieves the following:

  • (jordan) removes the stack trace from the telemetry counters
  • (roadmap) encloses a stack trace in pgerror.Error objects
  • (roadmap) reports the details of pgerrors with CodeInternalError to sentry
  • (radu) shows these details when an internal error is encountered in logic tests
  • (radu) shows these details when a pgerror is printed somewhere else with printf("%+v", err)
  • (bug fix) fixes the crash reporting enable flag (detection was broken)

@knz knz requested review from a team, RaduBerinde and jordanlewis March 16, 2019 12:22
@knz knz requested a review from a team as a code owner March 16, 2019 12:22
@knz knz requested review from a team March 16, 2019 12:22
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@knz knz force-pushed the 20190315-int-errs branch from eb7ea5f to d6370a9 Compare March 16, 2019 12:24
@knz knz force-pushed the 20190315-int-errs branch from 9426d64 to 076ea94 Compare March 16, 2019 13:07
knz added a commit to knz/cockroach that referenced this pull request Mar 16, 2019
Before:

```
testdata/logic_test/assertion_error:4:
expected success, but found
crdb_internal.force_assertion_error(): internal error: woo
```

After (for example, after cockroachdb#35820 is separately applied):

```
testdata/logic_test/assertion_error:4:
expected success, but found
(XX000) crdb_internal.force_assertion_error(): internal error: woo
sql/sem/builtins/builtins.go:2815: in func211()
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/sql/sem/builtins/builtins.go:2815: in func211()
github.com/cockroachdb/cockroach/pkg/sql/sem/tree/eval.go:3651: in Eval()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/expr.go:196: in eval()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:375: in ProcessRow()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:778: in ProcessRowHelper()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/values.go:126: in Next()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/base.go:174: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:801: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/flow.go:627: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:252: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:839: in PlanAndRun()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1112: in execWithDistSQLEngine()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:948: in dispatchToExecutionEngine()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:456: in execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:102: in execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1178: in run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:429: in ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:337: in func4()

NOTE: internal errors may have more details in log. Use -show-logs.
```

Release note: None
@knz knz force-pushed the 20190315-int-errs branch from 076ea94 to e0f9b64 Compare March 16, 2019 13:14
@knz knz force-pushed the 20190315-int-errs branch from e0f9b64 to 4fac91d Compare March 16, 2019 13:36
craig bot pushed a commit that referenced this pull request Mar 16, 2019
35827: sql/logictest: reveal error details if present r=knz a=knz

Before:

```
testdata/logic_test/assertion_error:4:
expected success, but found
crdb_internal.force_assertion_error(): internal error: woo
```

After (for example, after #35820 is separately applied):

```
testdata/logic_test/assertion_error:4:
expected success, but found
(XX000) crdb_internal.force_assertion_error(): internal error: woo
sql/sem/builtins/builtins.go:2815: in func211()
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/sql/sem/builtins/builtins.go:2815: in func211()
github.com/cockroachdb/cockroach/pkg/sql/sem/tree/eval.go:3651: in Eval()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/expr.go:196: in eval()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:375: in ProcessRow()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:778: in ProcessRowHelper()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/values.go:126: in Next()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/base.go:174: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:801: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/flow.go:627: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:252: in Run()
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:839: in PlanAndRun()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1112: in execWithDistSQLEngine()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:948: in dispatchToExecutionEngine()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:456: in execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:102: in execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1178: in run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:429: in ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:337: in func4()

NOTE: internal errors may have more details in log. Use -show-logs.
```

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
craig bot pushed a commit that referenced this pull request Mar 16, 2019
35822: sql: add a missing comment r=knz a=knz

This goes alongside #35820.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
... like it had always been intended.

Release note: None
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Very nice work, thank you! This gets us all the benefits we had with panics but without actually crashing!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @knz, and @RaduBerinde)


pkg/sql/pgwire/pgerror/errors.go, line 295 at r5 (raw file):

}

// Format implements the fmt.Formatter interface.

[nit] Add to this comment and explain the three variations v, +v, s. It's important because it's not common for %sand%v` to be different.


pkg/sql/pgwire/pgerror/errors.go, line 299 at r5 (raw file):

	switch verb {
	case 'v':
		switch {

[nit] switch with a single case should be if. Or just put them all in a big switch with

case verb == 'v' && s.Flag('+'):
case verb == 'v':
case verb == 's':

pkg/sql/pgwire/pgerror/errors.go, line 318 at r5 (raw file):

	case 's':
		fmt.Fprintf(s, "%s", pg.Message)
	}

What is this supposed to return if the verb is bad? Unfortunately this interface is poorly documented in fmt.


pkg/sql/pgwire/pgerror/errors.proto, line 50 at r5 (raw file):

  }

  // complement to the detail field that can be reported

[nit] explicitly mention that all info is scrubbed

@knz knz force-pushed the 20190315-int-errs branch from 4fac91d to 19a5391 Compare March 17, 2019 00:46
knz added 4 commits March 17, 2019 10:11
The `depth` argument wasn't incremented properly, causing one level to
be lost.

Release note: None
We wish to embed stack traces in Sentry reports for internal errors.
However, since internal errors flow through RPCs as protobufs and
raven.Stacktrace objects (used for Sentry reports) do not have a
protobuf encoding, we want another serialization format.

This patch provides this by means of an encapsulation of
`raven.Stacktrace` as `log.StackTrace`, with working `Encode`/`Decode`
methods.

Release note: None
This patch splits `log.SendCrashReport` into the following pieces:

- `log.ShouldSendReport` tests whether a report should be assembled+sent;
- `log.SendReport` sends a report from a message and zero or more
  reportalbe objects.
- `log.ReportableObject` aliases `raven.Interface` and is instantiated by:
  - `log.NewReportableMessage()` for simple strings.
  - `log.NewException()` for string + stack trace.

Release note: None
@knz knz force-pushed the 20190315-int-errs branch 2 times, most recently from e5c55f1 to 7810506 Compare March 17, 2019 11:02
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/pgwire/pgerror/errors.go, line 295 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] Add to this comment and explain the three variations v, +v, s. It's important because it's not common for %sand%v` to be different.

I changed it back - it was breaking otherwise.


pkg/sql/pgwire/pgerror/errors.go, line 299 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] switch with a single case should be if. Or just put them all in a big switch with

case verb == 'v' && s.Flag('+'):
case verb == 'v':
case verb == 's':

Done.


pkg/sql/pgwire/pgerror/errors.go, line 318 at r5 (raw file):

Previously, RaduBerinde wrote…

What is this supposed to return if the verb is bad? Unfortunately this interface is poorly documented in fmt.

Then nothing is printed. That's the way the errors package does it, so I'm not asking further questions.


pkg/sql/pgwire/pgerror/errors.proto, line 50 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] explicitly mention that all info is scrubbed

Done.

@RaduBerinde
Copy link
Copy Markdown
Member


pkg/sql/pgwire/pgerror/errors.go, line 295 at r5 (raw file):

Previously, knz (kena) wrote…

I changed it back - it was breaking otherwise.

Ok, great. I actually hit this in a test where I had to change %s to %v which felt strange.

Still worth a comment mentioning that %s / %v show the error message and %+v shows details as well (+v is not particularly well documented in fmt, it just says "when printing structs, the plus flag (%+v) adds field names").

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/pgwire/pgerror/errors.go, line 295 at r5 (raw file):

Previously, RaduBerinde wrote…

Ok, great. I actually hit this in a test where I had to change %s to %v which felt strange.

Still worth a comment mentioning that %s / %v show the error message and %+v shows details as well (+v is not particularly well documented in fmt, it just says "when printing structs, the plus flag (%+v) adds field names").

Added the comment.

@knz knz force-pushed the 20190315-int-errs branch from 7810506 to b72f746 Compare March 17, 2019 11:59
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Mar 17, 2019

This PR is more or less complete, but I want to get a green build on #35821 to verify that the API changes here are good.

@knz knz force-pushed the 20190315-int-errs branch 2 times, most recently from e87789c to 2779d6f Compare March 17, 2019 22:50
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Mar 17, 2019

It was a good idea to iterate on the other PR before merging this one. I discovered that the pgerror.Wrap functions were not well defined:

  1. they should return nil if given a nil err as input (that's how errors.Wrap is used in many places)
  2. because they must return a nil interface reference, they can't return *Error -- a nil *Error is turned into a non-nil error, and that breaks things badly.

Adjusted the PR accordingly, with an explanatory comment.

@knz knz force-pushed the 20190315-int-errs branch from 2779d6f to b9d9e67 Compare March 18, 2019 00:23
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Mar 18, 2019

Another "oops" caught by the other PR: the result message for
pgerror.Wrap(errors.Wrap(pgerror.NewError("x", "a"), "b"), "c")

should be c: b: a, not c: a.

Fixed (also added a test).

@knz knz force-pushed the 20190315-int-errs branch from b9d9e67 to ccb7ead Compare March 18, 2019 00:34
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Mar 18, 2019

Yet another "oops" caught in tests:

pgerror.Wrapf(err, CodeDataExceptionError, ...) is meant to add the pg code if err didn't have one already.

However if err is a non-pg retry error, we want the retry error code to prevail over the one provided to Wrapf.

I modified the logic accordingly to preserve retry errors + ambiguity errors.

knz added 2 commits March 18, 2019 01:55
This patch achieves the following:

- it extends the pgerror.Error object with a new field `SafeDetail`,
  aimed to contain non-PII details and thuis suitable for inclusion in
  reporting.

- it populates the scrubbed format + safe
  arguments of error messages in `SafeDetail` using the same logic as
  `log.ReportablesToSafeError`.

- it captures the stack trace in `SafeDetail` but also `Detail` so
  that a human user seeing the error can report the stack trace with a
  manually created issue.

- it extends `pgerror.Wrapf()` so as to combine details in an existing
  error with the details of the point that calls `pgerror.Wrapf()`.

- it provides a custom `fmt.Formatter` for `pgerror.Error`, so that:

  - `%#v` will now print the pg error code in addition to the message;
  - `%+v` will now print the source, code, message and stored `SafeDetail`.

Release note: None
This patch ensures that a sentry report is created when a pg error
with CodeInternalError flows out of the execution layers.

Release note: None
@knz knz force-pushed the 20190315-int-errs branch from ccb7ead to a6ef12c Compare March 18, 2019 00:55
craig bot pushed a commit that referenced this pull request Mar 18, 2019
35821: sql,*: error handling improvements r=knz a=knz

Subsumes #35825.
Subsumes #35820.

This PR replaces uses of `errors.New`, `errors.Errorf` and `errors.Wrap` by their `pgerror` equivalents. It also embeds the subsumed PRs; see those PR's discussion for details.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@craig craig bot merged commit a6ef12c into cockroachdb:master Mar 18, 2019
@knz knz deleted the 20190315-int-errs branch March 18, 2019 07:10
@RaduBerinde
Copy link
Copy Markdown
Member


pkg/sql/pgwire/pgerror/wrap.go, line 82 at r12 (raw file):

		// We'll want to report what we can from the provided context.
		st := log.NewStackTrace(depth + 1)

If the error already was CodeInternalError and had a stack trace, do we need to embed another stack trace every time we wrap it?

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Mar 18, 2019

If the error already was CodeInternalError and had a stack trace, do we need to embed another stack trace every time we wrap it?

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Mar 18, 2019

If the error already was CodeInternalError and had a stack trace, do we need to embed another stack trace every time we wrap it?

As I was debugging a couple of things I discovered that error objects do not simply flow on return paths, they also move horizontally in code (i.e. passed "down" to function calls) and also across the network (distsql).

capturing the wrap stack traces gives a better idea of the paths taken through the code by an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants