Skip to content

cli: command to deserialize descriptors#52972

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
spaskob:unmarshal
Aug 24, 2020
Merged

cli: command to deserialize descriptors#52972
craig[bot] merged 1 commit intocockroachdb:masterfrom
spaskob:unmarshal

Conversation

@spaskob
Copy link
Copy Markdown
Contributor

@spaskob spaskob commented Aug 18, 2020

We currently export the descriptors and job payloads in debug zips with
hex encoding. It is often very valuable to decode these protos into
JSON for inspection.

Fixes #52063.

Release note (cli change):
The new debug command decode-proto reads descriptor from stdin in hex
or base64 format (auto-detected) and a flag --schema=<fully qualified name to decode>
with default value cockroach.sql.sqlbase.Descriptor and outputs to stdout the
deserialized proto in JSON format.

@spaskob spaskob requested a review from knz August 18, 2020 07:53
@spaskob spaskob requested a review from a team as a code owner August 18, 2020 07:53
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@spaskob spaskob requested a review from ajwerner August 18, 2020 07:54
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good! But this will need a test before it merges.

Also from a UX perspective, I don't think that copy-pasting long proto encodings on the command line is the way to go.

First of all, the typical UX will be a bunch of codes inside a file (e.g. a dump of system.descriptors in TSV format)
also some shells do not like argument lists that are longer than X characters.

So I think I would find it more convenient to use if the tool were to read strings from its standard input.
Ideally, it would scan through standard input, reading one string of input at a time (with any kind of junk in-between), and spit out the decoded value on standard output. This way I could redirect the entire TSV dump of system.descriptors to it and it would ignore the columns that don't contain proto-encoded data automatically.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @knz)

Copy link
Copy Markdown
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this makes sense - PTAL and let me know if this is what you were looking for.
After confirmation I will add the tests.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @spaskob)


pkg/cli/debug.go, line 479 at r1 (raw file):

	Args: cobra.ArbitraryArgs,
	RunE: func(cmd *cobra.Command, args []string) error {
		sc := bufio.NewScanner(os.Stdin)

Recommendations:

  1. pull this entire function into a separate funcon runDebugDecodeProto

  2. then also pull the scanner/scan logic into a separate function that takes an io.Reader as argument -- this will facilitate testing

  3. finally pull the single code decoding logic also into a separate function

So you'd have:

func runDebugDecodeProto(_ *cobra.Command, _ []string) error { 
    return streamMap(os.Stdout, os.Stdin, 
               func (s string) (bool, string, error) { return tryDecodeValue(s, debugDecodeProtoName) })
}

// streamMap applies `fn` to all the scanned fields in `in`, and reports
// the result of `fn` on `out`.
// Errors returned by `fn` are emitted on `out` with a "warning" prefix.
func streamMap(out os.Writer, in os.Reader, fn func(string) (bool, string, error)) error {
  for sc := NewScanner(); sc.Scan(); {
     for _, field := range strings.Fields(sc.Text()) {
         ok, value, err := fn(field, debugDecodeProtoName)
         if err != nil {
               fmt.Fprintln(out, "warning:  %v", err)
         } else if ok {
               fmt.Fprintln(out, value)
         }
     }
  }
}

// tryDecodeValue tries to decode the given string with the given proto name
// reports ok=false if the data was not valid proto-encoded.
func tryDecodeValue(s, protoName string) (ok bool, val string, err error) {
				bytes, err := gohex.DecodeString(s)
				if err != nil {
					b, err := base64.StdEncoding.DecodeString(field)
					if err != nil {
                                                return false, "", nil
					}
					bytes = b
				}
				msg, err := protoreflect.DecodeMessage(protoName, bytes)
				if err != nil {
                                         return false, "", nil
				}
				j, err := protoreflect.MessageToJSON(msg)
				if err != nil {
                                        // Unexpected error: the data was valid protobuf, but does not
                                        // reflect back to JSON. We report the protobuf struct in the
                                       // error message nonetheless.
					return false, "", errors.Wrapf(err, "while JSON-encoding %#v", msg)
				}
				return true, j.String(), nil
}

Then you can have separate unit tests for streamMap and tryDecodeValue.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @spaskob)


pkg/cli/debug.go, line 478 at r1 (raw file):

`,
	Args: cobra.ArbitraryArgs,
	RunE: func(cmd *cobra.Command, args []string) error {

Also, separately, a tip: when a command-line utility reads from stdin, and a user doesn't know this upfront, and they run it mistakenly interactively from their terminal, then it looks like the command is "hanging" - the user wants "something" to happen but nothing happens.

The idiom to solve this UX pitfall is to inform the user that the command is not interactive when they expect it to be. This information message is only useful when a human user can see it, so there is no need to print it if the command is not run interactively. You can check this with isatty.IsTerminal() (which we already use in other places. For example:

if isatty.IsTerminal(os.Stdin.Fd()) {
   fmt.Fprintln(stderr, "# Reading proto-encoded pieces of data from stdin.\n# Press Ctrl+C or Ctrl+D to terminate.")
}

Copy link
Copy Markdown
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL - refactor and tests done

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @knz)


pkg/cli/debug.go, line 478 at r1 (raw file):

Previously, knz (kena) wrote…

Also, separately, a tip: when a command-line utility reads from stdin, and a user doesn't know this upfront, and they run it mistakenly interactively from their terminal, then it looks like the command is "hanging" - the user wants "something" to happen but nothing happens.

The idiom to solve this UX pitfall is to inform the user that the command is not interactive when they expect it to be. This information message is only useful when a human user can see it, so there is no need to print it if the command is not run interactively. You can check this with isatty.IsTerminal() (which we already use in other places. For example:

if isatty.IsTerminal(os.Stdin.Fd()) {
   fmt.Fprintln(stderr, "# Reading proto-encoded pieces of data from stdin.\n# Press Ctrl+C or Ctrl+D to terminate.")
}

Done.


pkg/cli/debug.go, line 479 at r1 (raw file):

Previously, knz (kena) wrote…

Recommendations:

  1. pull this entire function into a separate funcon runDebugDecodeProto

  2. then also pull the scanner/scan logic into a separate function that takes an io.Reader as argument -- this will facilitate testing

  3. finally pull the single code decoding logic also into a separate function

So you'd have:

func runDebugDecodeProto(_ *cobra.Command, _ []string) error { 
    return streamMap(os.Stdout, os.Stdin, 
               func (s string) (bool, string, error) { return tryDecodeValue(s, debugDecodeProtoName) })
}

// streamMap applies `fn` to all the scanned fields in `in`, and reports
// the result of `fn` on `out`.
// Errors returned by `fn` are emitted on `out` with a "warning" prefix.
func streamMap(out os.Writer, in os.Reader, fn func(string) (bool, string, error)) error {
  for sc := NewScanner(); sc.Scan(); {
     for _, field := range strings.Fields(sc.Text()) {
         ok, value, err := fn(field, debugDecodeProtoName)
         if err != nil {
               fmt.Fprintln(out, "warning:  %v", err)
         } else if ok {
               fmt.Fprintln(out, value)
         }
     }
  }
}

// tryDecodeValue tries to decode the given string with the given proto name
// reports ok=false if the data was not valid proto-encoded.
func tryDecodeValue(s, protoName string) (ok bool, val string, err error) {
				bytes, err := gohex.DecodeString(s)
				if err != nil {
					b, err := base64.StdEncoding.DecodeString(field)
					if err != nil {
                                                return false, "", nil
					}
					bytes = b
				}
				msg, err := protoreflect.DecodeMessage(protoName, bytes)
				if err != nil {
                                         return false, "", nil
				}
				j, err := protoreflect.MessageToJSON(msg)
				if err != nil {
                                        // Unexpected error: the data was valid protobuf, but does not
                                        // reflect back to JSON. We report the protobuf struct in the
                                       // error message nonetheless.
					return false, "", errors.Wrapf(err, "while JSON-encoding %#v", msg)
				}
				return true, j.String(), nil
}

Then you can have separate unit tests for streamMap and tryDecodeValue.

Done.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: awesome

a nit, and also the linter is unhappy:

[11:10:28]W:	 [Step 6/6] --- a/pkg/cli/decode_test.go
[11:10:28]W:	 [Step 6/6] +++ b/pkg/cli/decode_test.go
[11:10:28]W:	 [Step 6/6] @@ -9,12 +9,14 @@ import (
[11:10:28]W:	 [Step 6/6]  
[11:10:28]W:	 [Step 6/6]  	"github.com/cockroachdb/cockroach/pkg/sql/catalog/descpb"
[11:10:28]W:	 [Step 6/6]  	"github.com/cockroachdb/cockroach/pkg/sql/protoreflect"
[11:10:28]W:	 [Step 6/6] +	"github.com/cockroachdb/cockroach/pkg/util/leaktest"
[11:10:28]W:	 [Step 6/6]  	"github.com/cockroachdb/cockroach/pkg/util/protoutil"
[11:10:28]W:	 [Step 6/6]  	"github.com/cockroachdb/errors"
[11:10:28]W:	 [Step 6/6]  	"github.com/stretchr/testify/require"
[11:10:28]W:	 [Step 6/6]  )
[11:10:28]W:	 [Step 6/6]  
[11:10:28]W:	 [Step 6/6]  func TestStreamMap(t *testing.T) {
[11:10:28]W:	 [Step 6/6] +	defer leaktest.AfterTest(t)()
[11:10:28]W:	 [Step 6/6]  	tests := []struct {
[11:10:28]W:	 [Step 6/6]  		name    string
[11:10:28]W:	 [Step 6/6]  		in      string
[11:10:28]W:	 [Step 6/6] @@ -58,6 +60,7 @@ func TestStreamMap(t *testing.T) {
[11:10:28]W:	 [Step 6/6]  }
[11:10:28]W:	 [Step 6/6]  
[11:10:28]W:	 [Step 6/6]  func TestTryDecodeValue(t *testing.T) {
[11:10:28]W:	 [Step 6/6] +	defer leaktest.AfterTest(t)()
[11:10:28]W:	 [Step 6/6]  	protoName := "cockroach.sql.sqlbase.TableDescriptor"
[11:10:28]W:	 [Step 6/6]  	marshal := func(pb protoutil.Message) []byte {
[11:10:28]W:	 [Step 6/6]  		s, err := protoutil.Marshal(pb)

Reviewed 3 of 3 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @spaskob)


pkg/cli/decode_test.go, line 49 at r2 (raw file):

	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			out := &bytes.Buffer{}

nit: var out bytes.Buffer is more idiomatic


pkg/cli/decode_test.go, line 50 at r2 (raw file):

		t.Run(tt.name, func(t *testing.T) {
			out := &bytes.Buffer{}
			err := streamMap(out, strings.NewReader(tt.in), tt.fn)

and then pass this via &out

@spaskob spaskob force-pushed the unmarshal branch 2 times, most recently from 1299e5e to 3d9eb2d Compare August 24, 2020 11:03
@knz
Copy link
Copy Markdown
Contributor

knz commented Aug 24, 2020

also you need to update the release note with the new name of the flag

@spaskob spaskob force-pushed the unmarshal branch 2 times, most recently from e6ea48b to 7076c8d Compare August 24, 2020 12:47
We currently export the descriptors and job payloads in debug zips with
hex encoding.  It is often very valuable to decode these protos into
JSON for inspection.

Fixes cockroachdb#52063.

Release note (cli change):
The new debug command `decode-proto` reads descriptor from stdin in hex
or base64 format (auto-detected) and a flag
--schema=<fully qualified name to decode> with default value
`cockroach.sql.sqlbase.Descriptor` and outputs to stdout the deserialized
proto in JSON format. If the input is not a hex/base64 encoded proto
then it is outputted verbatim.
@spaskob
Copy link
Copy Markdown
Contributor Author

spaskob commented Aug 24, 2020

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 24, 2020

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 24, 2020

Build succeeded:

@craig craig bot merged commit a40c8cb into cockroachdb:master Aug 24, 2020
@spaskob spaskob deleted the unmarshal branch September 2, 2020 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cli: debug commands to deserialize descriptors and job payloads

3 participants