Skip to content

cli: debug merge-logs output should include node id to ensure uniqueness #55395

@florence-crl

Description

@florence-crl

Describe the problem
The merge-log output is prefixed by short machine name which may not be unique. If more than 1 node has the same short machine name, you will not be able to easily determine which node produced a log message.

This issue was discovered using the debug.zip of a 15 node cluster which used three of the same short machine names.

To Reproduce

(1) Create a cluster with nodes with the same short machine names. For example,

node  machine-name
1      cockroachdb-0.db.us1.example.dev
2      cockroachdb-1.db.us1.example.dev
3      cockroachdb-2.db.us1.example.dev
4      cockroachdb-0.db.us2.example.dev
5      cockroachdb-1.db.us2.example.dev
6      cockroachdb-2.db.us2.example.dev

(2) Generate a debug.zip for the cluster. For testing purposes, you can use:
sample_debug.zip

(3) Unzip the debug.zip file
(4) Grep the logs for "running on machine". Notice that the log files from multiple nodes in the debug zip will have the same short machine name (not the fully qualified name). For example, nodes 1 and 4 will have cockroachdb-0:

Florences-MBP:nodes florencemorris$ grep "running on machine" */logs/*
1/logs/cockroach.cockroachdb-0.root.2020-09-30T23_02_25Z.000001.log:I200930 23:02:25.482846 29002615 util/log/sync_buffer.go:49  [config] running on machine: cockroachdb-0
4/logs/cockroach.cockroachdb-0.root.2020-10-02T19_09_03Z.000001.log:I201002 19:09:03.549656 268864616 util/log/sync_buffer.go:49  [config] running on machine: cockroachdb-0

(4) Merge the log files:

Florences-MBP:nodes florencemorris$ cockroach debug merge-logs */logs/*

(5) In the merge-log output, notice that the lines are prefixed by short machine name, so you can not quickly determine which node the line is from. For example, the line below can be from either node 1 or 4:

Florences-MBP:nodes florencemorris$ cockroach debug merge-logs */logs/*
cockroachdb-0> I200930 23:02:25.482846 29002615 util/log/sync_buffer.go:49  [config] running on machine: cockroachdb-0
cockroachdb-0> I201002 19:09:03.549656 268864616 util/log/sync_buffer.go:49  [config] running on machine: cockroachdb-0

Expected behavior

merge-logs output prefix includes node ID which is unique. For example,

Florences-MBP:nodes florencemorris$ cockroach debug merge-logs */logs/*
cockroachdb-0(n1)> I200930 23:02:25.482846 29002615 util/log/sync_buffer.go:49  [config] running on machine: cockroachdb-0
cockroachdb-0(n4)> I201002 19:09:03.549656 268864616 util/log/sync_buffer.go:49  [config] running on machine: cockroachdb-0

Additional data / screenshots
Workaround:
grep logs for timestamp, such as I200930 23:02:25.482846, to determine the exact log file and node message came from.

Environment:

  • CockroachDB version 20.1.6
  • Server OS: Mac
  • Client app cockroach debug merge-logs

Additional context
What was the impact?
makes troubleshooting large clusters difficult

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-cli-adminCLI commands that pertain to controlling and configuring nodesA-loggingIn and around the logging infrastructure.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-server-and-securityDB Server & Security

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions