-
Notifications
You must be signed in to change notification settings - Fork 4.1k
cli: debug merge-logs output should include node id to ensure uniqueness #55395
Description
Describe the problem
The merge-log output is prefixed by short machine name which may not be unique. If more than 1 node has the same short machine name, you will not be able to easily determine which node produced a log message.
This issue was discovered using the debug.zip of a 15 node cluster which used three of the same short machine names.
To Reproduce
(1) Create a cluster with nodes with the same short machine names. For example,
node machine-name
1 cockroachdb-0.db.us1.example.dev
2 cockroachdb-1.db.us1.example.dev
3 cockroachdb-2.db.us1.example.dev
4 cockroachdb-0.db.us2.example.dev
5 cockroachdb-1.db.us2.example.dev
6 cockroachdb-2.db.us2.example.dev
(2) Generate a debug.zip for the cluster. For testing purposes, you can use:
sample_debug.zip
(3) Unzip the debug.zip file
(4) Grep the logs for "running on machine". Notice that the log files from multiple nodes in the debug zip will have the same short machine name (not the fully qualified name). For example, nodes 1 and 4 will have cockroachdb-0:
Florences-MBP:nodes florencemorris$ grep "running on machine" */logs/*
1/logs/cockroach.cockroachdb-0.root.2020-09-30T23_02_25Z.000001.log:I200930 23:02:25.482846 29002615 util/log/sync_buffer.go:49 [config] running on machine: cockroachdb-0
4/logs/cockroach.cockroachdb-0.root.2020-10-02T19_09_03Z.000001.log:I201002 19:09:03.549656 268864616 util/log/sync_buffer.go:49 [config] running on machine: cockroachdb-0
(4) Merge the log files:
Florences-MBP:nodes florencemorris$ cockroach debug merge-logs */logs/*
(5) In the merge-log output, notice that the lines are prefixed by short machine name, so you can not quickly determine which node the line is from. For example, the line below can be from either node 1 or 4:
Florences-MBP:nodes florencemorris$ cockroach debug merge-logs */logs/*
cockroachdb-0> I200930 23:02:25.482846 29002615 util/log/sync_buffer.go:49 [config] running on machine: cockroachdb-0
cockroachdb-0> I201002 19:09:03.549656 268864616 util/log/sync_buffer.go:49 [config] running on machine: cockroachdb-0
Expected behavior
merge-logs output prefix includes node ID which is unique. For example,
Florences-MBP:nodes florencemorris$ cockroach debug merge-logs */logs/*
cockroachdb-0(n1)> I200930 23:02:25.482846 29002615 util/log/sync_buffer.go:49 [config] running on machine: cockroachdb-0
cockroachdb-0(n4)> I201002 19:09:03.549656 268864616 util/log/sync_buffer.go:49 [config] running on machine: cockroachdb-0
Additional data / screenshots
Workaround:
grep logs for timestamp, such as I200930 23:02:25.482846, to determine the exact log file and node message came from.
Environment:
- CockroachDB version 20.1.6
- Server OS: Mac
- Client app
cockroach debug merge-logs
Additional context
What was the impact?
makes troubleshooting large clusters difficult