-
Notifications
You must be signed in to change notification settings - Fork 4.1k
very high CPU usage on one of 5 nodes in cluster #17075
Description
Is this a question, feature request, or bug report?
BUG REPORT
-
Please supply the header (i.e. the first few lines) of your most recent
log file for each node in your cluster. On most unix-based systems
running with defaults, this boils down to the output ofgrep -F '[config]' cockroach-data/logs/cockroach.log
When log files are not available, supply the output of
cockroach version
and all flags/environment variables passed tocockroach startinstead.
This is cockroachdb cluster running a fairly recent master commit. Not ideal, but I'm doing testing that relies on functionality that will only be released in 1.0.4.
$ cockroach version
Build Tag: a26544c-dirty
Build Time: 2017/07/16 14:48:16
Distribution: CCL
Platform: linux amd64
Go Version: go1.8.3
C Compiler: gcc 4.9.3
Build SHA-1: a26544ce14386cc07bd21420cfbc63208757f5b0
Build Type: development
I attached the entire debug bundle (link at the bottom of the description). This is a test cluster that I'm keeping running in this state. I am keeping it running for the moment in the hope that there is some additional output I can provide.
- Please describe the issue you observed:
- What did you do?
Start a 5-node cluster.
Issue queries to each of the nodes at a rate of ~1 per second.
Perform thousands of inserts and deletes over a period of a day or two.
As these are benchmarks, most inserts are performed serially in a 'preparation' phase and are executed one statement at a time against Node 2, followed by a few hundred insert/delete transactions distributed evenly against all 5 nodes, followed once more by serial delete statements against Node 2 to clean up after the benchmark. Throughout all these actions there is a slow background query rate of ~1/sec performed by other processes in the cluster.
I've attached a complete dump of the database and as you can see there is really very little data at the moment.
- What did you expect to see?
Inserts and queries taking milliseconds.
Specifically, queries of the following form not taking several seconds:
SELECT resources.id AS resources_id, resources.rid AS resources_rid, resources.description AS resources_description, aces.actions AS aces_actions, aces.id AS aces_id, aces.user_id AS aces_user_id, aces.group_id AS aces_group_id, aces.resource_id AS aces_resource_id
FROM aces JOIN users ON users.id = aces.user_id JOIN resources ON resources.id = aces.resource_id
WHERE users.uid = %(uid_1)s
CockroachDB on all nodes using some fraction of a core.
- What did you see instead?
Queries (admittedly the afore-mentioned 3-way joins) taking several seconds.
CockroachDB on nodes 1,2,4,5 hovering around 10% CPU while the process on Node 3 is pegged at ~400% CPU.
top - 13:25:46 up 1 day, 22:28, 2 users, load average: 6.70, 6.65, 6.58
Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie
%Cpu(s): 90.8 us, 7.3 sy, 0.0 ni, 1.8 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 16004820 total, 6722184 free, 4530252 used, 4752384 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11036620 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28794 dcos_co+ 20 0 3395548 1.553g 24296 S 379.1 10.2 8760:22 cockroach
I've attached the debug bundle and complete database dump.
I've also attached CPU profile SVG and goroutine stacktraces.