Skip to content

very high CPU usage on one of 5 nodes in cluster #17075

@gpaul

Description

@gpaul

Is this a question, feature request, or bug report?

BUG REPORT

  1. Please supply the header (i.e. the first few lines) of your most recent
    log file for each node in your cluster. On most unix-based systems
    running with defaults, this boils down to the output of

    grep -F '[config]' cockroach-data/logs/cockroach.log

    When log files are not available, supply the output of cockroach version
    and all flags/environment variables passed to cockroach start instead.

This is cockroachdb cluster running a fairly recent master commit. Not ideal, but I'm doing testing that relies on functionality that will only be released in 1.0.4.

$ cockroach version
Build Tag:    a26544c-dirty
Build Time:   2017/07/16 14:48:16
Distribution: CCL
Platform:     linux amd64
Go Version:   go1.8.3
C Compiler:   gcc 4.9.3
Build SHA-1:  a26544ce14386cc07bd21420cfbc63208757f5b0
Build Type:   development

I attached the entire debug bundle (link at the bottom of the description). This is a test cluster that I'm keeping running in this state. I am keeping it running for the moment in the hope that there is some additional output I can provide.

  1. Please describe the issue you observed:
  • What did you do?

Start a 5-node cluster.

Issue queries to each of the nodes at a rate of ~1 per second.

Perform thousands of inserts and deletes over a period of a day or two.

As these are benchmarks, most inserts are performed serially in a 'preparation' phase and are executed one statement at a time against Node 2, followed by a few hundred insert/delete transactions distributed evenly against all 5 nodes, followed once more by serial delete statements against Node 2 to clean up after the benchmark. Throughout all these actions there is a slow background query rate of ~1/sec performed by other processes in the cluster.

I've attached a complete dump of the database and as you can see there is really very little data at the moment.

  • What did you expect to see?

Inserts and queries taking milliseconds.

Specifically, queries of the following form not taking several seconds:

SELECT resources.id AS resources_id, resources.rid AS resources_rid, resources.description AS resources_description, aces.actions AS aces_actions, aces.id AS aces_id, aces.user_id AS aces_user_id, aces.group_id AS aces_group_id, aces.resource_id AS aces_resource_id 
FROM aces JOIN users ON users.id = aces.user_id JOIN resources ON resources.id = aces.resource_id 
WHERE users.uid = %(uid_1)s

CockroachDB on all nodes using some fraction of a core.

  • What did you see instead?

Queries (admittedly the afore-mentioned 3-way joins) taking several seconds.

CockroachDB on nodes 1,2,4,5 hovering around 10% CPU while the process on Node 3 is pegged at ~400% CPU.

top - 13:25:46 up 1 day, 22:28,  2 users,  load average: 6.70, 6.65, 6.58
Tasks: 180 total,   1 running, 179 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90.8 us,  7.3 sy,  0.0 ni,  1.8 id,  0.1 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 16004820 total,  6722184 free,  4530252 used,  4752384 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 11036620 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                   
28794 dcos_co+  20   0 3395548 1.553g  24296 S 379.1 10.2   8760:22 cockroach                                                                                                                 

I've attached the debug bundle and complete database dump.

I've also attached CPU profile SVG and goroutine stacktraces.

cockroach-debug.zip
dump.zip
cpu-profile.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions