Skip to content

tsdb: metrics graphs reporting abnormally high values #99486

@abarganier

Description

@abarganier

Describe the problem

Original thread: https://cockroachlabs.slack.com/archives/C01CNRP6TSN/p1679669195626919

The roachprod cluster in the above linked Slack thread is experiencing abnormally high stats readings in DB Console metrics charts. For example, normalized CPU Usage charts are reading > 1000% per node, memory usage per-node > 250GB, etc.

We recently merged #98077, which modified the TSDB query code to work for in-process tenants. It's possible we introduced a bug into the aggregation logic.

To Reproduce

  1. Set up a roach prod cluster (not multi-tenant - note the specific roachprod cluster where this was discovered did not have multiple tenants).
  2. Generate a workload against the cluster.
  3. Observe abnormally high metric readings.

Additional data / screenshots
Screenshot 2023-03-24 at 11 29 02 AM
Screenshot 2023-03-24 at 11 28 56 AM

Jira issue: CRDB-25898

Metadata

Metadata

Assignees

Labels

A-observability-infC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.GA-blockerbranch-masterFailures and bugs on the master branch.branch-release-23.1Used to mark GA and release blockers, technical advisories, and bugs for 23.1release-blockerIndicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions