Skip to content

storage: unmarshaling descriptors makes 35% of garbage on cluster with many ranges #21702

@jordanlewis

Description

@jordanlewis

I noticed this while playing with the TPCC-1000 dataset against a cluster that exhibited #21689, which creates a ton of ranges. The cluster had around 50,000 ranges.

The metrics collector is supposed to iterate over every range every 10 seconds, collecting stats on it. One of the steps to collecting stats is to check the range zone config to see how many replicas it's supposed to have. Checking the zone config requires demarshalling the range's table descriptor and the table's zone config, if it has one, which is the source of the garbage.

The reason that this produces so much garbage in the end is that there's no caching for any of these lookups, so each of the 50,000 ranges causes a full table descriptor unmarshal.

We should figure out a way to fix this. Here are some ideas:

  1. Cache the results of getZoneConfigForID, which is a map from descriptor ID to zone config. The challenge here would be determining the correct eviction policy, its relation to descriptor changes from gossip, and so on.
  2. Simpler would be to memoize the results of the above call just for the lifetime of the scan over all of the ranges. This way I think we could avoid thinking about eviction.
  3. @benesch suggested adding an index to the system.namespace table on its config id column, which would permit looking up whether a range has a specialized zone config for its table or partition without having to demarshal its descriptor to check.

I uploaded the profiles I used to find this issue below. You can see that this ComputeMetrics flow takes 20% of the CPU of the cluster, which was at the time of profiling running a large COUNT(*). @danhhz rightly points out that you can't conclude much there since the cluster wasn't fully loaded, but it's still concerning in my opinion.

pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.tpcc-1000-1node-count-stock.pb.gz
pprof.cockroach.samples.cpu.tpcc-1000-1node-count-stock.pb.gz

I think it would be great to get this fixed by 2.0. I can't prove it, but I suspect that this issue is one of the reasons why idle clusters eat up a lot of CPU, something that several users have asked about.

Metadata

Metadata

Labels

C-performancePerf of queries or internals. Solution not expected to change functional behavior.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions