cache descriptors to reduce API calls by fbs · Pull Request #218 · prometheus-community/stackdriver_exporter

fbs · 2023-04-20T12:34:03Z

See issue #217

This commit introduces a simple prefix -> []descriptor cache. If the prefix isn't in the cache the original page based lookup is performed and at the end of it the cache is updated. If it does exist in the cache the cached value is used.
No attempt is made to prevent two routines from updating the cache concurrently. If two concurrent requests see an expired cache entry they'll both do the ListDescriptor call and update the cache. Doing this keeps the cache simple and doesn't block one request on the other. This situation will also be rare in practice, as there is usually only one or two prometheuses scraping.

The cache itself comes in 3 flavors:

noop, does nothing
descriptorCache, caches all descriptors
googleDescriptorCache, caches only descriptors of google metrics

The cache is disabled by default and can be enabled with the new --monitoring.descriptor-cache-ttl CLI flag, if the TTL is set to a value >0s the cache is enabled. It defaults to using the google only cache, to control that the
--no-monitoring.descriptor-cache-only-google is introduced

fbs · 2023-04-20T12:35:55Z

collectors/cache_test.go

+	return true
+}
+
+func TestDescriptorCache(t *testing.T) {


The test feels fragile but not sure how bad it turns out to be in practice. Will see what the ci has to say about it

See issue prometheus-community#217 Currently the prefix -> descriptor mapping happens on every /metrics call. In our setup we have a lot of exporters running with specific prefixes that usually map to only a single/handful of metrics. This makes the ratio of ListDescriptor to ListTimeseries calls often close to 50/50 (43% of total request are ListDescriptor in a random sampling). In our case we're only interested in scraping the metrics produced by google services. Those services very rarely change their metrics so fetching the descriptors every time is just wasted work. While its only a few calls per exporter it adds up to millions of calls a month (2 (prometheuses in HA) * 1 (scrape per minute) * 60 * 24 * 31 (hours per month) * 20 prefixes = 1'728'000), across many google projects. This commit introduces a simple prefix -> []descriptor cache. If the prefix isn't in the cache the original page based lookup is performed and at the end of it the cache is updated. If it does exist in the cache the cached value is used. No attempt is made to prevent two routines from updating the cache concurrently. If two concurrent requests see an expired cache entry they'll both do the ListDescriptor call and update the cache. Doing this keeps the cache simple and doesn't block one request on the other. This situation will also be rare in practice, as there is usually only one or two prometheuses scraping. The cache itself comes in 3 flavors: - noop, does nothing - descriptorCache, caches all descriptors - googleDescriptorCache, caches only descriptors of google metrics The cache is disabled by default and can be enabled with the new `--monitoring.descriptor-cache-ttl` CLI flag, if the TTL is set to a value >0s the cache is enabled. It defaults to using the google only cache, to control that the `--no-monitoring.descriptor-cache-only-google` is introduced Signed-off-by: bas smit <bsmit@bol.com>

SuperQ · 2023-05-05T05:31:59Z

collectors/cache.go

+// descriptorCache is a MetricTypePrefix -> MetricDescriptor cache
+type descriptorCache struct {
+	cache map[string]*descriptorCacheEntry
+	lock  sync.Mutex


Would it make sense to use an RWMutex?

Don't think it matters in this case. The lock is not held during the API call itself, only for read/write of the map itself.

SuperQ

Question about if we should use an RWMutex/RLock(), otherwise LGTM.

fbs · 2023-05-25T17:45:20Z

Had >200 exporters running with this patch for a while, haven't seen any weird behavior and achieved a 40% reduction in number of api calls

* [FEATURE] cache descriptors to reduce API calls #218 Signed-off-by: SuperQ <superq@gmail.com>

weyert · 2023-05-25T21:35:35Z

Had >200 exporters running with this patch for a while, haven't seen any weird behavior and achieved a 40% reduction in number of api calls

40% less api calls that's great! This collector gets expensive really quickly

* [FEATURE] cache descriptors to reduce API calls #218 Signed-off-by: SuperQ <superq@gmail.com>

Release v0.14.0 * [FEATURE] cache descriptors to reduce API calls #218 --------- Signed-off-by: SuperQ <superq@gmail.com>

fbs · 2023-05-26T12:11:12Z

The issue has more details, but the reduction is because in a lot of cases we have very specific prefixes to reduce cost. If you only have a broad prefix like storage.googleapis.com/ the reduction will be way less.

Semi related, another thing you could look at is 'caching' the reply. In a HA env you have 2 prometheuses scraping the exporter, resulting in 2x the api calls to google. Depending on your usecase it might be ok to give one of them the cached results of the other

WojciechKuk · 2025-02-03T15:26:17Z

Semi related, another thing you could look at is 'caching' the reply.

@fbs this parameter (--monitoring.descriptor-cache-ttl) is enough for this? (I have 2 prometheuses in HA setup, scraping one stackdriver_exporter instance - I need all metrics+values to be cached for 60s - not only descriptor data)

fbs · 2025-02-03T16:47:39Z

@WojciechKuk no, afaik there is no way to cache the /metrics response yet. If there is no ticket for it it might be good to create one. I'm still interested in the feature, just never got to it

fbs force-pushed the descriptorcache branch from b9c0c6d to ac4876e Compare April 20, 2023 12:34

fbs commented Apr 20, 2023

View reviewed changes

fbs force-pushed the descriptorcache branch from ac4876e to 1350904 Compare April 21, 2023 11:47

fbs force-pushed the descriptorcache branch from 1350904 to a16df59 Compare May 4, 2023 17:31

SuperQ reviewed May 5, 2023

View reviewed changes

SuperQ approved these changes May 25, 2023

View reviewed changes

SuperQ merged commit 50c3be2 into prometheus-community:master May 25, 2023

SuperQ added a commit that referenced this pull request May 25, 2023

Release v0.14.0

a9e0bfe

* [FEATURE] cache descriptors to reduce API calls #218 Signed-off-by: SuperQ <superq@gmail.com>

This was referenced May 25, 2023

Fetch all descriptors in one request and filter in code #125

Closed

Release v0.14.0 #226

Merged

SuperQ added a commit that referenced this pull request May 26, 2023

Release v0.14.0

c6b840d

* [FEATURE] cache descriptors to reduce API calls #218 Signed-off-by: SuperQ <superq@gmail.com>

SuperQ added a commit that referenced this pull request May 26, 2023

Release v0.14.0 (#226)

6cb7206

Release v0.14.0 * [FEATURE] cache descriptors to reduce API calls #218 --------- Signed-off-by: SuperQ <superq@gmail.com>

fbs deleted the descriptorcache branch May 26, 2023 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache descriptors to reduce API calls#218

cache descriptors to reduce API calls#218
SuperQ merged 1 commit intoprometheus-community:masterfrom
fbs:descriptorcache

fbs commented Apr 20, 2023

Uh oh!

fbs Apr 20, 2023 •

edited

Loading

Uh oh!

SuperQ May 5, 2023

Uh oh!

fbs May 25, 2023

Uh oh!

SuperQ left a comment

Uh oh!

fbs commented May 25, 2023

Uh oh!

weyert commented May 25, 2023

Uh oh!

fbs commented May 26, 2023

Uh oh!

WojciechKuk commented Feb 3, 2025 •

edited

Loading

Uh oh!

fbs commented Feb 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fbs commented Apr 20, 2023

Uh oh!

fbs Apr 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SuperQ May 5, 2023

Choose a reason for hiding this comment

Uh oh!

fbs May 25, 2023

Choose a reason for hiding this comment

Uh oh!

SuperQ left a comment

Choose a reason for hiding this comment

Uh oh!

fbs commented May 25, 2023

Uh oh!

weyert commented May 25, 2023

Uh oh!

fbs commented May 26, 2023

Uh oh!

WojciechKuk commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fbs commented Feb 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fbs Apr 20, 2023 •

edited

Loading

WojciechKuk commented Feb 3, 2025 •

edited

Loading