Add more Optimizer data in metrics by JojiiOfficial · Pull Request #7275 · qdrant/qdrant

JojiiOfficial · 2025-09-19T08:56:10Z

Adds 2 new metrics, collections_optimizer_trigger_count and optimizer_total_processes_running, to the metrics API. The first one measures the amount of triggers of our optimizers on per (local) collection level.
The second one measures the total amount of optimization processes running on all local-shards of the given node.

# HELP collections_optimizer_run_count number of optimization triggers per optimizer
# TYPE collections_optimizer_run_count counter
collections_optimizer_run_count{id="benchmark",optimizer="merge"} 1
collections_optimizer_run_count{id="benchmark",optimizer="index"} 4

and

# HELP optimizer_total_processes_running number of optimization processes running in total
# TYPE optimizer_total_processes_running gauge
optimizer_total_processes_running 1

lib/collection/src/shards/local_shard/telemetry.rs

timvisee · 2025-09-24T13:59:17Z

lib/collection/src/update_handler.rs

+                // Optimizer has been triggered, so we need to increment the trigger-counter.
+                optimizer.increment_run_counter();


Though it is triggered, it may not actually run now because of the CPU/IO budget code below. It is possible to hit this function a lot of times without actually doing any optimization work.

Maybe it's better to move this further down to where we actually spawn the optimization task.

This value is populated as optimization "triggers" in API, so I added the increment here, even though we might not start executing the optimization right away.
I agree the naming here should be improved to make this more clear.

I'd suggest to rename the functions. Alternatively, if we want to count the executions of optimizers (instead of triggers), I'd suggest to rename it in the API together with moving this further down.

What do you think?

It may happen that we 'trigger' multiple thousands of times.

Do you think it makes sense to trigger that? I don't see a clear use case for it to be honest.

I am more interested in the total number of started and finished optimizations. Or started and running.

I see! I'll rename it and move the counter increment down the function 👍🏻

cc @generall, thoughts?

I am mostly interested in total number of actually running optimizations at the current moment.

src/common/metrics.rs

generall

Could you please make a separate PR which only introduces optimizer_total_processes_running

There are a lot of code changes for collections_optimizer_run_count which I am not sure that it worth it

JojiiOfficial · 2025-09-26T08:16:48Z

Done: #7316

timvisee · 2025-10-01T08:21:32Z

There are a lot of code changes for collections_optimizer_run_count which I am not sure that it worth it

@generall wouldn't it be an interesting metric to detect optimizer loops, or just very hot optimizer runs? While they aren't common, we have seen it happen once every so often.

We cannot detect those with just the current running count. We might see it through CPU/IO usage, but that isn't very direct.

We might simplify it by not partitioning per optimizer type.

JojiiOfficial · 2025-10-29T15:00:49Z

Closing in favor of #7316

JojiiOfficial commented Sep 19, 2025

View reviewed changes

lib/collection/src/shards/local_shard/telemetry.rs Outdated Show resolved Hide resolved

JojiiOfficial marked this pull request as ready for review September 19, 2025 09:07

This comment was marked as outdated.

Sign in to view

JojiiOfficial marked this pull request as draft September 19, 2025 09:31

JojiiOfficial force-pushed the optimizer-trigger-count-metric branch 2 times, most recently from b60f758 to 3eebd85 Compare September 19, 2025 14:00

JojiiOfficial marked this pull request as ready for review September 19, 2025 14:24

This comment was marked as off-topic.

Sign in to view

JojiiOfficial changed the title ~~Add new metric collections_optimizer_trigger_count~~ Add more Optimizer data in metrics Sep 24, 2025

This comment was marked as outdated.

Sign in to view

JojiiOfficial requested a review from generall September 24, 2025 08:30

timvisee self-requested a review September 24, 2025 10:20

JojiiOfficial mentioned this pull request Sep 24, 2025

Add replica metrics #7301

Merged

qdrant deleted a comment from coderabbitai bot Sep 24, 2025

timvisee requested changes Sep 24, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

qdrant deleted a comment from coderabbitai bot Sep 24, 2025

JojiiOfficial force-pushed the optimizer-trigger-count-metric branch from e1f6f5b to 424a232 Compare September 25, 2025 06:54

This comment was marked as off-topic.

Sign in to view

qdrant deleted a comment from coderabbitai bot Sep 25, 2025

JojiiOfficial added 8 commits September 25, 2025 11:23

Add new metric collections_optimizer_trigger_count

c0bd7a7

Only add metric if exist

c19f180

Manually counting triggers since logs are truncated

eb06699

Update openapi docs

64f81f6

add metric for total running optimizer

aa538f0

Destruct OptimizerTriggers for better maintainability

2a99594

Measure Optimizer runs instead of triggers

d3310f7

Update openapi specs

c099206

JojiiOfficial force-pushed the optimizer-trigger-count-metric branch from f3b7c36 to c099206 Compare September 25, 2025 09:23

This comment was marked as resolved.

Sign in to view

generall requested changes Sep 25, 2025

View reviewed changes

JojiiOfficial mentioned this pull request Sep 26, 2025

Currently running optimizers in Metrics #7316

Merged

qdrant deleted a comment from coderabbitai bot Oct 1, 2025

JojiiOfficial closed this Oct 29, 2025

		// Optimizer has been triggered, so we need to increment the trigger-counter.
		optimizer.increment_run_counter();

Conversation

JojiiOfficial commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as outdated.

Uh oh!

timvisee Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

JojiiOfficial Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timvisee Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JojiiOfficial Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

timvisee Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

generall Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as resolved.

Uh oh!

generall left a comment

Choose a reason for hiding this comment

Uh oh!

JojiiOfficial commented Sep 26, 2025

Uh oh!

timvisee commented Oct 1, 2025

Uh oh!

JojiiOfficial commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JojiiOfficial commented Sep 19, 2025 •

edited

Loading

JojiiOfficial Sep 24, 2025 •

edited

Loading

timvisee Sep 24, 2025 •

edited

Loading