Skip to content

Add more Optimizer data in metrics#7275

Closed
JojiiOfficial wants to merge 8 commits intodevfrom
optimizer-trigger-count-metric
Closed

Add more Optimizer data in metrics#7275
JojiiOfficial wants to merge 8 commits intodevfrom
optimizer-trigger-count-metric

Conversation

@JojiiOfficial
Copy link
Contributor

@JojiiOfficial JojiiOfficial commented Sep 19, 2025

Adds 2 new metrics, collections_optimizer_trigger_count and optimizer_total_processes_running, to the metrics API. The first one measures the amount of triggers of our optimizers on per (local) collection level.
The second one measures the total amount of optimization processes running on all local-shards of the given node.

# HELP collections_optimizer_run_count number of optimization triggers per optimizer
# TYPE collections_optimizer_run_count counter
collections_optimizer_run_count{id="benchmark",optimizer="merge"} 1
collections_optimizer_run_count{id="benchmark",optimizer="index"} 4

and

# HELP optimizer_total_processes_running number of optimization processes running in total
# TYPE optimizer_total_processes_running gauge
optimizer_total_processes_running 1

@JojiiOfficial JojiiOfficial marked this pull request as ready for review September 19, 2025 09:07
coderabbitai[bot]

This comment was marked as outdated.

@JojiiOfficial JojiiOfficial marked this pull request as draft September 19, 2025 09:31
@JojiiOfficial JojiiOfficial force-pushed the optimizer-trigger-count-metric branch 2 times, most recently from b60f758 to 3eebd85 Compare September 19, 2025 14:00
@JojiiOfficial JojiiOfficial marked this pull request as ready for review September 19, 2025 14:24
coderabbitai[bot]

This comment was marked as off-topic.

@JojiiOfficial JojiiOfficial changed the title Add new metric collections_optimizer_trigger_count Add more Optimizer data in metrics Sep 24, 2025
coderabbitai[bot]

This comment was marked as outdated.

@timvisee timvisee self-requested a review September 24, 2025 10:20
@qdrant qdrant deleted a comment from coderabbitai bot Sep 24, 2025
Comment on lines +358 to +359
// Optimizer has been triggered, so we need to increment the trigger-counter.
optimizer.increment_run_counter();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it is triggered, it may not actually run now because of the CPU/IO budget code below. It is possible to hit this function a lot of times without actually doing any optimization work.

Maybe it's better to move this further down to where we actually spawn the optimization task.

Copy link
Contributor Author

@JojiiOfficial JojiiOfficial Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is populated as optimization "triggers" in API, so I added the increment here, even though we might not start executing the optimization right away.
I agree the naming here should be improved to make this more clear.

I'd suggest to rename the functions. Alternatively, if we want to count the executions of optimizers (instead of triggers), I'd suggest to rename it in the API together with moving this further down.

What do you think?

Copy link
Member

@timvisee timvisee Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may happen that we 'trigger' multiple thousands of times.

Do you think it makes sense to trigger that? I don't see a clear use case for it to be honest.

I am more interested in the total number of started and finished optimizations. Or started and running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! I'll rename it and move the counter increment down the function 👍🏻

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @generall, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mostly interested in total number of actually running optimizations at the current moment.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Sep 24, 2025
@JojiiOfficial JojiiOfficial force-pushed the optimizer-trigger-count-metric branch from e1f6f5b to 424a232 Compare September 25, 2025 06:54
coderabbitai[bot]

This comment was marked as off-topic.

@qdrant qdrant deleted a comment from coderabbitai bot Sep 25, 2025
@JojiiOfficial JojiiOfficial force-pushed the optimizer-trigger-count-metric branch from f3b7c36 to c099206 Compare September 25, 2025 09:23
coderabbitai[bot]

This comment was marked as resolved.

Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make a separate PR which only introduces optimizer_total_processes_running

There are a lot of code changes for collections_optimizer_run_count which I am not sure that it worth it

@JojiiOfficial
Copy link
Contributor Author

Done: #7316

@qdrant qdrant deleted a comment from coderabbitai bot Oct 1, 2025
@timvisee
Copy link
Member

timvisee commented Oct 1, 2025

There are a lot of code changes for collections_optimizer_run_count which I am not sure that it worth it

@generall wouldn't it be an interesting metric to detect optimizer loops, or just very hot optimizer runs? While they aren't common, we have seen it happen once every so often.

We cannot detect those with just the current running count. We might see it through CPU/IO usage, but that isn't very direct.

We might simplify it by not partitioning per optimizer type.

@JojiiOfficial
Copy link
Contributor Author

Closing in favor of #7316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants