Conversation
b60f758 to
3eebd85
Compare
lib/collection/src/update_handler.rs
Outdated
| // Optimizer has been triggered, so we need to increment the trigger-counter. | ||
| optimizer.increment_run_counter(); |
There was a problem hiding this comment.
Though it is triggered, it may not actually run now because of the CPU/IO budget code below. It is possible to hit this function a lot of times without actually doing any optimization work.
Maybe it's better to move this further down to where we actually spawn the optimization task.
There was a problem hiding this comment.
This value is populated as optimization "triggers" in API, so I added the increment here, even though we might not start executing the optimization right away.
I agree the naming here should be improved to make this more clear.
I'd suggest to rename the functions. Alternatively, if we want to count the executions of optimizers (instead of triggers), I'd suggest to rename it in the API together with moving this further down.
What do you think?
There was a problem hiding this comment.
It may happen that we 'trigger' multiple thousands of times.
Do you think it makes sense to trigger that? I don't see a clear use case for it to be honest.
I am more interested in the total number of started and finished optimizations. Or started and running.
There was a problem hiding this comment.
I see! I'll rename it and move the counter increment down the function 👍🏻
There was a problem hiding this comment.
I am mostly interested in total number of actually running optimizations at the current moment.
e1f6f5b to
424a232
Compare
f3b7c36 to
c099206
Compare
generall
left a comment
There was a problem hiding this comment.
Could you please make a separate PR which only introduces optimizer_total_processes_running
There are a lot of code changes for collections_optimizer_run_count which I am not sure that it worth it
|
Done: #7316 |
@generall wouldn't it be an interesting metric to detect optimizer loops, or just very hot optimizer runs? While they aren't common, we have seen it happen once every so often. We cannot detect those with just the current running count. We might see it through CPU/IO usage, but that isn't very direct. We might simplify it by not partitioning per optimizer type. |
|
Closing in favor of #7316 |
Adds 2 new metrics,
collections_optimizer_trigger_countandoptimizer_total_processes_running, to the metrics API. The first one measures the amount of triggers of our optimizers on per (local) collection level.The second one measures the total amount of optimization processes running on all local-shards of the given node.
and