Skip to content

feature(scheduler): implement plugin_evaluation_total metric#115082

Merged
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
sanposhiho:filter-metrics
Mar 6, 2023
Merged

feature(scheduler): implement plugin_evaluation_total metric#115082
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
sanposhiho:filter-metrics

Conversation

@sanposhiho
Copy link
Copy Markdown
Member

@sanposhiho sanposhiho commented Jan 15, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

New "schedule_attempts" "plugin_evaluation_total" metric is added to the scheduler. This metric counts how many times the specific plugin affects the scheduling result while taking Skip status into consideration.

It's now only supporting the Filter status. After #114827, we can support "Score" as well.

Which issue(s) this PR fixes:

Part of #110643

Special notes for your reviewer:

Not sure if "schedule_attempts" is the most suitable name. Tell me if any other good idea on naming.

EDIT: renamed to plugin_evaluation_total as suggested in #115082 (comment).

Does this PR introduce a user-facing change?

New "plugin_evaluation_total" is added to the scheduler. 
This metric counts how many times the specific plugin affects the scheduling result. The metric doesn't get incremented when the plugin has nothing to do with an incoming Pod.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 15, 2023
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 15, 2023
@sanposhiho
Copy link
Copy Markdown
Member Author

/cc @alculquicondor

Comment thread pkg/scheduler/metrics/metrics.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counters should always end in _total https://prometheus.io/docs/practices/naming/#metric-names

plugin_evaluation_total? This is a low level metric, so I think it's fair to have the word plugin in it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed as suggested, thanks.

@alculquicondor
Copy link
Copy Markdown
Member

@kerthcet or @chendave do you have cycles to look at this?

@kerthcet
Copy link
Copy Markdown
Member

I'll be back tomorrow.
/assign

@sanposhiho
Copy link
Copy Markdown
Member Author

@alculquicondor @kerthcet

Fixed based on the suggestion, please take a look. 🙏

@sanposhiho sanposhiho changed the title feature(scheduler): implement schedule_attempts metric feature(scheduler): implement plugin_evaluation_total metric Feb 4, 2023
@kerthcet
Copy link
Copy Markdown
Member

kerthcet commented Feb 6, 2023

This metric counts how many times the specific plugin affects the scheduling result while taking Skip status into consideration. In other words, the metric doesn't get incremented when the plugin has nothing to do with an incoming Pod.

By evaluating the plug being called times, what can we get from this? I mean in what scenario we need this metrics. Not opposing it, something I haven't figured out.

@kerthcet
Copy link
Copy Markdown
Member

kerthcet commented Feb 6, 2023

And sorry for let this sitting a long time. Busy with works. ☹️

@alculquicondor
Copy link
Copy Markdown
Member

By evaluating the plug being called times, what can we get from this?

This is a common question in PRR for any new features: how can an admin know if a feature is being used? This PR moves the needle in that direction.

if state.SkipFilterPlugins.Has(pl.Name()) {
continue
}
metrics.PluginEvaluationTotal.WithLabelValues(pl.Name(), Filter).Inc()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add PreFilter too? Just in case some plugins only implement PreFilter.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can add to all extension points for counting plugin called times. Starting with filter is also OK.

Copy link
Copy Markdown
Member Author

@sanposhiho sanposhiho Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with preFilter/filter in this PR.
@kidddddddddddddddddddddd will work on the prescore/score.
#110643 (comment)

I'm not sure if it's worth implementing on other extension points for now because it's other extension points don't have Skip feature. I'd say we can support other extension points when we support Skip on them.

@alculquicondor
Copy link
Copy Markdown
Member

/label sig-instrumentation

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@alculquicondor: The label(s) /label sig-instrumentation cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, official-cve-feed. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

Details

In response to this:

/label sig-instrumentation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alculquicondor
Copy link
Copy Markdown
Member

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Feb 6, 2023
@kerthcet
Copy link
Copy Markdown
Member

kerthcet commented Feb 7, 2023

This is a common question in PRR for any new features: how can an admin know if a feature is being used? This PR moves the needle in that direction.

Got it, before without the support of Skip, all enabled filter plugins' being called times should be equal, now they're not very likely, this can be grabbed by metrics. Same to new plugins added.

if state.SkipFilterPlugins.Has(pl.Name()) {
continue
}
metrics.PluginEvaluationTotal.WithLabelValues(pl.Name(), Filter).Inc()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can add to all extension points for counting plugin called times. Starting with filter is also OK.

Comment thread pkg/scheduler/metrics/metrics.go Outdated
Name: "plugin_evaluation_total",
Help: "Number of attempts to schedule pods by each plugin and the extension point (only Filter is supported now.).",
StabilityLevel: metrics.ALPHA,
}, []string{"plugin", "extension_point"})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe profile here is also needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 👍

@logicalhan
Copy link
Copy Markdown

/assign
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2023
@sanposhiho
Copy link
Copy Markdown
Member Author

Thanks @kerthcet @alculquicondor.
These days, I'm running out of my time with KEPs related implementation,
so will get back here after them. 🙏

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 6, 2023
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 6, 2023
@sanposhiho
Copy link
Copy Markdown
Member Author

@kerthcet @alculquicondor Sorry for being late on it. Fixed based on the suggestions:

  • support PreFilter
  • support profile label

@alculquicondor
Copy link
Copy Markdown
Member

This metric counts how many times the specific plugin affects the scheduling result. In other words, the metric doesn't get incremented when the scheduling plugin makes no decision about an incoming Pod. For example, the metric only increments for the NodeAffinity plugin if a Pod being scheduled defines node affinity.

Skip is an implementation detail that users are not aware of.

Copy link
Copy Markdown
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

if state.SkipFilterPlugins.Has(pl.Name()) {
continue
}
metrics.PluginEvaluationTotal.WithLabelValues(pl.Name(), Filter, f.profileName).Inc()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 why is the Filter value exported, but not preFilter? But this is outside of the scope of this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems Filter value is used in another package. (schedule_one.go)
https://github.com/sanposhiho/kubernetes/blob/799444a689da80470ad1aa87153f4feb5134b3db/pkg/scheduler/schedule_one.go#L533

It may be better to move those variables to metrics package? since they are used in the metric value's label.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that sounds like a better location.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL #116312

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 6, 2023
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 95cdc134b79f33807c3ee8847b54a32b64167d1d

Name: "plugin_evaluation_total",
Help: "Number of attempts to schedule pods by each plugin and the extension point (available only in PreFilter and Filter.).",
StabilityLevel: metrics.ALPHA,
}, []string{"plugin", "extension_point", "profile"})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the cardinality of these labels?

Copy link
Copy Markdown
Member

@alculquicondor alculquicondor Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<20, <10, <5

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2023
@k8s-ci-robot k8s-ci-robot merged commit 283c26f into kubernetes:master Mar 6, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Mar 6, 2023
@sanposhiho
Copy link
Copy Markdown
Member Author

#115082 (comment)

modified the release not a bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants