Skip to content

node: cpumgr: metrics: add uncore cache alignment metrics#130133

Merged
k8s-ci-robot merged 5 commits intokubernetes:masterfrom
ffromani:split-uncore-metrics
Mar 14, 2025
Merged

node: cpumgr: metrics: add uncore cache alignment metrics#130133
k8s-ci-robot merged 5 commits intokubernetes:masterfrom
ffromani:split-uncore-metrics

Conversation

@ffromani
Copy link
Copy Markdown
Contributor

@ffromani ffromani commented Feb 13, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Add alignment metrics for uncore (aka last-level) cache (KEP-4800)

Which issue(s) this PR fixes:

Part of the work needed for KEP 4800: kubernetes/enhancements#4800

Special notes for your reviewer:

continuation of #100145 29529

Does this PR introduce a user-facing change?

Add metrics to track allocation of Uncore (aka last-level aka L3) Cache blocks

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 13, 2025
@ffromani
Copy link
Copy Markdown
Contributor Author

/sig node
/sig testing

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. area/kubelet area/test and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 13, 2025
@ffromani ffromani changed the title Split uncore metrics node: cpumgr: metrics: add uncore cache alignment metrics Feb 13, 2025
@ffromani
Copy link
Copy Markdown
Contributor Author

/retest-required

Comment thread test/e2e_node/cpu_manager_metrics_test.go Outdated
@ffromani ffromani force-pushed the split-uncore-metrics branch from fdbd95c to 9a69f7c Compare February 14, 2025 09:51
@ffromani
Copy link
Copy Markdown
Contributor Author

/retest

@haircommander haircommander moved this from Triage to Archive-it in SIG Node CI/Test Board Feb 26, 2025
@ffromani
Copy link
Copy Markdown
Contributor Author

ffromani commented Mar 4, 2025

/cc @dashpole we are adding a new metric and we can use your review. Hopefully this will be quick!

Copy link
Copy Markdown
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

straightforward enough

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 13, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: ec80b0b82a58e0a2a94b46cbce8aaa8b0362d859

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 13, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 13, 2025
@ffromani
Copy link
Copy Markdown
Contributor Author

/triage accepted

kinda implied by the context, but still

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 14, 2025
abort the test in the unlikely case we fail to create
the Policy object. Exactly because this is unlikely
we should fail loudly.

In my case this happened because unrelated bad parameters,
which was hard to catch

Signed-off-by: Francesco Romani <fromani@redhat.com>
There's no need to use the generic reflect.DeepEqual,
we can use the cpuset Equals method in tests,
which is more idiomatic and a tad clearer.

Signed-off-by: Francesco Romani <fromani@redhat.com>
get nicer output for free

Signed-off-by: Francesco Romani <fromani@redhat.com>
fix test name. Cosmetic only.

Signed-off-by: Francesco Romani <fromani@redhat.com>
add missing metric about uncore / L3 / Last-Level cache alignment,
plus its e2e tests.

Exposing uncore alignment requires a bit of refactoring in the static
policy implementation because, differently from full PCPUs alignment
and NUMA alignment, can't be easily and safely inferred by construction.

The main reason for this is that uncore cache alignment is preferred,
not mandatory, thus the cpu allocator can legally use cross-uncore
allocation. Because of that, the final cpuset union step can
create a final cpuset which is not uncore-aligned even though all
its parts are uncore-aligned.

The safest way seems thus to run just a final uncore-alignment check
once the final cpuset is computed.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the split-uncore-metrics branch from 9a69f7c to 5c17e7b Compare March 14, 2025 08:22
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2025
@ffromani
Copy link
Copy Markdown
Contributor Author

/retest

Copy link
Copy Markdown
Contributor

@swatisehgal swatisehgal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor comment but not a blocker.

/lgtm

}
return
}
// TODO: move in updateMetricsOnAllocate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for postponing this? Should we handle this in this PR itself?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to serialize quite some PRs in this area, not sure which one would be the last so can carry this cleanup

Copy link
Copy Markdown
Contributor Author

@ffromani ffromani Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...AND there 's a thorny issue I don't have a solution yet. In theory, the code portion which owns alignment information is the topology object; hence, we should report metrics based on that. Yet, for historical reasons, we make logic on the options object. I don't know how to properly cleanup this yet.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: e5799ca8b1cb5dc6737684efc7f0e0d0a942b02e

@k8s-ci-robot k8s-ci-robot merged commit 83d33a9 into kubernetes:master Mar 14, 2025
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Mar 14, 2025
@github-project-automation github-project-automation Bot moved this from Archive-it to Done in SIG Node CI/Test Board Mar 14, 2025
@github-project-automation github-project-automation Bot moved this from Needs Reviewer to Done in SIG Node: code and documentation PRs Mar 14, 2025
@ffromani ffromani deleted the split-uncore-metrics branch March 17, 2025 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants