Skip to content

Use UnmarshalTo instead of UnmarshalAny when dealing with cgroup metrics.#9195

Closed
alam0rt wants to merge 1 commit into
containerd:mainfrom
zendesk:sam.lockart/unmarshal-metrics-to-type
Closed

Use UnmarshalTo instead of UnmarshalAny when dealing with cgroup metrics.#9195
alam0rt wants to merge 1 commit into
containerd:mainfrom
zendesk:sam.lockart/unmarshal-metrics-to-type

Conversation

@alam0rt

@alam0rt alam0rt commented Oct 5, 2023

Copy link
Copy Markdown
Contributor

After encountering the issue here: #9052, it appears that the type assertion was failing as the typeof data was an interface{} despite the concrete type being a pointer to v1.Metrics. We can instead unmarshal the data directly into the expected type variable which appears to work as expected.

@k8s-ci-robot

Copy link
Copy Markdown

Hi @alam0rt. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alam0rt alam0rt changed the title Use UnmarshalTo instead of UnmarshalAny Use UnmarshalTo instead of UnmarshalAny when dealing with cgroup metrics. Oct 5, 2023
@alam0rt alam0rt mentioned this pull request Oct 5, 2023
Signed-off-by: Sam Lockart <sam.lockart@zendesk.com>
@alam0rt alam0rt force-pushed the sam.lockart/unmarshal-metrics-to-type branch from 2a860ef to 9b68866 Compare October 5, 2023 03:11
@henry118

henry118 commented Oct 7, 2023

Copy link
Copy Markdown
Member

The unit test below didn't result in any error to me (with v1.7.5). Do you mind running it on your end?

func TestMetrics(t *testing.T) {
	m := &v1.Metrics{
		CPU: &stats.CPUStat{
			Usage: &stats.CPUUsage{
				Total: 50,
			},
		},
	}
	stats, err := typeurl.MarshalAny(m)
	assert.NoError(t, err)

	data, err := typeurl.UnmarshalAny(stats)
	assert.NoError(t, err)
	_, ok := data.(*v1.Metrics)
	assert.True(t, ok)

	dst := &v1.Metrics{}
	err = typeurl.UnmarshalTo(stats, dst)
	assert.NoError(t, err)
}


data, err := typeurl.UnmarshalAny(stats)
if err != nil {
s := &v1.Metrics{}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chunk of code has remained the same in over 4 years now. I am not sure this is anything specific to containerd 1.7.5 like the github issue here mentions #9052 . What version were you previously on? Has there been any changes on your application?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using containerd with kubelet, there's nothing that comes to mind. I'll run this test on a kubelet and let you know

@alam0rt

alam0rt commented Oct 10, 2023

Copy link
Copy Markdown
Contributor Author

The unit test below didn't result in any error to me (with v1.7.5). Do you mind running it on your end?

func TestMetrics(t *testing.T) {
	m := &v1.Metrics{
		CPU: &stats.CPUStat{
			Usage: &stats.CPUUsage{
				Total: 50,
			},
		},
	}
	stats, err := typeurl.MarshalAny(m)
	assert.NoError(t, err)

	data, err := typeurl.UnmarshalAny(stats)
	assert.NoError(t, err)
	_, ok := data.(*v1.Metrics)
	assert.True(t, ok)

	dst := &v1.Metrics{}
	err = typeurl.UnmarshalTo(stats, dst)
	assert.NoError(t, err)
}

Damn, won't compile on OSX due to build flags.

I tried using ctr metrics

Oct 10 00:08:24 containerd[1448]: {"error":null,"level":"error","msg":"invalid metric type for 535655fc1c59d67a1c3c0e6a0f81f1636b6758485b80e162c730be9f86de0360","time":"2023-10-10T00:08:24.769770849Z"}
ID                                                                  TIMESTAMP
535655fc1c59d67a1c3c0e6a0f81f1636b6758485b80e162c730be9f86de0360    seconds:1696896532  nanos:163248295

METRIC                   VALUE
memory.usage_in_bytes    30474240
memory.limit_in_bytes    268435456
memory.stat.cache        2060288
cpuacct.usage            2204998916
cpuacct.usage_percpu     [144178557 89870576 31787241 48788881 365999279 34713446 43435968 539148328 53590245 112969897 453806691 44179111 145900230 33398840 31158931 32072695]
pids.current             24
pids.limit               0

No issue there. Will continue looking

@alam0rt

alam0rt commented Oct 10, 2023

Copy link
Copy Markdown
Contributor Author

Some more info: I believe this started occuring after the upgrade to 1.7.0 which I can see is when gogoproto was removed?

I added a print to https://github.com/containerd/containerd/blob/fe457eb99ac0e27b3ce638175ef8e68a7d2bc373/vendor/github.com/containerd/typeurl/v2/types.go
https://github.com/zendesk/containerd/blob/fe457eb99ac0e27b3ce638175ef8e68a7d2bc373/vendor/github.com/containerd/typeurl/v2/types.go and can see that the typeURL Is gogoproto

		case gogoproto.Message:
			err = gogoproto.Unmarshal(value, t)
			fmt.Printf("(typeURL: %v), gogoproto: %s\n", typeURL, t.String())
		}

containerd[58435]: (typeURL: io.containerd.cgroups.v1.Metrics), gogoproto: &Metrics{H....

@dmcgowan

Copy link
Copy Markdown
Member

@alam0rt What build of containerd are you using? I was wondering how the types could conflict since a panic will occur with multiple registrations. However, typeurl, gogoproto, and protobuf packages each have their own global registrations. I'm wondering why your build has this type registered while others don't seem to.

@alam0rt

alam0rt commented Oct 10, 2023

Copy link
Copy Markdown
Contributor Author

@alam0rt What build of containerd are you using? I was wondering how the types could conflict since a panic will occur with multiple registrations. However, typeurl, gogoproto, and protobuf packages each have their own global registrations. I'm wondering why your build has this type registered while others don't seem to.

Hey @dmcgowan, we aren't using anything to esoteric I don't think

# containerd github.com/containerd/containerd v1.7.5 https://github.com/containerd/containerd/commit/fe457eb99ac0e27b3ce638175ef8e68a7d2bc373

$ runc --version
runc version 1.1.8
commit: v1.1.8-0-g82f18fe0
spec: 1.0.2-dev
go: go1.20.3
libseccomp: 2.5.4

@alam0rt

alam0rt commented Oct 19, 2023

Copy link
Copy Markdown
Contributor Author

Any guidance on this would be great - not sure why we are seeing this error considering the code seems fine + we are running a proper release

@k8s-ci-robot

Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Champ-Goblem

Copy link
Copy Markdown
Contributor

Hi @alam0rt, @dmcgowan, @kiashok, @henry118
I would like to flag this patch as being important, I am currently seeing issues with stats when using Containerd 1.7.x with default runC (and gVisor), however when switching back to any of the 1.6.x releases this problem is resolved.
I have also checked the stats returned from the various shims and they line up with the expected fields required by Containerd.

I also see that ctr currently uses the suggested method by @alam0rt to parse the metrics results from the shim API (see here), so it would be a good idea to keep the implementations similar anyway?

@github-actions

github-actions Bot commented May 1, 2024

Copy link
Copy Markdown

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed.

@github-actions github-actions Bot added the Stale label May 1, 2024
@github-actions

github-actions Bot commented May 9, 2024

Copy link
Copy Markdown

This PR was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions Bot closed this May 9, 2024
@Iceber

Iceber commented Jul 11, 2024

Copy link
Copy Markdown
Member

I would suggest reopening this pr as the issue still exists and this pr fixes it.

Unmarshal io.containerd.cgroups.v1.Metrics with UnmarshalAny in 1.7+ (don't track which version it started with, but the latest version still has this issue) will first be parsed as github.com/containerd/cgroups/stats/v1.Metrics.

If UnmarshalAny in a unit test it shouldn't have this problem because it's not importing the wrong Metrics

This is due to hcsshim causing the dependency to have both github.com/containerd/cgroups/stats/v1/ and . /vendor/github.com/containerd/cgroups/v3/cgroup1/stats

How to reproduce this issue:
env:

  • containerd release/1.7 or 1.7.19
  • cgroup v1

steps:

  1. configure /etc/containerd/config.toml to enable metrics
[metrics]
  address = "127.0.0.1:7000"
  1. curl 127.0.0.1:7000/v1/metrics | grep ^container_
  2. You will notice that there is no container_* indicator.
  3. Check the logs of containerd, error log -> {"error":null,"level":"error","msg":"invalid metric type for ...

@Iceber Iceber reopened this Jul 11, 2024
@Iceber

Iceber commented Jul 11, 2024

Copy link
Copy Markdown
Member

This pr should rebase though, and it's not the issue in 2.0 because there aren't two v1/stat.Metrics

@github-actions github-actions Bot removed the Stale label Jul 12, 2024
@github-actions

Copy link
Copy Markdown

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed.

@alam0rt

alam0rt commented Oct 14, 2024

Copy link
Copy Markdown
Contributor Author

Oh my bad I didn't see this get reopened. I'll rebase it if required

@github-actions github-actions Bot removed the Stale label Oct 15, 2024
@Iceber

Iceber commented Oct 15, 2024

Copy link
Copy Markdown
Member

Hi @alam0rt , I used a new pr to follow up on this fix and you are the core author.

I'm going to close this pr.

@Iceber Iceber closed this Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants