Skip to content

go.{mod,sum}: update NRI dependency, fixing a potential fd double close error.#9743

Merged
fuweid merged 1 commit intocontainerd:mainfrom
klihub:fixes/nri-fd-double-close
Feb 7, 2024
Merged

go.{mod,sum}: update NRI dependency, fixing a potential fd double close error.#9743
fuweid merged 1 commit intocontainerd:mainfrom
klihub:fixes/nri-fd-double-close

Conversation

@klihub
Copy link
Copy Markdown
Member

@klihub klihub commented Feb 3, 2024

Update our NRI dependency. This fixes a potential fd double close bug (found and fixed by @champtar), occasionally seen in the wild when running with runtime-launched NRI plugins in place, and the node under high I/O load (for instance system boot).

Fixes #8860

go.mod Outdated
github.com/containerd/platforms v0.1.1
github.com/containerd/plugin v0.1.0
github.com/containerd/ttrpc v1.2.2
github.com/containerd/ttrpc v1.2.3-0.20231030150553-baadfd8e7956
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have done a release of ttrpc before the nri release? Was this a requirement for nri to work? https://github.com/containerd/nri/blob/v0.6.0/go.mod#L6

I see it was updated in containerd/nri@f600cf6 along with other dependencies.

Mostly wondering what the minimal requirement is for nri - for libraries, we should strive to stick to Go modules "MVS" (minimum version selection), and leave it to consumers to pick versions above that where needed (e.g. the golang.org/x/sys version was originally put at v0.1.0 as the module did not depend on anything newer than that).

Copy link
Copy Markdown
Member Author

@klihub klihub Feb 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last non-bugfix commit in NRI (45b9e3f1437f) was to allow otel trace context propagation over ttrpc (for which I am about to file another PR). For that 40f227ddbb9e in ttrpc would be enough, since everything above that are depdendabot version bumps.

I guess that means that ideally we should have tagged a new ttrpc release at that sha1, right ? In general, we've not been very eager with tagging ttrpc in the past, so we tend to have sha1-exact requirements for ttrpc in NRI.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe it's not a "big deal" in this case;

But the thing I try to keep in mind is that updating the versions in library modules means (in a Go modules world); this is the absolute minimal version that can be used. And that every consumer of the module MUST update all those dependencies to at least the given version.

When setting those minimum versions too optimistic "let's use latest!" that can have a ripple effect, which can be problematic if those are common dependencies, and while nri (in this case) itself may not be affected by other changes in those dependency-modules (because it may not be using those other parts), consumers of nri (or containerd, or some other code depending on containerd) could be affected, and now being forced to update those dependencies (and so on).

So for library modules, I try to keep required versions as low as possible; doing so allows consumers (such as containerd) to pick the version they need, and having those versions "low" also keeps the module more flexible (e.g. if there's a need, we can update the version in the 1.7 release branches if there's a fix in nri, without that resulting in all other dependencies to also have to be updated).

w.r.t. the tag for ttrpc; it's good practice to (where reasonably possible) only use tagged versions in (non-pre-)releases.

Technically, v1.2.3-0.20231030150553-baadfd8e7956 is an unreleased (and thus unstable version) of a dependency. Changes can be made before the next release, and with Go modules requiring SemVer, the version of that release is not yet known until it's released (you don't know versions ahead of time, because the changes in code dictate wheter the "major", "minor", or "patch" version must be updated, so it could become v1.2.3, v1.3.0 or if there's breaking changes, it could even become v2.0.0). This can be a risk. In this case we "control" the module, so we have more influence over what happens, but I know that distro packagers in most cases do not use unreleased versions, so will pick v1.2.2 instead (NOT the version we specified), which has led to packages being broken (sometimes in subtle ways that do not immediately surface).

Copy link
Copy Markdown
Member Author

@klihub klihub Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe it's not a "big deal" in this case;

But the thing I try to keep in mind is that updating the versions in library modules means (in a Go modules world); this is the absolute minimal version that can be used. And that every consumer of the module MUST update all those dependencies to at least the given version.

When setting those minimum versions too optimistic "let's use latest!" that can have a ripple effect, which can be problematic if those are common dependencies, and while nri (in this case) itself may not be affected by other changes in those dependency-modules (because it may not be using those other parts), consumers of nri (or containerd, or some other code depending on containerd) could be affected, and now being forced to update those dependencies (and so on).

So for library modules, I try to keep required versions as low as possible; doing so allows consumers (such as containerd) to pick the version they need, and having those versions "low" also keeps the module more flexible (e.g. if there's a need, we can update the version in the 1.7 release branches if there's a fix in nri, without that resulting in all other dependencies to also have to be updated).

w.r.t. the tag for ttrpc; it's good practice to (where reasonably possible) only use tagged versions in (non-pre-)releases.

Technically, v1.2.3-0.20231030150553-baadfd8e7956 is an unreleased (and thus unstable version) of a dependency. Changes can be made before the next release, and with Go modules requiring SemVer, the version of that release is not yet known until it's released (you don't know versions ahead of time, because the changes in code dictate wheter the "major", "minor", or "patch" version must be updated, so it could become v1.2.3, v1.3.0 or if there's breaking changes, it could even become v2.0.0). This can be a risk. In this case we "control" the module, so we have more influence over what happens, but I know that distro packagers in most cases do not use unreleased versions, so will pick v1.2.2 instead (NOT the version we specified), which has led to packages being broken (sometimes in subtle ways that do not immediately surface).

@thaJeztah That's nice recap/reminder about module dependency management. And I have to say that I've probably failed constantly/consistently on that front, thoughtlessly giving in to the "let's just use the latest" urge. At least I have never consciously considered these aspects in such an explicitly articulated manner.

Hmm, and maybe already that baad sha1-prefix in the offending v1.2.3-0.20231030150553-baadfd8e7956 dependency was the universe trying to tell me something but I was just not listening...

So would this be a decent corrective course of actions ?

  1. tag ttrpc baadfd8e7956 as v1.3.0 (v1.2.2..40f227ddbb9e contains backward-compatible changes/additions to the public API)
  2. update NRI's ttrpc dependency to v1.3.0 and tag it as v0.6.1 (only dependency changes)
  3. update this PR to require NRI v0.6.1 and ttrpc v1.3.0

WDYT ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw v1.2.3 was tagged, and opened containerd/nri#71

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I thought that semver-compatible versioning would have required us to bump the minor version number (going to 1.3.0), since we've added new public functions since 1.2.2.

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed, I think that should've been v1.3.0, but I guess it's released, so not sure if that can still be fixed, other than doing a v1.2.4 with that change reverted (possibly retracting v1.2.3), and a v1.3.0 after that 😬 (cc @dmcgowan)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we shouldn't do that. The cat's out of the bag. I just asked to make sure my impression was right.

Pull in latest NRI fixing a potential fd double close error.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
@klihub klihub force-pushed the fixes/nri-fd-double-close branch from 766242a to 4e8e21a Compare February 6, 2024 12:03
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fuweid fuweid added this pull request to the merge queue Feb 7, 2024
Merged via the queue into containerd:main with commit 805ed8e Feb 7, 2024
@champtar
Copy link
Copy Markdown
Contributor

champtar commented Feb 7, 2024

Can we backport this to 1.7 ?

@fuweid fuweid added the cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch label Feb 7, 2024
@fuweid
Copy link
Copy Markdown
Member

fuweid commented Feb 7, 2024

@champtar Sounds good to me

@klihub klihub deleted the fixes/nri-fd-double-close branch February 7, 2024 12:06
@champtar
Copy link
Copy Markdown
Contributor

champtar commented Feb 7, 2024

@klihub will you take care of the cherry-pick or should I do it ?

@klihub
Copy link
Copy Markdown
Member Author

klihub commented Feb 7, 2024

@klihub will you take care of the cherry-pick or should I do it ?

@champtar I can do it. Gimme a sec and I file it if everything goes fine with it...

@AkihiroSuda AkihiroSuda added cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

【Bug】crictl exec failed

6 participants