Skip to content

Use Windows matcher when on Windows platform in all code paths#6491

Merged
estesp merged 1 commit intocontainerd:mainfrom
jsturtevant:windows-platforms
Feb 18, 2022
Merged

Use Windows matcher when on Windows platform in all code paths#6491
estesp merged 1 commit intocontainerd:mainfrom
jsturtevant:windows-platforms

Conversation

@jsturtevant
Copy link
Contributor

When looking into containerd/nerdctl#751 I found that the platform Normalization function is dropping the OSVersion because it was marked as depreciated but this value is used throughout the windows

OSVersion: fmt.Sprintf("%d.%d.%d", major, minor, build),

There maybe more going on here than just this change. For example, this may be why this code uses Ordered here and then adds the osVersionPrefix.

This change does make it so that nerdctl will work with multiarch manifests but there may need to be more work done here.

@k8s-ci-robot
Copy link

Hi @jsturtevant. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@AkihiroSuda
Copy link
Member

/ok-to-test

@jsturtevant
Copy link
Contributor Author

/hold
I was testing some more and this may not solve it fully

@kevpar
Copy link
Member

kevpar commented Jan 31, 2022

This looks good based on the previous investigation I did: #6168 (comment). Although we probably don't want to unset OSFeatures either.

Copy link
Member

@kzys kzys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests? Windows' platform handling seems really tricky...

@jsturtevant
Copy link
Contributor Author

This looks good based on the previous investigation I did: #6168 (comment). Although we probably don't want to unset OSFeatures either.

There is a bit more going on. I have a partial fix that I can push up but it caused some other issues around LOCW support on windows. The LCOW support doesn't really work since the matcher would never pass the osPrefix version check

The main thing is on Windows some code paths (such as image pulling) are using the windows matcher and other it is using linux matcher. This is because the Ordered function news up a matcher that will be Linux default matcher that doesn't include OSVersion Prefix matching.

In the case of Windows image pull scenario with ctr this means it matched and pulled both Windows version as described in containerd/nerdctl#751 (comment: If it helps, I noticed something weird when I run nerdctl.exe images. When I run the command, it prints the container image twice for some reason:)

@jsturtevant
Copy link
Contributor Author

I added 5ae90c6 commit to demonstrate my comments in #6491 (comment)

In particular that:

  • The LCOW support doesn't really work since the matcher would never pass the osPrefix version check
  • This is because the Ordered (and only) function news up a matcher that will be Linux default matcher that doesn't include OSVersion Prefix matching.

@thaJeztah
Copy link
Member

I just opened a PR to address the Normalize() option (I overlooked your PR 😬 😂) #6497

The LCOW support doesn't really work since the matcher would never pass the osPrefix version check

Would LCOW use the default matcher though? Given that in the LCOW case, the container effectively runs "remotely", wouldn't it require a matcher that's created based on that environment?

},
},
{
platform: fmt.Sprintf("windows/amd64//%s", buildStr+".1"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly orthogonal, but wondering if the string representation for osversion (and features) was already "formalised" . I recall there were some discussion about this in the past (but don't recall the outcome). We should probably have some of this added to the OCI specs at some point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if this valid string. The parser doesn't actually work in this case but was trying to demonstrate this isn't really handled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Yes, I also was looking at Parse() not yet supporting it (and left that for a later exercise).

Empty // may work for missing components (but could potentially be problematic if someone (eg) uses path.Join() to construct the string).

Perhaps there's alternatives (/-/ for explicitly omitting a field, or /*/); in any case, likely something for further discussion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussing with someone else, and an alternative for the format could be os/arch/variant@os-version[+feature+feature...]

(we should look at the + symbol though, in case it may be used in URLs (where + can be ambiguous)

@jsturtevant
Copy link
Contributor Author

Would LCOW use the default matcher though? Given that in the LCOW case, the container effectively runs "remotely", wouldn't it require a matcher that's created based on that environment?

This might be the case but I not sure every architecture/variant of Linux will run on Windows so we may need a custom one? Maybe @kevpar knows?

@thaJeztah
Copy link
Member

but I not sure every architecture/variant of Linux will run on Windows

At least linux/<host arch>[/variant] should work; depending on wether or not quemu (binfmt) is installed, other architectures may be "functional" as well.

But I guess @kevpar and/or @katiewasnothere may have more insight

@jsturtevant
Copy link
Contributor Author

jsturtevant commented Feb 1, 2022

I've updated to use the appropriate matcher on a given OS. On Windows it will use a Windows Matcher which uses the default matcher and only adds the additional checks if the platform specific is Windows. Otherwise, the behavior should be the same which is to use the default matcher. This should cover the comment #6491 (comment)

@jsturtevant jsturtevant changed the title OSversion is used by Windows and should not be dropped Use Windows matcher when on Windows platform in all code paths Feb 2, 2022
@jsturtevant
Copy link
Contributor Author

This last commit works with the original issue reported (containerd/nerdctl#751 )

PS C:\> .\nerdctl.exe run --rm --net none hello-world:latest
docker.io/library/hello-world:latest:                                             resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:507ecde44b8eb741278274653120c2bf793b174c06ff4eaa672b713b3263477b:    exists         |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:b17da88a43678dd8de2c335b977330a5f4e3f1175251a4204f54ef2ed1360709: exists         |++++++++++++++++++++++++++++++++++++++|
config-sha256:bbca594eee38d2bcbda18490d2667560b2452e74e73b8c26bddea59cb6920ad1:   done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.9 s                                                                    total:   0.0 B (0.0 B/s)

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (windows-amd64, nanoserver-1809)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run a Windows Server container with:
 PS C:\> docker run -it mcr.microsoft.com/windows/servercore:ltsc2019 powershell

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

PS C:\> cmd /c ver

Microsoft Windows [Version 10.0.17763.2452]
PS C:\> .\nerdctl.exe --version
nerdctl version 0.16.0-39-g2f0cf75.m

@dmcgowan dmcgowan added this to the 1.6 milestone Feb 2, 2022
Match(platform specs.Platform) bool
}

// NewMatcher returns a simple matcher based on the provided platform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to have the exported function in the non-platform specific file then call into non-exported implementations. This will also need an other implementation besides just Linux and Windows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also need an other implementation besides just Linux and Windows

I've updated to include other implementations, it uses the same defaults as previous except Windows.

Best to have the exported function in the non-platform specific file then call into non-exported implementations

I don't think I fully understood what I should do here. could you explain a bit more? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewMatcher is an exported function. So it should be in platforms.go which calls the un-exported newDefaultMatcher in the various platforms_.go files. But actually thats another point. Why is NewDefaultMatcher exported? There are no callers right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I see what I did and I've updated

Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jsturtevant
Copy link
Contributor Author

I've addressed all the feedback and this is ready for review.

The node test failure was a Prow pod timeout. not sure if the same commands work here:
/retest

@thaJeztah
Copy link
Member

One thing I just realised; so this changes the default on windows to match on architecture and os-version, correct? I think that's correct for containers running with process isolation, but would this be an issue when running with hyper-v isolation? (not sure of that's supported by containerd, but I know dockerd allows switching)

@jterry75
Copy link
Contributor

@kevpar - Is there such a thing as x86 hypervisor isolated on Windows? I've never seen such an image to my knowledge. Would Seb's comment actually be possible?

@TBBle
Copy link
Contributor

TBBle commented Feb 11, 2022

Hyper-V isolation is going to be messy... You'd have to expose a flag to ctr images pull and any similar command telling it whether you planned on using the resulting pulled image in process isolation or Hyper-V isolation, so it could tell the matcher which rules to apply to accept an image from the named index.

Or you could always pull via the Hyper-V matching rules (a superset of the process isolation matching rules), and warn the user if the resulting image choice would not work in process isolation.

Both of those options seem undesirable to me. The least-awful option might well be: Note in the documentation that if you want to use Hyper-V isolation to run an older image than your current host, pull the specific image you want to use, not an image index.

That said, since the client code can provide its own matcher, perhaps this is a client thing to implement? Defaulting to process-isolation rules seems fine to me.

Edit: Reading the above comments, I realise I'm assuming we're talking about Hyper-V isolation in regards to wider version matching. I'd never even considered the idea of cross-architecture support. I didn't think x86 would even be a thing in containers (32-bits is so last millennium...) and I don't expect to see Arm64 Windows Server outside my most feverish of dreams for a while, yet.

@jterry75
Copy link
Contributor

Yea, ..., no good answer here imo. I think I agree with @thaJeztah earlier comment. Windows should match Windows process execution by default. Only the caller has enough context to know the intent of the container that will be scheduled next. IE: The caller knows if the container config is isolated or not. But it does not tell that to containerd when pulling images. But it CAN by providing a different matcher when pulling images. So I think the right default is Windows process works as expected, and if hypervisor is needed caller can control it.

@jsturtevant
Copy link
Contributor Author

so this changes the default on windows to match on architecture and os-version, correct?

The windows matcher did match on os-version already but wasn't being used in every code path. For example, os-version was used in cri apis and on ctr run but not ctr pull.

If a client used the Only() or Ordered() api functions then it would use a different matcher (a matcher that did not use os-version). This was an issue for nerdctl becuase it is using those Only/Ordered API's.

I am under the impression that Hyper-v containers are not supported right now and agree that we should use Windows Process as the default and leave it up the client to provide a different matcher for now. Maybe @dcantah or @kevpar knows more about the state of hyper-v contianers.

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jterry75
Copy link
Contributor

so this changes the default on windows to match on architecture and os-version, correct?

The windows matcher did match on os-version already but wasn't being used in every code path. For example, os-version was used in cri apis and on ctr run but not ctr pull.

If a client used the Only() or Ordered() api functions then it would use a different matcher (a matcher that did not use os-version). This was an issue for nerdctl becuase it is using those Only/Ordered API's.

I am under the impression that Hyper-v containers are not supported right now and agree that we should use Windows Process as the default and leave it up the client to provide a different matcher for now. Maybe @dcantah or @kevpar knows more about the state of hyper-v contianers.

SGTM

@estesp estesp merged commit c0bbaf5 into containerd:main Feb 18, 2022
@jterry75
Copy link
Contributor

jterry75 commented Mar 1, 2022

@jsturtevant - Any chance you have seen this?

PS C:\Users\Administrator\Desktop\Tools> ctr image ls
time="2022-03-01T18:40:26Z" level=error msg="failed calculating size for image mcr.microsoft.com/windows/servercore:10.0.17763.2458" error="no match for platform in manifest: not found"
time="2022-03-01T18:40:26Z" level=error msg="failed calculating size for image mcr.microsoft.com/windows/servercore:10.0.17763.2565" error="no match for platform in manifest: not found"
time="2022-03-01T18:40:26Z" level=error msg="failed calculating size for image mcr.microsoft.com/windows/servercore:ltsc2019" error="no match for platform in manifest: not found"
REF                                                       TYPE                                                      DIGEST                                                                  SIZE    PLATFORMS     LABELS
mcr.microsoft.com/windows/servercore:10.0.17763.2458      application/vnd.docker.distribution.manifest.list.v2+json sha256:7d370baf7e097a20bdd52d5a214fcad25f83caccca049cced5d5eb64d659c45e 480.0 B windows/amd64 -
mcr.microsoft.com/windows/servercore:10.0.17763.2565      application/vnd.docker.distribution.manifest.list.v2+json sha256:08968b1a61ea651efa830ade84c332df560bf5b551a94bb065c128989deb346d 480.0 B windows/amd64 -
mcr.microsoft.com/windows/servercore:1809-KB5010791-amd64 application/vnd.docker.distribution.manifest.v2+json      sha256:80b95537f26ad6f44e9d3675a0f4be3a076364dd1f988b7f0ceb3190eb31115d 2.5 GiB windows/amd64 -
mcr.microsoft.com/windows/servercore:ltsc2019             application/vnd.docker.distribution.manifest.list.v2+json sha256:d8c2b69511a49e9eb6fd2a17f2727ae011b0382cd11562fa5cb4c70bb460cf12 480.0 B windows/amd64 -

Seems related to the change. But I can look closer if you are unaware.

Oddly Images pull just fine, and unpack just fine.

Here is for the gen index:

ctr image pull mcr.microsoft.com/windows/servercore:1809-KB5010791
mcr.microsoft.com/windows/servercore:1809-KB5010791:                              resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:1cff1297b7aa2c8b51d593609708324920dff6e40e65ca60bf2cd497ca7a2258:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:80b95537f26ad6f44e9d3675a0f4be3a076364dd1f988b7f0ceb3190eb31115d: exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:567fd00846e9a9f44eea5925b497356dda00fe89b8335d2a3b2a8b9d84b0bb69:    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:629047deab7bae31fc8e9398ad08c7685f56178c31e65f0ada91b5a99cef5149:   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4612f6d0b889cad0ed0292fae3a0b0c8a9e49aff6dea8eb049b2386d9b07986f:    done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 1.0 s                                                                    total:  480.0  (473.0 B/s)
unpacking windows/amd64 sha256:1cff1297b7aa2c8b51d593609708324920dff6e40e65ca60bf2cd497ca7a2258...
ctr: no match for platform in manifest sha256:1cff1297b7aa2c8b51d593609708324920dff6e40e65ca60bf2cd497ca7a2258: not found

You can see the above warning.

Here is for the -amd64 index that works just fine:

ctr image pull mcr.microsoft.com/windows/servercore:1809-KB5010791-amd64
mcr.microsoft.com/windows/servercore:1809-KB5010791-amd64:                        resolved       |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:80b95537f26ad6f44e9d3675a0f4be3a076364dd1f988b7f0ceb3190eb31115d: exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:567fd00846e9a9f44eea5925b497356dda00fe89b8335d2a3b2a8b9d84b0bb69:    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:629047deab7bae31fc8e9398ad08c7685f56178c31e65f0ada91b5a99cef5149:   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4612f6d0b889cad0ed0292fae3a0b0c8a9e49aff6dea8eb049b2386d9b07986f:    done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.2 s                                                                    total:   0.0 B (0.0 B/s)
unpacking windows/amd64 sha256:80b95537f26ad6f44e9d3675a0f4be3a076364dd1f988b7f0ceb3190eb31115d...
done: 20.0951ms

@jsturtevant
Copy link
Contributor Author

I am not, my suspicion is the matcher or string to match with is wired up wrong for that code path in ctr. This looks similar to what I was seeing nerdctl when I turned on debug mode.

As a point of reference, We have a nightly job for containerd which has been running well since this change merged https://testgrid.k8s.io/sig-windows-master-release#aks-engine-windows-containerd-nightly

@jterry75
Copy link
Contributor

jterry75 commented Mar 1, 2022

Oh, I see. It looks like we have broken the unpacker for platform agnostic variants when the Host os doesn't match (meaning won't unpack so Xenon cant even run). So for example:

On WS2022:

ctr image pull mcr.microsoft.com/windows/servercore:ltsc2019
ctr image pull mcr.microsoft.com/windows/servercore:ltsc2019
mcr.microsoft.com/windows/servercore:ltsc2019:                                    resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:d8c2b69511a49e9eb6fd2a17f2727ae011b0382cd11562fa5cb4c70bb460cf12:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:596c8a02a8e73b53688052907d121879357d2a9208700306bafaa1ffb533cbab: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:1bd78008c728d8f9e56dc2093e6eb55f0f0b1aa96e5d0c7ccc830c5f60876cdf:    exists         |++++++++++++++++++++++++++++++++++++++|
config-sha256:c703d4d605332bcb954f514c7d2c6cd53ee51024d7a7a3f13eb907abbeedbca2:   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4612f6d0b889cad0ed0292fae3a0b0c8a9e49aff6dea8eb049b2386d9b07986f:    exists         |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.6 s                                                                    total:  1.3 Ki (2.2 KiB/s)
unpacking windows/amd64 sha256:d8c2b69511a49e9eb6fd2a17f2727ae011b0382cd11562fa5cb4c70bb460cf12...
ctr: no match for platform in manifest sha256:d8c2b69511a49e9eb6fd2a17f2727ae011b0382cd11562fa5cb4c70bb460cf12: not found

But if you add the -amd64

ctr image pull mcr.microsoft.com/windows/servercore:ltsc2019-amd64
mcr.microsoft.com/windows/servercore:ltsc2019-amd64:                              resolved       |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:596c8a02a8e73b53688052907d121879357d2a9208700306bafaa1ffb533cbab: exists         |++++++++++++++++++++++++++++++++++++++|
config-sha256:c703d4d605332bcb954f514c7d2c6cd53ee51024d7a7a3f13eb907abbeedbca2:   exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:1bd78008c728d8f9e56dc2093e6eb55f0f0b1aa96e5d0c7ccc830c5f60876cdf:    exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4612f6d0b889cad0ed0292fae3a0b0c8a9e49aff6dea8eb049b2386d9b07986f:    exists         |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.2 s                                                                    total:   0.0 B (0.0 B/s)
unpacking windows/amd64 sha256:596c8a02a8e73b53688052907d121879357d2a9208700306bafaa1ffb533cbab...
done: 3m12.8486039s

The mcr.microsoft.com/windows/servercore:ltsc2019 is actually a manifest list which I why I think it breaks for only that variant.

@jterry75
Copy link
Contributor

jterry75 commented Mar 1, 2022

@TBBle - I've already forgotten :(. Did we decide it was "ok" to break hv isolated pulls when the host os doesnt match the image hahaha? And that to do this would require the caller to pass in a custom matcher? I think my brain is giving out

@jsturtevant
Copy link
Contributor Author

#6491 (comment) 😄

The fact that it worked before was accidental IMO. I am still not sure if hyper-v upstream is fully functional, talking with @dcantah it sounds like it isn't

@kevpar
Copy link
Member

kevpar commented Mar 1, 2022

My $0.02:

I think for cases like nerdctl and ctr we should probably have separate flag(s) to control the exact matching behavior so that cases like hypervisor-isolation can work. In CRI my view is we should add fields to Runtime to control image matching behavior. We may need to extend CRI's PullImage to take a runtime name, but that would allow pulling an image for a specific runtime, and the runtime controls what image variant is pulled.

One difficulty is last I checked containerd doesn't have great support for multi-arch images. I think there were places where it's assumed we're only working with a single platform at a time.

@jterry75
Copy link
Contributor

jterry75 commented Mar 1, 2022

#6491 (comment) 😄

The fact that it worked before was accidental IMO. I am still not sure if hyper-v upstream is fully functional, talking with @dcantah it sounds like it isn't

So funny.

@jterry75
Copy link
Contributor

jterry75 commented Mar 1, 2022

@kevpar - Agreed. I think when I get around to it I will add something like:
ctr image pull --isolated <> or something tricky like that to indicate that I'm ok with some set of partial matches that honor how hypervisor compat is supposed to work.

@TBBle
Copy link
Contributor

TBBle commented Mar 2, 2022

Yeah, that's what I took away from the conversation as well: Take Linux out of the containerd default Windows-host matcher, and let tools that are happy to support Hyper-V isolation and/or WCOW (i.e. Xenon aka UVM) provide a matcher to suit.

It might be feasible to add a function that creates a Xenon-capable matcher, to save some code duplication in clients, and hence centralise any particular complex or specific rules that might appear. It might also make sense to put something like that in hcsshim or other central-but-Windows-host-specific repo instead.

But that assumes a tool that wants to integrate Xenon wants to accept both kinds of matches, so maybe it's not that valuable after all.

For CRI, I noticed that ImageSpec given to PullImage specifically describes its Annotation field for almost this use-case:

    // Unstructured key-value map holding arbitrary metadata.
    // ImageSpec Annotations can be used to help the runtime target specific
    // images in multi-arch images.
    map<string, string> annotations = 2;

It does make sense that if you are using Runtime to choose Isolation mode and other config options, then that should be visible to PullImge too; the backend can ignore the Runtime in the case where it doesn't matter. It could probably be plumbed through the Annotations pending the next spec update.

@jterry75
Copy link
Contributor

jterry75 commented Mar 2, 2022

@TBBle - That's exactly why we added annotations to the CRI Image :). Glad it's finally paying off.

I think we have consensus haha. We are good to go with the code as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.