ci: Add conditional CI for the containerd backed image store#46402
ci: Add conditional CI for the containerd backed image store#46402rumpl wants to merge 2 commits intomoby:masterfrom
Conversation
8841e54 to
9e9af81
Compare
|
Note: we can't do the same in jenkins (filter on the label), we don't have the required plugins that would let us do that. We could have a check on the branch name perhaps? Check that the branch contains Another possibility is to run the unit/integration tests (~15 minutes run) for all pull requests, but make it pass no matter what. Any preferences? |
9e9af81 to
423b4ee
Compare
True, but these new jobs don't run for all PRs, only for those that they are needed |
This is true... for now. At one point it would be great to test both graph drivers and containerd snapshotters at the same time (once the CI is green). |
.github/workflows/.test.yml
Outdated
| use_snapshotter: | ||
| required: true | ||
| type: boolean | ||
| graph_driver: |
There was a problem hiding this comment.
| graph_driver: | |
| storage_driver: |
Since the flag is actually --storage-driver.
423b4ee to
4883e9d
Compare
955bd80 to
c194553
Compare
|
I don't think it's actually testing the containerd image store. (Very late edit: I meant on Windows here, but didn't write it....) When I tried something similar (see master...TBBle:moby:hack/containerd_image_store^) I found that the env-var Anyway, you should see Another sign it's not using the snapshotter is that it should have failed all of the tests during setup due to (I think) the wildly incorrect WCOW mount code in containerd 1.6 (which I contributed, AFAIR) vendored into Docker. That was replaced with good code in containerd 1.7.2, so I think getting anything meaningful out of the testsuite will depend on #45966. You can also see a couple of quick-hack code changes I had to get the daemon even start, to deal with two issues: One was that when the containerd storage backend is enabled, the network setup code for builder-next assumes we're on Linux and hence uses an explicit network mode instead of "auto". The other is that the graph driver name (also used to select snapshotter) is different between Docker ( I also added the matrix in a different spot, but that should not affect any of the above. Side thought: Are we actually running the Linux build through CI with |
I did at one point manage to make it build/run and run a container: But I never pushed the code because it was a quick hack to see what's needed
For Linux we have a draft PR that @vvoland opened and rebases from time to time, which is not ideal. That's why I opened this PR. I didn't update Jenkins because we don't have the plugins needed to filter with PR labels as I do in this PR. So basically, we don't yet test on linux but this PR changes that.
Thanks, let me double check that! |
|
The logs from the relevant run have expired, but the problem I was alluding to is triggered by the Docker integration test suite setup code, not usual container-running operations. I assume |
4767350 to
85b8def
Compare
|
I have rebased this PR and removed the windows part, at least with this we will have CI run on linux/amd64 |
85b8def to
c852725
Compare
TBBle
left a comment
There was a problem hiding this comment.
Comments, but nothing majorly blocking as far as I can see, just some possible opportunities to reduce future work a little bit (and a typo).
My only real concern here is the artifact/code-cov question. The other questions/ideas are Windows-specific and since Windows isn't being enabled here, they can be punted to whoever does that, if you prefer.
| If ("${{ inputs.use_snapshotter }}" -eq "true") { | ||
| echo "TEST_INTEGRATION_USE_SNAPSHOTTER=1" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append | ||
| echo "DOCKER_GRAPHDRIVER=windows" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append | ||
| } |
There was a problem hiding this comment.
Unless things have changed since last time I looked, this change won't have any effect:
- The daemon gets its env-vars from the service setup, created during the
register-servicecall, and Go'sCreateServiceAPI doesn't appear to support populating env-vars. If there was a command-line option for this, that would work, as the command-line passed to the daemon during--register-serviceis bundled into the service configuration. The best option appears to be editing the service definition between registration and starting it. (Edit: The FIXME comment was added after I wrote this, so at least it's explicit this doesn't currently work now) - The daemon doesn't honour
DOCKER_GRAPHDRIVERorDOCKER_DRIVERon Windows. I'm not sure if there's any intention to do so, either, but it might be valuable once containerd image store works, as there's possibly going to be multiple Windows snapshotters available.
So I'd suggest moving this block down between the Start-Process ... --register-service and the Start-Service calls now, and dropping DOCKER_GRAPHDRIVER, as that'll have to be done later anyway. I'm not sure if the test suite actually cares about TEST_INTEGRATION_USE_SNAPSHOTTER, but no harm in setting it when also setting the registry key for the service, I guess? (I did so in my version, but don't recall if I checked for it having any effect or not)
| use_snapshotter: # always false for now until we update containerd | ||
| required: false | ||
| type: boolean | ||
| default: false |
There was a problem hiding this comment.
I think we should skip the runtime==builtin axis of the two matrix jobs in this file, when this is true. use_snapshotter makes no sense when not using containerd, and since it's started from a separate optional job, simply ignoring use_snapshotter in that case will needlessly duplicate the integration test run (and fail).
I'm not sure how to do that in GitHub Workflows though. Maybe you can exclude from the matrix using the job inputs?
There was a problem hiding this comment.
Buildkit has something like this https://github.com/moby/buildkit/blob/master/.github/workflows/validate.yml#L22-L60
I'm sure I can find something with a little help from @crazy-max
There was a problem hiding this comment.
Also, we can do this in a followup since we are not calling this with containerd snapshotters enabled any more
There was a problem hiding this comment.
I just looked more-closely at your link to BuildKit, and wow, today I learned that GitHub Actions lets you do crimes.
But yeah, should be feasible to mess with the matrix to insert an appropriate exclude clause based on the workflow inputs.
|
For the record, I was curious how the test suite would go on Windows now that containerd 1.7.6 has been vendored, so I rebased my version of this PR onto the mainline branch. Sadly, it still fails at the first action during busybox build ( Merged logs of failed stepBut at least it's pretty easy to get it into a "testable" state, as the only CI things in my branch that this PR doesn't have is the TODO-marked Windows enablement. I might rebase my branch onto this PR later to verify that, and provide a base for further exploration, if I find time to play with it before this PR is merged. Edit: I did the rebase, it was relatively simple. Manually triggered a Windows 2022 run and it produces the same result as above. @rumpl if you wish to steal TBBle@7d4ffa1 or part thereof for this PR, you're welcome to. If not, that's fine, it can sit in my branch, and once this PR lands, I'll probably put it up in a draft PR for visibility along with whatever hacks I have to get the Windows-side (closer) to parity. Final edit: With enough hacks, I got that branch to actually run the testsuite on Windows. Still waiting for the last few to complete, but it looks like it'll be pretty close to the pass-rate of Linux (from this PR), which is nice. So there'll be at least a draft PR (due to all the hacks) for the Windows side of this coming in the future, once this one lands. Final final edit: #46539 created, since it's now in a good-enough state for initial review, and is mostly pending on this PR to merge. Post-edit: That 60% guess was wrong, it was calling a nil function-type value and panicking. |
48dad8c to
711982e
Compare
|
For future reference / future self; we discussed this PR in Yesterday's maintainers call;
I should also add that this is a large diff, and "lotsa YAML", so I'll give it another look, but in doing so will somewhat depend on prior reviews of those that already happened, and I'll ask @crazy-max (as our in-house GHA guru) to give it a final "once-over". |
711982e to
d639855
Compare
| @@ -0,0 +1,32 @@ | |||
| name: snapshotter-test | |||
There was a problem hiding this comment.
Since it's an analogue with test but using the snapshotter, should it be test-snapshotter? Then it should sort adjacent-to but below the test workflow, I hope.
This occurred to me when I was thinking about whether we'd want to do split out the snapshotter test job for windows-2022 the same way, and if so, what to call it.
Also, thinking about future naming if and when the snapshotter becomes the default, i.e. in test, what will the non-snapshotter test be named? non-snapshotter-test is very English-like, but sorts weirdly, and the most important aspect (test) is way over at the end.
There was a problem hiding this comment.
We could call them test-graphdriver and test-snapshotter
There was a problem hiding this comment.
We could call them
test-graphdriverandtest-snapshotter
Works for me 👍
(GitHub grouping / listing checks remains horrible in either case 😂)
|
Thanks for splitting! Gonna have a look; one quick thing if you're updating/pushing; there's a typo in the first commit ( |
20126b0 to
24cf1a6
Compare
Moving these jobs to reusable workflows to declutter the test.yaml workflow, this will also help reuse these workflows to run tests with the container-snapshotter feature enabled Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
This workflow will run all the tests (unit, integration and integration-cli) with the containerd-snapshotter feature enalbed. We only run these tests for pull requests that have the "containerd-integration" label and also on any other branch/tag. Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
24cf1a6 to
1952344
Compare
Fixed that typo and another one in the second commit that you didn't see :) |
|
Passing thought: Are there unit tests affected by the Prompted because unit tests are about 25% of the GHA runtime, and serialise ahead of the integration tests, so avoiding needless duplication seems valuable, although this would lead to the fascinating result of the snapshotter integration tests starting (and completing?) before the graphdriver integration tests. |
I don't think the unit-tests should be affected by this, but (honestly) can't say with 100% certainty (I know that some unit-tests are not-as-unit-as-they-should be (looking at you: "networking unit-tests that depend on iptables"). I definitely think we need to look at the matrices as a whole, as I think there's some unneeded duplication (as well as looking if we can make them more efficient; some checks depend "too much" on others, which means having to wait for others to complete before being able to restart them, whereas others are fast to run (but relatively high overhead for building the initial parts). All that said; I think it's ok to look at that in follow-ups (but we could consider creating a tracking ticket for this). (Also trying to give @rumpl some sleep after waking up in the middle of night, dreaming of YAML and more YAML) 😂 |
|
Sooo much yaml! 😭😂 But you are right @TBBle, we might be able to run unit tests only once. I’m off for Dockercon next week so won’t have time to take a closer look, so maybe we should review this as is and we can optimize later? The double unit tests only hit those that open a PR in the snapshotter case; ok on master too but no one really waits for that I guess. Hold off merging this though, now I kinda want to take a look at this this weekend while on the plane. On top of this there are other optimisations we can do to our CI but I would say they are out of scope of what I want to accomplish here but will definitely get back to it while the |
crazy-max
left a comment
There was a problem hiding this comment.
I think we could merge test-graphdriver and test-snapshotter together and create a dynamic matrix if snapshotter is enabled. That would reduce changes footprint significantly but needs to make sure it's possible. I'm looking at it on another branch and keep you posted.
| # reusable workflow | ||
| name: .smoke |
There was a problem hiding this comment.
Why do we need a reusable workflow for smoke tests? It seems only called from test-graphdriver.
There was a problem hiding this comment.
Maybe instead we could just move this to a regular workflow named test-smoke.yml as it's not tied to graphdriver or snapshotter.
There was a problem hiding this comment.
I assume the idea is that if smoke fails, we can abort the run early. That's what the name suggests to me, anyway. So perhaps it should be called from both test-runs?
| - | ||
| name: Set up tracing | ||
| uses: ./.github/actions/setup-tracing |
There was a problem hiding this comment.
Don't think we need to set up tracing for unit tests? cc @cpuguy83
There was a problem hiding this comment.
Correct we aren't doing any tracing in unit tests.
| # reusable workflow | ||
| name: .windows-build |
There was a problem hiding this comment.
Why can't we just keep this in .windows.yml? We already have a bunch of workflows, would be nice to avoid unnecessary split.
| runs-on: ${{ inputs.os }} | ||
| timeout-minutes: 120 | ||
| needs: | ||
| - build |
There was a problem hiding this comment.
Related to previous comment, this is racy as we could call this reusable workflow without having a dependency on build so I think it's best to keep it build here.
There was a problem hiding this comment.
nit: I think it should be a rename of test.yml > test-graphdriver.yml instead
There was a problem hiding this comment.
AFAIK in Git, renames are detected automatically at diff time, and target minimising diff size. In this case, the bulk of what was in test.yml ended up in .test.yml so it picked that as a rename.
| build-dev: | ||
| needs: | ||
| - validate-dco | ||
| uses: ./.github/workflows/.build-dev.yml |
There was a problem hiding this comment.
build-dev is called from test-graphdriver and test-snapshotter which would make the cache racy.
|
Closing in favor of #46572 |


- What I did
Added CI for the containerd integration
Any PR that has the label
containerd-integrationwill also run tests with the snapshottersExample PR without the needed label, note the skipped jobs rumpl#138
- How I did it
Moved everything from
test.ymlinto a reusable workflow in.test.ymland used that to run tests with graph drivers and containerd snapshotters. Also added a workflow to test snapshotters in windows.- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)