Skip to content

Rework vmcheck to use kola spawn, move off of PAPR#1949

Merged
openshift-merge-robot merged 6 commits intocoreos:masterfrom
jlebon:pr/vmcheck-cci
Dec 13, 2019
Merged

Rework vmcheck to use kola spawn, move off of PAPR#1949
openshift-merge-robot merged 6 commits intocoreos:masterfrom
jlebon:pr/vmcheck-cci

Conversation

@jlebon
Copy link
Member

@jlebon jlebon commented Dec 6, 2019

There's a lot going on here, but essentially:

  1. We change the vmcheck model so that it always operates on an
    immutable base image. It takes that image and dynamically launches a
    separate VM for each test using kola spawn. This means we can drop
    a lot of hacks around re-using the same VMs.
  2. Following from 1., vmoverlay now takes as input a base image,
    overlays the built rpm-ostree bits, then creates a new base image. Of
    course, we don't have to do this in CI, because we build FCOS with
    the freshly built RPMs (so it uses SKIP_VMOVERLAY=1). vmoverlay
    then will be more for the developer case where one doesn't want to
    iterate via cosa build to test rpm-ostree changes. I say "will"
    because the functionality doesn't exist yet; I'd like to enhance
    cosa dev-overlay to do this. (Note vmsync should still works just
    as before too.)
  3. vmcheck can be run without building the tree first, as
    tests/vmcheck.sh. The make vmcheck target still exists though for
    finger compatibility and better meshing with vmoverlay in the
    developer case.

What's really nice about using kola spawn is that it takes care of a lot
of things for us, such as the qemu command, journal and console
gathering, and SSH.

Similarly to the compose testsuites, we're using parallel here to run
multiple vmcheck tests at once. (On developer laptops, we cap
parallelism at $(nproc) - 1).

@jlebon
Copy link
Member Author

jlebon commented Dec 9, 2019

There's something weird going on where rebooting sometimes takes around 5 minutes. I think it might be related to kola streaming the journal via SSH and having no idea we're rebooting (it has special handling for rebooting: https://github.com/coreos/mantle/blob/b729d0ba015693f92b485df461e44c70b63ff6ba/platform/util.go#L95, though doing systemctl stop sshd.socket ourselves doesn't work either). It also sometimes does work, so it might be a race somewhere.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only very briefly skimmed this...seems sane.

.cci.jenkinsfile Outdated
echo "starting compose tests in supermin"
ci/installdeps.sh # really, we just need test deps, but meh...
curl -Lo tests/vmcheck/image.qcow2.xz "https://builds.coreos.fedoraproject.org/prod/streams/testing-devel/builds/31.20191205.dev.1/x86_64/fedora-coreos-31.20191205.dev.1-qemu.x86_64.qcow2.xz"
unxz tests/vmcheck/image.qcow2.xz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really want curl | xz -d > image.qcow2 I think.

@cgwalters
Copy link
Member

and having no idea we're rebooting (it has special handling for rebooting

Hmm, this looks like mantle likely needs all of the special logic we have around checking boot IDs? And hmm, the way it's doing reboots looks like the same naive method we were doing before, I think we may need a systemd-run in there to make it fully async?

@jlebon
Copy link
Member Author

jlebon commented Dec 12, 2019

OK I filed systemd/systemd#14328 for the reboot issue. Trying to see if we can work around this by masking systemd-logind.

@jlebon jlebon force-pushed the pr/vmcheck-cci branch 2 times, most recently from 5a06ee5 to f5f79b3 Compare December 13, 2019 03:26
@jlebon
Copy link
Member Author

jlebon commented Dec 13, 2019

Now requires: coreos/fedora-coreos-config#254

Makes it less repetitive and allows controlling the images from a
central place.
That way, anyone can easily download the latest built RPMs from master
or a specific PR. This isn't a replacement for automated builds in Koji
though since it's not multi-arch.

Also fetch the tags so that the NEVRA derived from `git describe` is
nicer.
So that the built FCOS has them. This is a prereq for actually testing
what we built in `vmcheck`.
This is a hack to allow using `inject-pkglist` without having to build
the tree first.

Higher-level, I think we can split this back out again if we have a
`-tests` subpackage where we ship the vmcheck testsuite.
Likely leftover from when it was an experimental feature. It's not
anymore.
There's a lot going on here, but essentially:

1. We change the `vmcheck` model so that it always operates on an
   immutable base image. It takes that image and dynamically launches a
   separate VM for each test using `kola spawn`. This means we can drop
   a lot of hacks around re-using the same VMs.
2. Following from 1., `vmoverlay` now takes as input a base image,
   overlays the built rpm-ostree bits, then creates a new base image. Of
   course, we don't have to do this in CI, because we build FCOS with
   the freshly built RPMs (so it uses `SKIP_VMOVERLAY=1`). `vmoverlay`
   then will be more for the developer case where one doesn't want to
   iterate via `cosa build` to test rpm-ostree changes. I say "will"
   because the functionality doesn't exist yet; I'd like to enhance
   `cosa dev-overlay` to do this. (Note `vmsync` should still works just
   as before too.)
3. `vmcheck` can be run without building the tree first, as
   `tests/vmcheck.sh`. The `make vmcheck` target still exists though for
   finger compatibility and better meshing with `vmoverlay` in the
   developer case.

What's really nice about using kola spawn is that it takes care of a lot
of things for us, such as the qemu command, journal and console
gathering, and SSH.

Similarly to the compose testsuites, we're using parallel here to run
multiple vmcheck tests at once. (On developer laptops, we cap
parallelism at `$(nproc) - 1`).
@jlebon jlebon changed the title WIP: vmcheck: target FCOS, use kola spawn, move off of PAPR Rework vmcheck to use kola spawn, move off of PAPR Dec 13, 2019
@jlebon jlebon marked this pull request as ready for review December 13, 2019 17:27
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great! There's a lot of "next steps" here - in particular I'd really like to generalize/expand this to other repos; perhaps some of this flow really lives in cosa for example.

But let's get this in!

@cgwalters
Copy link
Member

/lgtm

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cgwalters
Copy link
Member

/override f29-compose2

@openshift-ci-robot
Copy link
Collaborator

@cgwalters: Overrode contexts on behalf of cgwalters: f29-compose2

Details

In response to this:

/override f29-compose2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit c7a9c3b into coreos:master Dec 13, 2019
@cgwalters
Copy link
Member

@jlebon
cosa dev-overlay --src-image "${src_img}" - missing a PR to cosa?

@jlebon
Copy link
Member Author

jlebon commented Dec 16, 2019

cosa dev-overlay --src-image "${src_img}" - missing a PR to cosa?

Right, the code has a # XXX: to develop near this. I mentioned this in the commit message too:

vmoverlay
then will be more for the developer case where one doesn't want to
iterate via cosa build to test rpm-ostree changes. I say "will"
because the functionality doesn't exist yet; I'd like to enhance
cosa dev-overlay to do this. (Note vmsync should still works just
as before too.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants