tests/inst: Add destructive test framework by cgwalters · Pull Request #2127 · ostreedev/ostree

cgwalters · 2020-06-08T21:40:49Z

This adds infrastructure to the Rust test suite for destructive
tests, and adds a new transactionality test which runs
rpm-ostree in a loop (along with ostree-finalize-staged) and
repeatedly kills them.

The main goal here is to flush out any "logic errors". I plan
to further extend this to reboots and then force poweroffs.

As I was working on extending some of ostree's destructive test suite to do reboots: ostreedev/ostree#2127 I realized that the Debian autopkgtest API for rebooting is better, because it allows *saving state external to the host*. Rather than having the test count boots as ostree is doing today, the "mark" allows us to more reliably dispatch. And further, becase we don't rely on writing anything to disk on the target, we can add clean support for "forced reboots" that might kill the OS before we write to persistent storage there. The "between reboot" state lives in the test runner's memory instead. We retain support for the previous (two!) reboot APIs here for now. I tested this with basically the example script from the Debian autopkgtest specification: ``` set -xeuo pipefail case "${AUTOPKGTEST_REBOOT_MARK:-}" in "") echo "test beginning"; /tmp/autopkgtest-reboot mark1 ;; mark1) echo "test in mark1"; /tmp/autopkgtest-reboot mark2 ;; mark2) echo "test in mark2" ;; *) echo "unexpected mark: ${AUTOPKGTEST_REBOOT_MARK}"; exit 1;; esac echo "ok autopkgtest rebooting" ``` I think it will make sense actually to implement more of the autopkgtest API - Debian has a nontrivial number of tests using this, and I think there's even work upstream in e.g. systemd to bridge its tests to autopkgtest. Which would mean we gain "run systemd's tests in kola" for free.

cgwalters · 2020-07-27T13:29:16Z

OK a lot more work here; we're testing that we reliably survive forced poweroffs. Still TODO:

~~Re-merge the "kill -9" code and alternate between multiple "interrupt strategies" of "none, kill -9, reboot -ff" etc.~~
Test that we successfully fail if we e.g. omit fsync()

cgwalters · 2020-08-06T00:18:32Z

OK lifting WIP on this - fault injection would be another level, and I need to solve some other problems before doing that like being able to pull containers/binaries/packages from the "host" cosa container at least in qemu so we don't rely on internet access (as I really want to be able to run this test in a loop and not have it randomly flake for internet reasons).

jlebon

Some comments, but LGTM overall. Cool stuff! It took a while honestly to grok how everything fits together. I think the cognitive load of thinking across reboots made it harder.

tests/inst/itest-macro/src/itest-macro.rs

tests/inst/src/destructive.rs

jlebon · 2020-08-07T21:12:49Z

tests/inst/src/destructive.rs

+        let res = res.context("Failed during upgrade")?;
+        if res {
+            println!(
+                "Failed to interrupt upgrade, attempt {}/{}",


We could lower the timeout by e.g. 10% in this case to increase the odds for the next time (maybe after we get to 5 retries or something).

I thought about doing things like this - an issue I've seen is that when I start up 4 instances of this test in parallel, there's heavy CPU usage in the VMs where they're all doing the initial setup, and that inflates the timing for the test upgrade.

The better fix I think would be something like an env OSTREE_PAUSE_POINT=pre-deploy,post-deploy,pre-cleanup where the harness can control when the process continues. (Or maybe implement this with scripting gdb or so)

cgwalters · 2020-08-17T13:04:51Z

OK ended up doing more fixes and tweaks here; I noticed that the results weren't including the "no-interrupt" case, and fixing/handling that required another tricky special case.

I also noticed there were fewer "completed" results for live interrupts than I expected, and that turned out to be me forgetting it needs to be systemctl stop ostree-finalize-staged and not start.

tests/inst/src/destructive.rs

This adds infrastructure to the Rust test suite for destructive tests, and adds a new `transactionality` test which runs rpm-ostree in a loop (along with `ostree-finalize-staged`) and repeatedly uses either `kill -9`, `reboot` and `reboot -ff`. The main goal here is to flush out any "logic errors". So far I've validated that this passes a lot of cycles using ``` $ kola run --qemu-image=fastbuild-fedora-coreos-ostree-qemu.qcow2 ext.ostree.destructive-rs.transactionality --debug --multiply 8 --parallel 4 ``` a number of times.

jlebon · 2020-08-17T14:47:41Z

/lgtm

openshift-ci-robot · 2020-08-17T14:47:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,jlebon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

We want to test upgrades that actually change files as a general rule; in some cases we want to test "large" upgrades to validate performance. This code generates a "synthetic" upgrade that adds an ELF note to a percentage of ELF files (randomly selected). By doing it this way we are only actually testing one version of the code. Migrated from coreos/coreos-assembler#1635 using the Rust code from ostreedev/ostree#2127

openshift-ci-robot added the do-not-merge/work-in-progress label Jun 8, 2020

openshift-ci-robot requested review from d4s and mwleeds June 8, 2020 21:40

openshift-ci-robot added the approved label Jun 8, 2020

cgwalters mentioned this pull request Jun 11, 2020

kola: Support the Debian autopkgtest reboot API coreos/coreos-assembler#1528

Merged

cgwalters force-pushed the destructive-rs branch 3 times, most recently from ceb0ff0 to d818375 Compare July 2, 2020 13:57

cgwalters force-pushed the destructive-rs branch 3 times, most recently from d66194f to dcf3914 Compare July 28, 2020 12:40

cgwalters mentioned this pull request Jul 30, 2020

Bug 1850057: stage OS updates (nicely) while etcd is still running openshift/machine-config-operator#1897

Closed

cgwalters force-pushed the destructive-rs branch 7 times, most recently from 190aca6 to 3addc62 Compare August 6, 2020 00:17

cgwalters changed the title ~~WIP: tests/inst: Add destructive test framework~~ tests/inst: Add destructive test framework Aug 6, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress label Aug 6, 2020

jlebon approved these changes Aug 7, 2020

View reviewed changes

jlebon mentioned this pull request Aug 7, 2020

dev-synthesize-osupdate: New command coreos/coreos-assembler#1635

Merged

cgwalters force-pushed the destructive-rs branch 2 times, most recently from 43a46b8 to 7ccd3e6 Compare August 16, 2020 22:09

jlebon reviewed Aug 17, 2020

View reviewed changes

tests/inst/src/destructive.rs Outdated Show resolved Hide resolved

cgwalters force-pushed the destructive-rs branch from 7ccd3e6 to 1101c02 Compare August 17, 2020 14:34

openshift-ci-robot assigned jlebon Aug 17, 2020

openshift-ci-robot added the lgtm label Aug 17, 2020

openshift-merge-robot merged commit 543610b into ostreedev:master Aug 17, 2020

cgwalters mentioned this pull request Aug 17, 2020

Add testutils generate-synthetic-upgrade coreos/rpm-ostree#2199

Merged

cgwalters mentioned this pull request Sep 25, 2020

exttests: New container image with upstream tests coreos/coreos-assembler#1745

Merged

Conversation

cgwalters commented Jun 8, 2020

Uh oh!

cgwalters commented Jul 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cgwalters commented Aug 6, 2020

Uh oh!

jlebon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jlebon Aug 7, 2020

Choose a reason for hiding this comment

Uh oh!

cgwalters Aug 17, 2020

Choose a reason for hiding this comment

Uh oh!

cgwalters commented Aug 17, 2020

Uh oh!

Uh oh!

jlebon commented Aug 17, 2020

Uh oh!

openshift-ci-robot commented Aug 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cgwalters commented Jul 27, 2020 •

edited

Loading