Skip to content

nixos/tests/systemd: Fix x-initrd-mount flakiness#67798

Merged
disassembler merged 1 commit intoNixOS:masterfrom
aszlig:unflake-nixos-test-systemd
Aug 31, 2019
Merged

nixos/tests/systemd: Fix x-initrd-mount flakiness#67798
disassembler merged 1 commit intoNixOS:masterfrom
aszlig:unflake-nixos-test-systemd

Conversation

@aszlig
Copy link
Copy Markdown
Member

@aszlig aszlig commented Aug 30, 2019

It turns out that checking for the last mount time of an ext4 file system isn't a very reliable way to check whether the file system was properly unmounted.

When creating that test in the first place (88530e0), I was reluctant to inspect the file system when the VM is down and was searching for a way to check for a clean unmount after the file system was mounted again to make sure we don't need to create a 512 MB raw image on the host.

Fortunately however, when converting from qcow2, qemu-img actually writes a sparse file, so for most file systems (that is, file systems supporting sparse files) this shouldn't waste a lot of disk space.

So when investigating the flakiness, I found that whenever the test is failing, the unmount of /test-x-initrd-mount was done before the final step during which systemd remounts+unmounts all the remaining file systems.

I haven't investigated why this is the case, but the test is a regression test for #35268, which actually didn't unmount the file system at all, so really all we need to take care here is whether the unmount has happened and not how.

To make sure that checking the filesystem state is enough for this, I temporarily replaced the $machine->shutdown call with $machine->crash and verified that the file system state is not clean.

Fixes: #67555

It turns out that checking for the last mount time of an ext4 file
system isn't a very reliable way to check whether the file system was
properly unmounted.

When creating that test in the first place (88530e0),
I was reluctant to inspect the file system when the VM is down and was
searching for a way to check for a clean unmount *after* the file system
was mounted again to make sure we don't need to create a 512 MB raw
image on the host.

Fortunately however, when converting from qcow2, qemu-img actually
writes a sparse file, so for most file systems (that is, file systems
supporting sparse files) this shouldn't waste a lot of disk space.

So when investigating the flakiness, I found that whenever the test is
failing, the unmount of /test-x-initrd-mount was done *before* the final
step during which systemd remounts+unmounts all the remaining file
systems.

I haven't investigated why this is the case, but the test is a
regression test for NixOS#35268, which
actually didn't unmount the file system *at* *all*, so really all we
need to take care here is whether the unmount has happened and not
*how*.

To make sure that checking the filesystem state is enough for this, I
temporarily replaced the $machine->shutdown call with $machine->crash
and verified that the file system state is "not clean".

Signed-off-by: aszlig <aszlig@nix.build>
Fixes: NixOS#67555
@aszlig aszlig requested a review from flokli August 30, 2019 22:29
@aszlig
Copy link
Copy Markdown
Member Author

aszlig commented Aug 30, 2019

@GrahamcOfBorg test systemd

@ofborg ofborg bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Aug 30, 2019
@aszlig
Copy link
Copy Markdown
Member Author

aszlig commented Aug 30, 2019

The test failure on aarch64-linux is unrelated to this and the failing subtest was introduced in 8e923df (#66482).

@disassembler disassembler merged commit d7c7fc4 into NixOS:master Aug 31, 2019
@disassembler
Copy link
Copy Markdown
Member

as the failure is unrelated and already in master, merging.

dtzWill pushed a commit to dtzWill/nixpkgs that referenced this pull request Sep 11, 2019
It turns out that checking for the last mount time of an ext4 file
system isn't a very reliable way to check whether the file system was
properly unmounted.

When creating that test in the first place (88530e0),
I was reluctant to inspect the file system when the VM is down and was
searching for a way to check for a clean unmount *after* the file system
was mounted again to make sure we don't need to create a 512 MB raw
image on the host.

Fortunately however, when converting from qcow2, qemu-img actually
writes a sparse file, so for most file systems (that is, file systems
supporting sparse files) this shouldn't waste a lot of disk space.

So when investigating the flakiness, I found that whenever the test is
failing, the unmount of /test-x-initrd-mount was done *before* the final
step during which systemd remounts+unmounts all the remaining file
systems.

I haven't investigated why this is the case, but the test is a
regression test for NixOS#35268, which
actually didn't unmount the file system *at* *all*, so really all we
need to take care here is whether the unmount has happened and not
*how*.

To make sure that checking the filesystem state is enough for this, I
temporarily replaced the $machine->shutdown call with $machine->crash
and verified that the file system state is "not clean".

Signed-off-by: aszlig <aszlig@nix.build>
Fixes: NixOS#67555
(cherry picked from commit d7c7fc4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nixos/tests/systemd.nix is broken

2 participants