Skip to content

Bug fixes and improvements for the udev test suite#8819

Closed
mwilck wants to merge 24 commits intosystemd:masterfrom
mwilck:udev-test
Closed

Bug fixes and improvements for the udev test suite#8819
mwilck wants to merge 24 commits intosystemd:masterfrom
mwilck:udev-test

Conversation

@mwilck
Copy link
Copy Markdown
Contributor

@mwilck mwilck commented Apr 25, 2018

This patch set fixes a few minor problems and adds new features to the udev test program, udev-test.pl, and the python code that is used to set up the fake sysfs for these tests.
The new features are:

  • allow addition and removal of several devices in a single test
  • allow testing for the (non-)existence of several symlinks per device in a single test
  • allow auto-generation of test device lists from the fake sysfs
  • generate (almost) arbitrarily many SCSI device entries and test on them

The motivation for doing this was to reproduce the problem addressed in #8667 in the test suite, and to test for performance regressions (lock contention) caused by my proposed fix for the same issue.
I verified that all tests are still correct and pass. I actually made the test suite a bit harder to pass by adding additional consistence checks. The ability to test several symlinks in a single test allowed me to merge a few tests.
Two of the tests added in the "add multiple device test" commit do FAIL sporadically with the current code because of the mentioned race condition.

mwilck added 21 commits April 25, 2018 22:12
/dev/null is required by udev for output redirection, otherwise it
emits irritating warnings for several test cases. Create this
device in the test environment. This requires dropping "nodev" from
the mountpoint for /dev in the test environment.
Allow testing cases where multiple devices are added and removed.
This implies a change of the data structure: every test allows
for multiple devices to be added, and "exp_name" etc. are now properties
of the device, not of the test.
It's not necessary to write the rules for every udev run, as we
now may have many (rather than just 2) per test.
Allow testing cases where multiple devices are added and removed
simultaneously. Tests are started as synchronously as possible using a
semaphore, in order to test possible race conditions. If this isn't desired,
the test parameter "sleep_us" can be set to the number of microseconds to wait
between udev invocations.
More often than not, the created devnode is the basename of the
sysfs entry. The "devnode" device may be used to override the
auto-detected node name.

Permissions and major/minor number are now verified on the devnode
itself, not on symlinks.

This allows separate testing for devnodes and symlinks an a follow-up
patch.
Expected device node names are now tested automatically. Use exp_name
only for symlinks in the test definitions, to avoid confusion and duplicated
tests.
Test if symlinks are created correctly by comparing the symlink
targets to the devnode path. This implies (for the symlink) that
major/minor numbers and permissions are correct, as we have tested
that on the devnode already.
Instead of testing the existence or non-exisitence of just a single
symlink, allow testing of several links per device.
Change the test definitions to work with the previous patch.
udev hasn't supported renaming device nodes for some time.
the "last_rule" option hasn't been supported for some time.
Therefore this test fails if a "not_exp_links" attribute is added,
as it should be. Mark it appropriately.
Add some rules that make it a bit harder to pass, mainly the
non-existence checks.
These rules have survived from an ancient version of the code
and save no purpose any more.
As we can check multiple links in a single test now, these 3
tests can be merged into one.
As we can test multiple devices and multiple links per device
in one test now, these two tests can be merged into one.
This is helpful to catch possible regressions in the test.
Also, don't count wait() errors, they are likely not udev errors.
Add 4 new tests using multiple devices. Number 2-4 use many
devices claiming the same symlink, where only one device has
a higher priority thatn the others. They fail sporadically with
the current code, if a race condition causes the symlink to point
to the wrong device. Test 4 is like test 2 with sleeps in between,
it's much less likely to fail.
for easier reproduction of sporadic test failures.
Manually listing all devices in the test definition becomes cumbersome with
lots of devices. Add a function that scans on all block devices in
the test sysfs and generates a list of devices to test.
Script for generating LOTS of SCSI disk and partition devices in
the fake sysfs we use for udev testing.

This script is supposed to be run after sys-script.py. It uses
code from sys-script.py as template to generate additional SCSI disk data
structures. Together with the "generator" code in udev-test.pl
added in the previous patch, it allows to run udev tests with almost
arbitrarily many devices, and thus to do performance scaling tests.
Copy link
Copy Markdown
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't locate IPC/SysV.pm in @INC (you may need to install the IPC::SysV module) (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at /builddir/build/BUILD/systemd-239/test/udev-test.pl line 23.

It seems we need to tweak our CI, otherwise we can't even check this properly.

{
devpath => "/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1",
exp_name => "sda1" ,
exp_rem_error => "yes",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this one is actually the single (?) case where devices has two items. It'd be great to make this a separate commit, so it is not buried in all the other trivial cases where devices has one item.

@mwilck
Copy link
Copy Markdown
Contributor Author

mwilck commented Apr 26, 2018

It seems we need to tweak our CI, otherwise we can't even check this properly.

Oops - IPC::SysV is a "core" Perl module, I didn't expect that to cause problems. That's not the only module I needed to add; there's also POSIX, IPC:Semaphore, Time:HiRes, and Cwd. They are all core modules.

Copy link
Copy Markdown
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests fail for me after 2d27a67 'add multiple device test' (sometimes), and then the next commit is d299247 'add repeat count' after which the tests fail reliably.


$sema->remove;
print "$error errors occurred\n\n";
print "$error errors occurred. $good good results.\n\n";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to go one step further, and verify that good is above some threshold (and that threshold would have to be adjusted when tests are added or removed).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an additional patch for the "expected good" count.

@keszybz
Copy link
Copy Markdown
Member

keszybz commented Apr 26, 2018

I added IPC::SysV to Fedora CI in https://src.fedoraproject.org/rpms/systemd/c/1eb8bc72df8f4dccff1e094b11eefce8adda896f?branch=rpm-master. The other ones might be pulled in by something else. Let's add them if it turns out that they are missing. If you push again, Fedora CI will run.

The failures on Ubuntu seem to be caused by something different.

autopkgtest [21:19:55]: test udev: [-----------------------
umount: test/tmpfs: no mount point specified.
Can't exec "/usr/bin/mknod": No such file or directory at ./udev-test.pl line 2254.
...
91 errors occurred. 6539 good results.

@mwilck
Copy link
Copy Markdown
Contributor Author

mwilck commented Apr 26, 2018

The tests fail for me after 2d27a67 'add multiple device test' (sometimes), and then the next commit is d299247 'add repeat count' after which the tests fail reliably.

Right. That's because these tests show a deficiency of the current udev code. The failure is sporadic, therefore adding the repeat count increases the failure probability. Merging #8667 should fix this. I kept the two pull requests separate on purpose for now.

umount: test/tmpfs: no mount point specified.

That happens all the time. AFAIK it's not caused by my patches, but I'll double-check.

Can't exec "/usr/bin/mknod": No such file or directory at ./udev-test.pl line 2254

OK, it's /bin/mknod on Ubuntu. I'll fix that.

mwilck added 2 commits April 26, 2018 13:25
umount emits an error message "no mount point specified" if the
tmpfs isn't mounted yet, which is the normal case.
Suppress that by redirecting stderr.
On some distros, mknod isn't under /usr/bin. Rely on PATH instead.

Fixes: e4c1f9a  "test/udev-test.pl: create /dev/null"
@mwilck
Copy link
Copy Markdown
Contributor Author

mwilck commented Apr 26, 2018

I just pushed fixes for the mknod and umount issues you mentioned.

Wrt the "multiple-devices" test, I need to understand the policy. If all tests are required to pass do to CI requirements, I can add "exp_err_" flags to these tests. We'd need to mark clearly that these errors are only "expected" because we know the current udev code is buggy.

Since 'test/udev-test.pl: count "good" results', we know how many
checks succeeded. Add an "expected good" count to make that number
more meaningful.
@mwilck
Copy link
Copy Markdown
Contributor Author

mwilck commented Apr 30, 2018

@keszybz, anything else I can do to get this merged?

@evverx
Copy link
Copy Markdown
Contributor

evverx commented May 6, 2018

Oops - IPC::SysV is a "core" Perl module, I didn't expect that to cause problems.

Fedora, CentOS and RH are unusual in the sense that an additional package has to be installed there to get the core Perl modules:

, so in practice, any use of any module might cause the test to fail immediately. I think it would make sense to check if all the modules used in test-udev.pl are installed and skip the test if they aren't.

Two of the tests added in the "add multiple device test" commit do FAIL sporadically with the current code because of the mentioned race condition.

Would it be better to move the test cases related to #8667 there so that the addition of the expected errors could be avoided?

@mwilck
Copy link
Copy Markdown
Contributor Author

mwilck commented May 16, 2018

I think it would make sense to check if all the modules used in test-udev.pl are installed and skip the test if they aren't.

That's dangerous for a regression test. The current behavior - fail the test, and require one-time developer action in the CI setup - seems to be safer to me. For the IPC::SysV module, that has happened already according to @keszybz' comment above, so AFAICS there's no further action needed, right?

Would it be better to move the test cases related to #8667 there so that the addition of the expected errors could be avoided?

I could do that. I can also disable the tests, or modify the test to temporarily ignore certain failures and spit out a warning. I was wondering whether systemd might have an established policy for cases like this.

What exactly is the recommendation?

@evverx
Copy link
Copy Markdown
Contributor

evverx commented May 16, 2018

That's dangerous for a regression test.

I think it depends on the perception of the test. If it's expected that it has to fail no matter what, then checks like b1ffacb should be removed as well.

For the IPC::SysV module, that has happened already according to @keszybz' comment above, so AFAICS there's no further action needed, right?

As far as I can see, perl(IPC::SysV) was added as a build dependency. It is a great shortcut, which made the test work on Fedora Rawhide CI x86_64. To make it more robust, though, perl should be installed as described in https://fedoraproject.org/wiki/Changes/perl_Package_to_Install_Core_Modules.

What exactly is the recommendation?

In my opinion this PR is just a part of #8667, so I think it would make sense to move all the commits there and continue the discussion there too. It would automatically resolve all the issues related to whether the test should be disabled due to the expected failures or not.

@evverx
Copy link
Copy Markdown
Contributor

evverx commented May 16, 2018

I was wondering whether systemd might have an established policy for cases like this.

The only case like this I was able to find is #5520, where the test has been waiting for the bug to be fixed. Apparently, it will be merged as soon as it doesn't fail.

@mwilck
Copy link
Copy Markdown
Contributor Author

mwilck commented May 16, 2018

@evverx,

To make it more robust, though, perl should be installed as described in https://fedoraproject.org/wiki/Changes/perl_Package_to_Install_Core_Modules.

I don't think I can help with that.

Apparently, it will be merged as soon as it doesn't fail.

OK, I'll work on it ASAP, currently busy with other stuff.

@mwilck
Copy link
Copy Markdown
Contributor Author

mwilck commented Jul 9, 2018

Superseded by #9551.

@mwilck mwilck closed this Jul 9, 2018
PizzaLovingNerd pushed a commit to risiIndustries/systemd-pkg that referenced this pull request Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants