We have set xfail_strict = true in setup.cfg, which in theory, would turn xpass into failures. However, in testing for 3.1RC, my Windows 10 reported 2 xpassed but did not change them into failures (as shown in pytest-dev/pytest#1386 ).
Maybe this would naturally be solved if we move away from using setup.py test anyway? 🤷♀️