Currently, the integration tests are checking for success by seeing if a hardcoded number of wheels is built: https://github.com/joerick/cibuildwheel/blob/master/bin/run_test.py#L25.
This should be changed into a condition that each test can impose independently.
See also #101 (comment) for more details.