Skip to content

Conversation

@jamesob
Copy link
Contributor

@jamesob jamesob commented Apr 27, 2018

Add the option (--failfast) to stop the functional test runner's execution when it encounters the first failure.

Also cleans up run_test's arguments list (no more mutable default for args) and call site.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can keep this in one line, so that it doesn't break when you e.g. grep for run_tests.

@jamesob jamesob force-pushed the 2018-04-testrunner-failfast branch 3 times, most recently from c70ec95 to 86dc5af Compare April 27, 2018 17:28
@jamesob
Copy link
Contributor Author

jamesob commented Apr 27, 2018

@MarcoFalke thanks for the look. I've addressed the formatting nit and the process cleanup issue you pointed out.

@theuni
Copy link
Member

theuni commented Apr 27, 2018

Concept ACK. Any reason not to enable this for Travis?

@jamesob jamesob force-pushed the 2018-04-testrunner-failfast branch from 86dc5af to fe66ff6 Compare April 27, 2018 18:13
@jamesob
Copy link
Contributor Author

jamesob commented Apr 27, 2018

@theuni good point; I guess that'd encourage a shift in compute consumption onto developers who write buggy code vs. scarce Travis resources. Will add.

jamesob added 2 commits April 27, 2018 15:01
Also cleans up run_test's arguments list (no more mutable default for `args`)
and call site.
@jamesob jamesob force-pushed the 2018-04-testrunner-failfast branch from b1f64a5 to 58f9a0a Compare April 27, 2018 19:01
@fanquake fanquake added the Tests label Apr 27, 2018
@laanwj
Copy link
Member

laanwj commented Apr 28, 2018

Concept ACK

@practicalswift
Copy link
Contributor

Concept ACK

Very nice!

@laanwj
Copy link
Member

laanwj commented Apr 30, 2018

Tested ACK 58f9a0a
I made the first test that runs here (wallet_hd.py) fail.
Without --failfast it continues after the error.
With --failfast it stopped immediately.

@laanwj laanwj merged commit 58f9a0a into bitcoin:master Apr 30, 2018
laanwj added a commit that referenced this pull request Apr 30, 2018
58f9a0a Use --failfast when running functional tests on Travis (James O'Beirne)
bf720c1 Add --failfast option to functional test runner (James O'Beirne)

Pull request description:

  Add the option (`--failfast`) to stop the functional test runner's execution when it encounters the first failure.

  Also cleans up run_test's arguments list ([no more mutable default for `args`](http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments)) and call site.

Tree-SHA512: e854b1b1634bf613ae8ae88e715df1460982fa68db9d785aafeb5eccf5bf324c7f20dded2ca6840ebf18a28347ecac2138d6c7592507b34939b02609ef55e1b3
codablock pushed a commit to codablock/dash that referenced this pull request Jan 7, 2020
…nner

58f9a0a Use --failfast when running functional tests on Travis (James O'Beirne)
bf720c1 Add --failfast option to functional test runner (James O'Beirne)

Pull request description:

  Add the option (`--failfast`) to stop the functional test runner's execution when it encounters the first failure.

  Also cleans up run_test's arguments list ([no more mutable default for `args`](http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments)) and call site.

Tree-SHA512: e854b1b1634bf613ae8ae88e715df1460982fa68db9d785aafeb5eccf5bf324c7f20dded2ca6840ebf18a28347ecac2138d6c7592507b34939b02609ef55e1b3
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Apr 20, 2020
Summary:
58f9a0a Use --failfast when running functional tests on Travis (James O'Beirne)
bf720c1 Add --failfast option to functional test runner (James O'Beirne)

Pull request description:

  Add the option (`--failfast`) to stop the functional test runner's execution when it encounters the first failure.

  Also cleans up run_test's arguments list ([no more mutable default for `args`](http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments)) and call site.

Tree-SHA512: e854b1b1634bf613ae8ae88e715df1460982fa68db9d785aafeb5eccf5bf324c7f20dded2ca6840ebf18a28347ecac2138d6c7592507b34939b02609ef55e1b3

Partial backport of Core [[bitcoin/bitcoin#13105 | PR13105]]

This backport only includes some of the --failfast behavior since our test_runner has diverged so much.
This patch implements --failfast in a way that does not kill currently running tests like the original PR does.
This is due to a limitation of the thread-level parallelization that we implemented in test_runner.

Though the documentation on these matters is extremely scattered, I will attempt to sum up my conclusions:

Threading implementation (current) + modifications that match this backport:
* Attempting to kill running tests either fails completely or leaves child bitcoind processes running.
  My understanding is this is due to a limitation in the way Python handles threads that spawn processes.

Multiprocessing implementation:
* I attempted a full migration from Threading to Multiprocessing to alleviate the above.
* Despite documentation indicating that killing a Multiprocess will also kill it's children, this behavior
  does not appear to work as intended. Some example issues that have persisted even in recent Python
  versions that might be related:
    https://bugs.python.org/issue9205
    https://bugs.python.org/issue22393
  While it's difficult to say exactly which of these issues is related to the root cause, it appears that
  we will not find a stable version of Python that supports our Multiprocessing needs in the near future.
  It also looks like I'm not the only one with such frustration:
    https://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool#comment101399084_1408476
    (note the timeline of the entire conversation: 2010 - 2019)
* Despite getting a mostly-working implementation using various workarounds, I ran into all sorts of signal
  passing issues where SIGKILL, SIGTERM, and SIGINT would not behave as expected when these signals originated
  from worker processes.  This was especially annoying, as it effectively had the same behavior as the Threading
  implementation when you attempt to Ctrl+C test_runner: leaving dangling bitcoind processes for an unknown
  amount of time after test_runner had quit.

After much experimentation, it appears there is no stable way to kill the currently running tests with widely
available Python versions (my current is 3.7.3).  Even if we could get something to work, it is clear to me
that the maintenance cost of doing so just to pull in --failfast is not worth it.  We should expect the
default to be that the test suite passes, so I'm not concerned with a small performance hit on --failfast.

It also appears that Core's test_runner runs tests in parallel now, so doing a migration to a more up-to-date
version of their test_runner is likely our best course of action in the long run.

Pulling in this patch sooner rather than later will help improve UX of our future land bot when test failures
occur by reducing patch queuing time.

Test Plan:
Add `assert True == False` to some test(s).
```
test_runner.py
test_runner.py --failfast
```
Remove the assert from those tests and re-run the above for sanity.

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Subscribers: deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D5742
ftrader pushed a commit to bitcoin-cash-node/bitcoin-cash-node that referenced this pull request Aug 17, 2020
Summary:
58f9a0a Use --failfast when running functional tests on Travis (James O'Beirne)
bf720c1 Add --failfast option to functional test runner (James O'Beirne)

Pull request description:

  Add the option (`--failfast`) to stop the functional test runner's execution when it encounters the first failure.

  Also cleans up run_test's arguments list ([no more mutable default for `args`](http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments)) and call site.

Tree-SHA512: e854b1b1634bf613ae8ae88e715df1460982fa68db9d785aafeb5eccf5bf324c7f20dded2ca6840ebf18a28347ecac2138d6c7592507b34939b02609ef55e1b3

Partial backport of Core [[bitcoin/bitcoin#13105 | PR13105]]

This backport only includes some of the --failfast behavior since our test_runner has diverged so much.
This patch implements --failfast in a way that does not kill currently running tests like the original PR does.
This is due to a limitation of the thread-level parallelization that we implemented in test_runner.

Though the documentation on these matters is extremely scattered, I will attempt to sum up my conclusions:

Threading implementation (current) + modifications that match this backport:
* Attempting to kill running tests either fails completely or leaves child bitcoind processes running.
  My understanding is this is due to a limitation in the way Python handles threads that spawn processes.

Multiprocessing implementation:
* I attempted a full migration from Threading to Multiprocessing to alleviate the above.
* Despite documentation indicating that killing a Multiprocess will also kill it's children, this behavior
  does not appear to work as intended. Some example issues that have persisted even in recent Python
  versions that might be related:
    https://bugs.python.org/issue9205
    https://bugs.python.org/issue22393
  While it's difficult to say exactly which of these issues is related to the root cause, it appears that
  we will not find a stable version of Python that supports our Multiprocessing needs in the near future.
  It also looks like I'm not the only one with such frustration:
    https://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool#comment101399084_1408476
    (note the timeline of the entire conversation: 2010 - 2019)
* Despite getting a mostly-working implementation using various workarounds, I ran into all sorts of signal
  passing issues where SIGKILL, SIGTERM, and SIGINT would not behave as expected when these signals originated
  from worker processes.  This was especially annoying, as it effectively had the same behavior as the Threading
  implementation when you attempt to Ctrl+C test_runner: leaving dangling bitcoind processes for an unknown
  amount of time after test_runner had quit.

After much experimentation, it appears there is no stable way to kill the currently running tests with widely
available Python versions (my current is 3.7.3).  Even if we could get something to work, it is clear to me
that the maintenance cost of doing so just to pull in --failfast is not worth it.  We should expect the
default to be that the test suite passes, so I'm not concerned with a small performance hit on --failfast.

It also appears that Core's test_runner runs tests in parallel now, so doing a migration to a more up-to-date
version of their test_runner is likely our best course of action in the long run.

Pulling in this patch sooner rather than later will help improve UX of our future land bot when test failures
occur by reducing patch queuing time.

Test Plan:
Add `assert True == False` to some test(s).
```
test_runner.py
test_runner.py --failfast
```
Remove the assert from those tests and re-run the above for sanity.

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Subscribers: deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D5742
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Jun 30, 2021
…nner

58f9a0a Use --failfast when running functional tests on Travis (James O'Beirne)
bf720c1 Add --failfast option to functional test runner (James O'Beirne)

Pull request description:

  Add the option (`--failfast`) to stop the functional test runner's execution when it encounters the first failure.

  Also cleans up run_test's arguments list ([no more mutable default for `args`](http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments)) and call site.

Tree-SHA512: e854b1b1634bf613ae8ae88e715df1460982fa68db9d785aafeb5eccf5bf324c7f20dded2ca6840ebf18a28347ecac2138d6c7592507b34939b02609ef55e1b3
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants