Skip to content

handle worker crash#158

Merged
icemac merged 11 commits intopytest-dev:masterfrom
xoviat:crashitem
Sep 17, 2021
Merged

handle worker crash#158
icemac merged 11 commits intopytest-dev:masterfrom
xoviat:crashitem

Conversation

@xoviat
Copy link
Contributor

@xoviat xoviat commented Apr 23, 2021

@sallner What do you think about this?

@xoviat xoviat marked this pull request as draft April 23, 2021 21:26
@xoviat xoviat changed the title wip: handle worker crash handle worker crash Apr 24, 2021
@xoviat xoviat marked this pull request as ready for review April 24, 2021 18:46
@xoviat
Copy link
Contributor Author

xoviat commented Apr 27, 2021

Potentially, we can avoid using a database using the pytest_report_from_serializable hook in the controller to count failures, and re-configuring the failure history for tests. However, the number of reruns needs to be communicated to the controller before the test run is started. Otherwise, the controller will not be able to determine whether the crash should be re-run or not.

About the db approach: it's as good as it could be, but for the fact that it doesn't work when the nodes are executed on a remote server (as is possible with xdist).

@icemac
Copy link
Contributor

icemac commented Apr 28, 2021

This PR fails in all configurations, could you please have a look.

@xoviat
Copy link
Contributor Author

xoviat commented Apr 28, 2021

The reason for the failure is that it uses hooks from pytest-xdist.

@icemac
Copy link
Contributor

icemac commented May 4, 2021

The reason for the failure is that it uses hooks from pytest-xdist.

So a release of pytest-xdist is needed? But on the other hand pytest-xdist is not a dependency for pytest-rerunfailures and I do not think that it should become one.

@xoviat
Copy link
Contributor Author

xoviat commented May 4, 2021

I still plan to complete this, but it will be delayed for a bit. The main problem is that the xdist controller needs to know whether a worker crash should be reported as a rerun or a failure, which requires the workers to communicate this information to the controller before running the test setup (so that if the test setup crashes, the controller can rerun it). Currently this is done through a file, but that appears to only work on windows and it doesn't work if the workers are remote. The new plan is set-up a simple key-value database on the controller that's stored in-memory.

xdist will not be a dependency in the final PR, but this functionality will be included if xdist is detected.

@icemac
Copy link
Contributor

icemac commented May 5, 2021

@xoviat Thank you for the clarification.

@xoviat
Copy link
Contributor Author

xoviat commented May 21, 2021

new plan: start socket server on controller, use as db:

https://stackoverflow.com/questions/7749341/basic-python-client-socket-example

@xoviat
Copy link
Contributor Author

xoviat commented Jun 2, 2021

@icemac I believe this may be close to complete. Do you have any questions about this, or should I clean up the commits?

@icemac
Copy link
Contributor

icemac commented Jun 3, 2021

@xoviat The last CI run was a crash. Each and every test failed and had to be killed by GHA after 6 hours. (This does not increase my confidence in this PR.)
What went wrong there and how are you planning to fix it?

@xoviat
Copy link
Contributor Author

xoviat commented Jun 3, 2021

The latest run passed (https://github.com/xoviat/pytest-rerunfailures/actions/runs/903077274)

2bff11b wasn't ready yet, I was just pushing changes so that I could see the diffs on github.

Copy link
Contributor

@icemac icemac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions.

@icemac
Copy link
Contributor

icemac commented Jun 4, 2021

Additionally the linter tests fail.

There is one skipped test: Could it be that this is your newly written test?

@icemac icemac marked this pull request as draft June 4, 2021 06:33
@xoviat
Copy link
Contributor Author

xoviat commented Jun 4, 2021

There is one skipped test: Could it be that this is your newly written test?

The test is skipped because it requires pytest-xdist development branch. What would you like to do about this?

@xoviat
Copy link
Contributor Author

xoviat commented Jun 4, 2021

Again, to clarify, this adds new functionality that reruns a test that crashes a worker with pytest-xdist. If a crash occurs without pytest-xdist, there is nothing that can be done, therefore this code is a no-op without pytest-xdist (the next release as of this writing). However, this functionality is important enough to add because it allows a timed-out test (via pytest-timeout) to be rerun (which cannot be currently done).

@icemac
Copy link
Contributor

icemac commented Jun 7, 2021

The test is skipped because it requires pytest-xdist development branch. What would you like to do about this?

Probably this PR could use that development branch for now until it is merged/released.

@xoviat
Copy link
Contributor Author

xoviat commented Jun 7, 2021

@icemac Is this a good summary of required changes?

  1. add explanation for set/get.
  2. add xdist development branch to tests

@icemac
Copy link
Contributor

icemac commented Jun 8, 2021

@xoviat additionally I'd request:

  1. fix the linting issues
  2. in case of not using xdist make the StatusDB as minimal as possible

@xoviat xoviat marked this pull request as ready for review June 8, 2021 20:25
@xoviat
Copy link
Contributor Author

xoviat commented Jun 8, 2021

Anything else?

@icemac
Copy link
Contributor

icemac commented Jun 9, 2021

The linter still fails, see https://github.com/pytest-dev/pytest-rerunfailures/pull/158/checks?check_run_id=2780766837
You can run the linter tests locally by installing pre-commit.

if a  test crashes the worker while rerunfailures
is running with a supported xdist, then
the test will now be rerun.

this new behavior:
1. requires pytest 6 or newer and a supported xdist
2. doesn't increase the number of times a worker can crash
3. allows rerunning a timed-out test when
 combined with pytest-timeout
@xoviat
Copy link
Contributor Author

xoviat commented Jun 14, 2021

cc @icemac

@xoviat
Copy link
Contributor Author

xoviat commented Jun 18, 2021

ping @icemac

@icemac
Copy link
Contributor

icemac commented Jun 22, 2021

The checks are green now, currently I do not have the time and energy for a final review of this PR. (Maybe @sallner can take over.)
Additionally we need a release of pytest-xdist before this PR can be merged.

@icemac icemac requested a review from sallner June 22, 2021 06:23
@xoviat
Copy link
Contributor Author

xoviat commented Sep 10, 2021

@icemac Is there an update on this? I'd be willing to resolve the merge conflicts if I could get an indication that this was moving forward.

@icemac
Copy link
Contributor

icemac commented Sep 13, 2021

@xoviat Thank you for coming back to this PR. Has the needed xdist release already been created?

@xoviat
Copy link
Contributor Author

xoviat commented Sep 13, 2021

Yes

@icemac
Copy link
Contributor

icemac commented Sep 15, 2021

Additionally there is a problem the linter shows up.

@xoviat
Copy link
Contributor Author

xoviat commented Sep 16, 2021

@icemac Fixes appear to work on my branch. To be clear, pytest-xdist is not required with this pr, but it is required if you want to run all of the tests (in this repository, not in general).

Copy link
Contributor

@icemac icemac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I will merge it after the CI checks have passed.

Copy link
Contributor

@icemac icemac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just before merging I saw that documentation and a change log entry are missing.

Could you please add them? (in README.rst resp. CHANGES.rst)

@xoviat
Copy link
Contributor Author

xoviat commented Sep 16, 2021

@icemac Done.

@icemac icemac merged commit 51c8d36 into pytest-dev:master Sep 17, 2021
@icemac
Copy link
Contributor

icemac commented Sep 17, 2021

Thank you for this PR. 😃

@icemac
Copy link
Contributor

icemac commented Sep 17, 2021

I just created a release including this PR: https://pypi.org/project/pytest-rerunfailures/10.2/

@xoviat xoviat deleted the crashitem branch September 17, 2021 13:12
andrea-segalini added a commit to andrea-segalini/pytest-rerunfailures that referenced this pull request Oct 19, 2023
We tested pytest-timeout with xdist:
pytest             7.0.1
pytest-timeout     2.1.0
pytest-xdist       3.0.2
And the plugin reruns the failed tests without the additional machinery.
Since we don't really need to handle the general case of a worker crash
(tests should never cause that), we can revert the commit and simply
the design.

This reverts commit 51c8d36.
andrea-segalini added a commit to andrea-segalini/pytest-rerunfailures that referenced this pull request Oct 19, 2023
We tested pytest-timeout with xdist:
pytest             7.0.1
pytest-timeout     2.1.0
pytest-xdist       3.0.2
And the plugin reruns the failed tests without the additional machinery.
Since we don't really need to handle the general case of a worker crash
(tests should never cause that), we can revert the commit and simply
the design.

This reverts commit 51c8d36.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants