Fix race condition between test and supervisor restart #530

exarkun · 2022-08-11T16:09:15Z

until polls with fairly low resolution. If the supervisor managed to
receive exit notification and launch a new process between two consecutive
polls for is_running() == False the test would miss the stopped condition
entirely and never complete (or rather, time out after 10 seconds).

Now just wait for the supervisor to report the pid has changed and assert the
process ends in the desire state (running).

`until` polls with fairly low resolution. If the supervisor managed to receive exit notification and launch a new process between two consecutive polls for `is_running() == False` the test would miss the stopped condition entirely and never complete (or rather, time out after 10 seconds). Now just wait for the supervisor to report the pid has changed and assert the process ends in the desire state (running).

crwood · 2022-08-18T13:57:23Z

If the supervisor managed to receive exit notification and launch a new process between two consecutive
polls for is_running() == False the test would miss the stopped condition entirely and never complete (or rather, time out after 10 seconds).

The default restart_delay value of 1 is intended to guard against this (i.e., by enforcing a wait of restart_delay amount of seconds before re-launching), however, since that's merely a default -- and since there's no mechanism/check in place to ensure that the value of restart_delay is greater than the until polling resolution -- I'm fine with this change.

Now just wait for the supervisor to report the pid has changed and assert the process ends in the desire state (running).

I recall reading somewhere that, on Windows, PIDs can be recycled immediately (in which case this approach would not allow us to reliably determine whether the process has been re-started) however a) I can't seem to find an authoritative source for that claim now and b) having this test in place might help to us to demonstrate whether it's true -- so thanks!

exarkun · 2022-08-18T15:28:45Z

I recall reading somewhere that, on Windows, PIDs can be recycled immediately (in which case this approach would not allow us to reliably determine whether the process has been re-started) however a) I can't seem to find an authoritative source for that claim now and b) having this test in place might help to us to demonstrate whether it's true -- so thanks!

Ah, this is a good point. PID re-use is certainly a thing. I don't know how likely it is that a PID gets re-used in this particular scenario but maybe it's not impossible.

Sometimes folks use the combination of (pid, start time) as a more robust unique identifier. We could consider doing that kind of thing around here (maybe exposing this functionality behind another Supervisor API), especially if this new version of the test also proves to be unreliable.

crwood merged commit 28ecf7f into gridsync:master Aug 18, 2022

exarkun deleted the more-reliable-supervisor-kill-test branch August 18, 2022 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix race condition between test and supervisor restart #530

Fix race condition between test and supervisor restart #530

Uh oh!

exarkun commented Aug 11, 2022

Uh oh!

crwood commented Aug 18, 2022

Uh oh!

exarkun commented Aug 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix race condition between test and supervisor restart #530

Fix race condition between test and supervisor restart #530

Uh oh!

Conversation

exarkun commented Aug 11, 2022

Uh oh!

crwood commented Aug 18, 2022

Uh oh!

exarkun commented Aug 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants