Skip to content

ChildPidWatcher: fix a race condition when unregistering a pid#2814

Merged
sporksmith merged 1 commit intoshadow:mainfrom
sporksmith:childpidwatcher-unreg2
Mar 29, 2023
Merged

ChildPidWatcher: fix a race condition when unregistering a pid#2814
sporksmith merged 1 commit intoshadow:mainfrom
sporksmith:childpidwatcher-unreg2

Conversation

@sporksmith
Copy link
Copy Markdown
Contributor

We were getting rare failures in CI unit tests, and consistent failures in large benchmarks. I was able to reproduce with:

for i in `seq 1000`; do
  echo trial $i
  if ! RUST_BACKTRACE=full cargo test --manifest-path=src/Cargo.toml -p shadow-rs childpid_watcher ; then
    break
  fi
done

I think what was happening is that we could get an epoll event for a dying process in the worker thread just as we were in the middle of unregistering it from another thread.

The simplest fix is to just let the worker thread handle unregistration so that it's serialized with other operations in that thread.

We were getting rare failures in CI unit tests, and consistent failures
in large benchmarks. I was able to reproduce with:

```
for i in `seq 1000`; do
  echo trial $i
  if ! RUST_BACKTRACE=full cargo test --manifest-path=src/Cargo.toml -p shadow-rs childpid_watcher ; then
    break
  fi
done
```

I think what was happening is that we could get an epoll event for a
dieing process in the worker thread just as we were in the middle of
unregistering it from another thread.

The simplest fix is to just let the worker thread handle unregistration
so that it's serialized with other operations in that thread.
@sporksmith sporksmith self-assigned this Mar 29, 2023
@github-actions github-actions bot added the Component: Main Composing the core Shadow executable label Mar 29, 2023
@sporksmith sporksmith requested a review from stevenengler March 29, 2023 15:03
@sporksmith
Copy link
Copy Markdown
Contributor Author

Actually I'm going to take a look at a more thorough revision moving all the "work" into the worker thread ...

@sporksmith
Copy link
Copy Markdown
Contributor Author

Though maybe it makes sense to get this in now since it's causing the regular CI to be flaky in addition to breaking the runner

@sporksmith sporksmith merged commit b43386c into shadow:main Mar 29, 2023
@sporksmith sporksmith deleted the childpidwatcher-unreg2 branch March 29, 2023 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Main Composing the core Shadow executable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants