Thread-safety improvements. Add tests with multiple threads. CI improvements.#122
Thread-safety improvements. Add tests with multiple threads. CI improvements.#122JamesWrigley merged 16 commits intomasterfrom
Conversation
|
The test failures are coming from replacing I think it's best to be conservative with this update so I'm going to revert that commit. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #122 +/- ##
==========================================
- Coverage 79.29% 79.19% -0.10%
==========================================
Files 10 10
Lines 1922 1923 +1
==========================================
- Hits 1524 1523 -1
- Misses 398 400 +2 ☔ View full report in Codecov by Sentry. |
2be833c to
877be4f
Compare
|
As discussed offline, added back the commits replacing |
|
Again discussed on slack. We decided to revert to |
This should fix an exception seen in CI from the lingering timeout task:
```
Test Summary: | Pass Total Time
Deserialization error recovery and include() | 11 11 3.9s
From worker 4: Unhandled Task ERROR: EOFError: read end of file
From worker 4: Stacktrace:
From worker 4: [1] wait
From worker 4: @ .\asyncevent.jl:159 [inlined]
From worker 4: [2] sleep(sec::Float64)
From worker 4: @ Base .\asyncevent.jl:265
From worker 4: [3] (::DistributedNext.var"#34#37"{DistributedNext.Worker, Float64})()
From worker 4: @ DistributedNext D:\a\DistributedNext.jl\DistributedNext.jl\src\cluster.jl:213
```
a866f6b to
c09b034
Compare
f6f1d38 to
190a6b6
Compare
e7212a9 to
dcd89eb
Compare
ddb8ad4 to
836bed1
Compare
3d6f794 to
54575e2
Compare
54575e2 to
e28698f
Compare
| function start_gc_msgs_task() | ||
| errormonitor( | ||
| Threads.@spawn begin | ||
| @async begin |
There was a problem hiding this comment.
This change fixed the race (#124). It seems to have always been a @spawn hence why I didn't initially convert it to @async.
e28698f to
ca2bd3c
Compare
IanButterworth
left a comment
There was a problem hiding this comment.
Approving to help the process along given review is required for merge
This cuts the running time down from ~1m to ~10s. It shouldn't be necessary to re-create the workers for every iteration of the inner loop.
f189b1c to
8494bbd
Compare
Co-authored by @IanButterworth
@spawnto@asyncto get tests passing. Moving back to@spawnwill take more work, but it should be low priority because the main thread, where Distributed usually runs, will already bestickyanyway, so the gain is minor.fbf8c12(#122) is necessary to avoid each worker in those tests re-precompiling any stdlibs they use. Speeds up macos runs by ~1 minute97fc237(#122) adds a sigusr1/siginfo profile before killing a worker that won't shut down, to aid debuggingFixes #124
TODO
@async)CC @jpsamaroo