Add a lock to distributed.profile for better concurrency control#6421
Add a lock to distributed.profile for better concurrency control#6421
distributed.profile for better concurrency control#6421Conversation
| } | ||
|
|
||
|
|
||
| _watch_running: set[int] = set() |
There was a problem hiding this comment.
Removed in favor of the lock to avoid inconsistencies.
| if time() > last + cycle: | ||
| if time() > last + cycle: | ||
| recent = create() | ||
| with lock: |
There was a problem hiding this comment.
Should we move this down to just above the try: block? My sense is that we're really trying to lock around the creation and deletion of the frame object.
There was a problem hiding this comment.
I believe for what we're looking for we need to lock for the entire context the frame exists, i.e. if external code has the lock, there is no frame.
In the context of GC, the frame is the offender that holds references to some objects we want to ensure are collected.
There was a problem hiding this comment.
I think it's a trade-off between more accurate timestamps and less locking.
My thought was that in case we have some thread hogging the lock for a bit (e.g., a particularly expensive run of garbage collection), this would ensure that timestamps are only created once we have actually acquired the lock. However, in practice, this should be an edge case.
At the same time, the reduction in lock time that we gain from moving the lock should be marginal, so I think I'd prefer more accurate timestamps. What's your take on this?
There was a problem hiding this comment.
My sense is that we're really trying to lock around the creation and deletion of the frame object.
The issue I found in #6033 (comment) was not that the profiler was keeping hold of the frame object it, it appeared as if the GIL was released while the thread owned the frame from the main_thread.
try:
# thread_id is for some <Thread(IO loop, started daemon 140502607329024)>
frame = sys._current_frames() [thread_id]
# ~~~~~~GIL released here~~~~^ ??
except KeyError:
return
There was a problem hiding this comment.
1 0 RESUME 0
2 2 NOP
4 4 LOAD_GLOBAL 0 (sys)
16 LOAD_METHOD 1 (_current_frames)
38 PRECALL 0
42 CALL 0
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ profile thread owns all frames
52 LOAD_FAST 0 (thread_id)
54 BINARY_SUBSCR
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
64 STORE_FAST 1 (frame)
66 LOAD_CONST 0 (None)
68 RETURN_VALUE
>> 70 PUSH_EXC_INFO
6 72 LOAD_GLOBAL 4 (KeyError)
84 CHECK_EXC_MATCH
86 POP_JUMP_FORWARD_IF_FALSE 4 (to 96)
88 POP_TOP
7 90 POP_EXCEPT
92 LOAD_CONST 0 (None)
94 RETURN_VALUE
6 >> 96 RERAISE 0
>> 98 COPY 3
100 POP_EXCEPT
102 RERAISE 1
There was a problem hiding this comment.
See #6364 (comment) for the motivation to do this. We essentially want a means to avoid that a profiler holds references to frames while gc.collect() runs. I wasn't aware of #6033 before, so I'm not sure if this is related to the particular issue you found there.
Unit Test Results 15 files + 15 15 suites +15 6h 15m 19s ⏱️ + 6h 15m 19s For more details on these failures, see this check. Results for commit 1974e26. ± Comparison against base commit d84485b. ♻️ This comment has been updated with latest results. |
|
|
|
This seems like an improvement to me. Merging. |
|
OK, no worries. It was a small comment, please don't trouble yourself with
it. +1 from me
…On Mon, May 23, 2022 at 12:52 PM Hendrik Makait ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In distributed/profile.py
<#6421 (comment)>:
>
while not stop():
- _watch_running.add(watch_id)
- try:
- if time() > last + cycle:
+ if time() > last + cycle:
+ recent = create()
+ with lock:
I think it's a trade-off between more accurate timestamps and less locking.
My thought was that in case we have some thread hogging the lock for a bit
(e.g., a particularly expensive run of garbage collection), this would
ensure that timestamps are only created once we have actually acquired the
lock. In practice, this should be an edge case.
On the other hand, the reduction in lock time should also be marginal, so
I think I'd prefer more accurate timestamps. What's your take on this?
—
Reply to this email directly, view it on GitHub
<#6421 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTEWAS3ZMQJKEMFVGJLVLPAVPANCNFSM5WWJ6NMA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Adds a
Locktodistributed.profileto enable better concurrency control. In particular, it allows running garbage collection without a profiling thread holding references to objects, which is necessary for #6250.pre-commit run --all-files