Add internal_api.global_gc() method, which triggers gc.collect() on all workers#7327
Add internal_api.global_gc() method, which triggers gc.collect() on all workers#7327ericl merged 10 commits intoray-project:masterfrom
Conversation
|
Can one of the admins verify this patch? |
|
|
||
| // Trigger local GC at the next heartbeat interval. | ||
| if (heartbeat_data.should_global_gc()) { | ||
| should_local_gc_ = true; |
There was a problem hiding this comment.
Could we just DoLocalGC here and not have the should_local_gc_ flag?
There was a problem hiding this comment.
Oh never mind, I guess we might receive more than one HeartbeatAdded per interval?
There was a problem hiding this comment.
Yeah. Actually I need to change the other site to set the flag too.
src/ray/raylet/node_manager.cc
Outdated
| RAY_LOG(WARNING) << "Broadcasting global GC request to all raylets."; | ||
| should_global_gc_ = true; | ||
| // We won't see our own request, so trigger local GC immediately too. | ||
| DoLocalGC(); |
There was a problem hiding this comment.
Could move this to Heartbeat(), or just set should_local_gc_, so that we also throttle if there are a bunch of global GC requests from all local workers.
| repeated ObjectReferenceCount borrowed_refs = 1; | ||
| } | ||
|
|
||
| message LocalGCRequest { |
There was a problem hiding this comment.
| message LocalGCRequest { | |
| message TriggerLocalGCRequest { |
There was a problem hiding this comment.
I probably won't change this, it doesn't seem that much more clear.
There was a problem hiding this comment.
That's fine, just bothers me because the other RPCs start with a verb
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test FAILed. |
Why are these changes needed?
This adds a
global_gcmethod, which triggers gc.collect() on all workers to collect cyclic object references. This can be called when there is object store memory pressure to trigger the release of distributed object references.It works like this:
should_global_gcflag in its heartbeat, which is broadcast to all other raylets. When a raylet sees a heartbeat with should_global_gc set, it sets a localshould_local_gcflag.should_local_gcset, it will send a RPC to all workers in the next heartbeat.This effectively throttles the frequency of worker GC to once per raylet heartbeat (100ms), no matter how often it is called across the cluster.