Skip to content

[Object spilling] Add policy to automatically spill objects on OutOfMemory#11673

Merged
ericl merged 18 commits intoray-project:masterfrom
stephanie-wang:automatic-spill
Nov 2, 2020
Merged

[Object spilling] Add policy to automatically spill objects on OutOfMemory#11673
ericl merged 18 commits intoray-project:masterfrom
stephanie-wang:automatic-spill

Conversation

@stephanie-wang
Copy link
Copy Markdown
Contributor

Why are these changes needed?

This adds a callback when an object store node runs out of memory to choose objects to spill, after first attempting to make space through eviction of objects not currently referenced. Since spilling is asynchronous, the object store client must try to create the object again. Once the object spilling is complete, the object creation will succeed.

A TODO is to modify the object store to respond to the client asynchronously. This is so that, in the case that we can definitely make enough space by spilling other objects, the object store client does not have to retry the create call on a timer and we do not block the object store while the objects are being spilled.

This PR also introduces some changes to the way configs are passed around the cluster to accommodate passing around the object spilling config, which is a JSON string. Long-term, we should have a less brittle way to pass around arbitrary config values.

Related issue number

Closes #9849.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ericl
Copy link
Copy Markdown
Contributor

ericl commented Oct 29, 2020

I think there are some missing BUILD files (doesn't compile).

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 29, 2020
@stephanie-wang stephanie-wang removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 29, 2020
// TODO: Only respond to the client with OutOfMemory if we could not
// make enough space through spilling. If we could make enough space,
// respond to the plasma client once spilling is complete.
static_cast<void>(spill_objects_callback_(space_needed));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might misunderstand here, but doesn't this mean raylet can try creating new objects or pulling new objects "before" all objects are actually spilled, and it can lead to OOM in some scenarios?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it handled from the object store side?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's possible that if the spilling is too slow, the client will still receive an OOM even though enough space will be made eventually.

// See the License for the specific language governing permissions and
// limitations under the License.

namespace ray {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include <functional>?

@ericl
Copy link
Copy Markdown
Contributor

ericl commented Oct 29, 2020

This is pretty cool. One thing I noticed is that

import json
import numpy as np
import ray

ray.init(
    object_store_memory=100 * 1024 * 1024,
    _system_config={
        "automatic_object_spilling_enabled": True,
        "object_spilling_config": json.dumps(
            {"type": "filesystem", "params": {"directory_path": "/tmp/spill"}},
            separators=(",", ":")
        )
    },
)

@ray.remote
def f():
    return np.zeros(10 * 1024 * 1024)

ids = []
for _ in range(10):
    x = f.remote()
    ids.append(x)

for x in ids:
    print(ray.get(x).shape)

This kind of workload will hang because the retries are waiting not enough time. I guess this will be addressed with the async RPC, but in the interim can we return a special return code to allow indefinite retries with low delay (e.g., every 10ms retry). This would allow spilling to be used for real workloads with pretty good performance.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 29, 2020
stephanie-wang and others added 6 commits October 29, 2020 16:24
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
@stephanie-wang
Copy link
Copy Markdown
Contributor Author

This kind of workload will hang because the retries are waiting not enough time. I guess this will be addressed with the async RPC, but in the interim can we return a special return code to allow indefinite retries with low delay (e.g., every 10ms retry). This would allow spilling to be used for real workloads with pretty good performance.

Hmm so I tried this and it actually still has the same problem. I added a new error code for the object store being transiently out of memory, and the client will retry quickly after receiving this error code. This is to prevent indefinite retries in the case where two clients need to create an object and there are pending spills, but only one of the clients will be able to create its object after the spills complete. If we just did an infinite retry loop on the second client, we could hang and never throw OutOfMemory.

There are two issues in the script you posted, both pretty subtle:

  1. Since there are multiple clients trying to create an object at the same time, only one of them will succeed after an object has been spilled. But then since all the clients retry at around the same time, the others retry while the one that succeeded is still creating its object (so its object can't be evicted or spilled). I managed to get this to work by resetting the normal OutOfMemory retries if the client ever receives the transient error, but I don't think that will work in all cases.
  2. During the script, there is one client (the driver) trying to get objects. With only one IO worker, this will hang because the IO worker will try to restore the driver's object, but that requires the other object to be spilled, which also requires an IO worker. I confirmed that it works once max IO workers is set to 2.

I'll push what I have, but it seems like there's a lot more to do here.

@stephanie-wang stephanie-wang removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 31, 2020
}

int64_t LocalObjectManager::SpillObjectsOfSize(int64_t num_bytes_required) {
if (RayConfig::instance().object_spilling_config().empty() ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, isn't the plasma store and object store running in a separate thread? If so, don't we need to lock this method? (since it is called within store.cc)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmm I think you're right about that. I'll add a lock. It seems weird that this wasn't triggered in any of the tests yet.

@rkooo567
Copy link
Copy Markdown
Contributor

rkooo567 commented Nov 2, 2020

@stephanie-wang I need more familiarity to the codebase, so it is probably not possible, but why don't we just queue up the creation requests and invoke them after object spilling requests are completed? Is it tricky to be implemented?

@stephanie-wang
Copy link
Copy Markdown
Contributor Author

@stephanie-wang I need more familiarity to the codebase, so it is probably not possible, but why don't we just queue up the creation requests and invoke them after object spilling requests are completed? Is it tricky to be implemented?

It is possible, and I think we should do it, but it's a bit complicated to do right now without refactoring the plasma store. I thought it'd be better to leave it for a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Object Spilling] Automatically spill objects on OutOfMemory

4 participants