Skip to content

Slower performance when passing numpy array into task versus list. #1813

@robertnishihara

Description

@robertnishihara
import numpy as np
import ray
import time

ray.init()

@ray.remote
class Foo(object):
  def method(self, x):
      return x

a = Foo.remote()
x = np.random.rand(10, 10).tolist()

time.sleep(1)  # Wait for the actor to start.

start = time.time()
for i in range(1000):
    ray.get(a.method.remote(x))
print("Using list: ", time.time() - start)

x = np.random.rand(10, 10)

start = time.time()
for i in range(1000):
    ray.get(a.method.remote(x))
print("Using numpy array: ", time.time() - start)

On my laptop, this prints

Using list:  0.6015908718109131
Using numpy array:  0.895500898361206

The numpy array case is slower (presumably because the array does not get inlined in the task specification and goes through the object store instead).

Proposal:

  1. Allow small numpy arrays to be inlined in the tasks.
  2. Allow larger things to be inlined in the tasks.

Potential Issues:

  1. The bigger the tasks are, the sooner Redis will run out of memory (until we are flushing keys from Redis).

cc @jsuarez5341

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions