To reproduce, run: python -m ray.experimental.shuffle --num-partitions=1000 --partition-size=1e6
You'll see that in top the raylet and owner process will end up using >10GB of heap memory. This is very unexpected, since theses processes should (1) only be storing metadata, and (2) the amount of "real data" is only 1GB in the benchmark above.
There might be some memory leak or other unexpected issue here.