Skip to content

Are reference cycles a performance problem? #4987

@mrocklin

Description

@mrocklin

@gjoseph92 noticed that, under some profiling conditions, turning off garbage collection had a significant impact on scheduler performance. I'm going to include some notes from him in the summary below

Notes from Gabe

See #4825 for initial discussion of the problem. It also comes up on #4881 (comment).

I've also run these with GC debug mode on (gjoseph92/dask-profiling-coiled@c0ea2aa1) and looked at GC logs. Interestingly GC debug mode generally reports GC as taking zero time:

gc: done, 0 unreachable, 0 uncollectable, 0.0000s elapsed

Some of those logs are here: https://rawcdn.githack.com/gjoseph92/dask-profiling-coiled/61fc875173a5b2f9195346f2a523cb1d876c48ad/results/cython-shuffle-gc-debug-noprofiling-ecs-prod-nopyspy.txt?raw=true

The types of objects being listed as collectable are interesting (cells, frames, tracebacks, asyncio Futures/Tasks, SelectorKey) since those are the sorts of things you might expect to create cycles. It's also interesting that there are already ~150k objects in generation 3 before the computation has even started, and ~300k (and growing) once it's been running for a little bit.

I've also tried turning off:

  • statistical profiling
  • bokeh dashboard
  • uvloop instead of native asyncio

But none of those affected the issue.

What I wanted to do next was use refcycle or objgraph or a similar tool to try to see what's causing the cycles. Or possibly use tracemalloc + GC hooks to try to log where the objects that were being collected were initially created.

I notice that we have reference cycles in our scheduler state

In [1]: from dask.distributed import Client
In [2]: client = Client()
In [3]: import dask.array as da
In [4]: x = da.random.random((1000, 1000)).sum().persist()
In [5]: s = client.cluster.scheduler
In [6]: a, b = s.tasks.values()

In [7]: a
Out[7]: <TaskState "('sum-aggregate-832c859ad539eafe39d0e7207de9f1e7',)" memory>

In [8]: b
Out[8]: <TaskState "('random_sample-sum-sum-aggregate-832c859ad539eafe39d0e7207de9f1e7',)" released>

In [9]: a in b.dependents
Out[9]: True
In [10]: b in a.dependencies
Out[10]: True

Should we be concerned about our use of reference cycles?

cc @jakirkham @pitrou

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions