Skip to content

pure delayed gives wrong result for dataclass methods #8376

@kdebrab

Description

@kdebrab

What happened:

from dask import delayed
from dataclasses import dataclass, field


@dataclass(frozen=True)
class A:
    param: float = field(repr=False)

    def get_param(self):
        return self.param

    def get_delayed_param(self, *args, **kwargs):
        return delayed(self.get_param, pure=True)(*args, **kwargs)

(A(1).get_delayed_param() - A(0).get_delayed_param()).compute()
Out[2]: 0

This is an incorrect result.

Apparently, A(1).get_delayed_param().key is erratically the same as A(0).get_delayed_param().key. It seems like the tokenize fails and falls back on something that gives the same result for both as str(A(1)) == str(A(0)).

What you expected to happen:
The correct result is obtained without delayed:

A(1).get_param() - A(0).get_param()
Out[3]: 1

One also gets the correct result when not using dataclasses

class B:

    def __init__(self, param):
        self.param = param

    def __repr__(self):
        return 'B()'

    def get_param(self):
        return self.param

    def get_delayed_param(self, *args, **kwargs):
        return delayed(self.get_param, pure=True)(*args, **kwargs)

(B(1).get_delayed_param() - B(0).get_delayed_param()).compute()
Out[4]: 1

Here, we get the correct result even though str(B(1)) == str(B(0)).

Environment:

  • Dask version: 2021.10.0
  • Python version: 3.9.4.final.0
  • Operating System: Windows 10
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions